legco_ai_assistant

History

Woody b2dd385443 feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation PDF uploads now use parse_pdf_by_page() -> chunk_pages() -> extract page PDFs -> enhanced metadata with page_number, chunk_file_path, and document_id. Same-filename replacement deletes old chunks and PDFs before re-ingest. DOCX/TXT keep original flat flow with document_id added. RAGService.ingest_document() accepts optional document_id parameter. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>		2026-04-24 10:53:17 +08:00
..
core	feat(backend): add PDF page extractor and chunk PDF storage config	2026-04-24 10:52:57 +08:00
models	feat(backend): add documents CRUD service methods and Pydantic schemas	2026-04-23 19:02:07 +08:00
routers	feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation	2026-04-24 10:53:17 +08:00
services	feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation	2026-04-24 10:53:17 +08:00
test	feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation	2026-04-24 10:53:17 +08:00
utils	feat(backend): add PDF page extractor and chunk PDF storage config	2026-04-24 10:52:57 +08:00
__init__.py	feat: Phase 1.1 project setup with config, database, and models	2026-04-22 16:13:52 +08:00
main.py	feat(backend): add documents CRUD endpoints and tests	2026-04-23 19:02:28 +08:00