legco_ai_assistant/backend/app/services
Woody b2dd385443 feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation
PDF uploads now use parse_pdf_by_page() -> chunk_pages() -> extract page PDFs -> enhanced metadata with page_number, chunk_file_path, and document_id. Same-filename replacement deletes old chunks and PDFs before re-ingest. DOCX/TXT keep original flat flow with document_id added. RAGService.ingest_document() accepts optional document_id parameter.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 10:53:17 +08:00
..
__init__.py feat: Phase 1.1 project setup with config, database, and models 2026-04-22 16:13:52 +08:00
embedding_client.py feat(backend): add embedding client and update LLM client 2026-04-23 13:26:43 +08:00
llm_client.py debug(backend): add LLM request/response logging for OpenRouter debugging 2026-04-23 16:28:43 +08:00
query_decomposer.py fix(backend): extract JSON from markdown code blocks in LLM responses 2026-04-23 16:28:07 +08:00
rag.py feat(backend): refactor ingest pipeline for page-aware chunking with PDF generation 2026-04-24 10:53:17 +08:00
relevance_filter.py fix(backend): extract JSON from markdown code blocks in LLM responses 2026-04-23 16:28:07 +08:00