legco_ai_assistant/backend/app/utils
Woody f4fa577fb0 feat(backend): add page-aware PDF parsing with per-page text extraction
Add parse_pdf_by_page() that returns List[Tuple[int, str]] with 1-indexed page numbers. Pages with no extractable text are skipped. Follows same error handling as existing parse_pdf().

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 10:30:04 +08:00
..
__init__.py feat: Phase 1.1 project setup with config, database, and models 2026-04-22 16:13:52 +08:00
chunking.py feat: Phase 1.2 ingestion pipeline with chunking and metadata 2026-04-22 16:49:52 +08:00
docx_parser.py refactor(backend): update document parsers for DOCX and PDF 2026-04-23 13:27:08 +08:00
metadata.py fix(backend): preserve original filename in chunk metadata instead of temp file name 2026-04-24 10:14:58 +08:00
pdf_parser.py feat(backend): add page-aware PDF parsing with per-page text extraction 2026-04-24 10:30:04 +08:00