legco_ai_assistant

History

Woody 8c84062996 feat(backend): add PDF page extractor and chunk PDF storage config New pdf_extractor.py with extract_page_as_pdf() and extract_pages_as_pdf() for extracting individual PDF pages as separate files. Adds document_chunk_path setting to config and document_chunk/ to .gitignore. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>		2026-04-24 10:52:57 +08:00
..
__init__.py	feat: Phase 1.1 project setup with config, database, and models	2026-04-22 16:13:52 +08:00
chunking.py	feat(backend): add page-aware chunking with adjacent-page overlap	2026-04-24 10:30:18 +08:00
docx_parser.py	refactor(backend): update document parsers for DOCX and PDF	2026-04-23 13:27:08 +08:00
metadata.py	feat(backend): add page_number, chunk_file_path, document_id to chunk metadata	2026-04-24 10:30:40 +08:00
pdf_extractor.py	feat(backend): add PDF page extractor and chunk PDF storage config	2026-04-24 10:52:57 +08:00
pdf_parser.py	feat(backend): add page-aware PDF parsing with per-page text extraction	2026-04-24 10:30:04 +08:00