legco_ai_assistant/backend/app/utils
Woody d94abaac77 feat: Phase 1.2 ingestion pipeline with chunking and metadata
- Add document parsers (DOCX, PDF) with lazy imports
- Add TokenChunkingStrategy with ABC for future replacement
- Add metadata extraction (filename, upload_date, content_summary)
- Add RAGService for ChromaDB ingestion/retrieval/response generation
- Add POST /api/v1/ingest endpoint with file validation
- Test-first: 20 passed, 2 skipped (python-docx not installed)
2026-04-22 16:49:52 +08:00
..
__init__.py feat: Phase 1.1 project setup with config, database, and models 2026-04-22 16:13:52 +08:00
chunking.py feat: Phase 1.2 ingestion pipeline with chunking and metadata 2026-04-22 16:49:52 +08:00
docx_parser.py feat: Phase 1.2 ingestion pipeline with chunking and metadata 2026-04-22 16:49:52 +08:00
metadata.py feat: Phase 1.2 ingestion pipeline with chunking and metadata 2026-04-22 16:49:52 +08:00
pdf_parser.py feat: Phase 1.2 ingestion pipeline with chunking and metadata 2026-04-22 16:49:52 +08:00