legco_ai_assistant/backend/app/utils
Woody b11d31e2d1 feat: add sentence splitter and highlight data models (Phase 5.4.1-5.4.2)
- sentence_splitter.py: regex-based sentence splitting for English + Chinese punctuation
- highlight.py: 6 Pydantic models (ChunkHighlightTarget, HighlightBatchRequest,
  RelevantSentence, ChunkHighlights, HighlightBatchResult, HighlightBatchResponse)
- 43 tests: 13 sentence splitter + 30 model validation
2026-04-29 09:26:06 +08:00
..
__init__.py feat: Phase 1.1 project setup with config, database, and models 2026-04-22 16:13:52 +08:00
chunking.py feat(backend): add page-aware chunking with adjacent-page overlap 2026-04-24 10:30:18 +08:00
docx_parser.py feat: rewrite DOCX parser with table extraction 2026-04-28 16:42:41 +08:00
metadata.py refactor(test): rewrite tests to comply with integration-first rules 2026-04-27 11:46:58 +08:00
pdf_extractor.py feat(backend): add PDF page extractor and chunk PDF storage config 2026-04-24 10:52:57 +08:00
pdf_parser.py feat(backend): add page-aware PDF parsing with per-page text extraction 2026-04-24 10:30:04 +08:00
sentence_splitter.py feat: add sentence splitter and highlight data models (Phase 5.4.1-5.4.2) 2026-04-29 09:26:06 +08:00
text_to_pdf.py feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3) 2026-04-28 17:32:22 +08:00