legco_ai_assistant

History

Woody 73c1789698 fix: Q\&A chunking always fell back to token — LLM never called, missing API fields Three bugs caused 'Chunk by Question' to silently produce token chunks: 1. QuestionChunkingStrategy.chunk_pages() had a broken event-loop check that always skipped LLM structure detection in FastAPI's async context. Fixed by making chunk_pages() async and removing the is_running() guard. 2. get_chunking_strategy() factory never passed an LLMClient to QuestionChunkingStrategy. Fixed by creating LLMClient in the factory with graceful fallback to regex-only when config is incomplete. 3. rag.list_documents() and list_chunks() didn't extract strategy_type or Q&A fields from ChromaDB metadata, so the frontend always showed chunking_strategy='token' and null Q&A fields. Fixed by reading these fields from ChromaDB and routing them through the API. Also: TokenChunkingStrategy.chunk_pages() made async for consistency with the question strategy; ingest router updated to await it. Tests updated (asyncio.run() for sync tests, async mock chunk_pages).		2026-05-15 14:46:45 +08:00
..
__init__.py	feat: Phase 1.1 project setup with config, database, and models	2026-04-22 16:13:52 +08:00
chunking.py	fix: Q\&A chunking always fell back to token — LLM never called, missing API fields	2026-05-15 14:46:45 +08:00
docx_parser.py	feat: rewrite DOCX parser with table extraction	2026-04-28 16:42:41 +08:00
metadata.py	feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy	2026-05-15 12:44:04 +08:00
pdf_extractor.py	feat(backend): add PDF page extractor and chunk PDF storage config	2026-04-24 10:52:57 +08:00
pdf_parser.py	feat(backend): add page-aware PDF parsing with per-page text extraction	2026-04-24 10:30:04 +08:00
qa_chunking.py	feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy	2026-05-15 12:44:04 +08:00
sentence_splitter.py	feat: add sentence splitter and highlight data models (Phase 5.4.1-5.4.2)	2026-04-29 09:26:06 +08:00
table_extraction.py	feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy	2026-05-15 12:44:04 +08:00
text_to_pdf.py	feat(ingest): generate per-chunk PDFs for DOCX/TXT documents (Phase 5.3)	2026-04-28 17:32:22 +08:00