Three bugs caused 'Chunk by Question' to silently produce token chunks: 1. QuestionChunkingStrategy.chunk_pages() had a broken event-loop check that always skipped LLM structure detection in FastAPI's async context. Fixed by making chunk_pages() async and removing the is_running() guard. 2. get_chunking_strategy() factory never passed an LLMClient to QuestionChunkingStrategy. Fixed by creating LLMClient in the factory with graceful fallback to regex-only when config is incomplete. 3. rag.list_documents() and list_chunks() didn't extract strategy_type or Q&A fields from ChromaDB metadata, so the frontend always showed chunking_strategy='token' and null Q&A fields. Fixed by reading these fields from ChromaDB and routing them through the API. Also: TokenChunkingStrategy.chunk_pages() made async for consistency with the question strategy; ingest router updated to await it. Tests updated (asyncio.run() for sync tests, async mock chunk_pages). |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| chunking.py | ||
| docx_parser.py | ||
| metadata.py | ||
| pdf_extractor.py | ||
| pdf_parser.py | ||
| qa_chunking.py | ||
| sentence_splitter.py | ||
| table_extraction.py | ||
| text_to_pdf.py | ||