legco_ai_assistant/backend/app/models
Woody 14423c773a feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy
8.1 — Core algorithm (test-first):
- qa_chunking.py: preprocess_text, build_structure_detection_prompt,
  parse_llm_structure_response, Section dataclass, split_chinese_qa,
  split_english_qa, build_chunks_from_sections with recursive size split
- QuestionChunkingStrategy in chunking.py with _chunk_metadata tracking
- get_chunking_strategy() factory function
- table_extraction.py: vision LLM extraction, heuristic text fallback,
  disk cache, inject_tables_into_answer
- 18/18 tests pass (LLM parse, regex fast-pass, multi-page, ABC contract,
  size limit, chunk building, preprocess)

8.2 — Metadata enrichment:
- extract_metadata() accepts strategy_type + chunk_metadata params
- Q&A fields (question_id, question_index, section_heading, etc.)
  merged into ChromaDB metadata entries
- DocumentInfo.chunking_strategy + ChunkInfo Q&A fields in models
- 6/6 metadata tests pass

8.3 — Ingest API integration:
- POST /api/v1/ingest accepts ?strategy=token|question
- validate strategy against VALID_CHUNKING_STRATEGIES
- factory creates correct chunker; _chunk_metadata passed to extract_metadata
- 6/6 ingest integration tests pass, zero regressions on existing tests

8.4 — Frontend strategy selector:
- Radio button selector (Token / Question) on RAG Database page
- Strategy passed to ingest mutation via api.ts
- DocumentList: strategy badge (gray/blue)
- ChunkList: Q&A display with question_id, question_text, page range, table badge
- tsc --noEmit clean, vite build successful
2026-05-15 12:44:04 +08:00
..
__init__.py feat: Phase 1.1 project setup with config, database, and models 2026-04-22 16:13:52 +08:00
asr.py feat: Phase 2.1 config + infrastructure and 2.2 video upload backend 2026-05-06 13:08:19 +08:00
common.py feat: structured LLM output for decompose + citation fuzzy matching (Phase 5) 2026-04-28 15:39:17 +08:00
decompose.py feat: configurable SubQuestions via Step 1.2 system prompt page 2026-05-04 17:22:14 +08:00
documents.py feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy 2026-05-15 12:44:04 +08:00
highlight.py feat: track highlight generation prompt, response, and timing in history (Phase 5.5) 2026-04-29 11:18:21 +08:00
history.py feat: track highlight generation prompt, response, and timing in history (Phase 5.5) 2026-04-29 11:18:21 +08:00
ingest.py feat: Sub-Phase 8.0 — config & enums for Q&A-pair chunking strategy 2026-05-15 12:01:28 +08:00
prompts.py feat(prompts): add JSON export/import for profile prompt configurations 2026-04-27 19:44:35 +08:00
query.py feat: Phase 3 — Half Question button, Final Submit rename, ASR text always black 2026-05-14 21:27:21 +08:00
video.py feat: Phase 2.1 config + infrastructure and 2.2 video upload backend 2026-05-06 13:08:19 +08:00