legco_ai_assistant/backend/app/routers
Woody 14423c773a feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy
8.1 — Core algorithm (test-first):
- qa_chunking.py: preprocess_text, build_structure_detection_prompt,
  parse_llm_structure_response, Section dataclass, split_chinese_qa,
  split_english_qa, build_chunks_from_sections with recursive size split
- QuestionChunkingStrategy in chunking.py with _chunk_metadata tracking
- get_chunking_strategy() factory function
- table_extraction.py: vision LLM extraction, heuristic text fallback,
  disk cache, inject_tables_into_answer
- 18/18 tests pass (LLM parse, regex fast-pass, multi-page, ABC contract,
  size limit, chunk building, preprocess)

8.2 — Metadata enrichment:
- extract_metadata() accepts strategy_type + chunk_metadata params
- Q&A fields (question_id, question_index, section_heading, etc.)
  merged into ChromaDB metadata entries
- DocumentInfo.chunking_strategy + ChunkInfo Q&A fields in models
- 6/6 metadata tests pass

8.3 — Ingest API integration:
- POST /api/v1/ingest accepts ?strategy=token|question
- validate strategy against VALID_CHUNKING_STRATEGIES
- factory creates correct chunker; _chunk_metadata passed to extract_metadata
- 6/6 ingest integration tests pass, zero regressions on existing tests

8.4 — Frontend strategy selector:
- Radio button selector (Token / Question) on RAG Database page
- Strategy passed to ingest mutation via api.ts
- DocumentList: strategy badge (gray/blue)
- ChunkList: Q&A display with question_id, question_text, page range, table badge
- tsc --noEmit clean, vite build successful
2026-05-15 12:44:04 +08:00
..
__init__.py feat: Phase 1.1 project setup with config, database, and models 2026-04-22 16:13:52 +08:00
chunks.py fix: vLLM highlight batch failure — replace guided_json with response_format + add debug logging 2026-05-15 11:08:36 +08:00
documents.py feat: add chunk PDF serving endpoint and frontend clickable source links (1.5.6) 2026-04-24 11:49:39 +08:00
history.py feat(history): Phase 3.5 — Query History backend (service, API, timing, XML capture) 2026-04-25 22:59:53 +08:00
ingest.py feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy 2026-05-15 12:44:04 +08:00
prompts.py feat: configurable SubQuestions via Step 1.2 system prompt page 2026-05-04 17:22:14 +08:00
query.py feat: Phase 3 — Half Question button, Final Submit rename, ASR text always black 2026-05-14 21:27:21 +08:00
video.py fix: Phase 2 ASR pipeline — 9 bugs resolved, Full Transcript works end-to-end 2026-05-06 18:26:17 +08:00
ws_asr.py feat: Phase 4 — System Audio & Listen Mic capture into ASR → RAG 2026-05-14 22:55:06 +08:00