Commit Graph

231 Commits

Author SHA1 Message Date
Woody 3e1f053f73 docs: update plan status to implemented and add Package 9 API examples to README 2026-05-25 20:27:24 +08:00
Woody 032dd75e17 feat: add Sub-Phase 9.3 evaluation API endpoint and 9.4 polish 2026-05-25 19:30:17 +08:00
Woody 098be359e7 feat: add Sub-Phase 9.2 evaluation engine (CER/WER, key questions, chunk, response) 2026-05-25 18:45:53 +08:00
Woody ac81df0704 feat: add Sub-Phase 9.1 results generation APIs with reusable RAGPipeline 2026-05-25 18:35:55 +08:00
Woody 852430f1f1 feat: add Sub-Phase 9.0 config and Pydantic models for accuracy testing 2026-05-25 18:27:51 +08:00
Woody 7dfd603bc8 chore: update .gitignore and add accuracy testing enhancement plan 2026-05-25 18:14:55 +08:00
Woody c8bcfa0487 docs: update Phase 5 plan with realtime implementation and model fix notes
Document chunked REST realtime implementation, model change to google/chirp-3, language code handling, diagnostic logging, and updated acceptance criteria.

Ultraworked with Sisyphus

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 13:34:25 +08:00
Woody f44b68812d fix: add diagnostic logging and OpenRouter language code filter
Add transcribe-start/complete logs for both providers, error response body logging, and ASR provider in startup log. Filter yue (ISO 639-3) language code from OpenRouter STT requests.

Ultraworked with Sisyphus

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 13:34:06 +08:00
Woody cd125d8535 feat: add OpenRouter realtime ASR via chunked REST WebSocket
Add _ws_proxy_openrouter() handler with pcm_to_wav() converter, 3s chunk accumulation, flush_lock concurrency guard, and endpoint dispatch on ASR_PROVIDER. Language code yue filtered for OpenRouter (ISO 639-3 not supported).

Ultraworked with Sisyphus

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 13:33:52 +08:00
Woody 552b4964bf fix: change default OpenRouter STT model to google/chirp-3
google/gemini-3.1-flash-lite is not an STT model; chirp-3 is one of the 8 supported OpenRouter STT models.

Ultraworked with Sisyphus

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 13:33:33 +08:00
Woody 5da74ec24c docs: add Phase 5 OpenRouter ASR implementation plan
Complete implementation plan with architecture (Factory+Strategy pattern), provider comparison (DashScope vs OpenRouter), configuration, 7 implementation tasks, test plan, acceptance criteria, and implementation notes including decisions made (circular import resolution, separate API key, sync-to-async DashScope wrapper).

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 09:49:22 +08:00
Woody 6928fff8ff test: update Phase 2 tests for ASR provider abstraction
Update TestTranscribeFull to use async/await and patch the moved OpenAI import (now in asr_providers.py). Set ASR_PROVIDER=dashscope in test fixtures to ensure tests don't pick up the real .env ASR_PROVIDER value. All 19 Phase 2 + 7 integration tests pass.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 09:48:58 +08:00
Woody 733824c177 test: add Phase 5 ASR provider and integration tests
test_phase5_config.py: 6 tests for ASR_PROVIDER validation and default values. test_phase5_openrouter_provider.py: 14 tests covering OpenRouterSTT transcription, retry logic, error handling, URL construction, cleanup, and factory dispatch. test_phase5_integration.py: 4 tests for full video-to-transcribe flow with both providers (mocked) and per-provider API key validation.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 09:48:37 +08:00
Woody 183fcf7772 refactor: make ASR client and video router provider-aware
Refactor ASRClient to delegate to provider (DashScopeASRProvider or OpenRouterASRProvider) via create_asr_provider() factory. transcribe_full() now async. Move _to_traditional to asr_providers.py (re-exported from asr_client.py for backward compat). Update video.py router to await transcribe_full() and validate API key per provider (DASHSCOPE_API_KEY for dashscope, OPENROUTER_API_KEY for openrouter).

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 09:48:12 +08:00
Woody 39525a2344 feat: add ASR provider config, abstraction layer, and OpenRouter provider
Add ASR_PROVIDER env var (dashscope|openrouter), OPENROUTER_API_KEY, and ASR_OPENROUTER_MODEL to Settings. Create ASRProvider ABC with DashScopeASRProvider (wraps existing OpenAI-based DashScope calls via run_in_executor) and OpenRouterASRProvider (httpx + tenacity retry for batch STT). Add tenacity>=8.0.0 dependency. Realtime WebSocket stays DashScope-only.

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-19 09:47:30 +08:00
Woody 67d2bddeb6 fix: use relative /api/v1 fallback instead of hardcoded localhost:8000
API URLs now resolve relative to the page origin, working for both local dev (via Vite proxy) and remote production deployments.

Also fixes useFullTranscript which had a double /api/v1 path bug when VITE_API_BASE_URL was set.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-18 17:27:28 +08:00
Woody a54d688867 fix: use VITE_API_BASE_URL for highlight endpoints instead of hardcoded localhost
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-18 16:31:16 +08:00
Woody 6678f81283 fix: keep textarea editable during half-question API call
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-18 16:04:14 +08:00
Woody 531e7c435e fix: enable half-question and final-submit buttons during interim ASR text
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-05-18 15:38:48 +08:00
Woody b6f8a522b6 docs: mark Phase 4 audio echo plan as completed 2026-05-18 14:50:59 +08:00
Woody 2d3dc7374d docs: Phase 4 audio echo bug fix plan 2026-05-18 14:47:46 +08:00
Woody d5e7e2d0ca chore: add pnpm config and update lockfile 2026-05-18 14:47:34 +08:00
Woody 1e6e41e426 feat: HTTPS support with nginx reverse proxy
- Add nginx as reverse proxy (HTTP→HTTPS redirect, self-signed cert)
- start.sh entrypoint: generates SSL cert, starts nginx + uvicorn
- Single-stage Dockerfile (no separate frontend build stage)
- Expose ports 80 and 443 in docker-compose
- Update README port references for HTTPS
2026-05-18 14:47:22 +08:00
Woody 0445fdba19 fix: UUID fallback for non-secure HTTP contexts
crypto.randomUUID() is unavailable outside secure contexts (plain HTTP).
Add generateUUID() helper with manual UUID v4 fallback (RFC 4122).
2026-05-18 14:47:07 +08:00
Woody 821159a198 Merge branch 'RAG-workflow' 2026-05-18 14:42:00 +08:00
Woody e00bb8853d Merge branch 'Highlight-Response' 2026-05-18 14:11:04 +08:00
Woody 82cc3a1d02 feat: question-based chunking strategy selector in RAG Database
Add ChunkingStrategy type ('token' | 'question') and wire it through
the ingest pipeline. Users can now choose between traditional token-window
chunking and question-based chunking (Q&A pair detection, table extraction).

Frontend changes:
- RAGDatabasePage: radio buttons for Token vs Question strategy
- DocumentList: strategy badges (blue 'chunked by question' / gray 'chunked by token')
- ChunkList: question-strategy chunks show Q&A metadata (question ID, topic,
  page range, 'contains table' badge) instead of raw page numbers
- api.ts / queries.tsx: pass strategy param to /ingest endpoint
- types/index.ts: new ChunkingStrategy type, new fields on ChunkInfo,
  DocumentInfo, IngestResponse
2026-05-18 14:10:51 +08:00
Woody 80af17a255 fix: mute audio output during System Audio and Mic capture to prevent echo
Insert a zero-gain GainNode between ScriptProcessorNode and
audioContext.destination. The processor stays in the graph (so
onaudioprocess fires on all browsers) but zero volume reaches the
speakers, eliminating the echo/feedback loop during live capture.
2026-05-18 14:04:42 +08:00
Woody 73c1789698 fix: Q\&A chunking always fell back to token — LLM never called, missing API fields
Three bugs caused 'Chunk by Question' to silently produce token chunks:

1. QuestionChunkingStrategy.chunk_pages() had a broken event-loop check
   that always skipped LLM structure detection in FastAPI's async context.
   Fixed by making chunk_pages() async and removing the is_running() guard.

2. get_chunking_strategy() factory never passed an LLMClient to
   QuestionChunkingStrategy. Fixed by creating LLMClient in the factory
   with graceful fallback to regex-only when config is incomplete.

3. rag.list_documents() and list_chunks() didn't extract strategy_type
   or Q&A fields from ChromaDB metadata, so the frontend always showed
   chunking_strategy='token' and null Q&A fields. Fixed by reading
   these fields from ChromaDB and routing them through the API.

Also: TokenChunkingStrategy.chunk_pages() made async for consistency
with the question strategy; ingest router updated to await it.
Tests updated (asyncio.run() for sync tests, async mock chunk_pages).
2026-05-15 14:46:45 +08:00
Woody f637ab10a5 Merge branch 'RAG-workflow' 2026-05-15 13:35:54 +08:00
Woody 9bef65de7b test: Sub-Phase 8.5 — acceptance test skeleton for Q&A chunking
8 acceptance tests with real LegCo PDFs (all @pytest.mark.acceptance + @slow).
Tests are skip()'d — run manually when real LLM is available:
  pytest app/test/acceptance/test_acceptance_phase8_qa_chunking.py -v -m acceptance

Sub-Phase 8.6 (polish/edge cases) deferred — remaining items are
O1-O4 format handling, [如被追問] nested Q&A, vision loading state.
Core algorithm (8.1-8.4) is test-passing and production-ready.
2026-05-15 12:45:46 +08:00
Woody 14423c773a feat: Sub-Phases 8.1-8.4 — Q&A-pair chunking strategy
8.1 — Core algorithm (test-first):
- qa_chunking.py: preprocess_text, build_structure_detection_prompt,
  parse_llm_structure_response, Section dataclass, split_chinese_qa,
  split_english_qa, build_chunks_from_sections with recursive size split
- QuestionChunkingStrategy in chunking.py with _chunk_metadata tracking
- get_chunking_strategy() factory function
- table_extraction.py: vision LLM extraction, heuristic text fallback,
  disk cache, inject_tables_into_answer
- 18/18 tests pass (LLM parse, regex fast-pass, multi-page, ABC contract,
  size limit, chunk building, preprocess)

8.2 — Metadata enrichment:
- extract_metadata() accepts strategy_type + chunk_metadata params
- Q&A fields (question_id, question_index, section_heading, etc.)
  merged into ChromaDB metadata entries
- DocumentInfo.chunking_strategy + ChunkInfo Q&A fields in models
- 6/6 metadata tests pass

8.3 — Ingest API integration:
- POST /api/v1/ingest accepts ?strategy=token|question
- validate strategy against VALID_CHUNKING_STRATEGIES
- factory creates correct chunker; _chunk_metadata passed to extract_metadata
- 6/6 ingest integration tests pass, zero regressions on existing tests

8.4 — Frontend strategy selector:
- Radio button selector (Token / Question) on RAG Database page
- Strategy passed to ingest mutation via api.ts
- DocumentList: strategy badge (gray/blue)
- ChunkList: Q&A display with question_id, question_text, page range, table badge
- tsc --noEmit clean, vite build successful
2026-05-15 12:44:04 +08:00
Woody c8a9c857f7 Merge branch 'Highlight-Response' 2026-05-15 12:05:17 +08:00
Woody 62db325f02 fix: add rehype-raw to ReactMarkdown so ==term== <mark> HTML renders
Without rehype-raw, ReactMarkdown escaped the raw <mark> HTML injected
by highlightTerms(), showing literal tags instead of yellow highlights.
Now 30 marks render with correct bg-yellow-200 (#FEF08A) background.
2026-05-15 12:05:07 +08:00
Woody ef10b937cf feat: Sub-Phase 8.0 — config & enums for Q&A-pair chunking strategy
Backend:
- Add 6 Q&A chunking config fields to Settings (default_chunking_strategy,
  qa_vision_enabled, qa_max_chunk_tokens, qa_structure_model,
  qa_include_internal_refs, qa_cache_vision_results)
- Define ChunkingStrategyType Literal + VALID_CHUNKING_STRATEGIES frozenset
- Add strategy field to IngestResponse (default token, non-breaking)
- Add IngestRequest model with strategy param
- Update .env.example with new env vars

Frontend:
- Add ChunkingStrategy type ('token' | 'question')
- Extend IngestResponse, DocumentInfo, ChunkInfo with Q&A fields

Tests:
- test_qa_chunking_config_defaults — all defaults verified
- test_qa_chunking_config_from_env — env var overrides verified

Plan fix: renamed qa_verification_model → qa_structure_model to match
LLM-first architecture
2026-05-15 12:01:28 +08:00
Woody 6bf04cedb1 docs: Package 8 — switch to LLM-first structure detection (not regex-first)
LegCo documents use multiple formats (問/答 markers, Q1/Q2 numbering,
section headings like '(1) 住戶的安置補償', 發言要點 bullet points,
and pure table pages). Regex alone cannot reliably classify all these.

Changes:
- Primary detection: LLM call identifies ALL section types in one pass
  (qa, narrative, speaking_notes, table, toc, heading_only)
- Regex: downgraded to optional fast-pass optimization for known patterns
- Architecture diagram, algorithm detail, risks, and test plan all updated
- Single model handles structure detection + table extraction + verification
2026-05-15 11:34:24 +08:00
Woody 29b4713f22 Merge branch 'Highlight-Response' 2026-05-15 11:23:02 +08:00
Woody 322caf1cc0 docs: Package 8 — add vLLM vision compatibility risk and smoke test to plan
- New risk: vLLM may not support Qwen3.5-35B-A3B vision API depending on version
- Dependencies: added vLLM compatibility note with smoke test snippet
- Heuristic fallback (Option B) works regardless of OpenRouter or vLLM
- qa_vision_enabled toggle provides escape hatch
2026-05-15 11:20:20 +08:00
Woody 16fbb107f4 Merge branch 'Ref-doc-highlight-bug' 2026-05-15 11:11:21 +08:00
Woody dbae9411c6 docs: Package 8 enhancement plan — Q&A-pair chunking strategy with vision table extraction
- New QuestionChunkingStrategy splits by 問/答 and Q1/Q2 boundaries
- Vision-based table-to-markdown using existing Qwen3.5-35B-A3B (native vision model)
- Strategy selector UI on RAG Database page (token vs question)
- Hybrid approach: regex primary split + LLM verification for edge cases
- Single-model architecture — no separate vision API needed
- 6 sub-phases with test-first delivery, 7 new files, 15+ modified files
2026-05-15 11:10:36 +08:00
Woody 787c6b1692 fix: vLLM highlight batch failure — replace guided_json with response_format + add debug logging
Root cause: guided_json removed in vLLM v0.12.0, and the two-attempt
loop (structured_outputs → guided_json) merged chat_template_kwargs
into the extra_body, potentially causing param conflicts.

Changes:
- llm_client.py: Replace _complete_structured_vllm() with two-tier
  approach — response_format (Tier 1, v0.6.4+) then structured_outputs
  (Tier 2, v0.8+). Remove dead guided_json path. Add _strip_markdown_fence().

- chunk_highlight_service.py: Add complete() fallback as defense-in-depth
  when structured output fails. Strip markdown fences before parsing.

- chunks.py: Add request/response logging at router level.

- chunk_highlight_service.py: Add full logging chain — entry, ChromaDB
  fetch, LLM call, fallback, cache results, exit.

- ResponsePanel.tsx: Add console logging for request payload, response
  status/errors/timing. Handle status=failed explicitly (was silently
  ignored). Track round-trip timing via performance.now().
2026-05-15 11:08:36 +08:00
Woody e78f53b687 feat: Phase 7.2 — wire highlightTerms into ResponsePanel + mark CSS
- Add HighlightMark component rendering <mark class="bg-yellow-200...">
- Call highlightTerms() in SubQuestionSection and FlatResponse before ReactMarkdown
- Add mark: HighlightMark to ReactMarkdown components in both paths
- Add .prose mark CSS rule (yellow-200 bg, rounded, px-0.5)
- Tests: 56/56 pass (citation + highlight + ResponsePanel)
2026-05-15 10:51:08 +08:00
Woody 534559b2e0 feat: Phase 7.1 — highlight prompt template + sequential citation [N] + highlightTerms parser
- Backend: add ==term== highlighting instruction to _SEED_GENERATE_PER_SUBQ
- Frontend: replaceFilename output with sequential [1] [2] [3] numbering
- Frontend: add highlightTerms() to convert ==term== to <mark> HTML
- Tests: 39 citation+highlight tests pass (28 updated + 11 new)
- Fix: QueryInput partialText styling and disabled state
2026-05-15 10:46:55 +08:00
Woody c3392989dc docs: vLLM highlight failure fix plan — confirmed guided_json removed in v0.12.0
Root cause confirmed via vLLM docs, protocol.py source, RFC #19097, and
GitHub test suite: guided_json was removed in v0.12.0. Our fallback to it
after structured_outputs fails is dead code.

Fix strategy: replace _complete_structured_vllm() with two-tier approach
(response_format as Tier 1, structured_outputs as Tier 2), removing the
dead guided_json path and the chat_template_kwargs merge that may conflict.

Evidence from: vllm.ai docs, vllm-project/vllm tests/entrypoints, protocol.py
to_sampling_params(), PRs #7654 #9530 #15627, RFC #19097
2026-05-15 10:13:07 +08:00
Woody 53ebafc401 docs: sync plan files with actual implementation — Phase 4 complete 2026-05-15 10:00:45 +08:00
Woody 8370f49631 docs: Package 7 — switch compact citations to sequential [1] [2] [3] numbering 2026-05-15 09:58:07 +08:00
Woody 29d2920b32 docs: Package 7 enhancement plan — response highlighting & compact citations 2026-05-15 09:53:15 +08:00
Woody d69c180544 feat: Phase 4.8-4.9 — integration tests, acceptance tests, docs, and polish 2026-05-15 09:51:45 +08:00
Woody 1e8773469e Merge branch 'Phase4-dev' 2026-05-14 23:29:42 +08:00
Woody 624df8cf9a fix: no text displayed during mic capture
DashScope realtime ASR sends utterance-completed (final) events
without incremental deltas. The onmessage handler cleared
partialTranscript on every final, so text never appeared.

Set partialTranscript to full_text on final messages instead
of clearing it, keeping the transcript visible in QueryInput.
2026-05-14 23:25:39 +08:00