Document chunked REST realtime implementation, model change to google/chirp-3, language code handling, diagnostic logging, and updated acceptance criteria.
Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add transcribe-start/complete logs for both providers, error response body logging, and ASR provider in startup log. Filter yue (ISO 639-3) language code from OpenRouter STT requests.
Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add _ws_proxy_openrouter() handler with pcm_to_wav() converter, 3s chunk accumulation, flush_lock concurrency guard, and endpoint dispatch on ASR_PROVIDER. Language code yue filtered for OpenRouter (ISO 639-3 not supported).
Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
google/gemini-3.1-flash-lite is not an STT model; chirp-3 is one of the 8 supported OpenRouter STT models.
Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Complete implementation plan with architecture (Factory+Strategy pattern), provider comparison (DashScope vs OpenRouter), configuration, 7 implementation tasks, test plan, acceptance criteria, and implementation notes including decisions made (circular import resolution, separate API key, sync-to-async DashScope wrapper).
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Update TestTranscribeFull to use async/await and patch the moved OpenAI import (now in asr_providers.py). Set ASR_PROVIDER=dashscope in test fixtures to ensure tests don't pick up the real .env ASR_PROVIDER value. All 19 Phase 2 + 7 integration tests pass.
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
test_phase5_config.py: 6 tests for ASR_PROVIDER validation and default values. test_phase5_openrouter_provider.py: 14 tests covering OpenRouterSTT transcription, retry logic, error handling, URL construction, cleanup, and factory dispatch. test_phase5_integration.py: 4 tests for full video-to-transcribe flow with both providers (mocked) and per-provider API key validation.
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Refactor ASRClient to delegate to provider (DashScopeASRProvider or OpenRouterASRProvider) via create_asr_provider() factory. transcribe_full() now async. Move _to_traditional to asr_providers.py (re-exported from asr_client.py for backward compat). Update video.py router to await transcribe_full() and validate API key per provider (DASHSCOPE_API_KEY for dashscope, OPENROUTER_API_KEY for openrouter).
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
API URLs now resolve relative to the page origin, working for both local dev (via Vite proxy) and remote production deployments.
Also fixes useFullTranscript which had a double /api/v1 path bug when VITE_API_BASE_URL was set.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add ChunkingStrategy type ('token' | 'question') and wire it through
the ingest pipeline. Users can now choose between traditional token-window
chunking and question-based chunking (Q&A pair detection, table extraction).
Frontend changes:
- RAGDatabasePage: radio buttons for Token vs Question strategy
- DocumentList: strategy badges (blue 'chunked by question' / gray 'chunked by token')
- ChunkList: question-strategy chunks show Q&A metadata (question ID, topic,
page range, 'contains table' badge) instead of raw page numbers
- api.ts / queries.tsx: pass strategy param to /ingest endpoint
- types/index.ts: new ChunkingStrategy type, new fields on ChunkInfo,
DocumentInfo, IngestResponse
Insert a zero-gain GainNode between ScriptProcessorNode and
audioContext.destination. The processor stays in the graph (so
onaudioprocess fires on all browsers) but zero volume reaches the
speakers, eliminating the echo/feedback loop during live capture.
Three bugs caused 'Chunk by Question' to silently produce token chunks:
1. QuestionChunkingStrategy.chunk_pages() had a broken event-loop check
that always skipped LLM structure detection in FastAPI's async context.
Fixed by making chunk_pages() async and removing the is_running() guard.
2. get_chunking_strategy() factory never passed an LLMClient to
QuestionChunkingStrategy. Fixed by creating LLMClient in the factory
with graceful fallback to regex-only when config is incomplete.
3. rag.list_documents() and list_chunks() didn't extract strategy_type
or Q&A fields from ChromaDB metadata, so the frontend always showed
chunking_strategy='token' and null Q&A fields. Fixed by reading
these fields from ChromaDB and routing them through the API.
Also: TokenChunkingStrategy.chunk_pages() made async for consistency
with the question strategy; ingest router updated to await it.
Tests updated (asyncio.run() for sync tests, async mock chunk_pages).
8 acceptance tests with real LegCo PDFs (all @pytest.mark.acceptance + @slow).
Tests are skip()'d — run manually when real LLM is available:
pytest app/test/acceptance/test_acceptance_phase8_qa_chunking.py -v -m acceptance
Sub-Phase 8.6 (polish/edge cases) deferred — remaining items are
O1-O4 format handling, [如被追問] nested Q&A, vision loading state.
Core algorithm (8.1-8.4) is test-passing and production-ready.
Without rehype-raw, ReactMarkdown escaped the raw <mark> HTML injected
by highlightTerms(), showing literal tags instead of yellow highlights.
Now 30 marks render with correct bg-yellow-200 (#FEF08A) background.
LegCo documents use multiple formats (問/答 markers, Q1/Q2 numbering,
section headings like '(1) 住戶的安置補償', 發言要點 bullet points,
and pure table pages). Regex alone cannot reliably classify all these.
Changes:
- Primary detection: LLM call identifies ALL section types in one pass
(qa, narrative, speaking_notes, table, toc, heading_only)
- Regex: downgraded to optional fast-pass optimization for known patterns
- Architecture diagram, algorithm detail, risks, and test plan all updated
- Single model handles structure detection + table extraction + verification
- New risk: vLLM may not support Qwen3.5-35B-A3B vision API depending on version
- Dependencies: added vLLM compatibility note with smoke test snippet
- Heuristic fallback (Option B) works regardless of OpenRouter or vLLM
- qa_vision_enabled toggle provides escape hatch
Root cause confirmed via vLLM docs, protocol.py source, RFC #19097, and
GitHub test suite: guided_json was removed in v0.12.0. Our fallback to it
after structured_outputs fails is dead code.
Fix strategy: replace _complete_structured_vllm() with two-tier approach
(response_format as Tier 1, structured_outputs as Tier 2), removing the
dead guided_json path and the chat_template_kwargs merge that may conflict.
Evidence from: vllm.ai docs, vllm-project/vllm tests/entrypoints, protocol.py
to_sampling_params(), PRs #7654#9530#15627, RFC #19097
DashScope realtime ASR sends utterance-completed (final) events
without incremental deltas. The onmessage handler cleared
partialTranscript on every final, so text never appeared.
Set partialTranscript to full_text on final messages instead
of clearing it, keeping the transcript visible in QueryInput.