Add _ws_proxy_openrouter() handler with pcm_to_wav() converter, 3s chunk accumulation, flush_lock concurrency guard, and endpoint dispatch on ASR_PROVIDER. Language code yue filtered for OpenRouter (ISO 639-3 not supported).
Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
google/gemini-3.1-flash-lite is not an STT model; chirp-3 is one of the 8 supported OpenRouter STT models.
Ultraworked with Sisyphus
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Complete implementation plan with architecture (Factory+Strategy pattern), provider comparison (DashScope vs OpenRouter), configuration, 7 implementation tasks, test plan, acceptance criteria, and implementation notes including decisions made (circular import resolution, separate API key, sync-to-async DashScope wrapper).
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Update TestTranscribeFull to use async/await and patch the moved OpenAI import (now in asr_providers.py). Set ASR_PROVIDER=dashscope in test fixtures to ensure tests don't pick up the real .env ASR_PROVIDER value. All 19 Phase 2 + 7 integration tests pass.
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
test_phase5_config.py: 6 tests for ASR_PROVIDER validation and default values. test_phase5_openrouter_provider.py: 14 tests covering OpenRouterSTT transcription, retry logic, error handling, URL construction, cleanup, and factory dispatch. test_phase5_integration.py: 4 tests for full video-to-transcribe flow with both providers (mocked) and per-provider API key validation.
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Refactor ASRClient to delegate to provider (DashScopeASRProvider or OpenRouterASRProvider) via create_asr_provider() factory. transcribe_full() now async. Move _to_traditional to asr_providers.py (re-exported from asr_client.py for backward compat). Update video.py router to await transcribe_full() and validate API key per provider (DASHSCOPE_API_KEY for dashscope, OPENROUTER_API_KEY for openrouter).
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
API URLs now resolve relative to the page origin, working for both local dev (via Vite proxy) and remote production deployments.
Also fixes useFullTranscript which had a double /api/v1 path bug when VITE_API_BASE_URL was set.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Add ChunkingStrategy type ('token' | 'question') and wire it through
the ingest pipeline. Users can now choose between traditional token-window
chunking and question-based chunking (Q&A pair detection, table extraction).
Frontend changes:
- RAGDatabasePage: radio buttons for Token vs Question strategy
- DocumentList: strategy badges (blue 'chunked by question' / gray 'chunked by token')
- ChunkList: question-strategy chunks show Q&A metadata (question ID, topic,
page range, 'contains table' badge) instead of raw page numbers
- api.ts / queries.tsx: pass strategy param to /ingest endpoint
- types/index.ts: new ChunkingStrategy type, new fields on ChunkInfo,
DocumentInfo, IngestResponse
Insert a zero-gain GainNode between ScriptProcessorNode and
audioContext.destination. The processor stays in the graph (so
onaudioprocess fires on all browsers) but zero volume reaches the
speakers, eliminating the echo/feedback loop during live capture.
Three bugs caused 'Chunk by Question' to silently produce token chunks:
1. QuestionChunkingStrategy.chunk_pages() had a broken event-loop check
that always skipped LLM structure detection in FastAPI's async context.
Fixed by making chunk_pages() async and removing the is_running() guard.
2. get_chunking_strategy() factory never passed an LLMClient to
QuestionChunkingStrategy. Fixed by creating LLMClient in the factory
with graceful fallback to regex-only when config is incomplete.
3. rag.list_documents() and list_chunks() didn't extract strategy_type
or Q&A fields from ChromaDB metadata, so the frontend always showed
chunking_strategy='token' and null Q&A fields. Fixed by reading
these fields from ChromaDB and routing them through the API.
Also: TokenChunkingStrategy.chunk_pages() made async for consistency
with the question strategy; ingest router updated to await it.
Tests updated (asyncio.run() for sync tests, async mock chunk_pages).
8 acceptance tests with real LegCo PDFs (all @pytest.mark.acceptance + @slow).
Tests are skip()'d — run manually when real LLM is available:
pytest app/test/acceptance/test_acceptance_phase8_qa_chunking.py -v -m acceptance
Sub-Phase 8.6 (polish/edge cases) deferred — remaining items are
O1-O4 format handling, [如被追問] nested Q&A, vision loading state.
Core algorithm (8.1-8.4) is test-passing and production-ready.
Without rehype-raw, ReactMarkdown escaped the raw <mark> HTML injected
by highlightTerms(), showing literal tags instead of yellow highlights.
Now 30 marks render with correct bg-yellow-200 (#FEF08A) background.
LegCo documents use multiple formats (問/答 markers, Q1/Q2 numbering,
section headings like '(1) 住戶的安置補償', 發言要點 bullet points,
and pure table pages). Regex alone cannot reliably classify all these.
Changes:
- Primary detection: LLM call identifies ALL section types in one pass
(qa, narrative, speaking_notes, table, toc, heading_only)
- Regex: downgraded to optional fast-pass optimization for known patterns
- Architecture diagram, algorithm detail, risks, and test plan all updated
- Single model handles structure detection + table extraction + verification
- New risk: vLLM may not support Qwen3.5-35B-A3B vision API depending on version
- Dependencies: added vLLM compatibility note with smoke test snippet
- Heuristic fallback (Option B) works regardless of OpenRouter or vLLM
- qa_vision_enabled toggle provides escape hatch
Root cause confirmed via vLLM docs, protocol.py source, RFC #19097, and
GitHub test suite: guided_json was removed in v0.12.0. Our fallback to it
after structured_outputs fails is dead code.
Fix strategy: replace _complete_structured_vllm() with two-tier approach
(response_format as Tier 1, structured_outputs as Tier 2), removing the
dead guided_json path and the chat_template_kwargs merge that may conflict.
Evidence from: vllm.ai docs, vllm-project/vllm tests/entrypoints, protocol.py
to_sampling_params(), PRs #7654#9530#15627, RFC #19097
DashScope realtime ASR sends utterance-completed (final) events
without incremental deltas. The onmessage handler cleared
partialTranscript on every final, so text never appeared.
Set partialTranscript to full_text on final messages instead
of clearing it, keeping the transcript visible in QueryInput.
useMediaStreamASR cleanup() cleared partialTranscript on stop,
causing live ASR text to vanish from QueryInput. Unlike video
ASR (which has onFinalTranscript to persist via queryText),
mic and system-audio hooks rely on partialTranscript for
display. Keep partialTranscript populated with the final
transcript instead of clearing it.
Adds two new live audio sources alongside file Upload:
- System Audio: getDisplayMedia() captures system/tab audio output,
pipes through WebSocket → DashScope realtime ASR → RAG.
- Listen Mic: getUserMedia() captures microphone input via the same
audio pipeline (shared useMediaStreamASR hook).
Backend: feature toggles (system_audio_enabled, mic_enabled) in
config.py, source query param gating in ws_asr.py, 10 config tests.
Bug fix: getDisplayMedia() rejected video:false per W3C spec —
changed to video:true then stop video tracks to allow audio-only
capture on Windows/macOS Chrome.
isDisabled, handleSubmit, and Half Question onClick all checked
question.trim() instead of displayValue.trim(). Since question state
is only updated on onFinalTranscript (complete sentences), interim
ASR delta text shown in the textarea via partialText was invisible
to the disabled check — buttons stayed disabled until sentence end.
Fix: use displayValue which includes partialText when user hasn't typed.
- Backend: add stop_after_decompose flag to QueryRequest, early-return
after decomposition in SSE stream with half_question:true event
- Frontend: add decomposeOnly method to useQueryDocumentStream hook
- QueryInput: remove grey italic from ASR partial text, rename Submit
to Final Submit, add gray Half Question button that decomposes
without clearing querybox text
- LTTPage: wire handleHalfQuestion to decomposeOnly
Reverts commits 284028b through b4096d6. Phase 4 (System Audio Capture)
will replace the YouTube use case with a more versatile getDisplayMedia approach.
Removed: YouTube router, HLS proxy, YouTubeService, YouTubeInput,
YouTubeVideoPlayer, useYouTubeASR hook, all Phase 3 tests, hls.js dep,
YouTube config fields, YouTube README/plan sections.
Modified files restored to pre-Phase-3 state: LTTPage (no source toggle),
api.ts (no YouTube extract), types (no YouTube types), config.py (no
youtube fields), main.py (no YouTube router), requirements.txt (no yt-dlp),
.env.example (no YouTube vars), package.json (no hls.js).
Relevant Phase 2 code preserved: ws_asr.py (unchanged), useVideoASR,
VideoPlayer, VideoUpload, QueryInput, Full Transcript.