Commit Graph

5 Commits

Author SHA1 Message Date
Woody 1699a249b0 feat: Phase 3.5 — YouTube → ASR integration with source toggle
- useYouTubeASR.ts: adapted from useVideoASR, captures audio from HTMLAudioElement
  (hls.js → <audio> → AudioContext.createMediaElementSource → ScriptProcessorNode → WebSocket)
  Play/pause events on videoElement; same return shape as useVideoASR
- LTTPage.tsx: Source toggle (Upload/YouTube tabs), YouTubeInput + YouTubeVideoPlayer
  wired with handleExtractSuccess → handleAudioReady → useYouTubeASR
  Full Transcript button hidden for YouTube source; unified asr variable
- QueryInput.tsx: no changes needed (already supports partialText/value from any source)
- Tests: 18 new (11 useYouTubeASR, 7 LTTPage integration)
- 189/189 total pass (zero regressions)
- Updated plan: 3.5 marked Complete, 5/7 sub-phases done
2026-05-09 17:00:32 +08:00
Woody 78d1f8cc91 feat: delta-based ASR transcript — use text field, utterance boundaries, stash on pause
Replace full_text responses with character-level deltas computed from
DashScope's monotonically-growing 'text' field. Stash-only events (empty
text) are skipped; trailing stash chars sent alongside deltas and
appended on pause to complete final sentences.

Backend:
- Delta = text[len(prev_text):] — simple suffix diff, no merge logic
- Track item_id for utterance boundaries, prepend space separator
- Send stash alongside delta for frontend pause handler

Frontend:
- Accumulate deltas locally (transcriptRef += msg.delta)
- Store lastStashRef from each message
- On pause: append stash to text, fire onFinalTranscript

Plan: .plans/phase2_enhancement_delta_sse.md updated to Complete
2026-05-07 11:26:19 +08:00
Woody cb0ac07786 fix: text accumulation — stashes are sliding windows, merge via overlap detection
DashScope stashes are ~7-char rolling windows, not cumulative. Each partial
event replaces the previous. Completed events rarely sent. This caused text to
jump/replace during streaming and disappear on pause.

Backend:
- Add _merge_stash() — finds overlapping suffix between successive stashes
  and appends only new characters, reconstructing full utterance from partials
- format_transcription_event returns raw stash for read_events to merge
- read_events maintains partial_buffer via _merge_stash, clears on completed
- Guard against empty/whitespace-only stashes

Frontend:
- transcriptRef + onFinalTranscriptRef avoid stale closures in pause handler
- stopStreaming fires onFinalTranscript(currentText) before clearing partial
- Removed blind setPartialTranscript('') that erased text on pause

Tests: 16/16 ws_protocol tests pass, frontend tests unchanged
Plan: Updated phase2_implementation_plan.md to Complete with 11-bug log
2026-05-06 20:06:39 +08:00
Woody fcb9ec1f6c fix: Phase 2 ASR pipeline — 9 bugs resolved, Full Transcript works end-to-end
- Vite proxy: forward /api and /ws to backend port 8000
- WebSocket URL: use backend host, not Vite HMR port
- LTTPage: callback ref replaces useRef (video element always null before)
- ws_asr: pass DashScope API key to OmniRealtimeConversation
- asr_client: fix data_url MIME type (audio/wav), omit extra_body when auto
- useFullTranscript: use absolute URL prefix for fetch
- QueryInput: add value prop for external Full Transcript injection
- QueryInput: fix displayValue || logic (partialText '' overrode question)
- ffmpeg: install static binary for audio extraction
- Integration tests: 7 tests (upload→transcribe flow)
- Acceptance tests: real DashScope tests (skippable)
- Structured logging: ws_asr.py + video.py
2026-05-06 18:26:17 +08:00
Woody a4e067822b feat: Phase 2.3 ASR proxy + full transcript and 2.4 frontend hooks
- Backend: DashScope WebSocket proxy (/ws/asr/{video_id}), DashScopeCallback
  sync-to-async bridge, ffmpeg audio extraction, POST /video/{id}/transcribe
- Frontend: useVideoASR hook (auto on play), useFullTranscript hook,
  QueryInput partialText prop, VideoUploadResponse types, uploadVideo API
- Tests: 41 backend + 26 frontend = 67 new tests, all passing
2026-05-06 13:41:24 +08:00