Commit Graph

5 Commits

Author SHA1 Message Date
Woody 78d1f8cc91 feat: delta-based ASR transcript — use text field, utterance boundaries, stash on pause
Replace full_text responses with character-level deltas computed from
DashScope's monotonically-growing 'text' field. Stash-only events (empty
text) are skipped; trailing stash chars sent alongside deltas and
appended on pause to complete final sentences.

Backend:
- Delta = text[len(prev_text):] — simple suffix diff, no merge logic
- Track item_id for utterance boundaries, prepend space separator
- Send stash alongside delta for frontend pause handler

Frontend:
- Accumulate deltas locally (transcriptRef += msg.delta)
- Store lastStashRef from each message
- On pause: append stash to text, fire onFinalTranscript

Plan: .plans/phase2_enhancement_delta_sse.md updated to Complete
2026-05-07 11:26:19 +08:00
Woody cb0ac07786 fix: text accumulation — stashes are sliding windows, merge via overlap detection
DashScope stashes are ~7-char rolling windows, not cumulative. Each partial
event replaces the previous. Completed events rarely sent. This caused text to
jump/replace during streaming and disappear on pause.

Backend:
- Add _merge_stash() — finds overlapping suffix between successive stashes
  and appends only new characters, reconstructing full utterance from partials
- format_transcription_event returns raw stash for read_events to merge
- read_events maintains partial_buffer via _merge_stash, clears on completed
- Guard against empty/whitespace-only stashes

Frontend:
- transcriptRef + onFinalTranscriptRef avoid stale closures in pause handler
- stopStreaming fires onFinalTranscript(currentText) before clearing partial
- Removed blind setPartialTranscript('') that erased text on pause

Tests: 16/16 ws_protocol tests pass, frontend tests unchanged
Plan: Updated phase2_implementation_plan.md to Complete with 11-bug log
2026-05-06 20:06:39 +08:00
Woody fcb9ec1f6c fix: Phase 2 ASR pipeline — 9 bugs resolved, Full Transcript works end-to-end
- Vite proxy: forward /api and /ws to backend port 8000
- WebSocket URL: use backend host, not Vite HMR port
- LTTPage: callback ref replaces useRef (video element always null before)
- ws_asr: pass DashScope API key to OmniRealtimeConversation
- asr_client: fix data_url MIME type (audio/wav), omit extra_body when auto
- useFullTranscript: use absolute URL prefix for fetch
- QueryInput: add value prop for external Full Transcript injection
- QueryInput: fix displayValue || logic (partialText '' overrode question)
- ffmpeg: install static binary for audio extraction
- Integration tests: 7 tests (upload→transcribe flow)
- Acceptance tests: real DashScope tests (skippable)
- Structured logging: ws_asr.py + video.py
2026-05-06 18:26:17 +08:00
Woody a4e067822b feat: Phase 2.3 ASR proxy + full transcript and 2.4 frontend hooks
- Backend: DashScope WebSocket proxy (/ws/asr/{video_id}), DashScopeCallback
  sync-to-async bridge, ffmpeg audio extraction, POST /video/{id}/transcribe
- Frontend: useVideoASR hook (auto on play), useFullTranscript hook,
  QueryInput partialText prop, VideoUploadResponse types, uploadVideo API
- Tests: 41 backend + 26 frontend = 67 new tests, all passing
2026-05-06 13:41:24 +08:00
Woody 9934749d2b feat: Phase 2.1 config + infrastructure and 2.2 video upload backend
- Add DashScope ASR and video upload config fields to Settings
- Create Pydantic models (video.py, asr.py)
- Create VideoService with validation, save, serve, delete
- Create ASR client stub with float32_to_s16le utility
- Implement POST /api/v1/video/upload with streaming validation
- Implement GET /api/v1/video/{video_id} with FileResponse
- Create WebSocket ASR endpoint stub
- Register new routers in main.py
- Update .env.example and requirements.txt
- Add reference examples for DashScope integration
- 8 tests passing (3 config + 5 video upload)
2026-05-06 13:08:19 +08:00