4.1 KiB
4.1 KiB
Phase 2 Enhancement: Use text field instead of stash
Created: 2026-05-07 Status: Planning Depends on: Phase 2 (Complete)
1. Discovery
Stash log analysis revealed DashScope partial events contain TWO text fields:
| Field | Behavior | Description |
|---|---|---|
stash |
Sliding window, ~7-20 chars, replaces on each event | Latest characters recognized (raw ASR output) |
text |
Monotonically growing, never shrinks | Stable cumulative transcription of current utterance |
Event sequence example:
stash="多" text="" ← text empty early on
stash="多谢" text=""
stash="多谢主席咁啊" text=""
stash="主席咁啊,亦" text="" ← stash slides, text still empty
stash="咁啊,亦都多谢" text="多谢主席咁啊亦" ← text starts populating
stash="都多谢邱主任" text="多谢主席咁啊亦都多谢邱主任头先..."
stash="point嘅详细介绍" text="多谢主席咁啊亦都多谢邱主任头先个诶powerpoint嘅详细介绍..."
text grows monotonically — it's the stable transcription. stash slides as new audio arrives.
2. Why This Is Better
- No
_merge_stashneeded —textis already cumulative per utterance - No overlap detection — characters never change once set
- No risk of wrong merging — stashes sometimes overlap incorrectly (sliding window may lose context)
- Simpler code — less logic, less surface area for bugs
3. Changes Required
3.1 Backend: format_transcription_event (ws_asr.py)
# BEFORE: extract stash field
if event_type == "...transcription.text":
stash = event.get("stash", "")
return {"delta": "", "stash": stash, ...}
# AFTER: extract text field
if event_type == "...transcription.text":
text = event.get("text", "")
return {"delta": "", "text": text, ...}
3.2 Backend: read_events (ws_asr.py)
# BEFORE: merge stashes
else:
stash = result.pop("stash", "")
if stash.strip():
partial_buffer = _merge_stash(partial_buffer, stash)
display = build_display_text(accumulated_text, partial_buffer)
# AFTER: use text directly (already cumulative)
else:
text = result.pop("text", "")
if text.strip():
partial_buffer = text # text already cumulative
display = build_display_text(accumulated_text, partial_buffer)
3.3 Backend: Remove _merge_stash function
No longer needed.
3.4 Backend: Tests (test_phase2_ws_protocol.py)
- Replace
TestMergeStashclass withTestTextFieldFormatting - Update partial event tests to use
textfield instead ofstash - Verify monotonic growth (text never shrinks character-by-character)
3.5 Backend: Stash log format
Update log to capture both fields for future debugging:
_stash_logger.info(
"seq=%d elapsed_ms=%d stash_len=%d text_len=%d stash=%r text=%r ...",
...
)
4. What Does NOT Change
- Frontend (
useVideoASR.ts) — already handlesfull_textcorrectly, no changes needed - Frontend (
QueryInput.tsx) — unchanged - Pause/stop logic — unchanged
- Completed event handling — unchanged (completed events already use
transcriptfield) partial_buffervariable — still used, just populated fromtextinstead of merged stashes
5. Files Changed
| File | Change |
|---|---|
backend/app/routers/ws_asr.py |
Remove _merge_stash(), use text field, update stash logging |
backend/app/test/test_phase2_ws_protocol.py |
Replace merge tests with text-field tests |
6. Acceptance Criteria
textfield used for partial events instead ofstash_merge_stashfunction removed- Text displayed in QueryInput grows monotonically (no jumping/replacing)
- All 16 ws_protocol tests pass (updated)
- Text persists on pause (existing behavior, unchanged)
- Stash log captures both
stashandtextfields for reference
7. Rollback Risk
Low. Only 2 files changed, only backend. Frontend untouched. If text field behaves unexpectedly, revert to _merge_stash approach (already committed).