# Phase 2 Enhancement: Use `text` field instead of `stash` **Created:** 2026-05-07 **Status:** Planning **Depends on:** Phase 2 (Complete) --- ## 1. Discovery Stash log analysis revealed DashScope partial events contain TWO text fields: | Field | Behavior | Description | |-------|----------|-------------| | `stash` | Sliding window, ~7-20 chars, replaces on each event | Latest characters recognized (raw ASR output) | | `text` | Monotonically growing, **never shrinks** | Stable cumulative transcription of current utterance | ``` Event sequence example: stash="多" text="" ← text empty early on stash="多谢" text="" stash="多谢主席咁啊" text="" stash="主席咁啊,亦" text="" ← stash slides, text still empty stash="咁啊,亦都多谢" text="多谢主席咁啊亦" ← text starts populating stash="都多谢邱主任" text="多谢主席咁啊亦都多谢邱主任头先..." stash="point嘅详细介绍" text="多谢主席咁啊亦都多谢邱主任头先个诶powerpoint嘅详细介绍..." ``` `text` grows monotonically — it's the stable transcription. `stash` slides as new audio arrives. ## 2. Why This Is Better - **No `_merge_stash` needed** — `text` is already cumulative per utterance - **No overlap detection** — characters never change once set - **No risk of wrong merging** — stashes sometimes overlap incorrectly (sliding window may lose context) - **Simpler code** — less logic, less surface area for bugs ## 3. Changes Required ### 3.1 Backend: `format_transcription_event` (ws_asr.py) ```python # BEFORE: extract stash field if event_type == "...transcription.text": stash = event.get("stash", "") return {"delta": "", "stash": stash, ...} # AFTER: extract text field if event_type == "...transcription.text": text = event.get("text", "") return {"delta": "", "text": text, ...} ``` ### 3.2 Backend: `read_events` (ws_asr.py) ```python # BEFORE: merge stashes else: stash = result.pop("stash", "") if stash.strip(): partial_buffer = _merge_stash(partial_buffer, stash) display = build_display_text(accumulated_text, partial_buffer) # AFTER: use text directly (already cumulative) else: text = result.pop("text", "") if text.strip(): partial_buffer = text # text already cumulative display = build_display_text(accumulated_text, partial_buffer) ``` ### 3.3 Backend: Remove `_merge_stash` function No longer needed. ### 3.4 Backend: Tests (`test_phase2_ws_protocol.py`) - Replace `TestMergeStash` class with `TestTextFieldFormatting` - Update partial event tests to use `text` field instead of `stash` - Verify monotonic growth (text never shrinks character-by-character) ### 3.5 Backend: Stash log format Update log to capture both fields for future debugging: ```python _stash_logger.info( "seq=%d elapsed_ms=%d stash_len=%d text_len=%d stash=%r text=%r ...", ... ) ``` ## 4. What Does NOT Change - Frontend (`useVideoASR.ts`) — already handles `full_text` correctly, no changes needed - Frontend (`QueryInput.tsx`) — unchanged - Pause/stop logic — unchanged - Completed event handling — unchanged (completed events already use `transcript` field) - `partial_buffer` variable — still used, just populated from `text` instead of merged stashes ## 5. Files Changed | File | Change | |------|--------| | `backend/app/routers/ws_asr.py` | Remove `_merge_stash()`, use `text` field, update stash logging | | `backend/app/test/test_phase2_ws_protocol.py` | Replace merge tests with text-field tests | ## 6. Acceptance Criteria - [ ] `text` field used for partial events instead of `stash` - [ ] `_merge_stash` function removed - [ ] Text displayed in QueryInput grows monotonically (no jumping/replacing) - [ ] All 16 ws_protocol tests pass (updated) - [ ] Text persists on pause (existing behavior, unchanged) - [ ] Stash log captures both `stash` and `text` fields for reference ## 7. Rollback Risk Low. Only 2 files changed, only backend. Frontend untouched. If `text` field behaves unexpectedly, revert to `_merge_stash` approach (already committed).