4.1 KiB

Raw Blame History

Phase 2 Enhancement: Use `text` field instead of `stash`

Created: 2026-05-07 Status: Planning Depends on: Phase 2 (Complete)

1. Discovery

Stash log analysis revealed DashScope partial events contain TWO text fields:

Field	Behavior	Description
`stash`	Sliding window, ~7-20 chars, replaces on each event	Latest characters recognized (raw ASR output)
`text`	Monotonically growing, never shrinks	Stable cumulative transcription of current utterance

Event sequence example:
  stash="多"               text=""             ← text empty early on
  stash="多谢"              text=""
  stash="多谢主席咁啊"        text=""
  stash="主席咁啊，亦"        text=""             ← stash slides, text still empty
  stash="咁啊，亦都多谢"      text="多谢主席咁啊亦"  ← text starts populating
  stash="都多谢邱主任"        text="多谢主席咁啊亦都多谢邱主任头先..."
  stash="point嘅详细介绍"    text="多谢主席咁啊亦都多谢邱主任头先个诶powerpoint嘅详细介绍..."

text grows monotonically — it's the stable transcription. stash slides as new audio arrives.

2. Why This Is Better

No _merge_stash needed — text is already cumulative per utterance
No overlap detection — characters never change once set
No risk of wrong merging — stashes sometimes overlap incorrectly (sliding window may lose context)
Simpler code — less logic, less surface area for bugs

3. Changes Required

3.1 Backend: `format_transcription_event` (ws_asr.py)

# BEFORE: extract stash field
if event_type == "...transcription.text":
    stash = event.get("stash", "")
    return {"delta": "", "stash": stash, ...}

# AFTER: extract text field
if event_type == "...transcription.text":
    text = event.get("text", "")
    return {"delta": "", "text": text, ...}

3.2 Backend: `read_events` (ws_asr.py)

# BEFORE: merge stashes
else:
    stash = result.pop("stash", "")
    if stash.strip():
        partial_buffer = _merge_stash(partial_buffer, stash)
    display = build_display_text(accumulated_text, partial_buffer)

# AFTER: use text directly (already cumulative)
else:
    text = result.pop("text", "")
    if text.strip():
        partial_buffer = text       # text already cumulative
    display = build_display_text(accumulated_text, partial_buffer)

3.3 Backend: Remove `_merge_stash` function

No longer needed.

3.4 Backend: Tests (`test_phase2_ws_protocol.py`)

Replace TestMergeStash class with TestTextFieldFormatting
Update partial event tests to use text field instead of stash
Verify monotonic growth (text never shrinks character-by-character)

3.5 Backend: Stash log format

Update log to capture both fields for future debugging:

_stash_logger.info(
    "seq=%d elapsed_ms=%d stash_len=%d text_len=%d stash=%r text=%r ...",
    ...
)

4. What Does NOT Change

Frontend (useVideoASR.ts) — already handles full_text correctly, no changes needed
Frontend (QueryInput.tsx) — unchanged
Pause/stop logic — unchanged
Completed event handling — unchanged (completed events already use transcript field)
partial_buffer variable — still used, just populated from text instead of merged stashes

5. Files Changed

File	Change
`backend/app/routers/ws_asr.py`	Remove `_merge_stash()`, use `text` field, update stash logging
`backend/app/test/test_phase2_ws_protocol.py`	Replace merge tests with text-field tests

6. Acceptance Criteria

text field used for partial events instead of stash
_merge_stash function removed
Text displayed in QueryInput grows monotonically (no jumping/replacing)
All 16 ws_protocol tests pass (updated)
Text persists on pause (existing behavior, unchanged)
Stash log captures both stash and text fields for reference

7. Rollback Risk

Low. Only 2 files changed, only backend. Frontend untouched. If text field behaves unexpectedly, revert to _merge_stash approach (already committed).

4.1 KiB Raw Blame History

Phase 2 Enhancement: Use text field instead of stash