# Phase 2 Enhancement: Use `text` field instead of `stash`

**Created:** 2026-05-07
**Status:** Planning
**Depends on:** Phase 2 (Complete)

---

## 1. Discovery

Stash log analysis revealed DashScope partial events contain TWO text fields:

| Field | Behavior | Description |
|-------|----------|-------------|
| `stash` | Sliding window, ~7-20 chars, replaces on each event | Latest characters recognized (raw ASR output) |
| `text` | Monotonically growing, **never shrinks** | Stable cumulative transcription of current utterance |

```
Event sequence example:
  stash="多"               text=""             ← text empty early on
  stash="多谢"              text=""
  stash="多谢主席咁啊"        text=""
  stash="主席咁啊，亦"        text=""             ← stash slides, text still empty
  stash="咁啊，亦都多谢"      text="多谢主席咁啊亦"  ← text starts populating
  stash="都多谢邱主任"        text="多谢主席咁啊亦都多谢邱主任头先..."
  stash="point嘅详细介绍"    text="多谢主席咁啊亦都多谢邱主任头先个诶powerpoint嘅详细介绍..."
```

`text` grows monotonically — it's the stable transcription. `stash` slides as new audio arrives.

## 2. Why This Is Better

- **No `_merge_stash` needed** — `text` is already cumulative per utterance
- **No overlap detection** — characters never change once set
- **No risk of wrong merging** — stashes sometimes overlap incorrectly (sliding window may lose context)
- **Simpler code** — less logic, less surface area for bugs

## 3. Changes Required

### 3.1 Backend: `format_transcription_event` (ws_asr.py)

```python
# BEFORE: extract stash field
if event_type == "...transcription.text":
    stash = event.get("stash", "")
    return {"delta": "", "stash": stash, ...}

# AFTER: extract text field
if event_type == "...transcription.text":
    text = event.get("text", "")
    return {"delta": "", "text": text, ...}
```

### 3.2 Backend: `read_events` (ws_asr.py)

```python
# BEFORE: merge stashes
else:
    stash = result.pop("stash", "")
    if stash.strip():
        partial_buffer = _merge_stash(partial_buffer, stash)
    display = build_display_text(accumulated_text, partial_buffer)

# AFTER: use text directly (already cumulative)
else:
    text = result.pop("text", "")
    if text.strip():
        partial_buffer = text       # text already cumulative
    display = build_display_text(accumulated_text, partial_buffer)
```

### 3.3 Backend: Remove `_merge_stash` function

No longer needed.

### 3.4 Backend: Tests (`test_phase2_ws_protocol.py`)

- Replace `TestMergeStash` class with `TestTextFieldFormatting`
- Update partial event tests to use `text` field instead of `stash`
- Verify monotonic growth (text never shrinks character-by-character)

### 3.5 Backend: Stash log format

Update log to capture both fields for future debugging:

```python
_stash_logger.info(
    "seq=%d elapsed_ms=%d stash_len=%d text_len=%d stash=%r text=%r ...",
    ...
)
```

## 4. What Does NOT Change

- Frontend (`useVideoASR.ts`) — already handles `full_text` correctly, no changes needed
- Frontend (`QueryInput.tsx`) — unchanged
- Pause/stop logic — unchanged
- Completed event handling — unchanged (completed events already use `transcript` field)
- `partial_buffer` variable — still used, just populated from `text` instead of merged stashes

## 5. Files Changed

| File | Change |
|------|--------|
| `backend/app/routers/ws_asr.py` | Remove `_merge_stash()`, use `text` field, update stash logging |
| `backend/app/test/test_phase2_ws_protocol.py` | Replace merge tests with text-field tests |

## 6. Acceptance Criteria

- [ ] `text` field used for partial events instead of `stash`
- [ ] `_merge_stash` function removed
- [ ] Text displayed in QueryInput grows monotonically (no jumping/replacing)
- [ ] All 16 ws_protocol tests pass (updated)
- [ ] Text persists on pause (existing behavior, unchanged)
- [ ] Stash log captures both `stash` and `text` fields for reference

## 7. Rollback Risk

Low. Only 2 files changed, only backend. Frontend untouched. If `text` field behaves unexpectedly, revert to `_merge_stash` approach (already committed).