refactor: remove dead _merge_stash, add Phase 3 YouTube proxy plan

- Remove _merge_stash (dead code since delta-based ASR refactor) - Replace TestMergeStash with TestTextFieldFormatting (53/53 Phase 2 tests pass) - Mark phase2_enhancement_use_text_field as Complete - Add Phase 3 YouTube live stream proxy implementation plan - README updates
2026-05-09 15:14:01 +08:00 · 2026-05-09 15:14:01 +08:00 · 09b5ea7d64
parent c8d955c45c
commit 09b5ea7d64
5 changed files with 459 additions and 45 deletions
--- a/.plans/phase2_enhancement_use_text_field.md
+++ b/.plans/phase2_enhancement_use_text_field.md
@ -1,7 +1,7 @@
 # Phase 2 Enhancement: Use `text` field instead of `stash`

 **Created:** 2026-05-07
-**Status:** Planning
+**Status:** Complete
 **Depends on:** Phase 2 (Complete)

 ---
@ -107,12 +107,12 @@ _stash_logger.info(

 ## 6. Acceptance Criteria

- [ ] `text` field used for partial events instead of `stash`
- [ ] `_merge_stash` function removed
- [ ] Text displayed in QueryInput grows monotonically (no jumping/replacing)
- [ ] All 16 ws_protocol tests pass (updated)
- [ ] Text persists on pause (existing behavior, unchanged)
- [ ] Stash log captures both `stash` and `text` fields for reference
+- [x] `text` field used for partial events instead of `stash` (already in delta implementation)
+- [x] `_merge_stash` function removed (was dead code — never called since delta refactor)
+- [x] Text displayed in QueryInput grows monotonically (no jumping/replacing) — verified via `test_text_grows_monotonically_across_events`
+- [x] All ws_protocol tests pass (13 tests, updated: removed `TestMergeStash`, added `TestTextFieldFormatting`)
+- [x] Text persists on pause (existing behavior, unchanged)
+- [x] Stash log captures both `stash` and `text` fields for reference (already present in current code)

 ## 7. Rollback Risk

--- a/.plans/phase3_youtube_proxy_plan.md
+++ b/.plans/phase3_youtube_proxy_plan.md
@ -0,0 +1,381 @@
+# Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan
+
+**Created:** 2026-05-09
+**Updated:** 2026-05-09 (user decisions incorporated)
+**Status:** Planning
+**Depends on:** Phase 1 (Complete), Phase 2 (Complete)
+
+---
+
+## 1. Overview
+
+Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in `<video>` via hls.js, routes audio through hidden `<audio>` element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
+
+**Same code works identically for live streams and VODs.**
+
+### Why Full Proxy (Not iframe)
+
+YouTube's official iframe player does not expose the audio track to Web Audio API due to cross-origin restrictions. We proxy HLS segments through our backend so the browser treats them as same-origin.
+
+### Audio Routing
+
+```
+YouTube HLS audio-only stream
+  → hls.js loads into hidden <audio> element
+  → AudioContext.createMediaElementSource(audioElement)
+  → ScriptProcessorNode (Float32 PCM)
+  → WebSocket → FastAPI → DashScope realtime ASR
+  → transcript → QueryInput
+```
+
+### Integration With Existing Pipeline
+
+This phase reuses the existing ASR infrastructure entirely:
+- `useVideoASR.ts` AudioContext graph pattern → adapted for YouTube audio element
+- `ws_asr.py` WebSocket → DashScope proxy → unchanged
+- `QueryInput.tsx` transcript display → unchanged
+- `LTTPage.tsx` layout → minor addition (source toggle)
+- RAG pipeline → unchanged
+
+---
+
+## 2. User Flow
+
+1. User selects "YouTube" source (instead of "Upload")
+2. User pastes YouTube URL → clicks "Load Stream"
+3. Backend extracts stream URLs → thumbnail shown as placeholder; video loads behind the scenes
+4. User presses play → video appears, audio routes to ASR pipeline (no auto-play)
+5. Real-time ASR transcription begins automatically on play
+6. Transcript flows into QueryInput → user can edit while streaming continues
+7. User pauses/stops → transcript stays, user edits and submits → RAG answer
+8. **"Full Transcript" button hidden for YouTube source** — real-time streaming ASR only
+9. **If HLS stream fails**: auto-retry up to 3 times with re-extracted URL → after 3 failures, show "Live stream unavailable" error
+
+---
+
+## 3. Sub-Phases
+
+### Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)
+
+Add config fields, install dependencies, create skeletons, register router.
+
+**Test:** `test_phase3_config.py`
+
+**Tasks:**
+| # | Task | File |
+|---|------|------|
+| 3.1.1 | Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl` | `core/config.py` |
+| 3.1.2 | Update `.env.example` | `.env.example` |
+| 3.1.3 | Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json` | `requirements.txt`, `package.json` |
+| 3.1.4 | Create `models/youtube.py` — `YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat` | `models/youtube.py` |
+| 3.1.5 | Create `services/youtube_service.py` stub | `services/youtube_service.py` |
+| 3.1.6 | Create `services/hls_proxy.py` stub | `services/hls_proxy.py` |
+| 3.1.7 | Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}` | `routers/youtube.py` |
+| 3.1.8 | Register router in `main.py` | `main.py` |
+| 3.1.9 | Write and pass `test_phase3_config.py` | `app/test/` |
+
+---
+
+### Phase 3.2 — YouTube URL Extraction Backend (0.5 day)
+
+yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.
+
+**Test:** `test_phase3_youtube_extract.py`
+
+**Acceptance Criteria:**
+- `POST /api/v1/youtube/extract` accepts `{"url": "https://www.youtube.com/watch?v=..."}`
+- Returns `{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url }`
+- VODs: extracts ~2–10 formats, returns best video+audio pair
+- Live streams: uses `ios` client for HLS, returns current live edge
+- Upcoming/scheduled streams: returns `is_upcoming: true` with scheduled start time
+- Invalid/private URLs: returns clear error
+- URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)
+
+**Tasks:**
+| # | Task | File |
+|---|------|------|
+| 3.2.1 | Write tests first | `app/test/test_phase3_youtube_extract.py` |
+| 3.2.2 | Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection | `services/youtube_service.py` |
+| 3.2.3 | Implement `YouTubeService._select_best_formats()` — separate video/audio from format list, prefer ≤480p | `services/youtube_service.py` |
+| 3.2.4 | Implement format URL caching with TTL | `services/youtube_service.py` |
+| 3.2.5 | Implement `POST /api/v1/youtube/extract` route | `routers/youtube.py` |
+| 3.2.6 | Run tests → pass → commit | — |
+
+---
+
+### Phase 3.3 — HLS Proxy Backend (1 day)
+
+Proxy service that rewrites HLS manifests and proxies .ts segments. StreamingResponse for minimal latency.
+
+**Reference:** mediaflow-proxy M3U8Processor pattern (line-by-line streaming, URL rewriting)
+
+**Tests:** `test_phase3_hls_proxy.py`, `test_phase3_hls_manifest.py`
+
+**Acceptance Criteria:**
+- `GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded>` — fetches upstream manifest, rewrites all segment/sub-manifest URLs to point back to our proxy, streams response
+- `GET /api/v1/youtube/proxy/segment.ts?url=<encoded>` — fetches upstream .ts segment, proxies with correct Content-Type (`video/mp2t`) and CORS headers
+- Lines rewritten: segment URIs, sub-manifest URIs, `#EXT-X-KEY:URI=`, absolute URLs
+- Lines passed through: `#EXTINF:`, `#EXT-X-TARGETDURATION`, `#EXT-X-MEDIA-SEQUENCE`, `#EXT-X-STREAM-INFO`, comments
+- Client disconnect → upstream connection closed cleanly
+- CORS headers on every response: `access-control-allow-origin: *`
+- **Upstream failure → HTTP 502 with error detail; frontend retries up to 3 times with fresh URL before showing "Service unavailable"**
+
+**Tasks:**
+| # | Task | File |
+|---|------|------|
+| 3.3.1 | Write tests first | `app/test/test_phase3_hls_proxy.py`, `app/test/test_phase3_hls_manifest.py` |
+| 3.3.2 | Implement `HLSProxyService.rewrite_manifest()` — streaming line-by-line, URL detection + rewriting | `services/hls_proxy.py` |
+| 3.3.3 | Implement `HLSProxyService.proxy_segment()` — httpx stream → StreamingResponse | `services/hls_proxy.py` |
+| 3.3.4 | Implement `GET /api/v1/youtube/proxy/{type}/{path}` route — dispatch manifest vs segment | `routers/youtube.py` |
+| 3.3.5 | Run tests → pass → commit | — |
+
+---
+
+### Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)
+
+URL input component and hls.js-based video player. Two hidden media elements: visible `<video>` (video-only, muted) and hidden `<audio>` (audio-only, for Web Audio API routing).
+
+**Tests:** `test_phase3_YouTubeInput.test.tsx`, `test_phase3_YouTubeVideoPlayer.test.tsx`
+
+**Acceptance Criteria:**
+- `YouTubeInput` accepts URL, validates format, shows loading/error states
+- `YouTubeVideoPlayer` uses `forwardRef<HTMLVideoElement>` (same pattern as `VideoPlayer`)
+- Video HLS loaded via hls.js into `<video muted>` element at 360p–480p (auto-best ≤ 480p)
+- Audio HLS loaded via hls.js into hidden `<audio>` element
+- Audio element exposes ref for parent to connect to AudioContext
+- Thumbnail displayed as placeholder until user presses play; video element replaces it on play
+- Video does NOT auto-play on load (waits for manual user play)
+- Loading spinner, error overlay, "LIVE" badge for live streams
+- **HLS error recovery**: on `hls.js` fatal error → re-extract stream URL → retry up to 3× → show "Service unavailable" on exhaustion
+- CrossOrigin="anonymous" on both elements (required for AudioContext graph)
+- No quality selector (low resolution only, sufficient for reference video)
+
+**Tasks:**
+| # | Task | File |
+|---|------|------|
+| 3.4.1 | Write tests first | `src/test/test_phase3_YouTubeInput.test.tsx`, `src/test/test_phase3_YouTubeVideoPlayer.test.tsx` |
+| 3.4.2 | Add YouTube types to `types/index.ts` | `types/index.ts` |
+| 3.4.3 | Add API functions to `lib/api.ts` | `lib/api.ts` |
+| 3.4.4 | Add TanStack Query hooks to `lib/queries.tsx` | `lib/queries.tsx` |
+| 3.4.5 | Create `components/YouTubeInput.tsx` — URL input, validation, loading/error states | `components/YouTubeInput.tsx` |
+| 3.4.6 | Create `components/YouTubeVideoPlayer.tsx` — hls.js dual-element player, forwardRef | `components/YouTubeVideoPlayer.tsx` |
+| 3.4.7 | Run tests → pass → commit | — |
+
+---
+
+### Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)
+
+Wire YouTube audio output into existing ASR pipeline. The key challenge: `useVideoASR` currently captures from `<video>` element; we need it to capture from the `<audio>` element loaded by hls.js.
+
+**Tests:** `test_phase3_useYouTubeASR.test.ts`, `test_phase3_LTTPage_integration.test.tsx`
+
+**Acceptance Criteria:**
+- `useYouTubeASR` hook: accepts `audioElement` ref, sets up AudioContext graph on mount
+- AudioContext.createMediaElementSource(audioElement) → ScriptProcessorNode → WebSocket
+- Auto-starts ASR on play, stops on pause/end (same lifecycle as `useVideoASR`)
+- Transcript flows into QueryInput (same `onFinalTranscript` callback)
+- QueryInput remains editable during streaming — user can type corrections while ASR appends
+- "Full Transcript" button hidden when YouTube source is active
+- Switching between "Upload" and "YouTube" sources clears previous state
+
+**Tasks:**
+| # | Task | File |
+|---|------|------|
+| 3.5.1 | Write tests first | `src/test/test_phase3_useYouTubeASR.test.ts` |
+| 3.5.2 | Create `hooks/useYouTubeASR.ts` — adapted from `useVideoASR.ts`, targets `<audio>` element | `hooks/useYouTubeASR.ts` |
+| 3.5.3 | Update `QueryInput.tsx` — accept transcript from either source | `components/QueryInput.tsx` |
+| 3.5.4 | Update `LTTPage.tsx` — add source toggle (Upload / YouTube), wire YouTubeInput + YouTubeVideoPlayer | `pages/LTTPage.tsx` |
+| 3.5.5 | Create `test_phase3_LTTPage_integration.test.tsx` | `src/test/` |
+| 3.5.6 | Run tests → pass → commit | — |
+
+---
+
+### Phase 3.6 — Integration & Acceptance Testing (1 day)
+
+**Tests:** `test_integration_phase3.py`, `test_acceptance_phase3_youtube.py`, `test_acceptance_phase3_live.py`
+
+**Tasks:**
+| # | Task |
+|---|------|
+| 3.6.1 | Implement integration test (mocked yt-dlp, real httpx proxy + hls.js) |
+| 3.6.2 | Implement acceptance: real YouTube VOD → extract → proxy → play |
+| 3.6.3 | Implement acceptance: real YouTube live stream → extract → proxy → play + ASR |
+| 3.6.4 | Full regression run (Phase 1 + 2 + 3 tests) |
+| 3.6.5 | Fix failures, final commit |
+
+---
+
+### Phase 3.7 — Polish & Deployment (0.5 day)
+
+| # | Task |
+|---|------|
+| 3.7.1 | Handle PO token expiration for live streams (log warning, auto-re-extract on failure) |
+| 3.7.2 | Update Dockerfile — ensure ffmpeg + yt-dlp available in container |
+| 3.7.3 | Update `docker-compose.yml` — add any new volumes/env vars |
+| 3.7.4 | Verify production build (`npm run build`, `docker compose up -d --build`) |
+| 3.7.5 | Update `README.md` — YouTube feature section |
+| 3.7.6 | Update `development_plan.md` — mark Phase 3 status |
+| 3.7.7 | Final commit |
+
+---
+
+## 4. Timeline
+
+| Sub-Phase | Description | Effort | Depends On |
+|---|---|---|---|
+| 3.1 | Config & Infrastructure | 0.5 day | — |
+| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 |
+| 3.3 | HLS Proxy Backend | 1 day | 3.1 |
+| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 |
+| 3.5 | YouTube → ASR Integration | 1 day | 3.4 |
+| 3.6 | Integration & Acceptance | 1 day | 3.5 |
+| 3.7 | Polish & Deployment | 0.5 day | 3.6 |
+| **Total** | | **5.5 days** | |
+
+3.2 (extraction) and 3.3 (proxy) can run concurrently.
+
+---
+
+## 5. Dependencies
+
+**Backend:** `yt-dlp>=2024.0.0` (new), `httpx>=0.26.0` (already present), `aiofiles>=24.0.0` (already present)
+**Frontend:** `hls.js@^1.5.0` (new — NOT present, must install)
+**System:** ffmpeg on server (already required by Phase 2)
+
+---
+
+## 6. Config Fields
+
+```python
+# YouTube live stream proxy (Phase 3)
+youtube_proxy_enabled: bool = True
+yt_dlp_timeout: int = 30          # seconds for yt-dlp extraction
+yt_dlp_cache_ttl: int = 300       # seconds to cache extraction results
+```
+
+```bash
+# .env.example additions
+YOUTUBE_PROXY_ENABLED=true
+YT_DLP_TIMEOUT=30
+YT_DLP_CACHE_TTL=300
+```
+
+---
+
+## 7. Key Design Decisions
+
+| Decision | Choice | Why |
+|---|---|---|
+| Streaming protocol | HLS (m3u8) | hls.js plays it natively; DASH requires dash.js |
+| yt-dlp client | `ios` for live, `web` for VOD | `ios` returns HLS for live streams with 60fps support; format selector prefers ≤480p |
+| HTTP client for proxy | httpx (already present) | Streaming support via `httpx.stream()`; no new dependency |
+| Manifest rewriting | Line-by-line streaming | Live manifests can be large; never buffer whole file |
+| Audio element | Hidden `<audio>` + hls.js | `createMediaElementSource` works on `<audio>` elements |
+| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min |
+| **Full Transcript for YouTube** | **Disabled** | Button hidden; real-time streaming ASR only |
+| **QueryInput during streaming** | **Editable** | User can type corrections while transcript streams (same as existing ASR) |
+| **Video quality** | **360p–480p auto-best** | Low resolution sufficient for reference; no quality selector |
+| **Auto-play on load** | **Wait for manual play** | Thumbnail placeholder; user presses play. Respects autoplay policy. |
+| **Thumbnail** | **Stays until user presses play** | Clean transition; no black frame |
+| **Error recovery** | **Retry 3× → "Service unavailable"** | Auto-re-extract URL on HLS failure; after 3 failures, show error state |
+| **PO Tokens (live streams)** | **Graceful degradation for MVP** | Stream first ~30s; on failure retry 3× with fresh URL; after exhaustion show "Live stream unavailable" |
+
+---
+
+## 8. File Manifest
+
+### New Files
+```
+backend/
+  app/models/youtube.py
+  app/services/youtube_service.py
+  app/services/hls_proxy.py
+  app/routers/youtube.py
+  app/test/test_phase3_config.py
+  app/test/test_phase3_youtube_extract.py
+  app/test/test_phase3_hls_proxy.py
+  app/test/test_phase3_hls_manifest.py
+  app/test/test_integration_phase3.py
+  app/test/acceptance/test_acceptance_phase3_youtube.py
+  app/test/acceptance/test_acceptance_phase3_live.py
+
+frontend/src/
+  components/YouTubeInput.tsx
+  components/YouTubeVideoPlayer.tsx
+  hooks/useYouTubeASR.ts
+  test/test_phase3_YouTubeInput.test.tsx
+  test/test_phase3_YouTubeVideoPlayer.test.tsx
+  test/test_phase3_useYouTubeASR.test.ts
+  test/test_phase3_LTTPage_integration.test.tsx
+```
+
+### Modified Files
+```
+backend/app/core/config.py                     # Add 3 config fields
+backend/.env.example                            # Add 3 env vars
+backend/main.py                                 # Register youtube router
+backend/requirements.txt                        # Add yt-dlp
+
+frontend/package.json                           # Add hls.js
+frontend/src/types/index.ts                     # Add YouTube types
+frontend/src/lib/api.ts                         # Add extractYouTube(), getYouTubeProxyUrl()
+frontend/src/lib/queries.tsx                    # Add useYouTubeExtract() mutation
+frontend/src/pages/LTTPage.tsx                  # Add source toggle + YouTube components
+frontend/src/components/QueryInput.tsx          # Accept transcript from either source
+
+Dockerfile                                      # Add yt-dlp install step
+docker-compose.yml                              # Add env vars if needed
+README.md                                       # YouTube feature section
+development_plan.md                             # Mark Phase 3 status
+```
+
+---
+
+## 9. Known Risks & Mitigations
+
+| Risk | Impact | Mitigation |
+|---|---|---|
+| PO Token expiration (live streams cut at 30s) | High — live streams unusable without token | Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify |
+| yt-dlp extraction slow (2-5s) | Medium — poor UX on "Load Stream" click | Cache results with TTL; show progress indicator |
+| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance |
+| hls.js audio sync drift vs video | Low — separate streams may drift | hls.js `liveSyncDuration` keeps both near live edge; test with 10+ min streams |
+| Safari `createMediaElementSource` on HLS | Low — known Safari bug with native HLS | hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected |
+| YouTube ToS for proxy | Low for internal demo | Personal/enterprise internal demo is generally fine; review for public product |
+
+---
+
+## 10. Example Data Flow
+
+```
+POST /api/v1/youtube/extract
+  Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
+  Response: {
+    "video_id": "dQw4w9WgXcQ",
+    "title": "Rick Astley - Never Gonna Give You Up",
+    "is_live": false,
+    "video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
+    "audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
+    "thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
+  }
+
+GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
+  → Fetches upstream manifest from googlevideo.com
+  → Rewrites segment URLs:
+      segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
+  → Streams rewritten manifest to browser
+
+GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
+  → Fetches upstream .ts segment via httpx.stream()
+  → StreamingResponse with Content-Type: video/mp2t
+  → CORS: access-control-allow-origin: *
+```
+
+---
+
+## 11. References
+
+- **mediaflow-proxy**: Production FastAPI HLS proxy with M3U8Processor — [mhdzumair/mediaflow-proxy](https://github.com/mhdzumair/mediaflow-proxy)
+- **yt-dlp API docs**: [yt-dlp-yt-dlp.mintlify.app](https://yt-dlp-yt-dlp.mintlify.app/api/extractors)
+- **hls.js API docs**: [github.com/video-dev/hls.js/blob/master/docs/API.md](https://github.com/video-dev/hls.js/blob/master/docs/API.md)
+- **hls.js low-latency live**: `lowLatencyMode: true`, `liveSyncDuration: 1.5`
+- **Existing code patterns**: `.plans/phase2_implementation_plan.md`, `backend/app/routers/video.py`, `frontend/src/hooks/useVideoASR.ts`
--- a/README.md
+++ b/README.md
@ -187,14 +187,14 @@ docker run -d --name legco_test -p 8888:8000 \
  -e HISTORY_DB_PATH=./data/history.db \
  -e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
  -e DASHSCOPE_API_KEY=your_dashscope_key \
-  -e ASR_MODEL_NAME=qwen3-asr-flash \
-  -e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime \
+  -e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
+  -e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
  -e VIDEO_UPLOAD_DIR=./uploads \
  -e MAX_VIDEO_SIZE_MB=300 \
  -v ~/woody/legco/data/chroma_db:/app/chroma_db \
  -v ~/woody/legco/data/document_chunk:/app/document_chunk \
  -v ~/woody/legco/data/data:/app/data \
-  legco_reranker:amd64
+  legco_reranker:amd64.01.02

 # Verify
 curl http://localhost:8888/health
--- a/backend/app/routers/ws_asr.py
+++ b/backend/app/routers/ws_asr.py
@ -56,17 +56,6 @@ class DashScopeCallback(OmniRealtimeCallback):
        logger.info("dashscope-connection-closed code=%s msg=%s", code, msg)


-def _merge_stash(partial_buffer: str, new_stash: str) -> str:
-    if not new_stash.strip():
-        return partial_buffer
-    if not partial_buffer:
-        return new_stash
-    for i in range(min(len(partial_buffer), len(new_stash)), 0, -1):
-        if partial_buffer[-i:] == new_stash[:i]:
-            return partial_buffer + new_stash[i:]
-    return partial_buffer + " " + new_stash
-
-
 def format_transcription_event(event: dict, accumulated: str) -> dict | None:
    event_type = event.get("type", "")

--- a/backend/app/test/test_phase2_ws_protocol.py
+++ b/backend/app/test/test_phase2_ws_protocol.py
@ -75,42 +75,86 @@ class TestDashScopeCallback:
        loop.close()


-class TestMergeStash:
-    def test_merge_empty_buffer_returns_stash(self):
-        from app.routers.ws_asr import _merge_stash
+class TestTextFieldFormatting:
+    """Verify text field (stable cumulative) replaces stash merge logic.
+    
+    DashScope partial events contain:
+    - text: monotonically growing stable transcription (never shrinks)
+    - stash: sliding window of latest chars (raw, unstable)
+    
+    The production code uses text for delta computation and passes
+    stash through for frontend's trailing-char completion on pause.
+    _merge_stash has been removed — text is already cumulative.
+    """

-        assert _merge_stash("", "你好") == "你好"
+    def test_text_field_present_and_distinct_from_stash(self):
+        """text field is the stable prefix; stash is the trailing window."""
+        from app.routers.ws_asr import format_transcription_event

-    def test_merge_overlapping_suffix(self):
-        from app.routers.ws_asr import _merge_stash
+        event = {
+            "type": "conversation.item.input_audio_transcription.text",
+            "text": "多謝主席咁啊亦都多謝",
+            "stash": "邱主任",
+            "language": "yue",
+        }

-        assert _merge_stash("系多謝主席", "主席咁咧呢個") == "系多謝主席咁咧呢個"
+        result = format_transcription_event(event, "")
+        assert result is not None
+        assert not result["is_final"]
+        assert result["text"] == "多謝主席咁啊亦都多謝"
+        assert result["stash"] == "邱主任"

-    def test_merge_overlapping_single_char(self):
-        from app.routers.ws_asr import _merge_stash
+    def test_text_grows_monotonically_across_events(self):
+        """text field should never lose characters between successive events."""
+        from app.routers.ws_asr import format_transcription_event

-        assert _merge_stash("abcde", "efgh") == "abcdefgh"
+        events = [
+            {"type": "conversation.item.input_audio_transcription.text", "text": "多謝主席", "stash": "席咁啊", "language": "yue"},
+            {"type": "conversation.item.input_audio_transcription.text", "text": "多謝主席咁啊", "stash": "啊亦都", "language": "yue"},
+            {"type": "conversation.item.input_audio_transcription.text", "text": "多謝主席咁啊亦都多謝", "stash": "邱主任", "language": "yue"},
+        ]

-    def test_merge_no_overlap_appends_with_space(self):
-        from app.routers.ws_asr import _merge_stash
+        prev_text = ""
+        for event in events:
+            result = format_transcription_event(event, "")
+            assert result is not None
+            current_text = result["text"]
+            assert current_text.startswith(prev_text) if prev_text else True, (
+                f"text regressed: '{current_text}' does not start with '{prev_text}'"
+            )
+            prev_text = current_text

-        assert _merge_stash("你好", "世界") == "你好 世界"
+    def test_text_empty_early_on(self):
+        """Early in an utterance, text may be empty while stash has content."""
+        from app.routers.ws_asr import format_transcription_event

-    def test_merge_stash_subset_of_buffer(self):
-        from app.routers.ws_asr import _merge_stash
+        event = {
+            "type": "conversation.item.input_audio_transcription.text",
+            "text": "",
+            "stash": "多謝主席",
+            "language": "yue",
+        }

-        assert _merge_stash("系多謝主席咁咧", "咧呢") == "系多謝主席咁咧呢"
+        result = format_transcription_event(event, "")
+        assert result is not None
+        assert result["text"] == ""
+        assert result["stash"] == "多謝主席"

-    def test_merge_empty_stash_preserves_buffer(self):
-        from app.routers.ws_asr import _merge_stash
+    def test_text_empty_stash_only_is_still_valid(self):
+        """Both fields empty is a valid transient state."""
+        from app.routers.ws_asr import format_transcription_event

-        assert _merge_stash("你好", "") == "你好"
-        assert _merge_stash("", "") == ""
+        event = {
+            "type": "conversation.item.input_audio_transcription.text",
+            "text": "",
+            "stash": "",
+            "language": "yue",
+        }

-    def test_merge_whitespace_only_stash_preserves_buffer(self):
-        from app.routers.ws_asr import _merge_stash
-
-        assert _merge_stash("你好", "   ") == "你好"
+        result = format_transcription_event(event, "")
+        assert result is not None
+        assert result["text"] == ""
+        assert result["stash"] == ""


 class TestProxyFormatsTranscriptionTextEvent: