refactor: remove dead _merge_stash, add Phase 3 YouTube proxy plan

- Remove _merge_stash (dead code since delta-based ASR refactor)
- Replace TestMergeStash with TestTextFieldFormatting (53/53 Phase 2 tests pass)
- Mark phase2_enhancement_use_text_field as Complete
- Add Phase 3 YouTube live stream proxy implementation plan
- README updates
This commit is contained in:
Woody 2026-05-09 15:14:01 +08:00
parent c8d955c45c
commit 09b5ea7d64
5 changed files with 459 additions and 45 deletions

View File

@ -1,7 +1,7 @@
# Phase 2 Enhancement: Use `text` field instead of `stash`
**Created:** 2026-05-07
**Status:** Planning
**Status:** Complete
**Depends on:** Phase 2 (Complete)
---
@ -107,12 +107,12 @@ _stash_logger.info(
## 6. Acceptance Criteria
- [ ] `text` field used for partial events instead of `stash`
- [ ] `_merge_stash` function removed
- [ ] Text displayed in QueryInput grows monotonically (no jumping/replacing)
- [ ] All 16 ws_protocol tests pass (updated)
- [ ] Text persists on pause (existing behavior, unchanged)
- [ ] Stash log captures both `stash` and `text` fields for reference
- [x] `text` field used for partial events instead of `stash` (already in delta implementation)
- [x] `_merge_stash` function removed (was dead code — never called since delta refactor)
- [x] Text displayed in QueryInput grows monotonically (no jumping/replacing) — verified via `test_text_grows_monotonically_across_events`
- [x] All ws_protocol tests pass (13 tests, updated: removed `TestMergeStash`, added `TestTextFieldFormatting`)
- [x] Text persists on pause (existing behavior, unchanged)
- [x] Stash log captures both `stash` and `text` fields for reference (already present in current code)
## 7. Rollback Risk

View File

@ -0,0 +1,381 @@
# Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan
**Created:** 2026-05-09
**Updated:** 2026-05-09 (user decisions incorporated)
**Status:** Planning
**Depends on:** Phase 1 (Complete), Phase 2 (Complete)
---
## 1. Overview
Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in `<video>` via hls.js, routes audio through hidden `<audio>` element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
**Same code works identically for live streams and VODs.**
### Why Full Proxy (Not iframe)
YouTube's official iframe player does not expose the audio track to Web Audio API due to cross-origin restrictions. We proxy HLS segments through our backend so the browser treats them as same-origin.
### Audio Routing
```
YouTube HLS audio-only stream
→ hls.js loads into hidden <audio> element
→ AudioContext.createMediaElementSource(audioElement)
→ ScriptProcessorNode (Float32 PCM)
→ WebSocket → FastAPI → DashScope realtime ASR
→ transcript → QueryInput
```
### Integration With Existing Pipeline
This phase reuses the existing ASR infrastructure entirely:
- `useVideoASR.ts` AudioContext graph pattern → adapted for YouTube audio element
- `ws_asr.py` WebSocket → DashScope proxy → unchanged
- `QueryInput.tsx` transcript display → unchanged
- `LTTPage.tsx` layout → minor addition (source toggle)
- RAG pipeline → unchanged
---
## 2. User Flow
1. User selects "YouTube" source (instead of "Upload")
2. User pastes YouTube URL → clicks "Load Stream"
3. Backend extracts stream URLs → thumbnail shown as placeholder; video loads behind the scenes
4. User presses play → video appears, audio routes to ASR pipeline (no auto-play)
5. Real-time ASR transcription begins automatically on play
6. Transcript flows into QueryInput → user can edit while streaming continues
7. User pauses/stops → transcript stays, user edits and submits → RAG answer
8. **"Full Transcript" button hidden for YouTube source** — real-time streaming ASR only
9. **If HLS stream fails**: auto-retry up to 3 times with re-extracted URL → after 3 failures, show "Live stream unavailable" error
---
## 3. Sub-Phases
### Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)
Add config fields, install dependencies, create skeletons, register router.
**Test:** `test_phase3_config.py`
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.1.1 | Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl` | `core/config.py` |
| 3.1.2 | Update `.env.example` | `.env.example` |
| 3.1.3 | Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json` | `requirements.txt`, `package.json` |
| 3.1.4 | Create `models/youtube.py``YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat` | `models/youtube.py` |
| 3.1.5 | Create `services/youtube_service.py` stub | `services/youtube_service.py` |
| 3.1.6 | Create `services/hls_proxy.py` stub | `services/hls_proxy.py` |
| 3.1.7 | Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}` | `routers/youtube.py` |
| 3.1.8 | Register router in `main.py` | `main.py` |
| 3.1.9 | Write and pass `test_phase3_config.py` | `app/test/` |
---
### Phase 3.2 — YouTube URL Extraction Backend (0.5 day)
yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.
**Test:** `test_phase3_youtube_extract.py`
**Acceptance Criteria:**
- `POST /api/v1/youtube/extract` accepts `{"url": "https://www.youtube.com/watch?v=..."}`
- Returns `{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url }`
- VODs: extracts ~210 formats, returns best video+audio pair
- Live streams: uses `ios` client for HLS, returns current live edge
- Upcoming/scheduled streams: returns `is_upcoming: true` with scheduled start time
- Invalid/private URLs: returns clear error
- URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.2.1 | Write tests first | `app/test/test_phase3_youtube_extract.py` |
| 3.2.2 | Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection | `services/youtube_service.py` |
| 3.2.3 | Implement `YouTubeService._select_best_formats()` — separate video/audio from format list, prefer ≤480p | `services/youtube_service.py` |
| 3.2.4 | Implement format URL caching with TTL | `services/youtube_service.py` |
| 3.2.5 | Implement `POST /api/v1/youtube/extract` route | `routers/youtube.py` |
| 3.2.6 | Run tests → pass → commit | — |
---
### Phase 3.3 — HLS Proxy Backend (1 day)
Proxy service that rewrites HLS manifests and proxies .ts segments. StreamingResponse for minimal latency.
**Reference:** mediaflow-proxy M3U8Processor pattern (line-by-line streaming, URL rewriting)
**Tests:** `test_phase3_hls_proxy.py`, `test_phase3_hls_manifest.py`
**Acceptance Criteria:**
- `GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded>` — fetches upstream manifest, rewrites all segment/sub-manifest URLs to point back to our proxy, streams response
- `GET /api/v1/youtube/proxy/segment.ts?url=<encoded>` — fetches upstream .ts segment, proxies with correct Content-Type (`video/mp2t`) and CORS headers
- Lines rewritten: segment URIs, sub-manifest URIs, `#EXT-X-KEY:URI=`, absolute URLs
- Lines passed through: `#EXTINF:`, `#EXT-X-TARGETDURATION`, `#EXT-X-MEDIA-SEQUENCE`, `#EXT-X-STREAM-INFO`, comments
- Client disconnect → upstream connection closed cleanly
- CORS headers on every response: `access-control-allow-origin: *`
- **Upstream failure → HTTP 502 with error detail; frontend retries up to 3 times with fresh URL before showing "Service unavailable"**
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.3.1 | Write tests first | `app/test/test_phase3_hls_proxy.py`, `app/test/test_phase3_hls_manifest.py` |
| 3.3.2 | Implement `HLSProxyService.rewrite_manifest()` — streaming line-by-line, URL detection + rewriting | `services/hls_proxy.py` |
| 3.3.3 | Implement `HLSProxyService.proxy_segment()` — httpx stream → StreamingResponse | `services/hls_proxy.py` |
| 3.3.4 | Implement `GET /api/v1/youtube/proxy/{type}/{path}` route — dispatch manifest vs segment | `routers/youtube.py` |
| 3.3.5 | Run tests → pass → commit | — |
---
### Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)
URL input component and hls.js-based video player. Two hidden media elements: visible `<video>` (video-only, muted) and hidden `<audio>` (audio-only, for Web Audio API routing).
**Tests:** `test_phase3_YouTubeInput.test.tsx`, `test_phase3_YouTubeVideoPlayer.test.tsx`
**Acceptance Criteria:**
- `YouTubeInput` accepts URL, validates format, shows loading/error states
- `YouTubeVideoPlayer` uses `forwardRef<HTMLVideoElement>` (same pattern as `VideoPlayer`)
- Video HLS loaded via hls.js into `<video muted>` element at 360p480p (auto-best ≤ 480p)
- Audio HLS loaded via hls.js into hidden `<audio>` element
- Audio element exposes ref for parent to connect to AudioContext
- Thumbnail displayed as placeholder until user presses play; video element replaces it on play
- Video does NOT auto-play on load (waits for manual user play)
- Loading spinner, error overlay, "LIVE" badge for live streams
- **HLS error recovery**: on `hls.js` fatal error → re-extract stream URL → retry up to 3× → show "Service unavailable" on exhaustion
- CrossOrigin="anonymous" on both elements (required for AudioContext graph)
- No quality selector (low resolution only, sufficient for reference video)
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.4.1 | Write tests first | `src/test/test_phase3_YouTubeInput.test.tsx`, `src/test/test_phase3_YouTubeVideoPlayer.test.tsx` |
| 3.4.2 | Add YouTube types to `types/index.ts` | `types/index.ts` |
| 3.4.3 | Add API functions to `lib/api.ts` | `lib/api.ts` |
| 3.4.4 | Add TanStack Query hooks to `lib/queries.tsx` | `lib/queries.tsx` |
| 3.4.5 | Create `components/YouTubeInput.tsx` — URL input, validation, loading/error states | `components/YouTubeInput.tsx` |
| 3.4.6 | Create `components/YouTubeVideoPlayer.tsx` — hls.js dual-element player, forwardRef | `components/YouTubeVideoPlayer.tsx` |
| 3.4.7 | Run tests → pass → commit | — |
---
### Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)
Wire YouTube audio output into existing ASR pipeline. The key challenge: `useVideoASR` currently captures from `<video>` element; we need it to capture from the `<audio>` element loaded by hls.js.
**Tests:** `test_phase3_useYouTubeASR.test.ts`, `test_phase3_LTTPage_integration.test.tsx`
**Acceptance Criteria:**
- `useYouTubeASR` hook: accepts `audioElement` ref, sets up AudioContext graph on mount
- AudioContext.createMediaElementSource(audioElement) → ScriptProcessorNode → WebSocket
- Auto-starts ASR on play, stops on pause/end (same lifecycle as `useVideoASR`)
- Transcript flows into QueryInput (same `onFinalTranscript` callback)
- QueryInput remains editable during streaming — user can type corrections while ASR appends
- "Full Transcript" button hidden when YouTube source is active
- Switching between "Upload" and "YouTube" sources clears previous state
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.5.1 | Write tests first | `src/test/test_phase3_useYouTubeASR.test.ts` |
| 3.5.2 | Create `hooks/useYouTubeASR.ts` — adapted from `useVideoASR.ts`, targets `<audio>` element | `hooks/useYouTubeASR.ts` |
| 3.5.3 | Update `QueryInput.tsx` — accept transcript from either source | `components/QueryInput.tsx` |
| 3.5.4 | Update `LTTPage.tsx` — add source toggle (Upload / YouTube), wire YouTubeInput + YouTubeVideoPlayer | `pages/LTTPage.tsx` |
| 3.5.5 | Create `test_phase3_LTTPage_integration.test.tsx` | `src/test/` |
| 3.5.6 | Run tests → pass → commit | — |
---
### Phase 3.6 — Integration & Acceptance Testing (1 day)
**Tests:** `test_integration_phase3.py`, `test_acceptance_phase3_youtube.py`, `test_acceptance_phase3_live.py`
**Tasks:**
| # | Task |
|---|------|
| 3.6.1 | Implement integration test (mocked yt-dlp, real httpx proxy + hls.js) |
| 3.6.2 | Implement acceptance: real YouTube VOD → extract → proxy → play |
| 3.6.3 | Implement acceptance: real YouTube live stream → extract → proxy → play + ASR |
| 3.6.4 | Full regression run (Phase 1 + 2 + 3 tests) |
| 3.6.5 | Fix failures, final commit |
---
### Phase 3.7 — Polish & Deployment (0.5 day)
| # | Task |
|---|------|
| 3.7.1 | Handle PO token expiration for live streams (log warning, auto-re-extract on failure) |
| 3.7.2 | Update Dockerfile — ensure ffmpeg + yt-dlp available in container |
| 3.7.3 | Update `docker-compose.yml` — add any new volumes/env vars |
| 3.7.4 | Verify production build (`npm run build`, `docker compose up -d --build`) |
| 3.7.5 | Update `README.md` — YouTube feature section |
| 3.7.6 | Update `development_plan.md` — mark Phase 3 status |
| 3.7.7 | Final commit |
---
## 4. Timeline
| Sub-Phase | Description | Effort | Depends On |
|---|---|---|---|
| 3.1 | Config & Infrastructure | 0.5 day | — |
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 |
| 3.3 | HLS Proxy Backend | 1 day | 3.1 |
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 |
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 |
| 3.6 | Integration & Acceptance | 1 day | 3.5 |
| 3.7 | Polish & Deployment | 0.5 day | 3.6 |
| **Total** | | **5.5 days** | |
3.2 (extraction) and 3.3 (proxy) can run concurrently.
---
## 5. Dependencies
**Backend:** `yt-dlp>=2024.0.0` (new), `httpx>=0.26.0` (already present), `aiofiles>=24.0.0` (already present)
**Frontend:** `hls.js@^1.5.0` (new — NOT present, must install)
**System:** ffmpeg on server (already required by Phase 2)
---
## 6. Config Fields
```python
# YouTube live stream proxy (Phase 3)
youtube_proxy_enabled: bool = True
yt_dlp_timeout: int = 30 # seconds for yt-dlp extraction
yt_dlp_cache_ttl: int = 300 # seconds to cache extraction results
```
```bash
# .env.example additions
YOUTUBE_PROXY_ENABLED=true
YT_DLP_TIMEOUT=30
YT_DLP_CACHE_TTL=300
```
---
## 7. Key Design Decisions
| Decision | Choice | Why |
|---|---|---|
| Streaming protocol | HLS (m3u8) | hls.js plays it natively; DASH requires dash.js |
| yt-dlp client | `ios` for live, `web` for VOD | `ios` returns HLS for live streams with 60fps support; format selector prefers ≤480p |
| HTTP client for proxy | httpx (already present) | Streaming support via `httpx.stream()`; no new dependency |
| Manifest rewriting | Line-by-line streaming | Live manifests can be large; never buffer whole file |
| Audio element | Hidden `<audio>` + hls.js | `createMediaElementSource` works on `<audio>` elements |
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min |
| **Full Transcript for YouTube** | **Disabled** | Button hidden; real-time streaming ASR only |
| **QueryInput during streaming** | **Editable** | User can type corrections while transcript streams (same as existing ASR) |
| **Video quality** | **360p480p auto-best** | Low resolution sufficient for reference; no quality selector |
| **Auto-play on load** | **Wait for manual play** | Thumbnail placeholder; user presses play. Respects autoplay policy. |
| **Thumbnail** | **Stays until user presses play** | Clean transition; no black frame |
| **Error recovery** | **Retry 3× → "Service unavailable"** | Auto-re-extract URL on HLS failure; after 3 failures, show error state |
| **PO Tokens (live streams)** | **Graceful degradation for MVP** | Stream first ~30s; on failure retry 3× with fresh URL; after exhaustion show "Live stream unavailable" |
---
## 8. File Manifest
### New Files
```
backend/
app/models/youtube.py
app/services/youtube_service.py
app/services/hls_proxy.py
app/routers/youtube.py
app/test/test_phase3_config.py
app/test/test_phase3_youtube_extract.py
app/test/test_phase3_hls_proxy.py
app/test/test_phase3_hls_manifest.py
app/test/test_integration_phase3.py
app/test/acceptance/test_acceptance_phase3_youtube.py
app/test/acceptance/test_acceptance_phase3_live.py
frontend/src/
components/YouTubeInput.tsx
components/YouTubeVideoPlayer.tsx
hooks/useYouTubeASR.ts
test/test_phase3_YouTubeInput.test.tsx
test/test_phase3_YouTubeVideoPlayer.test.tsx
test/test_phase3_useYouTubeASR.test.ts
test/test_phase3_LTTPage_integration.test.tsx
```
### Modified Files
```
backend/app/core/config.py # Add 3 config fields
backend/.env.example # Add 3 env vars
backend/main.py # Register youtube router
backend/requirements.txt # Add yt-dlp
frontend/package.json # Add hls.js
frontend/src/types/index.ts # Add YouTube types
frontend/src/lib/api.ts # Add extractYouTube(), getYouTubeProxyUrl()
frontend/src/lib/queries.tsx # Add useYouTubeExtract() mutation
frontend/src/pages/LTTPage.tsx # Add source toggle + YouTube components
frontend/src/components/QueryInput.tsx # Accept transcript from either source
Dockerfile # Add yt-dlp install step
docker-compose.yml # Add env vars if needed
README.md # YouTube feature section
development_plan.md # Mark Phase 3 status
```
---
## 9. Known Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| PO Token expiration (live streams cut at 30s) | High — live streams unusable without token | Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify |
| yt-dlp extraction slow (2-5s) | Medium — poor UX on "Load Stream" click | Cache results with TTL; show progress indicator |
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance |
| hls.js audio sync drift vs video | Low — separate streams may drift | hls.js `liveSyncDuration` keeps both near live edge; test with 10+ min streams |
| Safari `createMediaElementSource` on HLS | Low — known Safari bug with native HLS | hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected |
| YouTube ToS for proxy | Low for internal demo | Personal/enterprise internal demo is generally fine; review for public product |
---
## 10. Example Data Flow
```
POST /api/v1/youtube/extract
Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
Response: {
"video_id": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"is_live": false,
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
"thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
}
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
→ Fetches upstream manifest from googlevideo.com
→ Rewrites segment URLs:
segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
→ Streams rewritten manifest to browser
GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
→ Fetches upstream .ts segment via httpx.stream()
→ StreamingResponse with Content-Type: video/mp2t
→ CORS: access-control-allow-origin: *
```
---
## 11. References
- **mediaflow-proxy**: Production FastAPI HLS proxy with M3U8Processor — [mhdzumair/mediaflow-proxy](https://github.com/mhdzumair/mediaflow-proxy)
- **yt-dlp API docs**: [yt-dlp-yt-dlp.mintlify.app](https://yt-dlp-yt-dlp.mintlify.app/api/extractors)
- **hls.js API docs**: [github.com/video-dev/hls.js/blob/master/docs/API.md](https://github.com/video-dev/hls.js/blob/master/docs/API.md)
- **hls.js low-latency live**: `lowLatencyMode: true`, `liveSyncDuration: 1.5`
- **Existing code patterns**: `.plans/phase2_implementation_plan.md`, `backend/app/routers/video.py`, `frontend/src/hooks/useVideoASR.ts`

View File

@ -187,14 +187,14 @@ docker run -d --name legco_test -p 8888:8000 \
-e HISTORY_DB_PATH=./data/history.db \
-e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
-e DASHSCOPE_API_KEY=your_dashscope_key \
-e ASR_MODEL_NAME=qwen3-asr-flash \
-e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime \
-e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
-e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
-e VIDEO_UPLOAD_DIR=./uploads \
-e MAX_VIDEO_SIZE_MB=300 \
-v ~/woody/legco/data/chroma_db:/app/chroma_db \
-v ~/woody/legco/data/document_chunk:/app/document_chunk \
-v ~/woody/legco/data/data:/app/data \
legco_reranker:amd64
legco_reranker:amd64.01.02
# Verify
curl http://localhost:8888/health

View File

@ -56,17 +56,6 @@ class DashScopeCallback(OmniRealtimeCallback):
logger.info("dashscope-connection-closed code=%s msg=%s", code, msg)
def _merge_stash(partial_buffer: str, new_stash: str) -> str:
if not new_stash.strip():
return partial_buffer
if not partial_buffer:
return new_stash
for i in range(min(len(partial_buffer), len(new_stash)), 0, -1):
if partial_buffer[-i:] == new_stash[:i]:
return partial_buffer + new_stash[i:]
return partial_buffer + " " + new_stash
def format_transcription_event(event: dict, accumulated: str) -> dict | None:
event_type = event.get("type", "")

View File

@ -75,42 +75,86 @@ class TestDashScopeCallback:
loop.close()
class TestMergeStash:
def test_merge_empty_buffer_returns_stash(self):
from app.routers.ws_asr import _merge_stash
class TestTextFieldFormatting:
"""Verify text field (stable cumulative) replaces stash merge logic.
DashScope partial events contain:
- text: monotonically growing stable transcription (never shrinks)
- stash: sliding window of latest chars (raw, unstable)
The production code uses text for delta computation and passes
stash through for frontend's trailing-char completion on pause.
_merge_stash has been removed text is already cumulative.
"""
assert _merge_stash("", "你好") == "你好"
def test_text_field_present_and_distinct_from_stash(self):
"""text field is the stable prefix; stash is the trailing window."""
from app.routers.ws_asr import format_transcription_event
def test_merge_overlapping_suffix(self):
from app.routers.ws_asr import _merge_stash
event = {
"type": "conversation.item.input_audio_transcription.text",
"text": "多謝主席咁啊亦都多謝",
"stash": "邱主任",
"language": "yue",
}
assert _merge_stash("系多謝主席", "主席咁咧呢個") == "系多謝主席咁咧呢個"
result = format_transcription_event(event, "")
assert result is not None
assert not result["is_final"]
assert result["text"] == "多謝主席咁啊亦都多謝"
assert result["stash"] == "邱主任"
def test_merge_overlapping_single_char(self):
from app.routers.ws_asr import _merge_stash
def test_text_grows_monotonically_across_events(self):
"""text field should never lose characters between successive events."""
from app.routers.ws_asr import format_transcription_event
assert _merge_stash("abcde", "efgh") == "abcdefgh"
events = [
{"type": "conversation.item.input_audio_transcription.text", "text": "多謝主席", "stash": "席咁啊", "language": "yue"},
{"type": "conversation.item.input_audio_transcription.text", "text": "多謝主席咁啊", "stash": "啊亦都", "language": "yue"},
{"type": "conversation.item.input_audio_transcription.text", "text": "多謝主席咁啊亦都多謝", "stash": "邱主任", "language": "yue"},
]
def test_merge_no_overlap_appends_with_space(self):
from app.routers.ws_asr import _merge_stash
prev_text = ""
for event in events:
result = format_transcription_event(event, "")
assert result is not None
current_text = result["text"]
assert current_text.startswith(prev_text) if prev_text else True, (
f"text regressed: '{current_text}' does not start with '{prev_text}'"
)
prev_text = current_text
assert _merge_stash("你好", "世界") == "你好 世界"
def test_text_empty_early_on(self):
"""Early in an utterance, text may be empty while stash has content."""
from app.routers.ws_asr import format_transcription_event
def test_merge_stash_subset_of_buffer(self):
from app.routers.ws_asr import _merge_stash
event = {
"type": "conversation.item.input_audio_transcription.text",
"text": "",
"stash": "多謝主席",
"language": "yue",
}
assert _merge_stash("系多謝主席咁咧", "咧呢") == "系多謝主席咁咧呢"
result = format_transcription_event(event, "")
assert result is not None
assert result["text"] == ""
assert result["stash"] == "多謝主席"
def test_merge_empty_stash_preserves_buffer(self):
from app.routers.ws_asr import _merge_stash
def test_text_empty_stash_only_is_still_valid(self):
"""Both fields empty is a valid transient state."""
from app.routers.ws_asr import format_transcription_event
assert _merge_stash("你好", "") == "你好"
assert _merge_stash("", "") == ""
event = {
"type": "conversation.item.input_audio_transcription.text",
"text": "",
"stash": "",
"language": "yue",
}
def test_merge_whitespace_only_stash_preserves_buffer(self):
from app.routers.ws_asr import _merge_stash
assert _merge_stash("你好", " ") == "你好"
result = format_transcription_event(event, "")
assert result is not None
assert result["text"] == ""
assert result["stash"] == ""
class TestProxyFormatsTranscriptionTextEvent: