legco_ai_assistant/.plans/phase3_youtube_proxy_plan.md

382 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan
**Created:** 2026-05-09
**Updated:** 2026-05-09 (user decisions incorporated)
**Status:** Planning
**Depends on:** Phase 1 (Complete), Phase 2 (Complete)
---
## 1. Overview
Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in `<video>` via hls.js, routes audio through hidden `<audio>` element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
**Same code works identically for live streams and VODs.**
### Why Full Proxy (Not iframe)
YouTube's official iframe player does not expose the audio track to Web Audio API due to cross-origin restrictions. We proxy HLS segments through our backend so the browser treats them as same-origin.
### Audio Routing
```
YouTube HLS audio-only stream
→ hls.js loads into hidden <audio> element
→ AudioContext.createMediaElementSource(audioElement)
→ ScriptProcessorNode (Float32 PCM)
→ WebSocket → FastAPI → DashScope realtime ASR
→ transcript → QueryInput
```
### Integration With Existing Pipeline
This phase reuses the existing ASR infrastructure entirely:
- `useVideoASR.ts` AudioContext graph pattern → adapted for YouTube audio element
- `ws_asr.py` WebSocket → DashScope proxy → unchanged
- `QueryInput.tsx` transcript display → unchanged
- `LTTPage.tsx` layout → minor addition (source toggle)
- RAG pipeline → unchanged
---
## 2. User Flow
1. User selects "YouTube" source (instead of "Upload")
2. User pastes YouTube URL → clicks "Load Stream"
3. Backend extracts stream URLs → thumbnail shown as placeholder; video loads behind the scenes
4. User presses play → video appears, audio routes to ASR pipeline (no auto-play)
5. Real-time ASR transcription begins automatically on play
6. Transcript flows into QueryInput → user can edit while streaming continues
7. User pauses/stops → transcript stays, user edits and submits → RAG answer
8. **"Full Transcript" button hidden for YouTube source** — real-time streaming ASR only
9. **If HLS stream fails**: auto-retry up to 3 times with re-extracted URL → after 3 failures, show "Live stream unavailable" error
---
## 3. Sub-Phases
### Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)
Add config fields, install dependencies, create skeletons, register router.
**Test:** `test_phase3_config.py`
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.1.1 | Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl` | `core/config.py` |
| 3.1.2 | Update `.env.example` | `.env.example` |
| 3.1.3 | Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json` | `requirements.txt`, `package.json` |
| 3.1.4 | Create `models/youtube.py``YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat` | `models/youtube.py` |
| 3.1.5 | Create `services/youtube_service.py` stub | `services/youtube_service.py` |
| 3.1.6 | Create `services/hls_proxy.py` stub | `services/hls_proxy.py` |
| 3.1.7 | Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}` | `routers/youtube.py` |
| 3.1.8 | Register router in `main.py` | `main.py` |
| 3.1.9 | Write and pass `test_phase3_config.py` | `app/test/` |
---
### Phase 3.2 — YouTube URL Extraction Backend (0.5 day)
yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.
**Test:** `test_phase3_youtube_extract.py`
**Acceptance Criteria:**
- `POST /api/v1/youtube/extract` accepts `{"url": "https://www.youtube.com/watch?v=..."}`
- Returns `{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url }`
- VODs: extracts ~210 formats, returns best video+audio pair
- Live streams: uses `ios` client for HLS, returns current live edge
- Upcoming/scheduled streams: returns `is_upcoming: true` with scheduled start time
- Invalid/private URLs: returns clear error
- URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.2.1 | Write tests first | `app/test/test_phase3_youtube_extract.py` |
| 3.2.2 | Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection | `services/youtube_service.py` |
| 3.2.3 | Implement `YouTubeService._select_best_formats()` — separate video/audio from format list, prefer ≤480p | `services/youtube_service.py` |
| 3.2.4 | Implement format URL caching with TTL | `services/youtube_service.py` |
| 3.2.5 | Implement `POST /api/v1/youtube/extract` route | `routers/youtube.py` |
| 3.2.6 | Run tests → pass → commit | — |
---
### Phase 3.3 — HLS Proxy Backend (1 day)
Proxy service that rewrites HLS manifests and proxies .ts segments. StreamingResponse for minimal latency.
**Reference:** mediaflow-proxy M3U8Processor pattern (line-by-line streaming, URL rewriting)
**Tests:** `test_phase3_hls_proxy.py`, `test_phase3_hls_manifest.py`
**Acceptance Criteria:**
- `GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded>` — fetches upstream manifest, rewrites all segment/sub-manifest URLs to point back to our proxy, streams response
- `GET /api/v1/youtube/proxy/segment.ts?url=<encoded>` — fetches upstream .ts segment, proxies with correct Content-Type (`video/mp2t`) and CORS headers
- Lines rewritten: segment URIs, sub-manifest URIs, `#EXT-X-KEY:URI=`, absolute URLs
- Lines passed through: `#EXTINF:`, `#EXT-X-TARGETDURATION`, `#EXT-X-MEDIA-SEQUENCE`, `#EXT-X-STREAM-INFO`, comments
- Client disconnect → upstream connection closed cleanly
- CORS headers on every response: `access-control-allow-origin: *`
- **Upstream failure → HTTP 502 with error detail; frontend retries up to 3 times with fresh URL before showing "Service unavailable"**
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.3.1 | Write tests first | `app/test/test_phase3_hls_proxy.py`, `app/test/test_phase3_hls_manifest.py` |
| 3.3.2 | Implement `HLSProxyService.rewrite_manifest()` — streaming line-by-line, URL detection + rewriting | `services/hls_proxy.py` |
| 3.3.3 | Implement `HLSProxyService.proxy_segment()` — httpx stream → StreamingResponse | `services/hls_proxy.py` |
| 3.3.4 | Implement `GET /api/v1/youtube/proxy/{type}/{path}` route — dispatch manifest vs segment | `routers/youtube.py` |
| 3.3.5 | Run tests → pass → commit | — |
---
### Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)
URL input component and hls.js-based video player. Two hidden media elements: visible `<video>` (video-only, muted) and hidden `<audio>` (audio-only, for Web Audio API routing).
**Tests:** `test_phase3_YouTubeInput.test.tsx`, `test_phase3_YouTubeVideoPlayer.test.tsx`
**Acceptance Criteria:**
- `YouTubeInput` accepts URL, validates format, shows loading/error states
- `YouTubeVideoPlayer` uses `forwardRef<HTMLVideoElement>` (same pattern as `VideoPlayer`)
- Video HLS loaded via hls.js into `<video muted>` element at 360p480p (auto-best ≤ 480p)
- Audio HLS loaded via hls.js into hidden `<audio>` element
- Audio element exposes ref for parent to connect to AudioContext
- Thumbnail displayed as placeholder until user presses play; video element replaces it on play
- Video does NOT auto-play on load (waits for manual user play)
- Loading spinner, error overlay, "LIVE" badge for live streams
- **HLS error recovery**: on `hls.js` fatal error → re-extract stream URL → retry up to 3× → show "Service unavailable" on exhaustion
- CrossOrigin="anonymous" on both elements (required for AudioContext graph)
- No quality selector (low resolution only, sufficient for reference video)
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.4.1 | Write tests first | `src/test/test_phase3_YouTubeInput.test.tsx`, `src/test/test_phase3_YouTubeVideoPlayer.test.tsx` |
| 3.4.2 | Add YouTube types to `types/index.ts` | `types/index.ts` |
| 3.4.3 | Add API functions to `lib/api.ts` | `lib/api.ts` |
| 3.4.4 | Add TanStack Query hooks to `lib/queries.tsx` | `lib/queries.tsx` |
| 3.4.5 | Create `components/YouTubeInput.tsx` — URL input, validation, loading/error states | `components/YouTubeInput.tsx` |
| 3.4.6 | Create `components/YouTubeVideoPlayer.tsx` — hls.js dual-element player, forwardRef | `components/YouTubeVideoPlayer.tsx` |
| 3.4.7 | Run tests → pass → commit | — |
---
### Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)
Wire YouTube audio output into existing ASR pipeline. The key challenge: `useVideoASR` currently captures from `<video>` element; we need it to capture from the `<audio>` element loaded by hls.js.
**Tests:** `test_phase3_useYouTubeASR.test.ts`, `test_phase3_LTTPage_integration.test.tsx`
**Acceptance Criteria:**
- `useYouTubeASR` hook: accepts `audioElement` ref, sets up AudioContext graph on mount
- AudioContext.createMediaElementSource(audioElement) → ScriptProcessorNode → WebSocket
- Auto-starts ASR on play, stops on pause/end (same lifecycle as `useVideoASR`)
- Transcript flows into QueryInput (same `onFinalTranscript` callback)
- QueryInput remains editable during streaming — user can type corrections while ASR appends
- "Full Transcript" button hidden when YouTube source is active
- Switching between "Upload" and "YouTube" sources clears previous state
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.5.1 | Write tests first | `src/test/test_phase3_useYouTubeASR.test.ts` |
| 3.5.2 | Create `hooks/useYouTubeASR.ts` — adapted from `useVideoASR.ts`, targets `<audio>` element | `hooks/useYouTubeASR.ts` |
| 3.5.3 | Update `QueryInput.tsx` — accept transcript from either source | `components/QueryInput.tsx` |
| 3.5.4 | Update `LTTPage.tsx` — add source toggle (Upload / YouTube), wire YouTubeInput + YouTubeVideoPlayer | `pages/LTTPage.tsx` |
| 3.5.5 | Create `test_phase3_LTTPage_integration.test.tsx` | `src/test/` |
| 3.5.6 | Run tests → pass → commit | — |
---
### Phase 3.6 — Integration & Acceptance Testing (1 day)
**Tests:** `test_integration_phase3.py`, `test_acceptance_phase3_youtube.py`, `test_acceptance_phase3_live.py`
**Tasks:**
| # | Task |
|---|------|
| 3.6.1 | Implement integration test (mocked yt-dlp, real httpx proxy + hls.js) |
| 3.6.2 | Implement acceptance: real YouTube VOD → extract → proxy → play |
| 3.6.3 | Implement acceptance: real YouTube live stream → extract → proxy → play + ASR |
| 3.6.4 | Full regression run (Phase 1 + 2 + 3 tests) |
| 3.6.5 | Fix failures, final commit |
---
### Phase 3.7 — Polish & Deployment (0.5 day)
| # | Task |
|---|------|
| 3.7.1 | Handle PO token expiration for live streams (log warning, auto-re-extract on failure) |
| 3.7.2 | Update Dockerfile — ensure ffmpeg + yt-dlp available in container |
| 3.7.3 | Update `docker-compose.yml` — add any new volumes/env vars |
| 3.7.4 | Verify production build (`npm run build`, `docker compose up -d --build`) |
| 3.7.5 | Update `README.md` — YouTube feature section |
| 3.7.6 | Update `development_plan.md` — mark Phase 3 status |
| 3.7.7 | Final commit |
---
## 4. Timeline
| Sub-Phase | Description | Effort | Depends On |
|---|---|---|---|
| 3.1 | Config & Infrastructure | 0.5 day | — |
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 |
| 3.3 | HLS Proxy Backend | 1 day | 3.1 |
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 |
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 |
| 3.6 | Integration & Acceptance | 1 day | 3.5 |
| 3.7 | Polish & Deployment | 0.5 day | 3.6 |
| **Total** | | **5.5 days** | |
3.2 (extraction) and 3.3 (proxy) can run concurrently.
---
## 5. Dependencies
**Backend:** `yt-dlp>=2024.0.0` (new), `httpx>=0.26.0` (already present), `aiofiles>=24.0.0` (already present)
**Frontend:** `hls.js@^1.5.0` (new — NOT present, must install)
**System:** ffmpeg on server (already required by Phase 2)
---
## 6. Config Fields
```python
# YouTube live stream proxy (Phase 3)
youtube_proxy_enabled: bool = True
yt_dlp_timeout: int = 30 # seconds for yt-dlp extraction
yt_dlp_cache_ttl: int = 300 # seconds to cache extraction results
```
```bash
# .env.example additions
YOUTUBE_PROXY_ENABLED=true
YT_DLP_TIMEOUT=30
YT_DLP_CACHE_TTL=300
```
---
## 7. Key Design Decisions
| Decision | Choice | Why |
|---|---|---|
| Streaming protocol | HLS (m3u8) | hls.js plays it natively; DASH requires dash.js |
| yt-dlp client | `ios` for live, `web` for VOD | `ios` returns HLS for live streams with 60fps support; format selector prefers ≤480p |
| HTTP client for proxy | httpx (already present) | Streaming support via `httpx.stream()`; no new dependency |
| Manifest rewriting | Line-by-line streaming | Live manifests can be large; never buffer whole file |
| Audio element | Hidden `<audio>` + hls.js | `createMediaElementSource` works on `<audio>` elements |
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min |
| **Full Transcript for YouTube** | **Disabled** | Button hidden; real-time streaming ASR only |
| **QueryInput during streaming** | **Editable** | User can type corrections while transcript streams (same as existing ASR) |
| **Video quality** | **360p480p auto-best** | Low resolution sufficient for reference; no quality selector |
| **Auto-play on load** | **Wait for manual play** | Thumbnail placeholder; user presses play. Respects autoplay policy. |
| **Thumbnail** | **Stays until user presses play** | Clean transition; no black frame |
| **Error recovery** | **Retry 3× → "Service unavailable"** | Auto-re-extract URL on HLS failure; after 3 failures, show error state |
| **PO Tokens (live streams)** | **Graceful degradation for MVP** | Stream first ~30s; on failure retry 3× with fresh URL; after exhaustion show "Live stream unavailable" |
---
## 8. File Manifest
### New Files
```
backend/
app/models/youtube.py
app/services/youtube_service.py
app/services/hls_proxy.py
app/routers/youtube.py
app/test/test_phase3_config.py
app/test/test_phase3_youtube_extract.py
app/test/test_phase3_hls_proxy.py
app/test/test_phase3_hls_manifest.py
app/test/test_integration_phase3.py
app/test/acceptance/test_acceptance_phase3_youtube.py
app/test/acceptance/test_acceptance_phase3_live.py
frontend/src/
components/YouTubeInput.tsx
components/YouTubeVideoPlayer.tsx
hooks/useYouTubeASR.ts
test/test_phase3_YouTubeInput.test.tsx
test/test_phase3_YouTubeVideoPlayer.test.tsx
test/test_phase3_useYouTubeASR.test.ts
test/test_phase3_LTTPage_integration.test.tsx
```
### Modified Files
```
backend/app/core/config.py # Add 3 config fields
backend/.env.example # Add 3 env vars
backend/main.py # Register youtube router
backend/requirements.txt # Add yt-dlp
frontend/package.json # Add hls.js
frontend/src/types/index.ts # Add YouTube types
frontend/src/lib/api.ts # Add extractYouTube(), getYouTubeProxyUrl()
frontend/src/lib/queries.tsx # Add useYouTubeExtract() mutation
frontend/src/pages/LTTPage.tsx # Add source toggle + YouTube components
frontend/src/components/QueryInput.tsx # Accept transcript from either source
Dockerfile # Add yt-dlp install step
docker-compose.yml # Add env vars if needed
README.md # YouTube feature section
development_plan.md # Mark Phase 3 status
```
---
## 9. Known Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| PO Token expiration (live streams cut at 30s) | High — live streams unusable without token | Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify |
| yt-dlp extraction slow (2-5s) | Medium — poor UX on "Load Stream" click | Cache results with TTL; show progress indicator |
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance |
| hls.js audio sync drift vs video | Low — separate streams may drift | hls.js `liveSyncDuration` keeps both near live edge; test with 10+ min streams |
| Safari `createMediaElementSource` on HLS | Low — known Safari bug with native HLS | hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected |
| YouTube ToS for proxy | Low for internal demo | Personal/enterprise internal demo is generally fine; review for public product |
---
## 10. Example Data Flow
```
POST /api/v1/youtube/extract
Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
Response: {
"video_id": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"is_live": false,
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
"thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
}
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
→ Fetches upstream manifest from googlevideo.com
→ Rewrites segment URLs:
segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
→ Streams rewritten manifest to browser
GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
→ Fetches upstream .ts segment via httpx.stream()
→ StreamingResponse with Content-Type: video/mp2t
→ CORS: access-control-allow-origin: *
```
---
## 11. References
- **mediaflow-proxy**: Production FastAPI HLS proxy with M3U8Processor — [mhdzumair/mediaflow-proxy](https://github.com/mhdzumair/mediaflow-proxy)
- **yt-dlp API docs**: [yt-dlp-yt-dlp.mintlify.app](https://yt-dlp-yt-dlp.mintlify.app/api/extractors)
- **hls.js API docs**: [github.com/video-dev/hls.js/blob/master/docs/API.md](https://github.com/video-dev/hls.js/blob/master/docs/API.md)
- **hls.js low-latency live**: `lowLatencyMode: true`, `liveSyncDuration: 1.5`
- **Existing code patterns**: `.plans/phase2_implementation_plan.md`, `backend/app/routers/video.py`, `frontend/src/hooks/useVideoASR.ts`