22 KiB
Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan
Created: 2026-05-09 Updated: 2026-05-09 (Phase 3.1 + 3.2 implemented) Status: In Progress (3.1 Complete, 3.2 Complete) Depends on: Phase 1 (Complete), Phase 2 (Complete)
1. Overview
Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts stream URLs via yt-dlp (separate video-only + audio-only for VODs; combined HLS for live) → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in <video> via hls.js, routes audio through hidden <audio> element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
Same code works identically for live streams and VODs.
Why Full Proxy (Not iframe)
YouTube's official iframe player does not expose the audio track to Web Audio API due to cross-origin restrictions. We proxy HLS segments through our backend so the browser treats them as same-origin.
Audio Routing
YouTube HLS stream (combined video+audio for live; separate tracks for VOD)
→ hls.js loads into <video> (muted) and hidden <audio> element
→ AudioContext.createMediaElementSource(audioElement)
→ ScriptProcessorNode (Float32 PCM)
→ WebSocket → FastAPI → DashScope realtime ASR
→ transcript → QueryInput
Note: For VODs, separate video-only and audio-only tracks are used. For live streams, YouTube provides combined formats only — the same HLS manifest URL is used for both elements; hls.js demuxes them independently.
Integration With Existing Pipeline
This phase reuses the existing ASR infrastructure entirely:
useVideoASR.tsAudioContext graph pattern → adapted for YouTube audio elementws_asr.pyWebSocket → DashScope proxy → unchangedQueryInput.tsxtranscript display → unchangedLTTPage.tsxlayout → minor addition (source toggle)- RAG pipeline → unchanged
2. User Flow
- User selects "YouTube" source (instead of "Upload")
- User pastes YouTube URL → clicks "Load Stream"
- Backend extracts stream URLs → thumbnail shown as placeholder; video loads behind the scenes
- User presses play → video appears, audio routes to ASR pipeline (no auto-play)
- Real-time ASR transcription begins automatically on play
- Transcript flows into QueryInput → user can edit while streaming continues
- User pauses/stops → transcript stays, user edits and submits → RAG answer
- "Full Transcript" button hidden for YouTube source — real-time streaming ASR only
- If HLS stream fails: auto-retry up to 3 times with re-extracted URL → after 3 failures, show "Live stream unavailable" error
3. Sub-Phases
Phase 3.1 — Configuration & Infrastructure Setup ✅ Complete
Add config fields, install dependencies, create skeletons, register router.
Test: test_phase3_config.py (11 tests)
Tasks:
| # | Task | File | Status |
|---|---|---|---|
| 3.1.1 | Add config fields: youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl |
core/config.py |
Done |
| 3.1.2 | Update .env.example |
.env.example |
Done |
| 3.1.3 | Add deps: yt-dlp>=2024.0.0 to requirements.txt, hls.js@^1.5.0 to package.json |
requirements.txt, package.json |
Done |
| 3.1.4 | Create models/youtube.py — YouTubeExtractRequest, YouTubeStreamResponse, StreamFormat |
models/youtube.py |
Done |
| 3.1.5 | Create services/youtube_service.py stub |
services/youtube_service.py |
Done |
| 3.1.6 | Create services/hls_proxy.py stub |
services/hls_proxy.py |
Done |
| 3.1.7 | Create routers/youtube.py stub: POST /youtube/extract, GET /youtube/proxy/{stream_type}/{path} |
routers/youtube.py |
Done |
| 3.1.8 | Register router in main.py |
main.py |
Done |
| 3.1.9 | Write and pass test_phase3_config.py |
app/test/ |
Done (11/11 pass) |
Phase 3.2 — YouTube URL Extraction Backend ✅ Complete
yt-dlp wrapper service that extracts stream URLs and formats. Returns proxy-wrapped URLs pointing back to our HLS proxy.
Test: test_phase3_youtube_extract.py (18 tests)
Acceptance Criteria:
POST /api/v1/youtube/extractaccepts{"url": "https://www.youtube.com/watch?v=..."}- Returns
{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url, formats, error } - VODs: extracts separate video-only + audio-only tracks, selects best ≤480p + highest-bitrate audio
- Live streams: extracts combined HLS formats, uses same URL for video and audio (hls.js demuxes)
- Upcoming/scheduled streams: returns
is_upcoming: truewith no proxy URLs - Invalid/private URLs: returns 200 with error field populated (yt-dlp exception caught)
- URL expiration: in-memory cache with TTL (5 min for live, 30 min for VOD)
- Service singleton:
@lru_cacheon_get_youtube_service()for cache persistence across requests
Implementation Discoveries:
- No iOS client needed — default yt-dlp works for both VOD (separate tracks) and live (combined HLS)
- Live streams use combined formats — all live formats include both video+audio; same HLS URL serves both
<video>and<audio>elements - Format selection (
_pick_best_video): prefers ≤480p with HLS first, then falls back to ascending height + HLS preference - Error response pattern: extraction errors return HTTP 200 with
errorfield (not 4xx); the API call itself succeeds but YouTube returned an error - Proxy URL construction (
_build_proxy_url): URL-encodes upstream URL into/api/v1/youtube/proxy/manifest.m3u8?url=<encoded>
Real-URL Verification:
VOD: https://www.youtube.com/watch?v=5bF3tkO5jAA → 24 formats, separate video+audio ✓
Live: https://www.youtube.com/watch?v=fN9uYWCjQaw → 6 combined formats, same URL ✓
Tasks:
| # | Task | File | Status |
|---|---|---|---|
| 3.2.1 | Write tests first | app/test/test_phase3_youtube_extract.py |
Done |
| 3.2.2 | Implement YouTubeService.extract_streams() — yt-dlp wrapper with format selection |
services/youtube_service.py |
Done |
| 3.2.3 | Implement YouTubeService._select_best_formats() + _pick_best_video() — separate video/audio from format list, prefer ≤480p, combined fallback |
services/youtube_service.py |
Done |
| 3.2.4 | Implement format URL caching with TTL (live 5 min, VOD 30 min) | services/youtube_service.py |
Done |
| 3.2.5 | Implement POST /api/v1/youtube/extract route with response model + error handling |
routers/youtube.py |
Done |
| 3.2.6 | Run tests → pass → verified with real URLs | — | Done (82/82 pass) |
Phase 3.3 — HLS Proxy Backend (1 day)
Proxy service that rewrites HLS manifests and proxies .ts segments. StreamingResponse for minimal latency.
Reference: mediaflow-proxy M3U8Processor pattern (line-by-line streaming, URL rewriting)
Tests: test_phase3_hls_proxy.py, test_phase3_hls_manifest.py
Acceptance Criteria:
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded>— fetches upstream manifest, rewrites all segment/sub-manifest URLs to point back to our proxy, streams responseGET /api/v1/youtube/proxy/segment.ts?url=<encoded>— fetches upstream .ts segment, proxies with correct Content-Type (video/mp2t) and CORS headers- Lines rewritten: segment URIs, sub-manifest URIs,
#EXT-X-KEY:URI=, absolute URLs - Lines passed through:
#EXTINF:,#EXT-X-TARGETDURATION,#EXT-X-MEDIA-SEQUENCE,#EXT-X-STREAM-INFO, comments - Client disconnect → upstream connection closed cleanly
- CORS headers on every response:
access-control-allow-origin: * - Upstream failure → HTTP 502 with error detail; frontend retries up to 3 times with fresh URL before showing "Service unavailable"
Tasks:
| # | Task | File |
|---|---|---|
| 3.3.1 | Write tests first | app/test/test_phase3_hls_proxy.py, app/test/test_phase3_hls_manifest.py |
| 3.3.2 | Implement HLSProxyService.rewrite_manifest() — streaming line-by-line, URL detection + rewriting |
services/hls_proxy.py |
| 3.3.3 | Implement HLSProxyService.proxy_segment() — httpx stream → StreamingResponse |
services/hls_proxy.py |
| 3.3.4 | Implement GET /api/v1/youtube/proxy/{type}/{path} route — dispatch manifest vs segment |
routers/youtube.py |
| 3.3.5 | Run tests → pass → commit | — |
Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)
URL input component and hls.js-based video player. Two hidden media elements: visible <video> (video-only, muted) and hidden <audio> (audio-only, for Web Audio API routing).
Tests: test_phase3_YouTubeInput.test.tsx, test_phase3_YouTubeVideoPlayer.test.tsx
Acceptance Criteria:
YouTubeInputaccepts URL, validates format, shows loading/error statesYouTubeVideoPlayerusesforwardRef<HTMLVideoElement>(same pattern asVideoPlayer)- Video HLS loaded via hls.js into
<video muted>element at 360p–480p (auto-best ≤ 480p) - Audio HLS loaded via hls.js into hidden
<audio>element - Audio element exposes ref for parent to connect to AudioContext
- Thumbnail displayed as placeholder until user presses play; video element replaces it on play
- Video does NOT auto-play on load (waits for manual user play)
- Loading spinner, error overlay, "LIVE" badge for live streams
- HLS error recovery: on
hls.jsfatal error → re-extract stream URL → retry up to 3× → show "Service unavailable" on exhaustion - CrossOrigin="anonymous" on both elements (required for AudioContext graph)
- No quality selector (low resolution only, sufficient for reference video)
Tasks:
| # | Task | File |
|---|---|---|
| 3.4.1 | Write tests first | src/test/test_phase3_YouTubeInput.test.tsx, src/test/test_phase3_YouTubeVideoPlayer.test.tsx |
| 3.4.2 | Add YouTube types to types/index.ts |
types/index.ts |
| 3.4.3 | Add API functions to lib/api.ts |
lib/api.ts |
| 3.4.4 | Add TanStack Query hooks to lib/queries.tsx |
lib/queries.tsx |
| 3.4.5 | Create components/YouTubeInput.tsx — URL input, validation, loading/error states |
components/YouTubeInput.tsx |
| 3.4.6 | Create components/YouTubeVideoPlayer.tsx — hls.js dual-element player, forwardRef |
components/YouTubeVideoPlayer.tsx |
| 3.4.7 | Run tests → pass → commit | — |
Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)
Wire YouTube audio output into existing ASR pipeline. The key challenge: useVideoASR currently captures from <video> element; we need it to capture from the <audio> element loaded by hls.js.
Tests: test_phase3_useYouTubeASR.test.ts, test_phase3_LTTPage_integration.test.tsx
Acceptance Criteria:
useYouTubeASRhook: acceptsaudioElementref, sets up AudioContext graph on mount- AudioContext.createMediaElementSource(audioElement) → ScriptProcessorNode → WebSocket
- Auto-starts ASR on play, stops on pause/end (same lifecycle as
useVideoASR) - Transcript flows into QueryInput (same
onFinalTranscriptcallback) - QueryInput remains editable during streaming — user can type corrections while ASR appends
- "Full Transcript" button hidden when YouTube source is active
- Switching between "Upload" and "YouTube" sources clears previous state
Tasks:
| # | Task | File |
|---|---|---|
| 3.5.1 | Write tests first | src/test/test_phase3_useYouTubeASR.test.ts |
| 3.5.2 | Create hooks/useYouTubeASR.ts — adapted from useVideoASR.ts, targets <audio> element |
hooks/useYouTubeASR.ts |
| 3.5.3 | Update QueryInput.tsx — accept transcript from either source |
components/QueryInput.tsx |
| 3.5.4 | Update LTTPage.tsx — add source toggle (Upload / YouTube), wire YouTubeInput + YouTubeVideoPlayer |
pages/LTTPage.tsx |
| 3.5.5 | Create test_phase3_LTTPage_integration.test.tsx |
src/test/ |
| 3.5.6 | Run tests → pass → commit | — |
Phase 3.6 — Integration & Acceptance Testing (1 day)
Tests: test_integration_phase3.py, test_acceptance_phase3_youtube.py, test_acceptance_phase3_live.py
Tasks:
| # | Task |
|---|---|
| 3.6.1 | Implement integration test (mocked yt-dlp, real httpx proxy + hls.js) |
| 3.6.2 | Implement acceptance: real YouTube VOD → extract → proxy → play |
| 3.6.3 | Implement acceptance: real YouTube live stream → extract → proxy → play + ASR |
| 3.6.4 | Full regression run (Phase 1 + 2 + 3 tests) |
| 3.6.5 | Fix failures, final commit |
Phase 3.7 — Polish & Deployment (0.5 day)
| # | Task |
|---|---|
| 3.7.1 | Handle PO token expiration for live streams (log warning, auto-re-extract on failure) |
| 3.7.2 | Update Dockerfile — ensure ffmpeg + yt-dlp available in container |
| 3.7.3 | Update docker-compose.yml — add any new volumes/env vars |
| 3.7.4 | Verify production build (npm run build, docker compose up -d --build) |
| 3.7.5 | Update README.md — YouTube feature section |
| 3.7.6 | Update development_plan.md — mark Phase 3 status |
| 3.7.7 | Final commit |
4. Timeline
| Sub-Phase | Description | Effort | Depends On | Status | |
|---|---|---|---|---|---|
| 3.1 | Config & Infrastructure | 0.5 day | — | ✅ Complete | |
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 | ✅ Complete | |
| 3.3 | HLS Proxy Backend | 1 day | 3.1 | ⏳ Next | |
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 | Pending | |
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 | Pending | |
| 3.6 | Integration & Acceptance | 1 day | 3.5 | Pending | |
| 3.7 | Polish & Deployment | 0.5 day | 3.6 | Pending | |
| Total | 5.5 days | 2/7 done |
3.2 (extraction) and 3.3 (proxy) were planned concurrent; 3.2 is now done ahead of 3.3.
5. Dependencies
Backend: yt-dlp>=2024.0.0 (new), httpx>=0.26.0 (already present), aiofiles>=24.0.0 (already present)
Frontend: hls.js@^1.5.0 (new — NOT present, must install)
System: ffmpeg on server (already required by Phase 2)
6. Config Fields
# YouTube live stream proxy (Phase 3)
youtube_proxy_enabled: bool = True
yt_dlp_timeout: int = 30 # seconds for yt-dlp extraction
yt_dlp_cache_ttl: int = 300 # seconds to cache extraction results
# .env.example additions
YOUTUBE_PROXY_ENABLED=true
YT_DLP_TIMEOUT=30
YT_DLP_CACHE_TTL=300
7. Key Design Decisions
| Decision | Choice | Why | |
|---|---|---|---|
| Streaming protocol | HLS (m3u8) | hls.js plays it natively; DASH requires dash.js | |
| yt-dlp client | Default (no special client) | Default extractor works for both VOD (separate tracks) and live (combined HLS); iOS client caused "No video formats" errors on some live streams | |
| Live format strategy | Combined formats, same URL | Live HLS formats include both video+audio; same URL for <video> and <audio> elements — hls.js demuxes each independently |
|
| HTTP client for proxy | httpx (already present) | Streaming support via httpx.stream(); no new dependency |
|
| Manifest rewriting | Line-by-line streaming | Live manifests can be large; never buffer whole file | |
| Audio element | Hidden <audio> + hls.js |
createMediaElementSource works on <audio> elements |
|
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min live, 30 min VOD | |
| Service lifetime | @lru_cache singleton |
Cache must persist across HTTP requests for caching to work | |
| Error response | HTTP 200 with error field | API call succeeded; YouTube error is a content-level failure, not a protocol failure | |
| Full Transcript for YouTube | Disabled | Button hidden; real-time streaming ASR only | |
| QueryInput during streaming | Editable | User can type corrections while transcript streams (same as existing ASR) | |
| Video quality | 360p–480p auto-best | Low resolution sufficient for reference; no quality selector | |
| Auto-play on load | Wait for manual play | Thumbnail placeholder; user presses play. Respects autoplay policy. | |
| Thumbnail | Stays until user presses play | Clean transition; no black frame | |
| Error recovery | Retry 3× → "Service unavailable" | Auto-re-extract URL on HLS failure; after 3 failures, show error state | |
| PO Tokens (live streams) | Graceful degradation for MVP | Stream first ~30s; on failure retry 3× with fresh URL; after exhaustion show "Live stream unavailable" |
8. File Manifest
New Files
backend/
app/models/youtube.py ✅ Created (3.1)
app/services/youtube_service.py ✅ Created (3.1), implemented (3.2)
app/services/hls_proxy.py ✅ Stub created (3.1)
app/routers/youtube.py ✅ Created (3.1), implemented (3.2)
app/test/test_phase3_config.py ✅ Written (3.1, 11 tests)
app/test/test_phase3_youtube_extract.py ✅ Written (3.2, 18 tests)
app/test/test_phase3_hls_proxy.py ⏳ Pending (3.3)
app/test/test_phase3_hls_manifest.py ⏳ Pending (3.3)
app/test/test_integration_phase3.py ⏳ Pending (3.6)
app/test/acceptance/test_acceptance_phase3_youtube.py ⏳ Pending (3.6)
app/test/acceptance/test_acceptance_phase3_live.py ⏳ Pending (3.6)
frontend/src/
components/YouTubeInput.tsx ⏳ Pending (3.4)
components/YouTubeVideoPlayer.tsx ⏳ Pending (3.4)
hooks/useYouTubeASR.ts ⏳ Pending (3.5)
test/test_phase3_YouTubeInput.test.tsx ⏳ Pending (3.4)
test/test_phase3_YouTubeVideoPlayer.test.tsx ⏳ Pending (3.4)
test/test_phase3_useYouTubeASR.test.ts ⏳ Pending (3.5)
test/test_phase3_LTTPage_integration.test.tsx ⏳ Pending (3.5)
Modified Files
backend/app/core/config.py ✅ Done (3 fields)
backend/.env.example ✅ Done (3 vars)
backend/main.py ✅ Done (router registered)
backend/requirements.txt ✅ Done (yt-dlp added)
frontend/package.json ✅ Done (hls.js added)
frontend/src/types/index.ts ⏳ Pending (3.4)
frontend/src/lib/api.ts ⏳ Pending (3.4)
frontend/src/lib/queries.tsx ⏳ Pending (3.4)
frontend/src/pages/LTTPage.tsx ⏳ Pending (3.4-3.5)
frontend/src/components/QueryInput.tsx ⏳ Pending (3.5)
Dockerfile ⏳ Pending (3.7)
docker-compose.yml ⏳ Pending (3.7)
README.md ⏳ Pending (3.7)
development_plan.md ⏳ Pending (3.7)
9. Known Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| PO Token expiration (live streams cut at 30s) | High — live streams unusable without token | Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify |
| yt-dlp extraction slow (2-5s) | Medium — poor UX on "Load Stream" click | Cache results with TTL; show progress indicator |
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; pip install -U yt-dlp in maintenance. Note: iOS client caused "No video formats" on Phoenix TV live stream; default extractor works for both tested URLs. Monitor for regressions. |
| hls.js audio sync drift vs video | Low — separate streams may drift | hls.js liveSyncDuration keeps both near live edge; test with 10+ min streams |
Safari createMediaElementSource on HLS |
Low — known Safari bug with native HLS | hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected |
| YouTube ToS for proxy | Low for internal demo | Personal/enterprise internal demo is generally fine; review for public product |
10. Example Data Flow
POST /api/v1/youtube/extract
Body: {"url": "https://www.youtube.com/watch?v=5bF3tkO5jAA"}
Response: {
"video_id": "5bF3tkO5jAA",
"title": "《2026年稅務(修訂)(自動交換資料)條例草案》委員會會議",
"is_live": false,
"is_upcoming": false,
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=https%3A%2F%2Frr2---sn-jna...",
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=https%3A%2F%2Frr2---sn-jna...",
"thumbnail_url": "https://i.ytimg.com/vi/5bF3tkO5jAA/hqdefault.jpg",
"formats": [...],
"error": null
}
# Live stream (combined formats → same URL for video and audio)
POST /api/v1/youtube/extract
Body: {"url": "https://www.youtube.com/watch?v=fN9uYWCjQaw"}
Response: {
"video_id": "fN9uYWCjQaw",
"is_live": true,
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...",
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...",
# video_proxy_url == audio_proxy_url (same combined HLS manifest)
}
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>
→ Fetches upstream manifest from googlevideo.com
→ Rewrites segment URLs:
segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
→ Streams rewritten manifest to browser
GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
→ Fetches upstream .ts segment via httpx.stream()
→ StreamingResponse with Content-Type: video/mp2t
→ CORS: access-control-allow-origin: *
11. References
- mediaflow-proxy: Production FastAPI HLS proxy with M3U8Processor — mhdzumair/mediaflow-proxy
- yt-dlp API docs: yt-dlp-yt-dlp.mintlify.app
- hls.js API docs: github.com/video-dev/hls.js/blob/master/docs/API.md
- hls.js low-latency live:
lowLatencyMode: true,liveSyncDuration: 1.5 - Existing code patterns:
.plans/phase2_implementation_plan.md,backend/app/routers/video.py,frontend/src/hooks/useVideoASR.ts
12. Test Results (Current)
| Suite | Tests | Status |
|---|---|---|
| Phase 2 (existing) | 53 | ✅ All pass |
| Phase 3.1 (config) | 11 | ✅ All pass |
| Phase 3.2 (extraction) | 18 | ✅ All pass |
| Total | 82 | 0 failures |
Real-URL Smoke Tests
| URL | Type | Result |
|---|---|---|
5bF3tkO5jAA (LegCo meeting) |
VOD | 24 formats, separate video+audio ✅ |
fN9uYWCjQaw (Phoenix TV 24h) |
Live | 6 combined HLS formats, same URL ✅ |