18 KiB
Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan
Created: 2026-05-09 Updated: 2026-05-09 (user decisions incorporated) Status: Planning Depends on: Phase 1 (Complete), Phase 2 (Complete)
1. Overview
Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in <video> via hls.js, routes audio through hidden <audio> element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
Same code works identically for live streams and VODs.
Why Full Proxy (Not iframe)
YouTube's official iframe player does not expose the audio track to Web Audio API due to cross-origin restrictions. We proxy HLS segments through our backend so the browser treats them as same-origin.
Audio Routing
YouTube HLS audio-only stream
→ hls.js loads into hidden <audio> element
→ AudioContext.createMediaElementSource(audioElement)
→ ScriptProcessorNode (Float32 PCM)
→ WebSocket → FastAPI → DashScope realtime ASR
→ transcript → QueryInput
Integration With Existing Pipeline
This phase reuses the existing ASR infrastructure entirely:
useVideoASR.tsAudioContext graph pattern → adapted for YouTube audio elementws_asr.pyWebSocket → DashScope proxy → unchangedQueryInput.tsxtranscript display → unchangedLTTPage.tsxlayout → minor addition (source toggle)- RAG pipeline → unchanged
2. User Flow
- User selects "YouTube" source (instead of "Upload")
- User pastes YouTube URL → clicks "Load Stream"
- Backend extracts stream URLs → thumbnail shown as placeholder; video loads behind the scenes
- User presses play → video appears, audio routes to ASR pipeline (no auto-play)
- Real-time ASR transcription begins automatically on play
- Transcript flows into QueryInput → user can edit while streaming continues
- User pauses/stops → transcript stays, user edits and submits → RAG answer
- "Full Transcript" button hidden for YouTube source — real-time streaming ASR only
- If HLS stream fails: auto-retry up to 3 times with re-extracted URL → after 3 failures, show "Live stream unavailable" error
3. Sub-Phases
Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)
Add config fields, install dependencies, create skeletons, register router.
Test: test_phase3_config.py
Tasks:
| # | Task | File |
|---|---|---|
| 3.1.1 | Add config fields: youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl |
core/config.py |
| 3.1.2 | Update .env.example |
.env.example |
| 3.1.3 | Add deps: yt-dlp>=2024.0.0 to requirements.txt, hls.js@^1.5.0 to package.json |
requirements.txt, package.json |
| 3.1.4 | Create models/youtube.py — YouTubeExtractRequest, YouTubeStreamResponse, StreamFormat |
models/youtube.py |
| 3.1.5 | Create services/youtube_service.py stub |
services/youtube_service.py |
| 3.1.6 | Create services/hls_proxy.py stub |
services/hls_proxy.py |
| 3.1.7 | Create routers/youtube.py stub: POST /youtube/extract, GET /youtube/proxy/{stream_type}/{path} |
routers/youtube.py |
| 3.1.8 | Register router in main.py |
main.py |
| 3.1.9 | Write and pass test_phase3_config.py |
app/test/ |
Phase 3.2 — YouTube URL Extraction Backend (0.5 day)
yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.
Test: test_phase3_youtube_extract.py
Acceptance Criteria:
POST /api/v1/youtube/extractaccepts{"url": "https://www.youtube.com/watch?v=..."}- Returns
{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url } - VODs: extracts ~2–10 formats, returns best video+audio pair
- Live streams: uses
iosclient for HLS, returns current live edge - Upcoming/scheduled streams: returns
is_upcoming: truewith scheduled start time - Invalid/private URLs: returns clear error
- URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)
Tasks:
| # | Task | File |
|---|---|---|
| 3.2.1 | Write tests first | app/test/test_phase3_youtube_extract.py |
| 3.2.2 | Implement YouTubeService.extract_streams() — yt-dlp wrapper with format selection |
services/youtube_service.py |
| 3.2.3 | Implement YouTubeService._select_best_formats() — separate video/audio from format list, prefer ≤480p |
services/youtube_service.py |
| 3.2.4 | Implement format URL caching with TTL | services/youtube_service.py |
| 3.2.5 | Implement POST /api/v1/youtube/extract route |
routers/youtube.py |
| 3.2.6 | Run tests → pass → commit | — |
Phase 3.3 — HLS Proxy Backend (1 day)
Proxy service that rewrites HLS manifests and proxies .ts segments. StreamingResponse for minimal latency.
Reference: mediaflow-proxy M3U8Processor pattern (line-by-line streaming, URL rewriting)
Tests: test_phase3_hls_proxy.py, test_phase3_hls_manifest.py
Acceptance Criteria:
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded>— fetches upstream manifest, rewrites all segment/sub-manifest URLs to point back to our proxy, streams responseGET /api/v1/youtube/proxy/segment.ts?url=<encoded>— fetches upstream .ts segment, proxies with correct Content-Type (video/mp2t) and CORS headers- Lines rewritten: segment URIs, sub-manifest URIs,
#EXT-X-KEY:URI=, absolute URLs - Lines passed through:
#EXTINF:,#EXT-X-TARGETDURATION,#EXT-X-MEDIA-SEQUENCE,#EXT-X-STREAM-INFO, comments - Client disconnect → upstream connection closed cleanly
- CORS headers on every response:
access-control-allow-origin: * - Upstream failure → HTTP 502 with error detail; frontend retries up to 3 times with fresh URL before showing "Service unavailable"
Tasks:
| # | Task | File |
|---|---|---|
| 3.3.1 | Write tests first | app/test/test_phase3_hls_proxy.py, app/test/test_phase3_hls_manifest.py |
| 3.3.2 | Implement HLSProxyService.rewrite_manifest() — streaming line-by-line, URL detection + rewriting |
services/hls_proxy.py |
| 3.3.3 | Implement HLSProxyService.proxy_segment() — httpx stream → StreamingResponse |
services/hls_proxy.py |
| 3.3.4 | Implement GET /api/v1/youtube/proxy/{type}/{path} route — dispatch manifest vs segment |
routers/youtube.py |
| 3.3.5 | Run tests → pass → commit | — |
Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)
URL input component and hls.js-based video player. Two hidden media elements: visible <video> (video-only, muted) and hidden <audio> (audio-only, for Web Audio API routing).
Tests: test_phase3_YouTubeInput.test.tsx, test_phase3_YouTubeVideoPlayer.test.tsx
Acceptance Criteria:
YouTubeInputaccepts URL, validates format, shows loading/error statesYouTubeVideoPlayerusesforwardRef<HTMLVideoElement>(same pattern asVideoPlayer)- Video HLS loaded via hls.js into
<video muted>element at 360p–480p (auto-best ≤ 480p) - Audio HLS loaded via hls.js into hidden
<audio>element - Audio element exposes ref for parent to connect to AudioContext
- Thumbnail displayed as placeholder until user presses play; video element replaces it on play
- Video does NOT auto-play on load (waits for manual user play)
- Loading spinner, error overlay, "LIVE" badge for live streams
- HLS error recovery: on
hls.jsfatal error → re-extract stream URL → retry up to 3× → show "Service unavailable" on exhaustion - CrossOrigin="anonymous" on both elements (required for AudioContext graph)
- No quality selector (low resolution only, sufficient for reference video)
Tasks:
| # | Task | File |
|---|---|---|
| 3.4.1 | Write tests first | src/test/test_phase3_YouTubeInput.test.tsx, src/test/test_phase3_YouTubeVideoPlayer.test.tsx |
| 3.4.2 | Add YouTube types to types/index.ts |
types/index.ts |
| 3.4.3 | Add API functions to lib/api.ts |
lib/api.ts |
| 3.4.4 | Add TanStack Query hooks to lib/queries.tsx |
lib/queries.tsx |
| 3.4.5 | Create components/YouTubeInput.tsx — URL input, validation, loading/error states |
components/YouTubeInput.tsx |
| 3.4.6 | Create components/YouTubeVideoPlayer.tsx — hls.js dual-element player, forwardRef |
components/YouTubeVideoPlayer.tsx |
| 3.4.7 | Run tests → pass → commit | — |
Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)
Wire YouTube audio output into existing ASR pipeline. The key challenge: useVideoASR currently captures from <video> element; we need it to capture from the <audio> element loaded by hls.js.
Tests: test_phase3_useYouTubeASR.test.ts, test_phase3_LTTPage_integration.test.tsx
Acceptance Criteria:
useYouTubeASRhook: acceptsaudioElementref, sets up AudioContext graph on mount- AudioContext.createMediaElementSource(audioElement) → ScriptProcessorNode → WebSocket
- Auto-starts ASR on play, stops on pause/end (same lifecycle as
useVideoASR) - Transcript flows into QueryInput (same
onFinalTranscriptcallback) - QueryInput remains editable during streaming — user can type corrections while ASR appends
- "Full Transcript" button hidden when YouTube source is active
- Switching between "Upload" and "YouTube" sources clears previous state
Tasks:
| # | Task | File |
|---|---|---|
| 3.5.1 | Write tests first | src/test/test_phase3_useYouTubeASR.test.ts |
| 3.5.2 | Create hooks/useYouTubeASR.ts — adapted from useVideoASR.ts, targets <audio> element |
hooks/useYouTubeASR.ts |
| 3.5.3 | Update QueryInput.tsx — accept transcript from either source |
components/QueryInput.tsx |
| 3.5.4 | Update LTTPage.tsx — add source toggle (Upload / YouTube), wire YouTubeInput + YouTubeVideoPlayer |
pages/LTTPage.tsx |
| 3.5.5 | Create test_phase3_LTTPage_integration.test.tsx |
src/test/ |
| 3.5.6 | Run tests → pass → commit | — |
Phase 3.6 — Integration & Acceptance Testing (1 day)
Tests: test_integration_phase3.py, test_acceptance_phase3_youtube.py, test_acceptance_phase3_live.py
Tasks:
| # | Task |
|---|---|
| 3.6.1 | Implement integration test (mocked yt-dlp, real httpx proxy + hls.js) |
| 3.6.2 | Implement acceptance: real YouTube VOD → extract → proxy → play |
| 3.6.3 | Implement acceptance: real YouTube live stream → extract → proxy → play + ASR |
| 3.6.4 | Full regression run (Phase 1 + 2 + 3 tests) |
| 3.6.5 | Fix failures, final commit |
Phase 3.7 — Polish & Deployment (0.5 day)
| # | Task |
|---|---|
| 3.7.1 | Handle PO token expiration for live streams (log warning, auto-re-extract on failure) |
| 3.7.2 | Update Dockerfile — ensure ffmpeg + yt-dlp available in container |
| 3.7.3 | Update docker-compose.yml — add any new volumes/env vars |
| 3.7.4 | Verify production build (npm run build, docker compose up -d --build) |
| 3.7.5 | Update README.md — YouTube feature section |
| 3.7.6 | Update development_plan.md — mark Phase 3 status |
| 3.7.7 | Final commit |
4. Timeline
| Sub-Phase | Description | Effort | Depends On |
|---|---|---|---|
| 3.1 | Config & Infrastructure | 0.5 day | — |
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 |
| 3.3 | HLS Proxy Backend | 1 day | 3.1 |
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 |
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 |
| 3.6 | Integration & Acceptance | 1 day | 3.5 |
| 3.7 | Polish & Deployment | 0.5 day | 3.6 |
| Total | 5.5 days |
3.2 (extraction) and 3.3 (proxy) can run concurrently.
5. Dependencies
Backend: yt-dlp>=2024.0.0 (new), httpx>=0.26.0 (already present), aiofiles>=24.0.0 (already present)
Frontend: hls.js@^1.5.0 (new — NOT present, must install)
System: ffmpeg on server (already required by Phase 2)
6. Config Fields
# YouTube live stream proxy (Phase 3)
youtube_proxy_enabled: bool = True
yt_dlp_timeout: int = 30 # seconds for yt-dlp extraction
yt_dlp_cache_ttl: int = 300 # seconds to cache extraction results
# .env.example additions
YOUTUBE_PROXY_ENABLED=true
YT_DLP_TIMEOUT=30
YT_DLP_CACHE_TTL=300
7. Key Design Decisions
| Decision | Choice | Why |
|---|---|---|
| Streaming protocol | HLS (m3u8) | hls.js plays it natively; DASH requires dash.js |
| yt-dlp client | ios for live, web for VOD |
ios returns HLS for live streams with 60fps support; format selector prefers ≤480p |
| HTTP client for proxy | httpx (already present) | Streaming support via httpx.stream(); no new dependency |
| Manifest rewriting | Line-by-line streaming | Live manifests can be large; never buffer whole file |
| Audio element | Hidden <audio> + hls.js |
createMediaElementSource works on <audio> elements |
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min |
| Full Transcript for YouTube | Disabled | Button hidden; real-time streaming ASR only |
| QueryInput during streaming | Editable | User can type corrections while transcript streams (same as existing ASR) |
| Video quality | 360p–480p auto-best | Low resolution sufficient for reference; no quality selector |
| Auto-play on load | Wait for manual play | Thumbnail placeholder; user presses play. Respects autoplay policy. |
| Thumbnail | Stays until user presses play | Clean transition; no black frame |
| Error recovery | Retry 3× → "Service unavailable" | Auto-re-extract URL on HLS failure; after 3 failures, show error state |
| PO Tokens (live streams) | Graceful degradation for MVP | Stream first ~30s; on failure retry 3× with fresh URL; after exhaustion show "Live stream unavailable" |
8. File Manifest
New Files
backend/
app/models/youtube.py
app/services/youtube_service.py
app/services/hls_proxy.py
app/routers/youtube.py
app/test/test_phase3_config.py
app/test/test_phase3_youtube_extract.py
app/test/test_phase3_hls_proxy.py
app/test/test_phase3_hls_manifest.py
app/test/test_integration_phase3.py
app/test/acceptance/test_acceptance_phase3_youtube.py
app/test/acceptance/test_acceptance_phase3_live.py
frontend/src/
components/YouTubeInput.tsx
components/YouTubeVideoPlayer.tsx
hooks/useYouTubeASR.ts
test/test_phase3_YouTubeInput.test.tsx
test/test_phase3_YouTubeVideoPlayer.test.tsx
test/test_phase3_useYouTubeASR.test.ts
test/test_phase3_LTTPage_integration.test.tsx
Modified Files
backend/app/core/config.py # Add 3 config fields
backend/.env.example # Add 3 env vars
backend/main.py # Register youtube router
backend/requirements.txt # Add yt-dlp
frontend/package.json # Add hls.js
frontend/src/types/index.ts # Add YouTube types
frontend/src/lib/api.ts # Add extractYouTube(), getYouTubeProxyUrl()
frontend/src/lib/queries.tsx # Add useYouTubeExtract() mutation
frontend/src/pages/LTTPage.tsx # Add source toggle + YouTube components
frontend/src/components/QueryInput.tsx # Accept transcript from either source
Dockerfile # Add yt-dlp install step
docker-compose.yml # Add env vars if needed
README.md # YouTube feature section
development_plan.md # Mark Phase 3 status
9. Known Risks & Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| PO Token expiration (live streams cut at 30s) | High — live streams unusable without token | Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify |
| yt-dlp extraction slow (2-5s) | Medium — poor UX on "Load Stream" click | Cache results with TTL; show progress indicator |
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; pip install -U yt-dlp in maintenance |
| hls.js audio sync drift vs video | Low — separate streams may drift | hls.js liveSyncDuration keeps both near live edge; test with 10+ min streams |
Safari createMediaElementSource on HLS |
Low — known Safari bug with native HLS | hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected |
| YouTube ToS for proxy | Low for internal demo | Personal/enterprise internal demo is generally fine; review for public product |
10. Example Data Flow
POST /api/v1/youtube/extract
Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
Response: {
"video_id": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"is_live": false,
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
"thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
}
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
→ Fetches upstream manifest from googlevideo.com
→ Rewrites segment URLs:
segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
→ Streams rewritten manifest to browser
GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
→ Fetches upstream .ts segment via httpx.stream()
→ StreamingResponse with Content-Type: video/mp2t
→ CORS: access-control-allow-origin: *
11. References
- mediaflow-proxy: Production FastAPI HLS proxy with M3U8Processor — mhdzumair/mediaflow-proxy
- yt-dlp API docs: yt-dlp-yt-dlp.mintlify.app
- hls.js API docs: github.com/video-dev/hls.js/blob/master/docs/API.md
- hls.js low-latency live:
lowLatencyMode: true,liveSyncDuration: 1.5 - Existing code patterns:
.plans/phase2_implementation_plan.md,backend/app/routers/video.py,frontend/src/hooks/useVideoASR.ts