18 KiB

Raw Blame History

Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan

Created: 2026-05-09 Updated: 2026-05-09 (user decisions incorporated) Status: Planning Depends on: Phase 1 (Complete), Phase 2 (Complete)

1. Overview

Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in <video> via hls.js, routes audio through hidden <audio> element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.

Same code works identically for live streams and VODs.

Why Full Proxy (Not iframe)

YouTube's official iframe player does not expose the audio track to Web Audio API due to cross-origin restrictions. We proxy HLS segments through our backend so the browser treats them as same-origin.

Audio Routing

YouTube HLS audio-only stream
  → hls.js loads into hidden <audio> element
  → AudioContext.createMediaElementSource(audioElement)
  → ScriptProcessorNode (Float32 PCM)
  → WebSocket → FastAPI → DashScope realtime ASR
  → transcript → QueryInput

Integration With Existing Pipeline

This phase reuses the existing ASR infrastructure entirely:

useVideoASR.ts AudioContext graph pattern → adapted for YouTube audio element
ws_asr.py WebSocket → DashScope proxy → unchanged
QueryInput.tsx transcript display → unchanged
LTTPage.tsx layout → minor addition (source toggle)
RAG pipeline → unchanged

2. User Flow

User selects "YouTube" source (instead of "Upload")
User pastes YouTube URL → clicks "Load Stream"
Backend extracts stream URLs → thumbnail shown as placeholder; video loads behind the scenes
User presses play → video appears, audio routes to ASR pipeline (no auto-play)
Real-time ASR transcription begins automatically on play
Transcript flows into QueryInput → user can edit while streaming continues
User pauses/stops → transcript stays, user edits and submits → RAG answer
"Full Transcript" button hidden for YouTube source — real-time streaming ASR only
If HLS stream fails: auto-retry up to 3 times with re-extracted URL → after 3 failures, show "Live stream unavailable" error

3. Sub-Phases

Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)

Add config fields, install dependencies, create skeletons, register router.

Test: test_phase3_config.py

Tasks:

#	Task	File
3.1.1	Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl`	`core/config.py`
3.1.2	Update `.env.example`	`.env.example`
3.1.3	Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json`	`requirements.txt`, `package.json`
3.1.4	Create `models/youtube.py` — `YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat`	`models/youtube.py`
3.1.5	Create `services/youtube_service.py` stub	`services/youtube_service.py`
3.1.6	Create `services/hls_proxy.py` stub	`services/hls_proxy.py`
3.1.7	Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}`	`routers/youtube.py`
3.1.8	Register router in `main.py`	`main.py`
3.1.9	Write and pass `test_phase3_config.py`	`app/test/`

Phase 3.2 — YouTube URL Extraction Backend (0.5 day)

yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.

Test: test_phase3_youtube_extract.py

Acceptance Criteria:

POST /api/v1/youtube/extract accepts {"url": "https://www.youtube.com/watch?v=..."}
Returns { video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url }
VODs: extracts ~2–10 formats, returns best video+audio pair
Live streams: uses ios client for HLS, returns current live edge
Upcoming/scheduled streams: returns is_upcoming: true with scheduled start time
Invalid/private URLs: returns clear error
URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)

Tasks:

#	Task	File
3.2.1	Write tests first	`app/test/test_phase3_youtube_extract.py`
3.2.2	Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection	`services/youtube_service.py`
3.2.3	Implement `YouTubeService._select_best_formats()` — separate video/audio from format list, prefer ≤480p	`services/youtube_service.py`
3.2.4	Implement format URL caching with TTL	`services/youtube_service.py`
3.2.5	Implement `POST /api/v1/youtube/extract` route	`routers/youtube.py`
3.2.6	Run tests → pass → commit	—

Phase 3.3 — HLS Proxy Backend (1 day)

Proxy service that rewrites HLS manifests and proxies .ts segments. StreamingResponse for minimal latency.

Reference: mediaflow-proxy M3U8Processor pattern (line-by-line streaming, URL rewriting)

Tests: test_phase3_hls_proxy.py, test_phase3_hls_manifest.py

Acceptance Criteria:

GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded> — fetches upstream manifest, rewrites all segment/sub-manifest URLs to point back to our proxy, streams response
GET /api/v1/youtube/proxy/segment.ts?url=<encoded> — fetches upstream .ts segment, proxies with correct Content-Type (video/mp2t) and CORS headers
Lines rewritten: segment URIs, sub-manifest URIs, #EXT-X-KEY:URI=, absolute URLs
Lines passed through: #EXTINF:, #EXT-X-TARGETDURATION, #EXT-X-MEDIA-SEQUENCE, #EXT-X-STREAM-INFO, comments
Client disconnect → upstream connection closed cleanly
CORS headers on every response: access-control-allow-origin: *
Upstream failure → HTTP 502 with error detail; frontend retries up to 3 times with fresh URL before showing "Service unavailable"

Tasks:

#	Task	File
3.3.1	Write tests first	`app/test/test_phase3_hls_proxy.py`, `app/test/test_phase3_hls_manifest.py`
3.3.2	Implement `HLSProxyService.rewrite_manifest()` — streaming line-by-line, URL detection + rewriting	`services/hls_proxy.py`
3.3.3	Implement `HLSProxyService.proxy_segment()` — httpx stream → StreamingResponse	`services/hls_proxy.py`
3.3.4	Implement `GET /api/v1/youtube/proxy/{type}/{path}` route — dispatch manifest vs segment	`routers/youtube.py`
3.3.5	Run tests → pass → commit	—

Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)

URL input component and hls.js-based video player. Two hidden media elements: visible <video> (video-only, muted) and hidden <audio> (audio-only, for Web Audio API routing).

Tests: test_phase3_YouTubeInput.test.tsx, test_phase3_YouTubeVideoPlayer.test.tsx

Acceptance Criteria:

YouTubeInput accepts URL, validates format, shows loading/error states
YouTubeVideoPlayer uses forwardRef<HTMLVideoElement> (same pattern as VideoPlayer)
Video HLS loaded via hls.js into <video muted> element at 360p–480p (auto-best ≤ 480p)
Audio HLS loaded via hls.js into hidden <audio> element
Audio element exposes ref for parent to connect to AudioContext
Thumbnail displayed as placeholder until user presses play; video element replaces it on play
Video does NOT auto-play on load (waits for manual user play)
Loading spinner, error overlay, "LIVE" badge for live streams
HLS error recovery: on hls.js fatal error → re-extract stream URL → retry up to 3× → show "Service unavailable" on exhaustion
CrossOrigin="anonymous" on both elements (required for AudioContext graph)
No quality selector (low resolution only, sufficient for reference video)

Tasks:

#	Task	File
3.4.1	Write tests first	`src/test/test_phase3_YouTubeInput.test.tsx`, `src/test/test_phase3_YouTubeVideoPlayer.test.tsx`
3.4.2	Add YouTube types to `types/index.ts`	`types/index.ts`
3.4.3	Add API functions to `lib/api.ts`	`lib/api.ts`
3.4.4	Add TanStack Query hooks to `lib/queries.tsx`	`lib/queries.tsx`
3.4.5	Create `components/YouTubeInput.tsx` — URL input, validation, loading/error states	`components/YouTubeInput.tsx`
3.4.6	Create `components/YouTubeVideoPlayer.tsx` — hls.js dual-element player, forwardRef	`components/YouTubeVideoPlayer.tsx`
3.4.7	Run tests → pass → commit	—

Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)

Wire YouTube audio output into existing ASR pipeline. The key challenge: useVideoASR currently captures from <video> element; we need it to capture from the <audio> element loaded by hls.js.

Tests: test_phase3_useYouTubeASR.test.ts, test_phase3_LTTPage_integration.test.tsx

Acceptance Criteria:

useYouTubeASR hook: accepts audioElement ref, sets up AudioContext graph on mount
AudioContext.createMediaElementSource(audioElement) → ScriptProcessorNode → WebSocket
Auto-starts ASR on play, stops on pause/end (same lifecycle as useVideoASR)
Transcript flows into QueryInput (same onFinalTranscript callback)
QueryInput remains editable during streaming — user can type corrections while ASR appends
"Full Transcript" button hidden when YouTube source is active
Switching between "Upload" and "YouTube" sources clears previous state

Tasks:

#	Task	File
3.5.1	Write tests first	`src/test/test_phase3_useYouTubeASR.test.ts`
3.5.2	Create `hooks/useYouTubeASR.ts` — adapted from `useVideoASR.ts`, targets `<audio>` element	`hooks/useYouTubeASR.ts`
3.5.3	Update `QueryInput.tsx` — accept transcript from either source	`components/QueryInput.tsx`
3.5.4	Update `LTTPage.tsx` — add source toggle (Upload / YouTube), wire YouTubeInput + YouTubeVideoPlayer	`pages/LTTPage.tsx`
3.5.5	Create `test_phase3_LTTPage_integration.test.tsx`	`src/test/`
3.5.6	Run tests → pass → commit	—

Phase 3.6 — Integration & Acceptance Testing (1 day)

Tests: test_integration_phase3.py, test_acceptance_phase3_youtube.py, test_acceptance_phase3_live.py

Tasks:

#	Task
3.6.1	Implement integration test (mocked yt-dlp, real httpx proxy + hls.js)
3.6.2	Implement acceptance: real YouTube VOD → extract → proxy → play
3.6.3	Implement acceptance: real YouTube live stream → extract → proxy → play + ASR
3.6.4	Full regression run (Phase 1 + 2 + 3 tests)
3.6.5	Fix failures, final commit

Phase 3.7 — Polish & Deployment (0.5 day)

#	Task
3.7.1	Handle PO token expiration for live streams (log warning, auto-re-extract on failure)
3.7.2	Update Dockerfile — ensure ffmpeg + yt-dlp available in container
3.7.3	Update `docker-compose.yml` — add any new volumes/env vars
3.7.4	Verify production build (`npm run build`, `docker compose up -d --build`)
3.7.5	Update `README.md` — YouTube feature section
3.7.6	Update `development_plan.md` — mark Phase 3 status
3.7.7	Final commit

4. Timeline

Sub-Phase	Description	Effort	Depends On
3.1	Config & Infrastructure	0.5 day	—
3.2	YouTube URL Extraction	0.5 day	3.1
3.3	HLS Proxy Backend	1 day	3.1
3.4	Frontend Input + Player	1 day	3.2, 3.3
3.5	YouTube → ASR Integration	1 day	3.4
3.6	Integration & Acceptance	1 day	3.5
3.7	Polish & Deployment	0.5 day	3.6
Total		5.5 days

3.2 (extraction) and 3.3 (proxy) can run concurrently.

5. Dependencies

Backend: yt-dlp>=2024.0.0 (new), httpx>=0.26.0 (already present), aiofiles>=24.0.0 (already present) Frontend: hls.js@^1.5.0 (new — NOT present, must install) System: ffmpeg on server (already required by Phase 2)

6. Config Fields

# YouTube live stream proxy (Phase 3)
youtube_proxy_enabled: bool = True
yt_dlp_timeout: int = 30          # seconds for yt-dlp extraction
yt_dlp_cache_ttl: int = 300       # seconds to cache extraction results

# .env.example additions
YOUTUBE_PROXY_ENABLED=true
YT_DLP_TIMEOUT=30
YT_DLP_CACHE_TTL=300

7. Key Design Decisions

Decision	Choice	Why
Streaming protocol	HLS (m3u8)	hls.js plays it natively; DASH requires dash.js
yt-dlp client	`ios` for live, `web` for VOD	`ios` returns HLS for live streams with 60fps support; format selector prefers ≤480p
HTTP client for proxy	httpx (already present)	Streaming support via `httpx.stream()`; no new dependency
Manifest rewriting	Line-by-line streaming	Live manifests can be large; never buffer whole file
Audio element	Hidden `<audio>` + hls.js	`createMediaElementSource` works on `<audio>` elements
URL caching	In-memory dict with TTL	yt-dlp extraction is slow (~2-5s); reuse for 5 min
Full Transcript for YouTube	Disabled	Button hidden; real-time streaming ASR only
QueryInput during streaming	Editable	User can type corrections while transcript streams (same as existing ASR)
Video quality	360p–480p auto-best	Low resolution sufficient for reference; no quality selector
Auto-play on load	Wait for manual play	Thumbnail placeholder; user presses play. Respects autoplay policy.
Thumbnail	Stays until user presses play	Clean transition; no black frame
Error recovery	Retry 3× → "Service unavailable"	Auto-re-extract URL on HLS failure; after 3 failures, show error state
PO Tokens (live streams)	Graceful degradation for MVP	Stream first ~30s; on failure retry 3× with fresh URL; after exhaustion show "Live stream unavailable"

8. File Manifest

New Files

backend/
  app/models/youtube.py
  app/services/youtube_service.py
  app/services/hls_proxy.py
  app/routers/youtube.py
  app/test/test_phase3_config.py
  app/test/test_phase3_youtube_extract.py
  app/test/test_phase3_hls_proxy.py
  app/test/test_phase3_hls_manifest.py
  app/test/test_integration_phase3.py
  app/test/acceptance/test_acceptance_phase3_youtube.py
  app/test/acceptance/test_acceptance_phase3_live.py

frontend/src/
  components/YouTubeInput.tsx
  components/YouTubeVideoPlayer.tsx
  hooks/useYouTubeASR.ts
  test/test_phase3_YouTubeInput.test.tsx
  test/test_phase3_YouTubeVideoPlayer.test.tsx
  test/test_phase3_useYouTubeASR.test.ts
  test/test_phase3_LTTPage_integration.test.tsx

Modified Files

backend/app/core/config.py                     # Add 3 config fields
backend/.env.example                            # Add 3 env vars
backend/main.py                                 # Register youtube router
backend/requirements.txt                        # Add yt-dlp

frontend/package.json                           # Add hls.js
frontend/src/types/index.ts                     # Add YouTube types
frontend/src/lib/api.ts                         # Add extractYouTube(), getYouTubeProxyUrl()
frontend/src/lib/queries.tsx                    # Add useYouTubeExtract() mutation
frontend/src/pages/LTTPage.tsx                  # Add source toggle + YouTube components
frontend/src/components/QueryInput.tsx          # Accept transcript from either source

Dockerfile                                      # Add yt-dlp install step
docker-compose.yml                              # Add env vars if needed
README.md                                       # YouTube feature section
development_plan.md                             # Mark Phase 3 status

9. Known Risks & Mitigations

Risk	Impact	Mitigation
PO Token expiration (live streams cut at 30s)	High — live streams unusable without token	Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify
yt-dlp extraction slow (2-5s)	Medium — poor UX on "Load Stream" click	Cache results with TTL; show progress indicator
YouTube format changes break yt-dlp	Medium — sudden breakage	Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance
hls.js audio sync drift vs video	Low — separate streams may drift	hls.js `liveSyncDuration` keeps both near live edge; test with 10+ min streams
Safari `createMediaElementSource` on HLS	Low — known Safari bug with native HLS	hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected
YouTube ToS for proxy	Low for internal demo	Personal/enterprise internal demo is generally fine; review for public product

10. Example Data Flow

POST /api/v1/youtube/extract
  Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
  Response: {
    "video_id": "dQw4w9WgXcQ",
    "title": "Rick Astley - Never Gonna Give You Up",
    "is_live": false,
    "video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
    "audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
    "thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
  }

GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
  → Fetches upstream manifest from googlevideo.com
  → Rewrites segment URLs:
      segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
  → Streams rewritten manifest to browser

GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
  → Fetches upstream .ts segment via httpx.stream()
  → StreamingResponse with Content-Type: video/mp2t
  → CORS: access-control-allow-origin: *

11. References

mediaflow-proxy: Production FastAPI HLS proxy with M3U8Processor — mhdzumair/mediaflow-proxy
yt-dlp API docs: yt-dlp-yt-dlp.mintlify.app
hls.js API docs: github.com/video-dev/hls.js/blob/master/docs/API.md
hls.js low-latency live: lowLatencyMode: true, liveSyncDuration: 1.5
Existing code patterns: .plans/phase2_implementation_plan.md, backend/app/routers/video.py, frontend/src/hooks/useVideoASR.ts

18 KiB Raw Blame History Unescape Escape

Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan

1. Overview

Why Full Proxy (Not iframe)

Audio Routing

Integration With Existing Pipeline

2. User Flow

3. Sub-Phases

Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)

Phase 3.2 — YouTube URL Extraction Backend (0.5 day)

Phase 3.3 — HLS Proxy Backend (1 day)

Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)

Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)

Phase 3.6 — Integration & Acceptance Testing (1 day)

Phase 3.7 — Polish & Deployment (0.5 day)

4. Timeline

5. Dependencies

6. Config Fields

7. Key Design Decisions

8. File Manifest

New Files

Modified Files

9. Known Risks & Mitigations

10. Example Data Flow

11. References

18 KiB

Raw Blame History