legco_ai_assistant/.plans/phase3_youtube_proxy_plan.md

18 KiB
Raw Blame History

Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan

Created: 2026-05-09 Updated: 2026-05-09 (user decisions incorporated) Status: Planning Depends on: Phase 1 (Complete), Phase 2 (Complete)


1. Overview

Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in <video> via hls.js, routes audio through hidden <audio> element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.

Same code works identically for live streams and VODs.

Why Full Proxy (Not iframe)

YouTube's official iframe player does not expose the audio track to Web Audio API due to cross-origin restrictions. We proxy HLS segments through our backend so the browser treats them as same-origin.

Audio Routing

YouTube HLS audio-only stream
  → hls.js loads into hidden <audio> element
  → AudioContext.createMediaElementSource(audioElement)
  → ScriptProcessorNode (Float32 PCM)
  → WebSocket → FastAPI → DashScope realtime ASR
  → transcript → QueryInput

Integration With Existing Pipeline

This phase reuses the existing ASR infrastructure entirely:

  • useVideoASR.ts AudioContext graph pattern → adapted for YouTube audio element
  • ws_asr.py WebSocket → DashScope proxy → unchanged
  • QueryInput.tsx transcript display → unchanged
  • LTTPage.tsx layout → minor addition (source toggle)
  • RAG pipeline → unchanged

2. User Flow

  1. User selects "YouTube" source (instead of "Upload")
  2. User pastes YouTube URL → clicks "Load Stream"
  3. Backend extracts stream URLs → thumbnail shown as placeholder; video loads behind the scenes
  4. User presses play → video appears, audio routes to ASR pipeline (no auto-play)
  5. Real-time ASR transcription begins automatically on play
  6. Transcript flows into QueryInput → user can edit while streaming continues
  7. User pauses/stops → transcript stays, user edits and submits → RAG answer
  8. "Full Transcript" button hidden for YouTube source — real-time streaming ASR only
  9. If HLS stream fails: auto-retry up to 3 times with re-extracted URL → after 3 failures, show "Live stream unavailable" error

3. Sub-Phases

Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)

Add config fields, install dependencies, create skeletons, register router.

Test: test_phase3_config.py

Tasks:

# Task File
3.1.1 Add config fields: youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl core/config.py
3.1.2 Update .env.example .env.example
3.1.3 Add deps: yt-dlp>=2024.0.0 to requirements.txt, hls.js@^1.5.0 to package.json requirements.txt, package.json
3.1.4 Create models/youtube.pyYouTubeExtractRequest, YouTubeStreamResponse, StreamFormat models/youtube.py
3.1.5 Create services/youtube_service.py stub services/youtube_service.py
3.1.6 Create services/hls_proxy.py stub services/hls_proxy.py
3.1.7 Create routers/youtube.py stub: POST /youtube/extract, GET /youtube/proxy/{stream_type}/{path} routers/youtube.py
3.1.8 Register router in main.py main.py
3.1.9 Write and pass test_phase3_config.py app/test/

Phase 3.2 — YouTube URL Extraction Backend (0.5 day)

yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.

Test: test_phase3_youtube_extract.py

Acceptance Criteria:

  • POST /api/v1/youtube/extract accepts {"url": "https://www.youtube.com/watch?v=..."}
  • Returns { video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url }
  • VODs: extracts ~210 formats, returns best video+audio pair
  • Live streams: uses ios client for HLS, returns current live edge
  • Upcoming/scheduled streams: returns is_upcoming: true with scheduled start time
  • Invalid/private URLs: returns clear error
  • URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)

Tasks:

# Task File
3.2.1 Write tests first app/test/test_phase3_youtube_extract.py
3.2.2 Implement YouTubeService.extract_streams() — yt-dlp wrapper with format selection services/youtube_service.py
3.2.3 Implement YouTubeService._select_best_formats() — separate video/audio from format list, prefer ≤480p services/youtube_service.py
3.2.4 Implement format URL caching with TTL services/youtube_service.py
3.2.5 Implement POST /api/v1/youtube/extract route routers/youtube.py
3.2.6 Run tests → pass → commit

Phase 3.3 — HLS Proxy Backend (1 day)

Proxy service that rewrites HLS manifests and proxies .ts segments. StreamingResponse for minimal latency.

Reference: mediaflow-proxy M3U8Processor pattern (line-by-line streaming, URL rewriting)

Tests: test_phase3_hls_proxy.py, test_phase3_hls_manifest.py

Acceptance Criteria:

  • GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded> — fetches upstream manifest, rewrites all segment/sub-manifest URLs to point back to our proxy, streams response
  • GET /api/v1/youtube/proxy/segment.ts?url=<encoded> — fetches upstream .ts segment, proxies with correct Content-Type (video/mp2t) and CORS headers
  • Lines rewritten: segment URIs, sub-manifest URIs, #EXT-X-KEY:URI=, absolute URLs
  • Lines passed through: #EXTINF:, #EXT-X-TARGETDURATION, #EXT-X-MEDIA-SEQUENCE, #EXT-X-STREAM-INFO, comments
  • Client disconnect → upstream connection closed cleanly
  • CORS headers on every response: access-control-allow-origin: *
  • Upstream failure → HTTP 502 with error detail; frontend retries up to 3 times with fresh URL before showing "Service unavailable"

Tasks:

# Task File
3.3.1 Write tests first app/test/test_phase3_hls_proxy.py, app/test/test_phase3_hls_manifest.py
3.3.2 Implement HLSProxyService.rewrite_manifest() — streaming line-by-line, URL detection + rewriting services/hls_proxy.py
3.3.3 Implement HLSProxyService.proxy_segment() — httpx stream → StreamingResponse services/hls_proxy.py
3.3.4 Implement GET /api/v1/youtube/proxy/{type}/{path} route — dispatch manifest vs segment routers/youtube.py
3.3.5 Run tests → pass → commit

Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)

URL input component and hls.js-based video player. Two hidden media elements: visible <video> (video-only, muted) and hidden <audio> (audio-only, for Web Audio API routing).

Tests: test_phase3_YouTubeInput.test.tsx, test_phase3_YouTubeVideoPlayer.test.tsx

Acceptance Criteria:

  • YouTubeInput accepts URL, validates format, shows loading/error states
  • YouTubeVideoPlayer uses forwardRef<HTMLVideoElement> (same pattern as VideoPlayer)
  • Video HLS loaded via hls.js into <video muted> element at 360p480p (auto-best ≤ 480p)
  • Audio HLS loaded via hls.js into hidden <audio> element
  • Audio element exposes ref for parent to connect to AudioContext
  • Thumbnail displayed as placeholder until user presses play; video element replaces it on play
  • Video does NOT auto-play on load (waits for manual user play)
  • Loading spinner, error overlay, "LIVE" badge for live streams
  • HLS error recovery: on hls.js fatal error → re-extract stream URL → retry up to 3× → show "Service unavailable" on exhaustion
  • CrossOrigin="anonymous" on both elements (required for AudioContext graph)
  • No quality selector (low resolution only, sufficient for reference video)

Tasks:

# Task File
3.4.1 Write tests first src/test/test_phase3_YouTubeInput.test.tsx, src/test/test_phase3_YouTubeVideoPlayer.test.tsx
3.4.2 Add YouTube types to types/index.ts types/index.ts
3.4.3 Add API functions to lib/api.ts lib/api.ts
3.4.4 Add TanStack Query hooks to lib/queries.tsx lib/queries.tsx
3.4.5 Create components/YouTubeInput.tsx — URL input, validation, loading/error states components/YouTubeInput.tsx
3.4.6 Create components/YouTubeVideoPlayer.tsx — hls.js dual-element player, forwardRef components/YouTubeVideoPlayer.tsx
3.4.7 Run tests → pass → commit

Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)

Wire YouTube audio output into existing ASR pipeline. The key challenge: useVideoASR currently captures from <video> element; we need it to capture from the <audio> element loaded by hls.js.

Tests: test_phase3_useYouTubeASR.test.ts, test_phase3_LTTPage_integration.test.tsx

Acceptance Criteria:

  • useYouTubeASR hook: accepts audioElement ref, sets up AudioContext graph on mount
  • AudioContext.createMediaElementSource(audioElement) → ScriptProcessorNode → WebSocket
  • Auto-starts ASR on play, stops on pause/end (same lifecycle as useVideoASR)
  • Transcript flows into QueryInput (same onFinalTranscript callback)
  • QueryInput remains editable during streaming — user can type corrections while ASR appends
  • "Full Transcript" button hidden when YouTube source is active
  • Switching between "Upload" and "YouTube" sources clears previous state

Tasks:

# Task File
3.5.1 Write tests first src/test/test_phase3_useYouTubeASR.test.ts
3.5.2 Create hooks/useYouTubeASR.ts — adapted from useVideoASR.ts, targets <audio> element hooks/useYouTubeASR.ts
3.5.3 Update QueryInput.tsx — accept transcript from either source components/QueryInput.tsx
3.5.4 Update LTTPage.tsx — add source toggle (Upload / YouTube), wire YouTubeInput + YouTubeVideoPlayer pages/LTTPage.tsx
3.5.5 Create test_phase3_LTTPage_integration.test.tsx src/test/
3.5.6 Run tests → pass → commit

Phase 3.6 — Integration & Acceptance Testing (1 day)

Tests: test_integration_phase3.py, test_acceptance_phase3_youtube.py, test_acceptance_phase3_live.py

Tasks:

# Task
3.6.1 Implement integration test (mocked yt-dlp, real httpx proxy + hls.js)
3.6.2 Implement acceptance: real YouTube VOD → extract → proxy → play
3.6.3 Implement acceptance: real YouTube live stream → extract → proxy → play + ASR
3.6.4 Full regression run (Phase 1 + 2 + 3 tests)
3.6.5 Fix failures, final commit

Phase 3.7 — Polish & Deployment (0.5 day)

# Task
3.7.1 Handle PO token expiration for live streams (log warning, auto-re-extract on failure)
3.7.2 Update Dockerfile — ensure ffmpeg + yt-dlp available in container
3.7.3 Update docker-compose.yml — add any new volumes/env vars
3.7.4 Verify production build (npm run build, docker compose up -d --build)
3.7.5 Update README.md — YouTube feature section
3.7.6 Update development_plan.md — mark Phase 3 status
3.7.7 Final commit

4. Timeline

Sub-Phase Description Effort Depends On
3.1 Config & Infrastructure 0.5 day
3.2 YouTube URL Extraction 0.5 day 3.1
3.3 HLS Proxy Backend 1 day 3.1
3.4 Frontend Input + Player 1 day 3.2, 3.3
3.5 YouTube → ASR Integration 1 day 3.4
3.6 Integration & Acceptance 1 day 3.5
3.7 Polish & Deployment 0.5 day 3.6
Total 5.5 days

3.2 (extraction) and 3.3 (proxy) can run concurrently.


5. Dependencies

Backend: yt-dlp>=2024.0.0 (new), httpx>=0.26.0 (already present), aiofiles>=24.0.0 (already present) Frontend: hls.js@^1.5.0 (new — NOT present, must install) System: ffmpeg on server (already required by Phase 2)


6. Config Fields

# YouTube live stream proxy (Phase 3)
youtube_proxy_enabled: bool = True
yt_dlp_timeout: int = 30          # seconds for yt-dlp extraction
yt_dlp_cache_ttl: int = 300       # seconds to cache extraction results
# .env.example additions
YOUTUBE_PROXY_ENABLED=true
YT_DLP_TIMEOUT=30
YT_DLP_CACHE_TTL=300

7. Key Design Decisions

Decision Choice Why
Streaming protocol HLS (m3u8) hls.js plays it natively; DASH requires dash.js
yt-dlp client ios for live, web for VOD ios returns HLS for live streams with 60fps support; format selector prefers ≤480p
HTTP client for proxy httpx (already present) Streaming support via httpx.stream(); no new dependency
Manifest rewriting Line-by-line streaming Live manifests can be large; never buffer whole file
Audio element Hidden <audio> + hls.js createMediaElementSource works on <audio> elements
URL caching In-memory dict with TTL yt-dlp extraction is slow (~2-5s); reuse for 5 min
Full Transcript for YouTube Disabled Button hidden; real-time streaming ASR only
QueryInput during streaming Editable User can type corrections while transcript streams (same as existing ASR)
Video quality 360p480p auto-best Low resolution sufficient for reference; no quality selector
Auto-play on load Wait for manual play Thumbnail placeholder; user presses play. Respects autoplay policy.
Thumbnail Stays until user presses play Clean transition; no black frame
Error recovery Retry 3× → "Service unavailable" Auto-re-extract URL on HLS failure; after 3 failures, show error state
PO Tokens (live streams) Graceful degradation for MVP Stream first ~30s; on failure retry 3× with fresh URL; after exhaustion show "Live stream unavailable"

8. File Manifest

New Files

backend/
  app/models/youtube.py
  app/services/youtube_service.py
  app/services/hls_proxy.py
  app/routers/youtube.py
  app/test/test_phase3_config.py
  app/test/test_phase3_youtube_extract.py
  app/test/test_phase3_hls_proxy.py
  app/test/test_phase3_hls_manifest.py
  app/test/test_integration_phase3.py
  app/test/acceptance/test_acceptance_phase3_youtube.py
  app/test/acceptance/test_acceptance_phase3_live.py

frontend/src/
  components/YouTubeInput.tsx
  components/YouTubeVideoPlayer.tsx
  hooks/useYouTubeASR.ts
  test/test_phase3_YouTubeInput.test.tsx
  test/test_phase3_YouTubeVideoPlayer.test.tsx
  test/test_phase3_useYouTubeASR.test.ts
  test/test_phase3_LTTPage_integration.test.tsx

Modified Files

backend/app/core/config.py                     # Add 3 config fields
backend/.env.example                            # Add 3 env vars
backend/main.py                                 # Register youtube router
backend/requirements.txt                        # Add yt-dlp

frontend/package.json                           # Add hls.js
frontend/src/types/index.ts                     # Add YouTube types
frontend/src/lib/api.ts                         # Add extractYouTube(), getYouTubeProxyUrl()
frontend/src/lib/queries.tsx                    # Add useYouTubeExtract() mutation
frontend/src/pages/LTTPage.tsx                  # Add source toggle + YouTube components
frontend/src/components/QueryInput.tsx          # Accept transcript from either source

Dockerfile                                      # Add yt-dlp install step
docker-compose.yml                              # Add env vars if needed
README.md                                       # YouTube feature section
development_plan.md                             # Mark Phase 3 status

9. Known Risks & Mitigations

Risk Impact Mitigation
PO Token expiration (live streams cut at 30s) High — live streams unusable without token Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify
yt-dlp extraction slow (2-5s) Medium — poor UX on "Load Stream" click Cache results with TTL; show progress indicator
YouTube format changes break yt-dlp Medium — sudden breakage Pin yt-dlp version; CI test with known-good URLs; pip install -U yt-dlp in maintenance
hls.js audio sync drift vs video Low — separate streams may drift hls.js liveSyncDuration keeps both near live edge; test with 10+ min streams
Safari createMediaElementSource on HLS Low — known Safari bug with native HLS hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected
YouTube ToS for proxy Low for internal demo Personal/enterprise internal demo is generally fine; review for public product

10. Example Data Flow

POST /api/v1/youtube/extract
  Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
  Response: {
    "video_id": "dQw4w9WgXcQ",
    "title": "Rick Astley - Never Gonna Give You Up",
    "is_live": false,
    "video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
    "audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
    "thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
  }

GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
  → Fetches upstream manifest from googlevideo.com
  → Rewrites segment URLs:
      segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
  → Streams rewritten manifest to browser

GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
  → Fetches upstream .ts segment via httpx.stream()
  → StreamingResponse with Content-Type: video/mp2t
  → CORS: access-control-allow-origin: *

11. References