legco_ai_assistant/.plans/phase3_youtube_proxy_plan.md

# Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan

**Created:** 2026-05-09
**Updated:** 2026-05-09 (user decisions incorporated)
**Status:** Planning
**Depends on:** Phase 1 (Complete), Phase 2 (Complete)

---

## 1. Overview

Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in `<video>` via hls.js, routes audio through hidden `<audio>` element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.

**Same code works identically for live streams and VODs.**

### Why Full Proxy (Not iframe)

YouTube's official iframe player does not expose the audio track to Web Audio API due to cross-origin restrictions. We proxy HLS segments through our backend so the browser treats them as same-origin.

### Audio Routing

```
YouTube HLS audio-only stream
  → hls.js loads into hidden <audio> element
  → AudioContext.createMediaElementSource(audioElement)
  → ScriptProcessorNode (Float32 PCM)
  → WebSocket → FastAPI → DashScope realtime ASR
  → transcript → QueryInput
```

### Integration With Existing Pipeline

This phase reuses the existing ASR infrastructure entirely:
- `useVideoASR.ts` AudioContext graph pattern → adapted for YouTube audio element
- `ws_asr.py` WebSocket → DashScope proxy → unchanged
- `QueryInput.tsx` transcript display → unchanged
- `LTTPage.tsx` layout → minor addition (source toggle)
- RAG pipeline → unchanged

---

## 2. User Flow

1. User selects "YouTube" source (instead of "Upload")
2. User pastes YouTube URL → clicks "Load Stream"
3. Backend extracts stream URLs → thumbnail shown as placeholder; video loads behind the scenes
4. User presses play → video appears, audio routes to ASR pipeline (no auto-play)
5. Real-time ASR transcription begins automatically on play
6. Transcript flows into QueryInput → user can edit while streaming continues
7. User pauses/stops → transcript stays, user edits and submits → RAG answer
8. **"Full Transcript" button hidden for YouTube source** — real-time streaming ASR only
9. **If HLS stream fails**: auto-retry up to 3 times with re-extracted URL → after 3 failures, show "Live stream unavailable" error

---

## 3. Sub-Phases

### Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)

Add config fields, install dependencies, create skeletons, register router.

**Test:** `test_phase3_config.py`

**Tasks:**
| # | Task | File |
|---|------|------|
| 3.1.1 | Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl` | `core/config.py` |
| 3.1.2 | Update `.env.example` | `.env.example` |
| 3.1.3 | Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json` | `requirements.txt`, `package.json` |
| 3.1.4 | Create `models/youtube.py` — `YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat` | `models/youtube.py` |
| 3.1.5 | Create `services/youtube_service.py` stub | `services/youtube_service.py` |
| 3.1.6 | Create `services/hls_proxy.py` stub | `services/hls_proxy.py` |
| 3.1.7 | Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}` | `routers/youtube.py` |
| 3.1.8 | Register router in `main.py` | `main.py` |
| 3.1.9 | Write and pass `test_phase3_config.py` | `app/test/` |

---

### Phase 3.2 — YouTube URL Extraction Backend (0.5 day)

yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.

**Test:** `test_phase3_youtube_extract.py`

**Acceptance Criteria:**
- `POST /api/v1/youtube/extract` accepts `{"url": "https://www.youtube.com/watch?v=..."}`
- Returns `{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url }`
- VODs: extracts ~2–10 formats, returns best video+audio pair
- Live streams: uses `ios` client for HLS, returns current live edge
- Upcoming/scheduled streams: returns `is_upcoming: true` with scheduled start time
- Invalid/private URLs: returns clear error
- URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)

**Tasks:**
| # | Task | File |
|---|------|------|
| 3.2.1 | Write tests first | `app/test/test_phase3_youtube_extract.py` |
| 3.2.2 | Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection | `services/youtube_service.py` |
| 3.2.3 | Implement `YouTubeService._select_best_formats()` — separate video/audio from format list, prefer ≤480p | `services/youtube_service.py` |
| 3.2.4 | Implement format URL caching with TTL | `services/youtube_service.py` |
| 3.2.5 | Implement `POST /api/v1/youtube/extract` route | `routers/youtube.py` |
| 3.2.6 | Run tests → pass → commit | — |

---

### Phase 3.3 — HLS Proxy Backend (1 day)

Proxy service that rewrites HLS manifests and proxies .ts segments. StreamingResponse for minimal latency.

**Reference:** mediaflow-proxy M3U8Processor pattern (line-by-line streaming, URL rewriting)

**Tests:** `test_phase3_hls_proxy.py`, `test_phase3_hls_manifest.py`

**Acceptance Criteria:**
- `GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded>` — fetches upstream manifest, rewrites all segment/sub-manifest URLs to point back to our proxy, streams response
- `GET /api/v1/youtube/proxy/segment.ts?url=<encoded>` — fetches upstream .ts segment, proxies with correct Content-Type (`video/mp2t`) and CORS headers
- Lines rewritten: segment URIs, sub-manifest URIs, `#EXT-X-KEY:URI=`, absolute URLs
- Lines passed through: `#EXTINF:`, `#EXT-X-TARGETDURATION`, `#EXT-X-MEDIA-SEQUENCE`, `#EXT-X-STREAM-INFO`, comments
- Client disconnect → upstream connection closed cleanly
- CORS headers on every response: `access-control-allow-origin: *`
- **Upstream failure → HTTP 502 with error detail; frontend retries up to 3 times with fresh URL before showing "Service unavailable"**

**Tasks:**
| # | Task | File |
|---|------|------|
| 3.3.1 | Write tests first | `app/test/test_phase3_hls_proxy.py`, `app/test/test_phase3_hls_manifest.py` |
| 3.3.2 | Implement `HLSProxyService.rewrite_manifest()` — streaming line-by-line, URL detection + rewriting | `services/hls_proxy.py` |
| 3.3.3 | Implement `HLSProxyService.proxy_segment()` — httpx stream → StreamingResponse | `services/hls_proxy.py` |
| 3.3.4 | Implement `GET /api/v1/youtube/proxy/{type}/{path}` route — dispatch manifest vs segment | `routers/youtube.py` |
| 3.3.5 | Run tests → pass → commit | — |

---

### Phase 3.4 — Frontend: YouTube Input + Video Player (1 day)

URL input component and hls.js-based video player. Two hidden media elements: visible `<video>` (video-only, muted) and hidden `<audio>` (audio-only, for Web Audio API routing).

**Tests:** `test_phase3_YouTubeInput.test.tsx`, `test_phase3_YouTubeVideoPlayer.test.tsx`

**Acceptance Criteria:**
- `YouTubeInput` accepts URL, validates format, shows loading/error states
- `YouTubeVideoPlayer` uses `forwardRef<HTMLVideoElement>` (same pattern as `VideoPlayer`)
- Video HLS loaded via hls.js into `<video muted>` element at 360p–480p (auto-best ≤ 480p)
- Audio HLS loaded via hls.js into hidden `<audio>` element
- Audio element exposes ref for parent to connect to AudioContext
- Thumbnail displayed as placeholder until user presses play; video element replaces it on play
- Video does NOT auto-play on load (waits for manual user play)
- Loading spinner, error overlay, "LIVE" badge for live streams
- **HLS error recovery**: on `hls.js` fatal error → re-extract stream URL → retry up to 3× → show "Service unavailable" on exhaustion
- CrossOrigin="anonymous" on both elements (required for AudioContext graph)
- No quality selector (low resolution only, sufficient for reference video)

**Tasks:**
| # | Task | File |
|---|------|------|
| 3.4.1 | Write tests first | `src/test/test_phase3_YouTubeInput.test.tsx`, `src/test/test_phase3_YouTubeVideoPlayer.test.tsx` |
| 3.4.2 | Add YouTube types to `types/index.ts` | `types/index.ts` |
| 3.4.3 | Add API functions to `lib/api.ts` | `lib/api.ts` |
| 3.4.4 | Add TanStack Query hooks to `lib/queries.tsx` | `lib/queries.tsx` |
| 3.4.5 | Create `components/YouTubeInput.tsx` — URL input, validation, loading/error states | `components/YouTubeInput.tsx` |
| 3.4.6 | Create `components/YouTubeVideoPlayer.tsx` — hls.js dual-element player, forwardRef | `components/YouTubeVideoPlayer.tsx` |
| 3.4.7 | Run tests → pass → commit | — |

---

### Phase 3.5 — Integration: YouTube → ASR Pipeline (1 day)

Wire YouTube audio output into existing ASR pipeline. The key challenge: `useVideoASR` currently captures from `<video>` element; we need it to capture from the `<audio>` element loaded by hls.js.

**Tests:** `test_phase3_useYouTubeASR.test.ts`, `test_phase3_LTTPage_integration.test.tsx`

**Acceptance Criteria:**
- `useYouTubeASR` hook: accepts `audioElement` ref, sets up AudioContext graph on mount
- AudioContext.createMediaElementSource(audioElement) → ScriptProcessorNode → WebSocket
- Auto-starts ASR on play, stops on pause/end (same lifecycle as `useVideoASR`)
- Transcript flows into QueryInput (same `onFinalTranscript` callback)
- QueryInput remains editable during streaming — user can type corrections while ASR appends
- "Full Transcript" button hidden when YouTube source is active
- Switching between "Upload" and "YouTube" sources clears previous state

**Tasks:**
| # | Task | File |
|---|------|------|
| 3.5.1 | Write tests first | `src/test/test_phase3_useYouTubeASR.test.ts` |
| 3.5.2 | Create `hooks/useYouTubeASR.ts` — adapted from `useVideoASR.ts`, targets `<audio>` element | `hooks/useYouTubeASR.ts` |
| 3.5.3 | Update `QueryInput.tsx` — accept transcript from either source | `components/QueryInput.tsx` |
| 3.5.4 | Update `LTTPage.tsx` — add source toggle (Upload / YouTube), wire YouTubeInput + YouTubeVideoPlayer | `pages/LTTPage.tsx` |
| 3.5.5 | Create `test_phase3_LTTPage_integration.test.tsx` | `src/test/` |
| 3.5.6 | Run tests → pass → commit | — |

---

### Phase 3.6 — Integration & Acceptance Testing (1 day)

**Tests:** `test_integration_phase3.py`, `test_acceptance_phase3_youtube.py`, `test_acceptance_phase3_live.py`

**Tasks:**
| # | Task |
|---|------|
| 3.6.1 | Implement integration test (mocked yt-dlp, real httpx proxy + hls.js) |
| 3.6.2 | Implement acceptance: real YouTube VOD → extract → proxy → play |
| 3.6.3 | Implement acceptance: real YouTube live stream → extract → proxy → play + ASR |
| 3.6.4 | Full regression run (Phase 1 + 2 + 3 tests) |
| 3.6.5 | Fix failures, final commit |

---

### Phase 3.7 — Polish & Deployment (0.5 day)

| # | Task |
|---|------|
| 3.7.1 | Handle PO token expiration for live streams (log warning, auto-re-extract on failure) |
| 3.7.2 | Update Dockerfile — ensure ffmpeg + yt-dlp available in container |
| 3.7.3 | Update `docker-compose.yml` — add any new volumes/env vars |
| 3.7.4 | Verify production build (`npm run build`, `docker compose up -d --build`) |
| 3.7.5 | Update `README.md` — YouTube feature section |
| 3.7.6 | Update `development_plan.md` — mark Phase 3 status |
| 3.7.7 | Final commit |

---

## 4. Timeline

| Sub-Phase | Description | Effort | Depends On |
|---|---|---|---|
| 3.1 | Config & Infrastructure | 0.5 day | — |
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 |
| 3.3 | HLS Proxy Backend | 1 day | 3.1 |
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 |
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 |
| 3.6 | Integration & Acceptance | 1 day | 3.5 |
| 3.7 | Polish & Deployment | 0.5 day | 3.6 |
| **Total** | | **5.5 days** | |

3.2 (extraction) and 3.3 (proxy) can run concurrently.

---

## 5. Dependencies

**Backend:** `yt-dlp>=2024.0.0` (new), `httpx>=0.26.0` (already present), `aiofiles>=24.0.0` (already present)
**Frontend:** `hls.js@^1.5.0` (new — NOT present, must install)
**System:** ffmpeg on server (already required by Phase 2)

---

## 6. Config Fields

```python
# YouTube live stream proxy (Phase 3)
youtube_proxy_enabled: bool = True
yt_dlp_timeout: int = 30          # seconds for yt-dlp extraction
yt_dlp_cache_ttl: int = 300       # seconds to cache extraction results
```

```bash
# .env.example additions
YOUTUBE_PROXY_ENABLED=true
YT_DLP_TIMEOUT=30
YT_DLP_CACHE_TTL=300
```

---

## 7. Key Design Decisions

| Decision | Choice | Why |
|---|---|---|
| Streaming protocol | HLS (m3u8) | hls.js plays it natively; DASH requires dash.js |
| yt-dlp client | `ios` for live, `web` for VOD | `ios` returns HLS for live streams with 60fps support; format selector prefers ≤480p |
| HTTP client for proxy | httpx (already present) | Streaming support via `httpx.stream()`; no new dependency |
| Manifest rewriting | Line-by-line streaming | Live manifests can be large; never buffer whole file |
| Audio element | Hidden `<audio>` + hls.js | `createMediaElementSource` works on `<audio>` elements |
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min |
| **Full Transcript for YouTube** | **Disabled** | Button hidden; real-time streaming ASR only |
| **QueryInput during streaming** | **Editable** | User can type corrections while transcript streams (same as existing ASR) |
| **Video quality** | **360p–480p auto-best** | Low resolution sufficient for reference; no quality selector |
| **Auto-play on load** | **Wait for manual play** | Thumbnail placeholder; user presses play. Respects autoplay policy. |
| **Thumbnail** | **Stays until user presses play** | Clean transition; no black frame |
| **Error recovery** | **Retry 3× → "Service unavailable"** | Auto-re-extract URL on HLS failure; after 3 failures, show error state |
| **PO Tokens (live streams)** | **Graceful degradation for MVP** | Stream first ~30s; on failure retry 3× with fresh URL; after exhaustion show "Live stream unavailable" |

---

## 8. File Manifest

### New Files
```
backend/
  app/models/youtube.py
  app/services/youtube_service.py
  app/services/hls_proxy.py
  app/routers/youtube.py
  app/test/test_phase3_config.py
  app/test/test_phase3_youtube_extract.py
  app/test/test_phase3_hls_proxy.py
  app/test/test_phase3_hls_manifest.py
  app/test/test_integration_phase3.py
  app/test/acceptance/test_acceptance_phase3_youtube.py
  app/test/acceptance/test_acceptance_phase3_live.py

frontend/src/
  components/YouTubeInput.tsx
  components/YouTubeVideoPlayer.tsx
  hooks/useYouTubeASR.ts
  test/test_phase3_YouTubeInput.test.tsx
  test/test_phase3_YouTubeVideoPlayer.test.tsx
  test/test_phase3_useYouTubeASR.test.ts
  test/test_phase3_LTTPage_integration.test.tsx
```

### Modified Files
```
backend/app/core/config.py                     # Add 3 config fields
backend/.env.example                            # Add 3 env vars
backend/main.py                                 # Register youtube router
backend/requirements.txt                        # Add yt-dlp

frontend/package.json                           # Add hls.js
frontend/src/types/index.ts                     # Add YouTube types
frontend/src/lib/api.ts                         # Add extractYouTube(), getYouTubeProxyUrl()
frontend/src/lib/queries.tsx                    # Add useYouTubeExtract() mutation
frontend/src/pages/LTTPage.tsx                  # Add source toggle + YouTube components
frontend/src/components/QueryInput.tsx          # Accept transcript from either source

Dockerfile                                      # Add yt-dlp install step
docker-compose.yml                              # Add env vars if needed
README.md                                       # YouTube feature section
development_plan.md                             # Mark Phase 3 status
```

---

## 9. Known Risks & Mitigations

| Risk | Impact | Mitigation |
|---|---|---|
| PO Token expiration (live streams cut at 30s) | High — live streams unusable without token | Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify |
| yt-dlp extraction slow (2-5s) | Medium — poor UX on "Load Stream" click | Cache results with TTL; show progress indicator |
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance |
| hls.js audio sync drift vs video | Low — separate streams may drift | hls.js `liveSyncDuration` keeps both near live edge; test with 10+ min streams |
| Safari `createMediaElementSource` on HLS | Low — known Safari bug with native HLS | hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected |
| YouTube ToS for proxy | Low for internal demo | Personal/enterprise internal demo is generally fine; review for public product |

---

## 10. Example Data Flow

```
POST /api/v1/youtube/extract
  Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
  Response: {
    "video_id": "dQw4w9WgXcQ",
    "title": "Rick Astley - Never Gonna Give You Up",
    "is_live": false,
    "video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
    "audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
    "thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
  }

GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
  → Fetches upstream manifest from googlevideo.com
  → Rewrites segment URLs:
      segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
  → Streams rewritten manifest to browser

GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
  → Fetches upstream .ts segment via httpx.stream()
  → StreamingResponse with Content-Type: video/mp2t
  → CORS: access-control-allow-origin: *
```

---

## 11. References

- **mediaflow-proxy**: Production FastAPI HLS proxy with M3U8Processor — [mhdzumair/mediaflow-proxy](https://github.com/mhdzumair/mediaflow-proxy)
- **yt-dlp API docs**: [yt-dlp-yt-dlp.mintlify.app](https://yt-dlp-yt-dlp.mintlify.app/api/extractors)
- **hls.js API docs**: [github.com/video-dev/hls.js/blob/master/docs/API.md](https://github.com/video-dev/hls.js/blob/master/docs/API.md)
- **hls.js low-latency live**: `lowLatencyMode: true`, `liveSyncDuration: 1.5`
- **Existing code patterns**: `.plans/phase2_implementation_plan.md`, `backend/app/routers/video.py`, `frontend/src/hooks/useVideoASR.ts`