feat: Phase 3.1 + 3.2 — YouTube config infra and URL extraction
Phase 3.1 — Configuration & Infrastructure: - Add youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl config fields - Add yt-dlp and hls.js dependencies - Create models/youtube.py (request/response schemas) - Create service stubs (youtube_service, hls_proxy) - Create router stub and register in main.py - 11 config tests Phase 3.2 — YouTube URL Extraction: - yt-dlp wrapper with async extraction (run_in_executor) - Format selection: ≤480p video-only + highest-bitrate audio (VOD) - Combined format fallback: same URL for live streams - In-memory URL cache: 5min TTL live, 30min VOD - lru_cache singleton service for cache persistence - Error handling: DownloadError → 200 with error field - 18 extract tests, 82/82 total pass (zero regressions) Real-URL verified: VOD (5bF3tkO5jAA) 24 formats, Live (fN9uYWCjQaw) 6 HLS
This commit is contained in:
parent
09b5ea7d64
commit
284028bb1f
|
|
@ -1,15 +1,15 @@
|
|||
# Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan
|
||||
|
||||
**Created:** 2026-05-09
|
||||
**Updated:** 2026-05-09 (user decisions incorporated)
|
||||
**Status:** Planning
|
||||
**Updated:** 2026-05-09 (Phase 3.1 + 3.2 implemented)
|
||||
**Status:** In Progress (3.1 Complete, 3.2 Complete)
|
||||
**Depends on:** Phase 1 (Complete), Phase 2 (Complete)
|
||||
|
||||
---
|
||||
|
||||
## 1. Overview
|
||||
|
||||
Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in `<video>` via hls.js, routes audio through hidden `<audio>` element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
|
||||
Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts stream URLs via yt-dlp (separate video-only + audio-only for VODs; combined HLS for live) → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in `<video>` via hls.js, routes audio through hidden `<audio>` element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
|
||||
|
||||
**Same code works identically for live streams and VODs.**
|
||||
|
||||
|
|
@ -20,14 +20,16 @@ YouTube's official iframe player does not expose the audio track to Web Audio AP
|
|||
### Audio Routing
|
||||
|
||||
```
|
||||
YouTube HLS audio-only stream
|
||||
→ hls.js loads into hidden <audio> element
|
||||
YouTube HLS stream (combined video+audio for live; separate tracks for VOD)
|
||||
→ hls.js loads into <video> (muted) and hidden <audio> element
|
||||
→ AudioContext.createMediaElementSource(audioElement)
|
||||
→ ScriptProcessorNode (Float32 PCM)
|
||||
→ WebSocket → FastAPI → DashScope realtime ASR
|
||||
→ transcript → QueryInput
|
||||
```
|
||||
|
||||
Note: For VODs, separate video-only and audio-only tracks are used. For live streams, YouTube provides combined formats only — the same HLS manifest URL is used for both elements; hls.js demuxes them independently.
|
||||
|
||||
### Integration With Existing Pipeline
|
||||
|
||||
This phase reuses the existing ASR infrastructure entirely:
|
||||
|
|
@ -55,51 +57,65 @@ This phase reuses the existing ASR infrastructure entirely:
|
|||
|
||||
## 3. Sub-Phases
|
||||
|
||||
### Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)
|
||||
### Phase 3.1 — Configuration & Infrastructure Setup ✅ Complete
|
||||
|
||||
Add config fields, install dependencies, create skeletons, register router.
|
||||
|
||||
**Test:** `test_phase3_config.py`
|
||||
**Test:** `test_phase3_config.py` (11 tests)
|
||||
|
||||
**Tasks:**
|
||||
| # | Task | File |
|
||||
|---|------|------|
|
||||
| 3.1.1 | Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl` | `core/config.py` |
|
||||
| 3.1.2 | Update `.env.example` | `.env.example` |
|
||||
| 3.1.3 | Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json` | `requirements.txt`, `package.json` |
|
||||
| 3.1.4 | Create `models/youtube.py` — `YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat` | `models/youtube.py` |
|
||||
| 3.1.5 | Create `services/youtube_service.py` stub | `services/youtube_service.py` |
|
||||
| 3.1.6 | Create `services/hls_proxy.py` stub | `services/hls_proxy.py` |
|
||||
| 3.1.7 | Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}` | `routers/youtube.py` |
|
||||
| 3.1.8 | Register router in `main.py` | `main.py` |
|
||||
| 3.1.9 | Write and pass `test_phase3_config.py` | `app/test/` |
|
||||
| # | Task | File | Status |
|
||||
|---|------|------|--------|
|
||||
| 3.1.1 | Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl` | `core/config.py` | Done |
|
||||
| 3.1.2 | Update `.env.example` | `.env.example` | Done |
|
||||
| 3.1.3 | Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json` | `requirements.txt`, `package.json` | Done |
|
||||
| 3.1.4 | Create `models/youtube.py` — `YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat` | `models/youtube.py` | Done |
|
||||
| 3.1.5 | Create `services/youtube_service.py` stub | `services/youtube_service.py` | Done |
|
||||
| 3.1.6 | Create `services/hls_proxy.py` stub | `services/hls_proxy.py` | Done |
|
||||
| 3.1.7 | Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}` | `routers/youtube.py` | Done |
|
||||
| 3.1.8 | Register router in `main.py` | `main.py` | Done |
|
||||
| 3.1.9 | Write and pass `test_phase3_config.py` | `app/test/` | Done (11/11 pass) |
|
||||
|
||||
---
|
||||
|
||||
### Phase 3.2 — YouTube URL Extraction Backend (0.5 day)
|
||||
### Phase 3.2 — YouTube URL Extraction Backend ✅ Complete
|
||||
|
||||
yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.
|
||||
yt-dlp wrapper service that extracts stream URLs and formats. Returns proxy-wrapped URLs pointing back to our HLS proxy.
|
||||
|
||||
**Test:** `test_phase3_youtube_extract.py`
|
||||
**Test:** `test_phase3_youtube_extract.py` (18 tests)
|
||||
|
||||
**Acceptance Criteria:**
|
||||
- `POST /api/v1/youtube/extract` accepts `{"url": "https://www.youtube.com/watch?v=..."}`
|
||||
- Returns `{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url }`
|
||||
- VODs: extracts ~2–10 formats, returns best video+audio pair
|
||||
- Live streams: uses `ios` client for HLS, returns current live edge
|
||||
- Upcoming/scheduled streams: returns `is_upcoming: true` with scheduled start time
|
||||
- Invalid/private URLs: returns clear error
|
||||
- URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)
|
||||
- Returns `{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url, formats, error }`
|
||||
- VODs: extracts separate video-only + audio-only tracks, selects best ≤480p + highest-bitrate audio
|
||||
- Live streams: extracts combined HLS formats, uses same URL for video and audio (hls.js demuxes)
|
||||
- Upcoming/scheduled streams: returns `is_upcoming: true` with no proxy URLs
|
||||
- Invalid/private URLs: returns 200 with error field populated (yt-dlp exception caught)
|
||||
- URL expiration: in-memory cache with TTL (5 min for live, 30 min for VOD)
|
||||
- Service singleton: `@lru_cache` on `_get_youtube_service()` for cache persistence across requests
|
||||
|
||||
**Implementation Discoveries:**
|
||||
- **No iOS client needed** — default yt-dlp works for both VOD (separate tracks) and live (combined HLS)
|
||||
- **Live streams use combined formats** — all live formats include both video+audio; same HLS URL serves both `<video>` and `<audio>` elements
|
||||
- **Format selection** (`_pick_best_video`): prefers ≤480p with HLS first, then falls back to ascending height + HLS preference
|
||||
- **Error response pattern**: extraction errors return HTTP 200 with `error` field (not 4xx); the API call itself succeeds but YouTube returned an error
|
||||
- **Proxy URL construction** (`_build_proxy_url`): URL-encodes upstream URL into `/api/v1/youtube/proxy/manifest.m3u8?url=<encoded>`
|
||||
|
||||
**Real-URL Verification:**
|
||||
```
|
||||
VOD: https://www.youtube.com/watch?v=5bF3tkO5jAA → 24 formats, separate video+audio ✓
|
||||
Live: https://www.youtube.com/watch?v=fN9uYWCjQaw → 6 combined formats, same URL ✓
|
||||
```
|
||||
|
||||
**Tasks:**
|
||||
| # | Task | File |
|
||||
|---|------|------|
|
||||
| 3.2.1 | Write tests first | `app/test/test_phase3_youtube_extract.py` |
|
||||
| 3.2.2 | Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection | `services/youtube_service.py` |
|
||||
| 3.2.3 | Implement `YouTubeService._select_best_formats()` — separate video/audio from format list, prefer ≤480p | `services/youtube_service.py` |
|
||||
| 3.2.4 | Implement format URL caching with TTL | `services/youtube_service.py` |
|
||||
| 3.2.5 | Implement `POST /api/v1/youtube/extract` route | `routers/youtube.py` |
|
||||
| 3.2.6 | Run tests → pass → commit | — |
|
||||
| # | Task | File | Status |
|
||||
|---|------|------|--------|
|
||||
| 3.2.1 | Write tests first | `app/test/test_phase3_youtube_extract.py` | Done |
|
||||
| 3.2.2 | Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection | `services/youtube_service.py` | Done |
|
||||
| 3.2.3 | Implement `YouTubeService._select_best_formats()` + `_pick_best_video()` — separate video/audio from format list, prefer ≤480p, combined fallback | `services/youtube_service.py` | Done |
|
||||
| 3.2.4 | Implement format URL caching with TTL (live 5 min, VOD 30 min) | `services/youtube_service.py` | Done |
|
||||
| 3.2.5 | Implement `POST /api/v1/youtube/extract` route with response model + error handling | `routers/youtube.py` | Done |
|
||||
| 3.2.6 | Run tests → pass → verified with real URLs | — | Done (82/82 pass) |
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -221,18 +237,18 @@ Wire YouTube audio output into existing ASR pipeline. The key challenge: `useVid
|
|||
|
||||
## 4. Timeline
|
||||
|
||||
| Sub-Phase | Description | Effort | Depends On |
|
||||
|---|---|---|---|
|
||||
| 3.1 | Config & Infrastructure | 0.5 day | — |
|
||||
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 |
|
||||
| 3.3 | HLS Proxy Backend | 1 day | 3.1 |
|
||||
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 |
|
||||
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 |
|
||||
| 3.6 | Integration & Acceptance | 1 day | 3.5 |
|
||||
| 3.7 | Polish & Deployment | 0.5 day | 3.6 |
|
||||
| **Total** | | **5.5 days** | |
|
||||
| Sub-Phase | Description | Effort | Depends On | Status |
|
||||
|---|---|---|---|---|---|
|
||||
| 3.1 | Config & Infrastructure | 0.5 day | — | ✅ Complete |
|
||||
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 | ✅ Complete |
|
||||
| 3.3 | HLS Proxy Backend | 1 day | 3.1 | ⏳ Next |
|
||||
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 | Pending |
|
||||
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 | Pending |
|
||||
| 3.6 | Integration & Acceptance | 1 day | 3.5 | Pending |
|
||||
| 3.7 | Polish & Deployment | 0.5 day | 3.6 | Pending |
|
||||
| **Total** | | **5.5 days** | | **2/7 done** |
|
||||
|
||||
3.2 (extraction) and 3.3 (proxy) can run concurrently.
|
||||
3.2 (extraction) and 3.3 (proxy) were planned concurrent; 3.2 is now done ahead of 3.3.
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -265,13 +281,16 @@ YT_DLP_CACHE_TTL=300
|
|||
## 7. Key Design Decisions
|
||||
|
||||
| Decision | Choice | Why |
|
||||
|---|---|---|
|
||||
|---|---|---|---|
|
||||
| Streaming protocol | HLS (m3u8) | hls.js plays it natively; DASH requires dash.js |
|
||||
| yt-dlp client | `ios` for live, `web` for VOD | `ios` returns HLS for live streams with 60fps support; format selector prefers ≤480p |
|
||||
| yt-dlp client | **Default** (no special client) | Default extractor works for both VOD (separate tracks) and live (combined HLS); iOS client caused "No video formats" errors on some live streams |
|
||||
| Live format strategy | **Combined formats, same URL** | Live HLS formats include both video+audio; same URL for `<video>` and `<audio>` elements — hls.js demuxes each independently |
|
||||
| HTTP client for proxy | httpx (already present) | Streaming support via `httpx.stream()`; no new dependency |
|
||||
| Manifest rewriting | Line-by-line streaming | Live manifests can be large; never buffer whole file |
|
||||
| Audio element | Hidden `<audio>` + hls.js | `createMediaElementSource` works on `<audio>` elements |
|
||||
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min |
|
||||
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min live, 30 min VOD |
|
||||
| Service lifetime | `@lru_cache` singleton | Cache must persist across HTTP requests for caching to work |
|
||||
| Error response | **HTTP 200 with error field** | API call succeeded; YouTube error is a content-level failure, not a protocol failure |
|
||||
| **Full Transcript for YouTube** | **Disabled** | Button hidden; real-time streaming ASR only |
|
||||
| **QueryInput during streaming** | **Editable** | User can type corrections while transcript streams (same as existing ASR) |
|
||||
| **Video quality** | **360p–480p auto-best** | Low resolution sufficient for reference; no quality selector |
|
||||
|
|
@ -287,46 +306,46 @@ YT_DLP_CACHE_TTL=300
|
|||
### New Files
|
||||
```
|
||||
backend/
|
||||
app/models/youtube.py
|
||||
app/services/youtube_service.py
|
||||
app/services/hls_proxy.py
|
||||
app/routers/youtube.py
|
||||
app/test/test_phase3_config.py
|
||||
app/test/test_phase3_youtube_extract.py
|
||||
app/test/test_phase3_hls_proxy.py
|
||||
app/test/test_phase3_hls_manifest.py
|
||||
app/test/test_integration_phase3.py
|
||||
app/test/acceptance/test_acceptance_phase3_youtube.py
|
||||
app/test/acceptance/test_acceptance_phase3_live.py
|
||||
app/models/youtube.py ✅ Created (3.1)
|
||||
app/services/youtube_service.py ✅ Created (3.1), implemented (3.2)
|
||||
app/services/hls_proxy.py ✅ Stub created (3.1)
|
||||
app/routers/youtube.py ✅ Created (3.1), implemented (3.2)
|
||||
app/test/test_phase3_config.py ✅ Written (3.1, 11 tests)
|
||||
app/test/test_phase3_youtube_extract.py ✅ Written (3.2, 18 tests)
|
||||
app/test/test_phase3_hls_proxy.py ⏳ Pending (3.3)
|
||||
app/test/test_phase3_hls_manifest.py ⏳ Pending (3.3)
|
||||
app/test/test_integration_phase3.py ⏳ Pending (3.6)
|
||||
app/test/acceptance/test_acceptance_phase3_youtube.py ⏳ Pending (3.6)
|
||||
app/test/acceptance/test_acceptance_phase3_live.py ⏳ Pending (3.6)
|
||||
|
||||
frontend/src/
|
||||
components/YouTubeInput.tsx
|
||||
components/YouTubeVideoPlayer.tsx
|
||||
hooks/useYouTubeASR.ts
|
||||
test/test_phase3_YouTubeInput.test.tsx
|
||||
test/test_phase3_YouTubeVideoPlayer.test.tsx
|
||||
test/test_phase3_useYouTubeASR.test.ts
|
||||
test/test_phase3_LTTPage_integration.test.tsx
|
||||
components/YouTubeInput.tsx ⏳ Pending (3.4)
|
||||
components/YouTubeVideoPlayer.tsx ⏳ Pending (3.4)
|
||||
hooks/useYouTubeASR.ts ⏳ Pending (3.5)
|
||||
test/test_phase3_YouTubeInput.test.tsx ⏳ Pending (3.4)
|
||||
test/test_phase3_YouTubeVideoPlayer.test.tsx ⏳ Pending (3.4)
|
||||
test/test_phase3_useYouTubeASR.test.ts ⏳ Pending (3.5)
|
||||
test/test_phase3_LTTPage_integration.test.tsx ⏳ Pending (3.5)
|
||||
```
|
||||
|
||||
### Modified Files
|
||||
```
|
||||
backend/app/core/config.py # Add 3 config fields
|
||||
backend/.env.example # Add 3 env vars
|
||||
backend/main.py # Register youtube router
|
||||
backend/requirements.txt # Add yt-dlp
|
||||
backend/app/core/config.py ✅ Done (3 fields)
|
||||
backend/.env.example ✅ Done (3 vars)
|
||||
backend/main.py ✅ Done (router registered)
|
||||
backend/requirements.txt ✅ Done (yt-dlp added)
|
||||
|
||||
frontend/package.json # Add hls.js
|
||||
frontend/src/types/index.ts # Add YouTube types
|
||||
frontend/src/lib/api.ts # Add extractYouTube(), getYouTubeProxyUrl()
|
||||
frontend/src/lib/queries.tsx # Add useYouTubeExtract() mutation
|
||||
frontend/src/pages/LTTPage.tsx # Add source toggle + YouTube components
|
||||
frontend/src/components/QueryInput.tsx # Accept transcript from either source
|
||||
frontend/package.json ✅ Done (hls.js added)
|
||||
frontend/src/types/index.ts ⏳ Pending (3.4)
|
||||
frontend/src/lib/api.ts ⏳ Pending (3.4)
|
||||
frontend/src/lib/queries.tsx ⏳ Pending (3.4)
|
||||
frontend/src/pages/LTTPage.tsx ⏳ Pending (3.4-3.5)
|
||||
frontend/src/components/QueryInput.tsx ⏳ Pending (3.5)
|
||||
|
||||
Dockerfile # Add yt-dlp install step
|
||||
docker-compose.yml # Add env vars if needed
|
||||
README.md # YouTube feature section
|
||||
development_plan.md # Mark Phase 3 status
|
||||
Dockerfile ⏳ Pending (3.7)
|
||||
docker-compose.yml ⏳ Pending (3.7)
|
||||
README.md ⏳ Pending (3.7)
|
||||
development_plan.md ⏳ Pending (3.7)
|
||||
```
|
||||
|
||||
---
|
||||
|
|
@ -337,7 +356,7 @@ development_plan.md # Mark Phase 3 status
|
|||
|---|---|---|
|
||||
| PO Token expiration (live streams cut at 30s) | High — live streams unusable without token | Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify |
|
||||
| yt-dlp extraction slow (2-5s) | Medium — poor UX on "Load Stream" click | Cache results with TTL; show progress indicator |
|
||||
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance |
|
||||
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance. **Note**: iOS client caused "No video formats" on Phoenix TV live stream; default extractor works for both tested URLs. Monitor for regressions. |
|
||||
| hls.js audio sync drift vs video | Low — separate streams may drift | hls.js `liveSyncDuration` keeps both near live edge; test with 10+ min streams |
|
||||
| Safari `createMediaElementSource` on HLS | Low — known Safari bug with native HLS | hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected |
|
||||
| YouTube ToS for proxy | Low for internal demo | Personal/enterprise internal demo is generally fine; review for public product |
|
||||
|
|
@ -348,17 +367,31 @@ development_plan.md # Mark Phase 3 status
|
|||
|
||||
```
|
||||
POST /api/v1/youtube/extract
|
||||
Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
|
||||
Body: {"url": "https://www.youtube.com/watch?v=5bF3tkO5jAA"}
|
||||
Response: {
|
||||
"video_id": "dQw4w9WgXcQ",
|
||||
"title": "Rick Astley - Never Gonna Give You Up",
|
||||
"video_id": "5bF3tkO5jAA",
|
||||
"title": "《2026年稅務(修訂)(自動交換資料)條例草案》委員會會議",
|
||||
"is_live": false,
|
||||
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
|
||||
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
|
||||
"thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
|
||||
"is_upcoming": false,
|
||||
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=https%3A%2F%2Frr2---sn-jna...",
|
||||
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=https%3A%2F%2Frr2---sn-jna...",
|
||||
"thumbnail_url": "https://i.ytimg.com/vi/5bF3tkO5jAA/hqdefault.jpg",
|
||||
"formats": [...],
|
||||
"error": null
|
||||
}
|
||||
|
||||
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
|
||||
# Live stream (combined formats → same URL for video and audio)
|
||||
POST /api/v1/youtube/extract
|
||||
Body: {"url": "https://www.youtube.com/watch?v=fN9uYWCjQaw"}
|
||||
Response: {
|
||||
"video_id": "fN9uYWCjQaw",
|
||||
"is_live": true,
|
||||
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...",
|
||||
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...",
|
||||
# video_proxy_url == audio_proxy_url (same combined HLS manifest)
|
||||
}
|
||||
|
||||
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>
|
||||
→ Fetches upstream manifest from googlevideo.com
|
||||
→ Rewrites segment URLs:
|
||||
segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
|
||||
|
|
@ -379,3 +412,20 @@ GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
|
|||
- **hls.js API docs**: [github.com/video-dev/hls.js/blob/master/docs/API.md](https://github.com/video-dev/hls.js/blob/master/docs/API.md)
|
||||
- **hls.js low-latency live**: `lowLatencyMode: true`, `liveSyncDuration: 1.5`
|
||||
- **Existing code patterns**: `.plans/phase2_implementation_plan.md`, `backend/app/routers/video.py`, `frontend/src/hooks/useVideoASR.ts`
|
||||
|
||||
---
|
||||
|
||||
## 12. Test Results (Current)
|
||||
|
||||
| Suite | Tests | Status |
|
||||
|-------|-------|--------|
|
||||
| Phase 2 (existing) | 53 | ✅ All pass |
|
||||
| Phase 3.1 (config) | 11 | ✅ All pass |
|
||||
| Phase 3.2 (extraction) | 18 | ✅ All pass |
|
||||
| **Total** | **82** | **0 failures** |
|
||||
|
||||
### Real-URL Smoke Tests
|
||||
| URL | Type | Result |
|
||||
|-----|------|--------|
|
||||
| `5bF3tkO5jAA` (LegCo meeting) | VOD | 24 formats, separate video+audio ✅ |
|
||||
| `fN9uYWCjQaw` (Phoenix TV 24h) | Live | 6 combined HLS formats, same URL ✅ |
|
||||
|
|
|
|||
|
|
@ -36,3 +36,8 @@ ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime
|
|||
# Video upload (Phase 2)
|
||||
VIDEO_UPLOAD_DIR=./uploads
|
||||
MAX_VIDEO_SIZE_MB=300
|
||||
|
||||
# YouTube Proxy (Phase 3)
|
||||
YOUTUBE_PROXY_ENABLED=true
|
||||
YT_DLP_TIMEOUT=30
|
||||
YT_DLP_CACHE_TTL=300
|
||||
|
|
|
|||
|
|
@ -54,6 +54,11 @@ class Settings(BaseSettings):
|
|||
max_video_size_mb: int = 300
|
||||
supported_video_formats: list[str] = [".mp4", ".webm", ".mov", ".avi", ".mkv"]
|
||||
|
||||
# YouTube Proxy (Phase 3)
|
||||
youtube_proxy_enabled: bool = True
|
||||
yt_dlp_timeout: int = 30
|
||||
yt_dlp_cache_ttl: int = 300 # seconds (live=5min shared; VOD=30min computed in service)
|
||||
|
||||
# Development helpers
|
||||
model_config = {"env_file": ".env", "env_file_encoding": "utf-8"}
|
||||
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ from fastapi import FastAPI
|
|||
from fastapi.middleware.cors import CORSMiddleware
|
||||
from fastapi.responses import FileResponse
|
||||
|
||||
from app.routers import ingest, query, documents, prompts, history, chunks, video, ws_asr
|
||||
from app.routers import ingest, query, documents, prompts, history, chunks, video, ws_asr, youtube
|
||||
from app.core.config import get_settings
|
||||
from app.core.sqlite_db import (
|
||||
get_prompts_db,
|
||||
|
|
@ -58,6 +58,7 @@ app.include_router(history.router)
|
|||
app.include_router(chunks.router)
|
||||
app.include_router(video.router, prefix="/api/v1")
|
||||
app.include_router(ws_asr.router)
|
||||
app.include_router(youtube.router, prefix="/api/v1")
|
||||
|
||||
_prompts_conn = get_prompts_db()
|
||||
init_prompts_db(_prompts_conn)
|
||||
|
|
|
|||
|
|
@ -0,0 +1,28 @@
|
|||
"""YouTube stream extraction models (Phase 3)."""
|
||||
|
||||
from pydantic import BaseModel
|
||||
|
||||
|
||||
class YouTubeExtractRequest(BaseModel):
|
||||
url: str
|
||||
|
||||
|
||||
class StreamFormat(BaseModel):
|
||||
format_id: str
|
||||
url: str
|
||||
resolution: str | None = None
|
||||
is_audio_only: bool = False
|
||||
is_video_only: bool = False
|
||||
codec: str | None = None
|
||||
|
||||
|
||||
class YouTubeStreamResponse(BaseModel):
|
||||
video_id: str
|
||||
title: str
|
||||
is_live: bool = False
|
||||
is_upcoming: bool = False
|
||||
video_proxy_url: str | None = None
|
||||
audio_proxy_url: str | None = None
|
||||
thumbnail_url: str | None = None
|
||||
formats: list[StreamFormat] = []
|
||||
error: str | None = None
|
||||
|
|
@ -0,0 +1,83 @@
|
|||
import logging
|
||||
import time
|
||||
from functools import lru_cache
|
||||
|
||||
from fastapi import APIRouter, HTTPException
|
||||
|
||||
from app.models.youtube import YouTubeExtractRequest, YouTubeStreamResponse, StreamFormat
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
router = APIRouter(tags=["youtube"])
|
||||
|
||||
|
||||
@lru_cache
|
||||
def _get_youtube_service():
|
||||
from app.core.config import get_settings
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
s = get_settings()
|
||||
return YouTubeService(timeout=s.yt_dlp_timeout, cache_ttl=s.yt_dlp_cache_ttl)
|
||||
|
||||
|
||||
@router.post("/youtube/extract", response_model=YouTubeStreamResponse)
|
||||
async def extract_youtube_stream(req: YouTubeExtractRequest):
|
||||
from app.core.config import get_settings
|
||||
|
||||
settings = get_settings()
|
||||
if not settings.youtube_proxy_enabled:
|
||||
raise HTTPException(status_code=503, detail="YouTube proxy is disabled")
|
||||
|
||||
service = _get_youtube_service()
|
||||
started = time.monotonic()
|
||||
logger.info("youtube-extract-started url=%s", req.url)
|
||||
|
||||
try:
|
||||
data = await service.extract_streams(req.url)
|
||||
except Exception as e:
|
||||
logger.error("youtube-extract-failed url=%s error=%s", req.url, e)
|
||||
raise HTTPException(status_code=500, detail=str(e))
|
||||
|
||||
if data.get("error"):
|
||||
logger.warning(
|
||||
"youtube-extract-error url=%s error=%s duration=%.1fs",
|
||||
req.url,
|
||||
data["error"],
|
||||
time.monotonic() - started,
|
||||
)
|
||||
return YouTubeStreamResponse(
|
||||
video_id=data.get("video_id", ""),
|
||||
title=data.get("title", ""),
|
||||
error=data["error"],
|
||||
)
|
||||
|
||||
formats = [
|
||||
StreamFormat(
|
||||
format_id=f.get("format_id", ""),
|
||||
url=f.get("url", ""),
|
||||
resolution=f.get("resolution"),
|
||||
is_audio_only=f.get("acodec", "none") != "none" and f.get("vcodec", "none") == "none",
|
||||
is_video_only=f.get("vcodec", "none") != "none" and f.get("acodec", "none") == "none",
|
||||
codec=f.get("vcodec") or f.get("acodec"),
|
||||
)
|
||||
for f in data.get("formats", [])
|
||||
]
|
||||
|
||||
logger.info(
|
||||
"youtube-extract-completed url=%s video_id=%s is_live=%s fmt_count=%d duration=%.1fs",
|
||||
req.url,
|
||||
data["video_id"],
|
||||
data["is_live"],
|
||||
len(formats),
|
||||
time.monotonic() - started,
|
||||
)
|
||||
|
||||
return YouTubeStreamResponse(
|
||||
video_id=data["video_id"],
|
||||
title=data["title"],
|
||||
is_live=data["is_live"],
|
||||
is_upcoming=data["is_upcoming"],
|
||||
video_proxy_url=data.get("video_proxy_url"),
|
||||
audio_proxy_url=data.get("audio_proxy_url"),
|
||||
thumbnail_url=data.get("thumbnail_url"),
|
||||
formats=formats,
|
||||
)
|
||||
|
|
@ -0,0 +1,21 @@
|
|||
"""HLS manifest proxy service (Phase 3.3).
|
||||
|
||||
Rewrites HLS manifests and proxies .ts segments so the browser treats
|
||||
them as same-origin, enabling Web Audio API access to the audio track.
|
||||
"""
|
||||
|
||||
import logging
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class HLSProxyService:
|
||||
"""Streams and rewrites HLS manifests; proxies .ts segments with zero re-encoding."""
|
||||
|
||||
async def rewrite_manifest(self, upstream_url: str) -> bytes:
|
||||
"""Fetch upstream HLS manifest and rewrite segment URLs to point to our proxy."""
|
||||
raise NotImplementedError("Phase 3.3 — manifest rewriting to be implemented")
|
||||
|
||||
async def proxy_segment(self, upstream_url: str) -> bytes:
|
||||
"""Proxy a single .ts segment from the upstream server."""
|
||||
raise NotImplementedError("Phase 3.3 — segment proxying to be implemented")
|
||||
|
|
@ -0,0 +1,128 @@
|
|||
import asyncio
|
||||
import logging
|
||||
import time
|
||||
from typing import Any
|
||||
from urllib.parse import quote
|
||||
|
||||
import yt_dlp
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class YouTubeService:
|
||||
def __init__(self, timeout: int, cache_ttl: int):
|
||||
self.timeout = timeout
|
||||
self.cache_ttl = cache_ttl
|
||||
self._cache: dict[str, tuple[float, dict]] = {}
|
||||
|
||||
async def extract_streams(self, url: str) -> dict:
|
||||
now = time.monotonic()
|
||||
if url in self._cache:
|
||||
cached_at, cached_data = self._cache[url]
|
||||
is_live = cached_data.get("is_live", False)
|
||||
ttl = self.cache_ttl if is_live else self.cache_ttl * 6
|
||||
if now - cached_at < ttl:
|
||||
logger.debug("Cache hit for URL=%s age=%.1fs", url, now - cached_at)
|
||||
return cached_data
|
||||
logger.debug("Cache expired for URL=%s", url)
|
||||
|
||||
try:
|
||||
loop = asyncio.get_running_loop()
|
||||
info = await loop.run_in_executor(None, lambda: self._extract_sync(url))
|
||||
except yt_dlp.utils.DownloadError as e:
|
||||
logger.warning("yt-dlp extraction failed for URL=%s: %s", url, e)
|
||||
return {"error": str(e)[:500], "video_id": "", "title": "", "formats": []}
|
||||
|
||||
live_status = info.get("live_status", "not_live")
|
||||
is_live = live_status == "is_live"
|
||||
is_upcoming = live_status == "is_upcoming"
|
||||
|
||||
result = {
|
||||
"video_id": info.get("id", ""),
|
||||
"title": info.get("title", ""),
|
||||
"is_live": is_live,
|
||||
"is_upcoming": is_upcoming,
|
||||
"thumbnail_url": info.get("thumbnail"),
|
||||
"formats": info.get("formats", []),
|
||||
"error": None,
|
||||
}
|
||||
|
||||
if not is_upcoming and info.get("formats"):
|
||||
try:
|
||||
video_fmt, audio_fmt = self._select_best_formats(info["formats"])
|
||||
result["video_proxy_url"] = self._build_proxy_url(video_fmt["url"])
|
||||
result["audio_proxy_url"] = self._build_proxy_url(audio_fmt["url"])
|
||||
except ValueError as e:
|
||||
result["error"] = str(e)
|
||||
|
||||
ttl = self.cache_ttl if is_live else self.cache_ttl * 6
|
||||
self._cache[url] = (now, result)
|
||||
return result
|
||||
|
||||
def _extract_sync(self, url: str) -> dict:
|
||||
opts = self._get_ydl_opts(url)
|
||||
with yt_dlp.YoutubeDL(opts) as ydl:
|
||||
return ydl.extract_info(url, download=False)
|
||||
|
||||
def _get_ydl_opts(self, url: str) -> dict:
|
||||
opts: dict[str, Any] = {
|
||||
"quiet": True,
|
||||
"no_warnings": True,
|
||||
"extract_flat": False,
|
||||
}
|
||||
return opts
|
||||
|
||||
def _select_best_formats(self, formats: list[dict]) -> tuple[dict, dict]:
|
||||
video_only = [
|
||||
f
|
||||
for f in formats
|
||||
if f.get("vcodec", "none") != "none" and f.get("acodec", "none") == "none"
|
||||
]
|
||||
audio_only = [
|
||||
f
|
||||
for f in formats
|
||||
if f.get("acodec", "none") != "none" and f.get("vcodec", "none") == "none"
|
||||
]
|
||||
combined = [
|
||||
f
|
||||
for f in formats
|
||||
if f.get("vcodec", "none") != "none"
|
||||
and f.get("acodec", "none") != "none"
|
||||
]
|
||||
|
||||
has_content = bool(combined or video_only or audio_only)
|
||||
if not has_content:
|
||||
raise ValueError("No streamable formats found")
|
||||
|
||||
if video_only and audio_only:
|
||||
video_fmt = self._pick_best_video(video_only)
|
||||
audio_fmt = max(audio_only, key=lambda f: f.get("abr") or 0)
|
||||
return video_fmt, audio_fmt
|
||||
|
||||
if combined and audio_only:
|
||||
combined_sorted = sorted(combined, key=lambda f: f.get("height") or 9999)
|
||||
return combined_sorted[0], audio_only[0]
|
||||
|
||||
if combined:
|
||||
best_combined = self._pick_best_video(combined)
|
||||
return best_combined, best_combined
|
||||
|
||||
if video_only:
|
||||
raise ValueError("No streamable audio format found")
|
||||
raise ValueError("No streamable video format found")
|
||||
|
||||
def _pick_best_video(self, candidates: list[dict]) -> dict:
|
||||
def _sort_key(f: dict) -> tuple[int, int, int, int]:
|
||||
height = f.get("height") or 9999
|
||||
tbr = f.get("tbr") or 0
|
||||
is_m3u8 = 0 if f.get("protocol") in ("m3u8_native", "m3u8") else 1
|
||||
at_or_under_480 = 0 if height <= 480 else 1
|
||||
if at_or_under_480 == 0:
|
||||
return (0, is_m3u8, -height, -tbr)
|
||||
return (1, is_m3u8, height, -tbr)
|
||||
|
||||
return sorted(candidates, key=_sort_key)[0]
|
||||
|
||||
def _build_proxy_url(self, upstream_url: str) -> str:
|
||||
encoded = quote(upstream_url, safe="")
|
||||
return f"/api/v1/youtube/proxy/manifest.m3u8?url={encoded}"
|
||||
|
|
@ -0,0 +1,177 @@
|
|||
"""Phase 3.1 tests: Configuration and infrastructure setup for YouTube proxy.
|
||||
|
||||
Covers:
|
||||
- Config fields: youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl defaults and env loading
|
||||
- Model schemas: YouTubeExtractRequest, YouTubeStreamResponse, StreamFormat
|
||||
- Service stubs: YouTubeService, HLSProxyService instantiation
|
||||
- Router registration: youtube.router mounted, endpoint responds 200 with mock
|
||||
"""
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from fastapi import FastAPI
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
|
||||
class TestYouTubeProxyConfig:
|
||||
"""Config fields for YouTube proxy exist with correct defaults."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def clear_cache(self):
|
||||
from app.core.config import get_settings
|
||||
|
||||
get_settings.cache_clear()
|
||||
yield
|
||||
get_settings.cache_clear()
|
||||
|
||||
def test_defaults(self):
|
||||
from app.core.config import get_settings
|
||||
|
||||
s = get_settings()
|
||||
assert s.youtube_proxy_enabled is True
|
||||
assert s.yt_dlp_timeout == 30
|
||||
assert s.yt_dlp_cache_ttl == 300
|
||||
|
||||
def test_env_override(self, monkeypatch):
|
||||
monkeypatch.setenv("YOUTUBE_PROXY_ENABLED", "false")
|
||||
monkeypatch.setenv("YT_DLP_TIMEOUT", "60")
|
||||
monkeypatch.setenv("YT_DLP_CACHE_TTL", "600")
|
||||
|
||||
from app.core.config import get_settings
|
||||
|
||||
s = get_settings()
|
||||
assert s.youtube_proxy_enabled is False
|
||||
assert s.yt_dlp_timeout == 60
|
||||
assert s.yt_dlp_cache_ttl == 600
|
||||
|
||||
def test_bool_parsing(self, monkeypatch):
|
||||
"""Bool fields accept 'true'/'false', '1'/'0' (pydantic-settings)."""
|
||||
monkeypatch.setenv("YOUTUBE_PROXY_ENABLED", "0")
|
||||
|
||||
from app.core.config import get_settings
|
||||
|
||||
s = get_settings()
|
||||
assert s.youtube_proxy_enabled is False
|
||||
|
||||
|
||||
class TestYouTubeModels:
|
||||
"""Pydantic models for YouTube stream extraction."""
|
||||
|
||||
def test_extract_request(self):
|
||||
from app.models.youtube import YouTubeExtractRequest
|
||||
|
||||
req = YouTubeExtractRequest(url="https://www.youtube.com/watch?v=abc123")
|
||||
assert req.url == "https://www.youtube.com/watch?v=abc123"
|
||||
|
||||
def test_stream_response_defaults(self):
|
||||
from app.models.youtube import YouTubeStreamResponse
|
||||
|
||||
resp = YouTubeStreamResponse(video_id="abc123", title="Test Video")
|
||||
assert resp.video_id == "abc123"
|
||||
assert resp.title == "Test Video"
|
||||
assert resp.is_live is False
|
||||
assert resp.is_upcoming is False
|
||||
assert resp.video_proxy_url is None
|
||||
assert resp.audio_proxy_url is None
|
||||
assert resp.formats == []
|
||||
assert resp.error is None
|
||||
|
||||
def test_stream_format(self):
|
||||
from app.models.youtube import StreamFormat
|
||||
|
||||
fmt = StreamFormat(
|
||||
format_id="140",
|
||||
url="https://example.com/audio.m3u8",
|
||||
is_audio_only=True,
|
||||
codec="mp4a.40.2",
|
||||
)
|
||||
assert fmt.format_id == "140"
|
||||
assert fmt.is_audio_only is True
|
||||
assert fmt.is_video_only is False
|
||||
assert fmt.resolution is None
|
||||
|
||||
|
||||
class TestYouTubeServices:
|
||||
"""Service stubs can be imported and instantiated."""
|
||||
|
||||
def test_youtube_service_instantiate(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
assert svc.timeout == 30
|
||||
assert svc.cache_ttl == 300
|
||||
|
||||
def test_youtube_service_extract_is_async(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
import inspect
|
||||
|
||||
assert inspect.iscoroutinefunction(svc.extract_streams)
|
||||
|
||||
def test_hls_proxy_instantiate(self):
|
||||
from app.services.hls_proxy import HLSProxyService
|
||||
|
||||
svc = HLSProxyService()
|
||||
assert svc is not None
|
||||
|
||||
|
||||
class TestYouTubeRouter:
|
||||
"""YouTube router is mounted and stub endpoint responds correctly."""
|
||||
|
||||
@pytest.fixture
|
||||
def youtube_client(self):
|
||||
from app.routers.youtube import router
|
||||
from app.core.config import get_settings
|
||||
|
||||
get_settings.cache_clear()
|
||||
app = FastAPI()
|
||||
app.include_router(router, prefix="/api/v1")
|
||||
return TestClient(app)
|
||||
|
||||
def test_extract_responds_with_mocked_ytdlp(self, youtube_client):
|
||||
from app.routers.youtube import _get_youtube_service
|
||||
|
||||
_get_youtube_service.cache_clear()
|
||||
|
||||
vod_info = {
|
||||
"id": "test123",
|
||||
"title": "Test",
|
||||
"thumbnail": "https://example.com/thumb.jpg",
|
||||
"live_status": "not_live",
|
||||
"formats": [
|
||||
{
|
||||
"format_id": "135", "height": 480,
|
||||
"vcodec": "avc1", "acodec": "none",
|
||||
"ext": "mp4", "protocol": "https",
|
||||
"url": "https://example.com/video.mp4", "tbr": 1200,
|
||||
},
|
||||
{
|
||||
"format_id": "140",
|
||||
"vcodec": "none", "acodec": "mp4a",
|
||||
"ext": "m4a", "protocol": "https",
|
||||
"url": "https://example.com/audio.m4a", "abr": 128,
|
||||
},
|
||||
],
|
||||
}
|
||||
mock_ydl = MagicMock()
|
||||
mock_instance = MagicMock()
|
||||
mock_instance.extract_info.return_value = vod_info
|
||||
mock_ydl.__enter__.return_value = mock_instance
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=test123"},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["video_id"] == "test123"
|
||||
assert data["video_proxy_url"] is not None
|
||||
assert data["audio_proxy_url"] is not None
|
||||
|
||||
def test_router_tag(self):
|
||||
from app.routers.youtube import router
|
||||
|
||||
assert any(tag == "youtube" for tag in router.tags)
|
||||
|
|
@ -0,0 +1,446 @@
|
|||
"""Phase 3.2 tests: YouTube URL extraction via yt-dlp.
|
||||
|
||||
Covers:
|
||||
- POST /api/v1/youtube/extract — VOD, live, upcoming, invalid URL
|
||||
- Format selection: video-only ≤480p, best audio, HLS preference
|
||||
- URL caching: in-memory with TTL, expiry triggers re-extract
|
||||
- Proxy URL construction: upstream URL encoded in query param
|
||||
- Error handling: DownloadError → 400, timeout → 504, disabled → 503
|
||||
|
||||
All yt-dlp external calls are mocked.
|
||||
"""
|
||||
import time
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
from fastapi import FastAPI
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers — fake yt-dlp format data
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _make_format(
|
||||
format_id: str,
|
||||
height: int | None = None,
|
||||
vcodec: str = "none",
|
||||
acodec: str = "none",
|
||||
ext: str = "mp4",
|
||||
protocol: str = "https",
|
||||
url: str = "",
|
||||
abr: float | None = None,
|
||||
tbr: float | None = None,
|
||||
resolution: str | None = None,
|
||||
) -> dict:
|
||||
return {
|
||||
"format_id": format_id,
|
||||
"height": height,
|
||||
"width": height * 16 // 9 if height else None,
|
||||
"vcodec": vcodec,
|
||||
"acodec": acodec,
|
||||
"ext": ext,
|
||||
"protocol": protocol,
|
||||
"url": url or f"https://example.com/{format_id}.{ext}",
|
||||
"abr": abr,
|
||||
"tbr": tbr,
|
||||
"resolution": resolution or (f"{height * 16 // 9}x{height}" if height else None),
|
||||
}
|
||||
|
||||
|
||||
def _vod_info(video_id: str = "abc123") -> dict:
|
||||
return {
|
||||
"id": video_id,
|
||||
"title": "Test VOD Video",
|
||||
"thumbnail": "https://i.ytimg.com/vi/abc123/hqdefault.jpg",
|
||||
"live_status": "not_live",
|
||||
"duration": 300,
|
||||
"formats": [
|
||||
_make_format("137", height=1080, vcodec="avc1.640028", acodec="none", tbr=5000),
|
||||
_make_format("136", height=720, vcodec="avc1.640028", acodec="none", tbr=2500),
|
||||
_make_format("135", height=480, vcodec="avc1.640028", acodec="none", tbr=1200),
|
||||
_make_format("134", height=360, vcodec="avc1.640028", acodec="none", tbr=600),
|
||||
_make_format("133", height=240, vcodec="avc1.640028", acodec="none", tbr=300),
|
||||
_make_format("140", acodec="mp4a.40.2", vcodec="none", abr=128),
|
||||
_make_format("251", acodec="opus", vcodec="none", abr=160),
|
||||
_make_format("18", height=360, vcodec="avc1.42001E", acodec="mp4a.40.2", tbr=500),
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _vod_info_hls(video_id: str = "abc123") -> dict:
|
||||
return {
|
||||
"id": video_id,
|
||||
"title": "Test VOD with HLS",
|
||||
"thumbnail": "https://i.ytimg.com/vi/abc123/hqdefault.jpg",
|
||||
"live_status": "not_live",
|
||||
"duration": 600,
|
||||
"formats": [
|
||||
_make_format("136", height=720, vcodec="avc1.640028", acodec="none", ext="m3u8", protocol="m3u8_native", tbr=2500),
|
||||
_make_format("135", height=480, vcodec="avc1.640028", acodec="none", ext="m3u8", protocol="m3u8_native", tbr=1200),
|
||||
_make_format("140", acodec="mp4a.40.2", vcodec="none", ext="m3u8", protocol="m3u8_native", abr=128),
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _live_info(video_id: str = "live999") -> dict:
|
||||
return {
|
||||
"id": video_id,
|
||||
"title": "Live Stream Test",
|
||||
"thumbnail": "https://i.ytimg.com/vi/live999/hqdefault_live.jpg",
|
||||
"live_status": "is_live",
|
||||
"duration": None,
|
||||
"formats": [
|
||||
_make_format("91", height=144, vcodec="avc1.42C00B", acodec="mp4a.40.5", ext="mp4", protocol="m3u8_native"),
|
||||
_make_format("92", height=240, vcodec="avc1.4D4015", acodec="mp4a.40.5", ext="mp4", protocol="m3u8_native"),
|
||||
_make_format("93", height=360, vcodec="avc1.4D401E", acodec="mp4a.40.2", ext="mp4", protocol="m3u8_native"),
|
||||
_make_format("94", height=480, vcodec="avc1.4D401F", acodec="mp4a.40.2", ext="mp4", protocol="m3u8_native", tbr=1200),
|
||||
_make_format("95", height=720, vcodec="avc1.4D401F", acodec="mp4a.40.2", ext="mp4", protocol="m3u8_native"),
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def _upcoming_info(video_id: str = "up999") -> dict:
|
||||
return {
|
||||
"id": video_id,
|
||||
"title": "Upcoming Stream",
|
||||
"thumbnail": "https://i.ytimg.com/vi/up999/hqdefault.jpg",
|
||||
"live_status": "is_upcoming",
|
||||
"duration": None,
|
||||
"formats": [],
|
||||
}
|
||||
|
||||
|
||||
def _private_info(video_id: str = "priv99") -> dict:
|
||||
import yt_dlp
|
||||
|
||||
raise yt_dlp.utils.DownloadError("Private video. Sign in if you've been granted access to this video")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Mock helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _make_mock_ydl(return_value: dict | Exception) -> MagicMock:
|
||||
"""Build a mock yt_dlp.YoutubeDL context manager with .extract_info."""
|
||||
mock_instance = MagicMock()
|
||||
if isinstance(return_value, Exception):
|
||||
mock_instance.extract_info.side_effect = return_value
|
||||
else:
|
||||
mock_instance.extract_info.return_value = return_value
|
||||
mock_ydl = MagicMock()
|
||||
mock_ydl.__enter__.return_value = mock_instance
|
||||
mock_ydl.__exit__.return_value = None
|
||||
return mock_ydl
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Fixtures
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
@pytest.fixture
|
||||
def youtube_client(monkeypatch):
|
||||
"""FastAPI TestClient with youtube router mounted, cached settings cleared."""
|
||||
from app.routers.youtube import router
|
||||
from app.core.config import get_settings
|
||||
|
||||
get_settings.cache_clear()
|
||||
monkeypatch.setenv("YOUTUBE_PROXY_ENABLED", "true")
|
||||
get_settings.cache_clear()
|
||||
|
||||
app = FastAPI()
|
||||
app.include_router(router, prefix="/api/v1")
|
||||
return TestClient(app)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Unit: Format selection
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestFormatSelection:
|
||||
def test_selects_best_video_at_or_under_480p(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
formats = _vod_info()["formats"]
|
||||
video, audio = svc._select_best_formats(formats)
|
||||
|
||||
assert video is not None
|
||||
assert audio is not None
|
||||
assert video["height"] == 480
|
||||
assert video["vcodec"] != "none"
|
||||
assert video["acodec"] == "none"
|
||||
assert audio["acodec"] != "none"
|
||||
assert audio["vcodec"] == "none"
|
||||
|
||||
def test_falls_back_to_lowest_video_if_no_480p(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
formats = [
|
||||
_make_format("137", height=1080, vcodec="avc1", acodec="none", tbr=5000),
|
||||
_make_format("136", height=720, vcodec="avc1", acodec="none", tbr=2500),
|
||||
_make_format("140", acodec="mp4a", vcodec="none", abr=128),
|
||||
]
|
||||
video, audio = svc._select_best_formats(formats)
|
||||
|
||||
assert video is not None
|
||||
assert video["height"] == 720 # Lowest available (no ≤480p exist)
|
||||
|
||||
def test_selects_highest_bitrate_audio(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
formats = [
|
||||
_make_format("137", height=480, vcodec="avc1", acodec="none", tbr=1200),
|
||||
_make_format("140", acodec="mp4a", vcodec="none", abr=128),
|
||||
_make_format("251", acodec="opus", vcodec="none", abr=160),
|
||||
_make_format("250", acodec="opus", vcodec="none", abr=64),
|
||||
]
|
||||
video, audio = svc._select_best_formats(formats)
|
||||
|
||||
assert audio is not None
|
||||
assert audio["format_id"] == "251" # Highest abr
|
||||
|
||||
def test_no_formats_raises(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
with pytest.raises(ValueError, match="No streamable formats"):
|
||||
svc._select_best_formats([])
|
||||
|
||||
def test_no_video_only_formats_falls_back_to_combined(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
formats = [
|
||||
_make_format("18", height=360, vcodec="avc1", acodec="mp4a", tbr=500),
|
||||
_make_format("140", acodec="mp4a", vcodec="none", abr=128),
|
||||
]
|
||||
video, audio = svc._select_best_formats(formats)
|
||||
|
||||
# Fallback: combined format as video
|
||||
assert video is not None
|
||||
assert video["format_id"] == "18"
|
||||
assert audio is not None
|
||||
|
||||
def test_hls_preference_for_live(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
formats = [
|
||||
_make_format("135", height=480, vcodec="avc1", acodec="none", ext="mp4", protocol="https", tbr=1200),
|
||||
_make_format("301", height=480, vcodec="avc1", acodec="none", ext="m3u8", protocol="m3u8_native", tbr=1200),
|
||||
_make_format("140", acodec="mp4a", vcodec="none", ext="m3u8", protocol="m3u8_native", abr=128),
|
||||
]
|
||||
video, audio = svc._select_best_formats(formats)
|
||||
|
||||
assert video["protocol"] == "m3u8_native"
|
||||
assert audio["protocol"] == "m3u8_native"
|
||||
|
||||
def test_combined_only_all_combined_formats(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
formats = [
|
||||
_make_format("93", height=360, vcodec="avc1", acodec="mp4a", ext="mp4", protocol="m3u8_native"),
|
||||
_make_format("94", height=480, vcodec="avc1", acodec="mp4a", ext="mp4", protocol="m3u8_native"),
|
||||
_make_format("95", height=720, vcodec="avc1", acodec="mp4a", ext="mp4", protocol="m3u8_native"),
|
||||
_make_format("96", height=1080, vcodec="avc1", acodec="mp4a", ext="mp4", protocol="m3u8_native"),
|
||||
]
|
||||
video, audio = svc._select_best_formats(formats)
|
||||
|
||||
assert video["height"] == 480
|
||||
assert audio["height"] == 480
|
||||
assert video["url"] == audio["url"]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Integration: Route + mocked yt-dlp
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestYouTubeExtractVOD:
|
||||
def test_extract_vod_returns_proxy_urls(self, youtube_client):
|
||||
mock_ydl = _make_mock_ydl(_vod_info("abc123"))
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=abc123"},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["video_id"] == "abc123"
|
||||
assert data["title"] == "Test VOD Video"
|
||||
assert data["is_live"] is False
|
||||
assert data["is_upcoming"] is False
|
||||
assert data["video_proxy_url"] is not None
|
||||
assert data["audio_proxy_url"] is not None
|
||||
assert data["video_proxy_url"].startswith("/api/v1/youtube/proxy/")
|
||||
assert data["thumbnail_url"] == "https://i.ytimg.com/vi/abc123/hqdefault.jpg"
|
||||
assert len(data["formats"]) > 0
|
||||
|
||||
def test_extract_vod_hls_returns_manifest_proxy_urls(self, youtube_client):
|
||||
mock_ydl = _make_mock_ydl(_vod_info_hls("abc123"))
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=abc123"},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert "manifest.m3u8?url=" in data["video_proxy_url"]
|
||||
assert "manifest.m3u8?url=" in data["audio_proxy_url"]
|
||||
|
||||
def test_error_field_is_none_on_success(self, youtube_client):
|
||||
mock_ydl = _make_mock_ydl(_vod_info())
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=abc123"},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
assert resp.json()["error"] is None
|
||||
|
||||
|
||||
class TestYouTubeExtractLive:
|
||||
def test_extract_live_returns_is_live_true(self, youtube_client):
|
||||
mock_ydl = _make_mock_ydl(_live_info())
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=live999"},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["video_id"] == "live999"
|
||||
assert data["is_live"] is True
|
||||
assert data["is_upcoming"] is False
|
||||
assert data["video_proxy_url"] is not None
|
||||
assert data["audio_proxy_url"] is not None
|
||||
|
||||
def test_live_combined_format_same_url_for_both(self, youtube_client):
|
||||
mock_ydl = _make_mock_ydl(_live_info("combined_test"))
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=combined_test"},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["is_live"] is True
|
||||
assert data["video_proxy_url"] == data["audio_proxy_url"]
|
||||
|
||||
|
||||
class TestYouTubeExtractUpcoming:
|
||||
def test_extract_upcoming_returns_is_upcoming_true(self, youtube_client):
|
||||
mock_ydl = _make_mock_ydl(_upcoming_info())
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=up999"},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["video_id"] == "up999"
|
||||
assert data["is_upcoming"] is True
|
||||
assert data["is_live"] is False
|
||||
assert data["video_proxy_url"] is None
|
||||
assert data["audio_proxy_url"] is None
|
||||
|
||||
|
||||
class TestYouTubeExtractErrors:
|
||||
def test_private_video_returns_error_field(self, youtube_client):
|
||||
import yt_dlp
|
||||
|
||||
exc = yt_dlp.utils.DownloadError("Private video")
|
||||
mock_ydl = _make_mock_ydl(exc)
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=priv99"},
|
||||
)
|
||||
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["error"] is not None
|
||||
assert "Private video" in data["error"]
|
||||
|
||||
def test_disabled_proxy_returns_503(self, monkeypatch, youtube_client):
|
||||
monkeypatch.setenv("YOUTUBE_PROXY_ENABLED", "false")
|
||||
from app.core.config import get_settings
|
||||
|
||||
get_settings.cache_clear()
|
||||
|
||||
resp = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=abc123"},
|
||||
)
|
||||
assert resp.status_code == 503
|
||||
|
||||
|
||||
class TestURLCaching:
|
||||
def test_cached_result_not_re_extracted(self, youtube_client):
|
||||
mock_ydl = _make_mock_ydl(_vod_info("cached1"))
|
||||
instance = mock_ydl.__enter__.return_value
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
r1 = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=cached1"},
|
||||
)
|
||||
r2 = youtube_client.post(
|
||||
"/api/v1/youtube/extract",
|
||||
json={"url": "https://www.youtube.com/watch?v=cached1"},
|
||||
)
|
||||
|
||||
assert r1.status_code == 200
|
||||
assert r2.status_code == 200
|
||||
assert r1.json()["video_id"] == r2.json()["video_id"]
|
||||
assert instance.extract_info.call_count == 1 # Cached, not called twice
|
||||
|
||||
def test_cache_expiry_triggers_re_extract(self, monkeypatch):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=0) # 0 TTL = immediate expiry
|
||||
|
||||
mock_ydl = _make_mock_ydl(_vod_info("exp1"))
|
||||
instance = mock_ydl.__enter__.return_value
|
||||
|
||||
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
|
||||
import asyncio
|
||||
|
||||
asyncio.run(svc.extract_streams("https://www.youtube.com/watch?v=exp1"))
|
||||
# Cache should be set but TTL=0 means expired
|
||||
asyncio.run(svc.extract_streams("https://www.youtube.com/watch?v=exp1"))
|
||||
|
||||
assert instance.extract_info.call_count == 2
|
||||
|
||||
|
||||
class TestProxyURLConstruction:
|
||||
def test_proxy_url_encodes_upstream_url(self):
|
||||
from app.services.youtube_service import YouTubeService
|
||||
from urllib.parse import quote, unquote
|
||||
|
||||
svc = YouTubeService(timeout=30, cache_ttl=300)
|
||||
|
||||
upstream = "https://manifest.googlevideo.com/123/hls_playlist.m3u8?id=abc&key=def"
|
||||
proxy = svc._build_proxy_url(upstream)
|
||||
|
||||
assert proxy.startswith("/api/v1/youtube/proxy/manifest.m3u8?url=")
|
||||
# Extract and decode the URL parameter
|
||||
encoded = proxy.split("url=", 1)[1]
|
||||
decoded = unquote(encoded)
|
||||
assert decoded == upstream
|
||||
|
|
@ -19,3 +19,4 @@ langchain-openai>=1.1.11,<1.2.0
|
|||
dashscope>=0.4.0
|
||||
aiofiles>=24.0.0
|
||||
zhconv>=1.4.0
|
||||
yt-dlp>=2024.0.0
|
||||
|
|
|
|||
|
|
@ -13,6 +13,7 @@
|
|||
"@tanstack/react-query": "^5.0.0",
|
||||
"autoprefixer": "^10.5.0",
|
||||
"axios": "^1.6.0",
|
||||
"hls.js": "^1.5.0",
|
||||
"lucide-react": "^0.190.0",
|
||||
"pdfjs-dist": "^5.6.205",
|
||||
"react": "^18.2.0",
|
||||
|
|
|
|||
Loading…
Reference in New Issue