feat: Phase 3.1 + 3.2 — YouTube config infra and URL extraction

Phase 3.1 — Configuration & Infrastructure:
- Add youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl config fields
- Add yt-dlp and hls.js dependencies
- Create models/youtube.py (request/response schemas)
- Create service stubs (youtube_service, hls_proxy)
- Create router stub and register in main.py
- 11 config tests

Phase 3.2 — YouTube URL Extraction:
- yt-dlp wrapper with async extraction (run_in_executor)
- Format selection: ≤480p video-only + highest-bitrate audio (VOD)
- Combined format fallback: same URL for live streams
- In-memory URL cache: 5min TTL live, 30min VOD
- lru_cache singleton service for cache persistence
- Error handling: DownloadError → 200 with error field
- 18 extract tests, 82/82 total pass (zero regressions)

Real-URL verified: VOD (5bF3tkO5jAA) 24 formats, Live (fN9uYWCjQaw) 6 HLS
This commit is contained in:
Woody 2026-05-09 15:53:04 +08:00
parent 09b5ea7d64
commit 284028bb1f
12 changed files with 1036 additions and 90 deletions

View File

@ -1,15 +1,15 @@
# Phase 3: YouTube Live Stream Proxy → ASR → RAG — Implementation Plan
**Created:** 2026-05-09
**Updated:** 2026-05-09 (user decisions incorporated)
**Status:** Planning
**Updated:** 2026-05-09 (Phase 3.1 + 3.2 implemented)
**Status:** In Progress (3.1 Complete, 3.2 Complete)
**Depends on:** Phase 1 (Complete), Phase 2 (Complete)
---
## 1. Overview
Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts separate video-only and audio-only HLS streams via yt-dlp → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in `<video>` via hls.js, routes audio through hidden `<audio>` element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
Phase 3 adds YouTube live stream (and VOD) playback as an alternative to file upload. User pastes a YouTube URL → backend extracts stream URLs via yt-dlp (separate video-only + audio-only for VODs; combined HLS for live) → backend proxies HLS manifests and .ts segments (zero re-encoding) → frontend plays video in `<video>` via hls.js, routes audio through hidden `<audio>` element → AudioContext.createMediaElementSource(audioElement) → existing ASR pipeline (WebSocket → DashScope) → transcript flows into QueryInput → Phase 1 RAG pipeline.
**Same code works identically for live streams and VODs.**
@ -20,14 +20,16 @@ YouTube's official iframe player does not expose the audio track to Web Audio AP
### Audio Routing
```
YouTube HLS audio-only stream
→ hls.js loads into hidden <audio> element
YouTube HLS stream (combined video+audio for live; separate tracks for VOD)
→ hls.js loads into <video> (muted) and hidden <audio> element
→ AudioContext.createMediaElementSource(audioElement)
→ ScriptProcessorNode (Float32 PCM)
→ WebSocket → FastAPI → DashScope realtime ASR
→ transcript → QueryInput
```
Note: For VODs, separate video-only and audio-only tracks are used. For live streams, YouTube provides combined formats only — the same HLS manifest URL is used for both elements; hls.js demuxes them independently.
### Integration With Existing Pipeline
This phase reuses the existing ASR infrastructure entirely:
@ -55,51 +57,65 @@ This phase reuses the existing ASR infrastructure entirely:
## 3. Sub-Phases
### Phase 3.1 — Configuration & Infrastructure Setup (0.5 day)
### Phase 3.1 — Configuration & Infrastructure Setup ✅ Complete
Add config fields, install dependencies, create skeletons, register router.
**Test:** `test_phase3_config.py`
**Test:** `test_phase3_config.py` (11 tests)
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.1.1 | Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl` | `core/config.py` |
| 3.1.2 | Update `.env.example` | `.env.example` |
| 3.1.3 | Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json` | `requirements.txt`, `package.json` |
| 3.1.4 | Create `models/youtube.py``YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat` | `models/youtube.py` |
| 3.1.5 | Create `services/youtube_service.py` stub | `services/youtube_service.py` |
| 3.1.6 | Create `services/hls_proxy.py` stub | `services/hls_proxy.py` |
| 3.1.7 | Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}` | `routers/youtube.py` |
| 3.1.8 | Register router in `main.py` | `main.py` |
| 3.1.9 | Write and pass `test_phase3_config.py` | `app/test/` |
| # | Task | File | Status |
|---|------|------|--------|
| 3.1.1 | Add config fields: `youtube_proxy_enabled`, `yt_dlp_timeout`, `yt_dlp_cache_ttl` | `core/config.py` | Done |
| 3.1.2 | Update `.env.example` | `.env.example` | Done |
| 3.1.3 | Add deps: `yt-dlp>=2024.0.0` to `requirements.txt`, `hls.js@^1.5.0` to `package.json` | `requirements.txt`, `package.json` | Done |
| 3.1.4 | Create `models/youtube.py``YouTubeExtractRequest`, `YouTubeStreamResponse`, `StreamFormat` | `models/youtube.py` | Done |
| 3.1.5 | Create `services/youtube_service.py` stub | `services/youtube_service.py` | Done |
| 3.1.6 | Create `services/hls_proxy.py` stub | `services/hls_proxy.py` | Done |
| 3.1.7 | Create `routers/youtube.py` stub: `POST /youtube/extract`, `GET /youtube/proxy/{stream_type}/{path}` | `routers/youtube.py` | Done |
| 3.1.8 | Register router in `main.py` | `main.py` | Done |
| 3.1.9 | Write and pass `test_phase3_config.py` | `app/test/` | Done (11/11 pass) |
---
### Phase 3.2 — YouTube URL Extraction Backend (0.5 day)
### Phase 3.2 — YouTube URL Extraction Backend ✅ Complete
yt-dlp wrapper service that extracts separate video-only and audio-only HLS URLs. Returns proxy-wrapped URLs pointing back to our HLS proxy.
yt-dlp wrapper service that extracts stream URLs and formats. Returns proxy-wrapped URLs pointing back to our HLS proxy.
**Test:** `test_phase3_youtube_extract.py`
**Test:** `test_phase3_youtube_extract.py` (18 tests)
**Acceptance Criteria:**
- `POST /api/v1/youtube/extract` accepts `{"url": "https://www.youtube.com/watch?v=..."}`
- Returns `{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url }`
- VODs: extracts ~210 formats, returns best video+audio pair
- Live streams: uses `ios` client for HLS, returns current live edge
- Upcoming/scheduled streams: returns `is_upcoming: true` with scheduled start time
- Invalid/private URLs: returns clear error
- URL expiration: caches extraction result with TTL (5 min for live, 30 min for VOD)
- Returns `{ video_id, title, is_live, video_proxy_url, audio_proxy_url, thumbnail_url, formats, error }`
- VODs: extracts separate video-only + audio-only tracks, selects best ≤480p + highest-bitrate audio
- Live streams: extracts combined HLS formats, uses same URL for video and audio (hls.js demuxes)
- Upcoming/scheduled streams: returns `is_upcoming: true` with no proxy URLs
- Invalid/private URLs: returns 200 with error field populated (yt-dlp exception caught)
- URL expiration: in-memory cache with TTL (5 min for live, 30 min for VOD)
- Service singleton: `@lru_cache` on `_get_youtube_service()` for cache persistence across requests
**Implementation Discoveries:**
- **No iOS client needed** — default yt-dlp works for both VOD (separate tracks) and live (combined HLS)
- **Live streams use combined formats** — all live formats include both video+audio; same HLS URL serves both `<video>` and `<audio>` elements
- **Format selection** (`_pick_best_video`): prefers ≤480p with HLS first, then falls back to ascending height + HLS preference
- **Error response pattern**: extraction errors return HTTP 200 with `error` field (not 4xx); the API call itself succeeds but YouTube returned an error
- **Proxy URL construction** (`_build_proxy_url`): URL-encodes upstream URL into `/api/v1/youtube/proxy/manifest.m3u8?url=<encoded>`
**Real-URL Verification:**
```
VOD: https://www.youtube.com/watch?v=5bF3tkO5jAA → 24 formats, separate video+audio ✓
Live: https://www.youtube.com/watch?v=fN9uYWCjQaw → 6 combined formats, same URL ✓
```
**Tasks:**
| # | Task | File |
|---|------|------|
| 3.2.1 | Write tests first | `app/test/test_phase3_youtube_extract.py` |
| 3.2.2 | Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection | `services/youtube_service.py` |
| 3.2.3 | Implement `YouTubeService._select_best_formats()` — separate video/audio from format list, prefer ≤480p | `services/youtube_service.py` |
| 3.2.4 | Implement format URL caching with TTL | `services/youtube_service.py` |
| 3.2.5 | Implement `POST /api/v1/youtube/extract` route | `routers/youtube.py` |
| 3.2.6 | Run tests → pass → commit | — |
| # | Task | File | Status |
|---|------|------|--------|
| 3.2.1 | Write tests first | `app/test/test_phase3_youtube_extract.py` | Done |
| 3.2.2 | Implement `YouTubeService.extract_streams()` — yt-dlp wrapper with format selection | `services/youtube_service.py` | Done |
| 3.2.3 | Implement `YouTubeService._select_best_formats()` + `_pick_best_video()` — separate video/audio from format list, prefer ≤480p, combined fallback | `services/youtube_service.py` | Done |
| 3.2.4 | Implement format URL caching with TTL (live 5 min, VOD 30 min) | `services/youtube_service.py` | Done |
| 3.2.5 | Implement `POST /api/v1/youtube/extract` route with response model + error handling | `routers/youtube.py` | Done |
| 3.2.6 | Run tests → pass → verified with real URLs | — | Done (82/82 pass) |
---
@ -221,18 +237,18 @@ Wire YouTube audio output into existing ASR pipeline. The key challenge: `useVid
## 4. Timeline
| Sub-Phase | Description | Effort | Depends On |
|---|---|---|---|
| 3.1 | Config & Infrastructure | 0.5 day | — |
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 |
| 3.3 | HLS Proxy Backend | 1 day | 3.1 |
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 |
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 |
| 3.6 | Integration & Acceptance | 1 day | 3.5 |
| 3.7 | Polish & Deployment | 0.5 day | 3.6 |
| **Total** | | **5.5 days** | |
| Sub-Phase | Description | Effort | Depends On | Status |
|---|---|---|---|---|---|
| 3.1 | Config & Infrastructure | 0.5 day | — | ✅ Complete |
| 3.2 | YouTube URL Extraction | 0.5 day | 3.1 | ✅ Complete |
| 3.3 | HLS Proxy Backend | 1 day | 3.1 | ⏳ Next |
| 3.4 | Frontend Input + Player | 1 day | 3.2, 3.3 | Pending |
| 3.5 | YouTube → ASR Integration | 1 day | 3.4 | Pending |
| 3.6 | Integration & Acceptance | 1 day | 3.5 | Pending |
| 3.7 | Polish & Deployment | 0.5 day | 3.6 | Pending |
| **Total** | | **5.5 days** | | **2/7 done** |
3.2 (extraction) and 3.3 (proxy) can run concurrently.
3.2 (extraction) and 3.3 (proxy) were planned concurrent; 3.2 is now done ahead of 3.3.
---
@ -265,13 +281,16 @@ YT_DLP_CACHE_TTL=300
## 7. Key Design Decisions
| Decision | Choice | Why |
|---|---|---|
|---|---|---|---|
| Streaming protocol | HLS (m3u8) | hls.js plays it natively; DASH requires dash.js |
| yt-dlp client | `ios` for live, `web` for VOD | `ios` returns HLS for live streams with 60fps support; format selector prefers ≤480p |
| yt-dlp client | **Default** (no special client) | Default extractor works for both VOD (separate tracks) and live (combined HLS); iOS client caused "No video formats" errors on some live streams |
| Live format strategy | **Combined formats, same URL** | Live HLS formats include both video+audio; same URL for `<video>` and `<audio>` elements — hls.js demuxes each independently |
| HTTP client for proxy | httpx (already present) | Streaming support via `httpx.stream()`; no new dependency |
| Manifest rewriting | Line-by-line streaming | Live manifests can be large; never buffer whole file |
| Audio element | Hidden `<audio>` + hls.js | `createMediaElementSource` works on `<audio>` elements |
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min |
| URL caching | In-memory dict with TTL | yt-dlp extraction is slow (~2-5s); reuse for 5 min live, 30 min VOD |
| Service lifetime | `@lru_cache` singleton | Cache must persist across HTTP requests for caching to work |
| Error response | **HTTP 200 with error field** | API call succeeded; YouTube error is a content-level failure, not a protocol failure |
| **Full Transcript for YouTube** | **Disabled** | Button hidden; real-time streaming ASR only |
| **QueryInput during streaming** | **Editable** | User can type corrections while transcript streams (same as existing ASR) |
| **Video quality** | **360p480p auto-best** | Low resolution sufficient for reference; no quality selector |
@ -287,46 +306,46 @@ YT_DLP_CACHE_TTL=300
### New Files
```
backend/
app/models/youtube.py
app/services/youtube_service.py
app/services/hls_proxy.py
app/routers/youtube.py
app/test/test_phase3_config.py
app/test/test_phase3_youtube_extract.py
app/test/test_phase3_hls_proxy.py
app/test/test_phase3_hls_manifest.py
app/test/test_integration_phase3.py
app/test/acceptance/test_acceptance_phase3_youtube.py
app/test/acceptance/test_acceptance_phase3_live.py
app/models/youtube.py ✅ Created (3.1)
app/services/youtube_service.py ✅ Created (3.1), implemented (3.2)
app/services/hls_proxy.py ✅ Stub created (3.1)
app/routers/youtube.py ✅ Created (3.1), implemented (3.2)
app/test/test_phase3_config.py ✅ Written (3.1, 11 tests)
app/test/test_phase3_youtube_extract.py ✅ Written (3.2, 18 tests)
app/test/test_phase3_hls_proxy.py ⏳ Pending (3.3)
app/test/test_phase3_hls_manifest.py ⏳ Pending (3.3)
app/test/test_integration_phase3.py ⏳ Pending (3.6)
app/test/acceptance/test_acceptance_phase3_youtube.py ⏳ Pending (3.6)
app/test/acceptance/test_acceptance_phase3_live.py ⏳ Pending (3.6)
frontend/src/
components/YouTubeInput.tsx
components/YouTubeVideoPlayer.tsx
hooks/useYouTubeASR.ts
test/test_phase3_YouTubeInput.test.tsx
test/test_phase3_YouTubeVideoPlayer.test.tsx
test/test_phase3_useYouTubeASR.test.ts
test/test_phase3_LTTPage_integration.test.tsx
components/YouTubeInput.tsx ⏳ Pending (3.4)
components/YouTubeVideoPlayer.tsx ⏳ Pending (3.4)
hooks/useYouTubeASR.ts ⏳ Pending (3.5)
test/test_phase3_YouTubeInput.test.tsx ⏳ Pending (3.4)
test/test_phase3_YouTubeVideoPlayer.test.tsx ⏳ Pending (3.4)
test/test_phase3_useYouTubeASR.test.ts ⏳ Pending (3.5)
test/test_phase3_LTTPage_integration.test.tsx ⏳ Pending (3.5)
```
### Modified Files
```
backend/app/core/config.py # Add 3 config fields
backend/.env.example # Add 3 env vars
backend/main.py # Register youtube router
backend/requirements.txt # Add yt-dlp
backend/app/core/config.py ✅ Done (3 fields)
backend/.env.example ✅ Done (3 vars)
backend/main.py ✅ Done (router registered)
backend/requirements.txt ✅ Done (yt-dlp added)
frontend/package.json # Add hls.js
frontend/src/types/index.ts # Add YouTube types
frontend/src/lib/api.ts # Add extractYouTube(), getYouTubeProxyUrl()
frontend/src/lib/queries.tsx # Add useYouTubeExtract() mutation
frontend/src/pages/LTTPage.tsx # Add source toggle + YouTube components
frontend/src/components/QueryInput.tsx # Accept transcript from either source
frontend/package.json ✅ Done (hls.js added)
frontend/src/types/index.ts ⏳ Pending (3.4)
frontend/src/lib/api.ts ⏳ Pending (3.4)
frontend/src/lib/queries.tsx ⏳ Pending (3.4)
frontend/src/pages/LTTPage.tsx ⏳ Pending (3.4-3.5)
frontend/src/components/QueryInput.tsx ⏳ Pending (3.5)
Dockerfile # Add yt-dlp install step
docker-compose.yml # Add env vars if needed
README.md # YouTube feature section
development_plan.md # Mark Phase 3 status
Dockerfile ⏳ Pending (3.7)
docker-compose.yml ⏳ Pending (3.7)
README.md ⏳ Pending (3.7)
development_plan.md ⏳ Pending (3.7)
```
---
@ -337,7 +356,7 @@ development_plan.md # Mark Phase 3 status
|---|---|---|
| PO Token expiration (live streams cut at 30s) | High — live streams unusable without token | Auto-re-extract on HLS failure; document cookie-based workaround; acceptance test to quantify |
| yt-dlp extraction slow (2-5s) | Medium — poor UX on "Load Stream" click | Cache results with TTL; show progress indicator |
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance |
| YouTube format changes break yt-dlp | Medium — sudden breakage | Pin yt-dlp version; CI test with known-good URLs; `pip install -U yt-dlp` in maintenance. **Note**: iOS client caused "No video formats" on Phoenix TV live stream; default extractor works for both tested URLs. Monitor for regressions. |
| hls.js audio sync drift vs video | Low — separate streams may drift | hls.js `liveSyncDuration` keeps both near live edge; test with 10+ min streams |
| Safari `createMediaElementSource` on HLS | Low — known Safari bug with native HLS | hls.js uses MSE, not native HLS — works around Safari bug; Chrome/Firefox unaffected |
| YouTube ToS for proxy | Low for internal demo | Personal/enterprise internal demo is generally fine; review for public product |
@ -348,17 +367,31 @@ development_plan.md # Mark Phase 3 status
```
POST /api/v1/youtube/extract
Body: {"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ"}
Body: {"url": "https://www.youtube.com/watch?v=5bF3tkO5jAA"}
Response: {
"video_id": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up",
"video_id": "5bF3tkO5jAA",
"title": "《2026年稅務(修訂)(自動交換資料)條例草案》委員會會議",
"is_live": false,
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=video",
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...&type=audio",
"thumbnail_url": "https://i.ytimg.com/vi/dQw4w9WgXcQ/hqdefault.jpg"
"is_upcoming": false,
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=https%3A%2F%2Frr2---sn-jna...",
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=https%3A%2F%2Frr2---sn-jna...",
"thumbnail_url": "https://i.ytimg.com/vi/5bF3tkO5jAA/hqdefault.jpg",
"formats": [...],
"error": null
}
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>&type=video
# Live stream (combined formats → same URL for video and audio)
POST /api/v1/youtube/extract
Body: {"url": "https://www.youtube.com/watch?v=fN9uYWCjQaw"}
Response: {
"video_id": "fN9uYWCjQaw",
"is_live": true,
"video_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...",
"audio_proxy_url": "/api/v1/youtube/proxy/manifest.m3u8?url=...",
# video_proxy_url == audio_proxy_url (same combined HLS manifest)
}
GET /api/v1/youtube/proxy/manifest.m3u8?url=<encoded_upstream_m3u8>
→ Fetches upstream manifest from googlevideo.com
→ Rewrites segment URLs:
segment_0.ts → /api/v1/youtube/proxy/segment.ts?url=<encoded_segment_url>
@ -379,3 +412,20 @@ GET /api/v1/youtube/proxy/segment.ts?url=<encoded_upstream_ts>
- **hls.js API docs**: [github.com/video-dev/hls.js/blob/master/docs/API.md](https://github.com/video-dev/hls.js/blob/master/docs/API.md)
- **hls.js low-latency live**: `lowLatencyMode: true`, `liveSyncDuration: 1.5`
- **Existing code patterns**: `.plans/phase2_implementation_plan.md`, `backend/app/routers/video.py`, `frontend/src/hooks/useVideoASR.ts`
---
## 12. Test Results (Current)
| Suite | Tests | Status |
|-------|-------|--------|
| Phase 2 (existing) | 53 | ✅ All pass |
| Phase 3.1 (config) | 11 | ✅ All pass |
| Phase 3.2 (extraction) | 18 | ✅ All pass |
| **Total** | **82** | **0 failures** |
### Real-URL Smoke Tests
| URL | Type | Result |
|-----|------|--------|
| `5bF3tkO5jAA` (LegCo meeting) | VOD | 24 formats, separate video+audio ✅ |
| `fN9uYWCjQaw` (Phoenix TV 24h) | Live | 6 combined HLS formats, same URL ✅ |

View File

@ -36,3 +36,8 @@ ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime
# Video upload (Phase 2)
VIDEO_UPLOAD_DIR=./uploads
MAX_VIDEO_SIZE_MB=300
# YouTube Proxy (Phase 3)
YOUTUBE_PROXY_ENABLED=true
YT_DLP_TIMEOUT=30
YT_DLP_CACHE_TTL=300

View File

@ -54,6 +54,11 @@ class Settings(BaseSettings):
max_video_size_mb: int = 300
supported_video_formats: list[str] = [".mp4", ".webm", ".mov", ".avi", ".mkv"]
# YouTube Proxy (Phase 3)
youtube_proxy_enabled: bool = True
yt_dlp_timeout: int = 30
yt_dlp_cache_ttl: int = 300 # seconds (live=5min shared; VOD=30min computed in service)
# Development helpers
model_config = {"env_file": ".env", "env_file_encoding": "utf-8"}

View File

@ -7,7 +7,7 @@ from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import FileResponse
from app.routers import ingest, query, documents, prompts, history, chunks, video, ws_asr
from app.routers import ingest, query, documents, prompts, history, chunks, video, ws_asr, youtube
from app.core.config import get_settings
from app.core.sqlite_db import (
get_prompts_db,
@ -58,6 +58,7 @@ app.include_router(history.router)
app.include_router(chunks.router)
app.include_router(video.router, prefix="/api/v1")
app.include_router(ws_asr.router)
app.include_router(youtube.router, prefix="/api/v1")
_prompts_conn = get_prompts_db()
init_prompts_db(_prompts_conn)

View File

@ -0,0 +1,28 @@
"""YouTube stream extraction models (Phase 3)."""
from pydantic import BaseModel
class YouTubeExtractRequest(BaseModel):
url: str
class StreamFormat(BaseModel):
format_id: str
url: str
resolution: str | None = None
is_audio_only: bool = False
is_video_only: bool = False
codec: str | None = None
class YouTubeStreamResponse(BaseModel):
video_id: str
title: str
is_live: bool = False
is_upcoming: bool = False
video_proxy_url: str | None = None
audio_proxy_url: str | None = None
thumbnail_url: str | None = None
formats: list[StreamFormat] = []
error: str | None = None

View File

@ -0,0 +1,83 @@
import logging
import time
from functools import lru_cache
from fastapi import APIRouter, HTTPException
from app.models.youtube import YouTubeExtractRequest, YouTubeStreamResponse, StreamFormat
logger = logging.getLogger(__name__)
router = APIRouter(tags=["youtube"])
@lru_cache
def _get_youtube_service():
from app.core.config import get_settings
from app.services.youtube_service import YouTubeService
s = get_settings()
return YouTubeService(timeout=s.yt_dlp_timeout, cache_ttl=s.yt_dlp_cache_ttl)
@router.post("/youtube/extract", response_model=YouTubeStreamResponse)
async def extract_youtube_stream(req: YouTubeExtractRequest):
from app.core.config import get_settings
settings = get_settings()
if not settings.youtube_proxy_enabled:
raise HTTPException(status_code=503, detail="YouTube proxy is disabled")
service = _get_youtube_service()
started = time.monotonic()
logger.info("youtube-extract-started url=%s", req.url)
try:
data = await service.extract_streams(req.url)
except Exception as e:
logger.error("youtube-extract-failed url=%s error=%s", req.url, e)
raise HTTPException(status_code=500, detail=str(e))
if data.get("error"):
logger.warning(
"youtube-extract-error url=%s error=%s duration=%.1fs",
req.url,
data["error"],
time.monotonic() - started,
)
return YouTubeStreamResponse(
video_id=data.get("video_id", ""),
title=data.get("title", ""),
error=data["error"],
)
formats = [
StreamFormat(
format_id=f.get("format_id", ""),
url=f.get("url", ""),
resolution=f.get("resolution"),
is_audio_only=f.get("acodec", "none") != "none" and f.get("vcodec", "none") == "none",
is_video_only=f.get("vcodec", "none") != "none" and f.get("acodec", "none") == "none",
codec=f.get("vcodec") or f.get("acodec"),
)
for f in data.get("formats", [])
]
logger.info(
"youtube-extract-completed url=%s video_id=%s is_live=%s fmt_count=%d duration=%.1fs",
req.url,
data["video_id"],
data["is_live"],
len(formats),
time.monotonic() - started,
)
return YouTubeStreamResponse(
video_id=data["video_id"],
title=data["title"],
is_live=data["is_live"],
is_upcoming=data["is_upcoming"],
video_proxy_url=data.get("video_proxy_url"),
audio_proxy_url=data.get("audio_proxy_url"),
thumbnail_url=data.get("thumbnail_url"),
formats=formats,
)

View File

@ -0,0 +1,21 @@
"""HLS manifest proxy service (Phase 3.3).
Rewrites HLS manifests and proxies .ts segments so the browser treats
them as same-origin, enabling Web Audio API access to the audio track.
"""
import logging
logger = logging.getLogger(__name__)
class HLSProxyService:
"""Streams and rewrites HLS manifests; proxies .ts segments with zero re-encoding."""
async def rewrite_manifest(self, upstream_url: str) -> bytes:
"""Fetch upstream HLS manifest and rewrite segment URLs to point to our proxy."""
raise NotImplementedError("Phase 3.3 — manifest rewriting to be implemented")
async def proxy_segment(self, upstream_url: str) -> bytes:
"""Proxy a single .ts segment from the upstream server."""
raise NotImplementedError("Phase 3.3 — segment proxying to be implemented")

View File

@ -0,0 +1,128 @@
import asyncio
import logging
import time
from typing import Any
from urllib.parse import quote
import yt_dlp
logger = logging.getLogger(__name__)
class YouTubeService:
def __init__(self, timeout: int, cache_ttl: int):
self.timeout = timeout
self.cache_ttl = cache_ttl
self._cache: dict[str, tuple[float, dict]] = {}
async def extract_streams(self, url: str) -> dict:
now = time.monotonic()
if url in self._cache:
cached_at, cached_data = self._cache[url]
is_live = cached_data.get("is_live", False)
ttl = self.cache_ttl if is_live else self.cache_ttl * 6
if now - cached_at < ttl:
logger.debug("Cache hit for URL=%s age=%.1fs", url, now - cached_at)
return cached_data
logger.debug("Cache expired for URL=%s", url)
try:
loop = asyncio.get_running_loop()
info = await loop.run_in_executor(None, lambda: self._extract_sync(url))
except yt_dlp.utils.DownloadError as e:
logger.warning("yt-dlp extraction failed for URL=%s: %s", url, e)
return {"error": str(e)[:500], "video_id": "", "title": "", "formats": []}
live_status = info.get("live_status", "not_live")
is_live = live_status == "is_live"
is_upcoming = live_status == "is_upcoming"
result = {
"video_id": info.get("id", ""),
"title": info.get("title", ""),
"is_live": is_live,
"is_upcoming": is_upcoming,
"thumbnail_url": info.get("thumbnail"),
"formats": info.get("formats", []),
"error": None,
}
if not is_upcoming and info.get("formats"):
try:
video_fmt, audio_fmt = self._select_best_formats(info["formats"])
result["video_proxy_url"] = self._build_proxy_url(video_fmt["url"])
result["audio_proxy_url"] = self._build_proxy_url(audio_fmt["url"])
except ValueError as e:
result["error"] = str(e)
ttl = self.cache_ttl if is_live else self.cache_ttl * 6
self._cache[url] = (now, result)
return result
def _extract_sync(self, url: str) -> dict:
opts = self._get_ydl_opts(url)
with yt_dlp.YoutubeDL(opts) as ydl:
return ydl.extract_info(url, download=False)
def _get_ydl_opts(self, url: str) -> dict:
opts: dict[str, Any] = {
"quiet": True,
"no_warnings": True,
"extract_flat": False,
}
return opts
def _select_best_formats(self, formats: list[dict]) -> tuple[dict, dict]:
video_only = [
f
for f in formats
if f.get("vcodec", "none") != "none" and f.get("acodec", "none") == "none"
]
audio_only = [
f
for f in formats
if f.get("acodec", "none") != "none" and f.get("vcodec", "none") == "none"
]
combined = [
f
for f in formats
if f.get("vcodec", "none") != "none"
and f.get("acodec", "none") != "none"
]
has_content = bool(combined or video_only or audio_only)
if not has_content:
raise ValueError("No streamable formats found")
if video_only and audio_only:
video_fmt = self._pick_best_video(video_only)
audio_fmt = max(audio_only, key=lambda f: f.get("abr") or 0)
return video_fmt, audio_fmt
if combined and audio_only:
combined_sorted = sorted(combined, key=lambda f: f.get("height") or 9999)
return combined_sorted[0], audio_only[0]
if combined:
best_combined = self._pick_best_video(combined)
return best_combined, best_combined
if video_only:
raise ValueError("No streamable audio format found")
raise ValueError("No streamable video format found")
def _pick_best_video(self, candidates: list[dict]) -> dict:
def _sort_key(f: dict) -> tuple[int, int, int, int]:
height = f.get("height") or 9999
tbr = f.get("tbr") or 0
is_m3u8 = 0 if f.get("protocol") in ("m3u8_native", "m3u8") else 1
at_or_under_480 = 0 if height <= 480 else 1
if at_or_under_480 == 0:
return (0, is_m3u8, -height, -tbr)
return (1, is_m3u8, height, -tbr)
return sorted(candidates, key=_sort_key)[0]
def _build_proxy_url(self, upstream_url: str) -> str:
encoded = quote(upstream_url, safe="")
return f"/api/v1/youtube/proxy/manifest.m3u8?url={encoded}"

View File

@ -0,0 +1,177 @@
"""Phase 3.1 tests: Configuration and infrastructure setup for YouTube proxy.
Covers:
- Config fields: youtube_proxy_enabled, yt_dlp_timeout, yt_dlp_cache_ttl defaults and env loading
- Model schemas: YouTubeExtractRequest, YouTubeStreamResponse, StreamFormat
- Service stubs: YouTubeService, HLSProxyService instantiation
- Router registration: youtube.router mounted, endpoint responds 200 with mock
"""
from unittest.mock import MagicMock, patch
import pytest
from fastapi import FastAPI
from fastapi.testclient import TestClient
class TestYouTubeProxyConfig:
"""Config fields for YouTube proxy exist with correct defaults."""
@pytest.fixture(autouse=True)
def clear_cache(self):
from app.core.config import get_settings
get_settings.cache_clear()
yield
get_settings.cache_clear()
def test_defaults(self):
from app.core.config import get_settings
s = get_settings()
assert s.youtube_proxy_enabled is True
assert s.yt_dlp_timeout == 30
assert s.yt_dlp_cache_ttl == 300
def test_env_override(self, monkeypatch):
monkeypatch.setenv("YOUTUBE_PROXY_ENABLED", "false")
monkeypatch.setenv("YT_DLP_TIMEOUT", "60")
monkeypatch.setenv("YT_DLP_CACHE_TTL", "600")
from app.core.config import get_settings
s = get_settings()
assert s.youtube_proxy_enabled is False
assert s.yt_dlp_timeout == 60
assert s.yt_dlp_cache_ttl == 600
def test_bool_parsing(self, monkeypatch):
"""Bool fields accept 'true'/'false', '1'/'0' (pydantic-settings)."""
monkeypatch.setenv("YOUTUBE_PROXY_ENABLED", "0")
from app.core.config import get_settings
s = get_settings()
assert s.youtube_proxy_enabled is False
class TestYouTubeModels:
"""Pydantic models for YouTube stream extraction."""
def test_extract_request(self):
from app.models.youtube import YouTubeExtractRequest
req = YouTubeExtractRequest(url="https://www.youtube.com/watch?v=abc123")
assert req.url == "https://www.youtube.com/watch?v=abc123"
def test_stream_response_defaults(self):
from app.models.youtube import YouTubeStreamResponse
resp = YouTubeStreamResponse(video_id="abc123", title="Test Video")
assert resp.video_id == "abc123"
assert resp.title == "Test Video"
assert resp.is_live is False
assert resp.is_upcoming is False
assert resp.video_proxy_url is None
assert resp.audio_proxy_url is None
assert resp.formats == []
assert resp.error is None
def test_stream_format(self):
from app.models.youtube import StreamFormat
fmt = StreamFormat(
format_id="140",
url="https://example.com/audio.m3u8",
is_audio_only=True,
codec="mp4a.40.2",
)
assert fmt.format_id == "140"
assert fmt.is_audio_only is True
assert fmt.is_video_only is False
assert fmt.resolution is None
class TestYouTubeServices:
"""Service stubs can be imported and instantiated."""
def test_youtube_service_instantiate(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
assert svc.timeout == 30
assert svc.cache_ttl == 300
def test_youtube_service_extract_is_async(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
import inspect
assert inspect.iscoroutinefunction(svc.extract_streams)
def test_hls_proxy_instantiate(self):
from app.services.hls_proxy import HLSProxyService
svc = HLSProxyService()
assert svc is not None
class TestYouTubeRouter:
"""YouTube router is mounted and stub endpoint responds correctly."""
@pytest.fixture
def youtube_client(self):
from app.routers.youtube import router
from app.core.config import get_settings
get_settings.cache_clear()
app = FastAPI()
app.include_router(router, prefix="/api/v1")
return TestClient(app)
def test_extract_responds_with_mocked_ytdlp(self, youtube_client):
from app.routers.youtube import _get_youtube_service
_get_youtube_service.cache_clear()
vod_info = {
"id": "test123",
"title": "Test",
"thumbnail": "https://example.com/thumb.jpg",
"live_status": "not_live",
"formats": [
{
"format_id": "135", "height": 480,
"vcodec": "avc1", "acodec": "none",
"ext": "mp4", "protocol": "https",
"url": "https://example.com/video.mp4", "tbr": 1200,
},
{
"format_id": "140",
"vcodec": "none", "acodec": "mp4a",
"ext": "m4a", "protocol": "https",
"url": "https://example.com/audio.m4a", "abr": 128,
},
],
}
mock_ydl = MagicMock()
mock_instance = MagicMock()
mock_instance.extract_info.return_value = vod_info
mock_ydl.__enter__.return_value = mock_instance
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=test123"},
)
assert resp.status_code == 200
data = resp.json()
assert data["video_id"] == "test123"
assert data["video_proxy_url"] is not None
assert data["audio_proxy_url"] is not None
def test_router_tag(self):
from app.routers.youtube import router
assert any(tag == "youtube" for tag in router.tags)

View File

@ -0,0 +1,446 @@
"""Phase 3.2 tests: YouTube URL extraction via yt-dlp.
Covers:
- POST /api/v1/youtube/extract VOD, live, upcoming, invalid URL
- Format selection: video-only 480p, best audio, HLS preference
- URL caching: in-memory with TTL, expiry triggers re-extract
- Proxy URL construction: upstream URL encoded in query param
- Error handling: DownloadError 400, timeout 504, disabled 503
All yt-dlp external calls are mocked.
"""
import time
from unittest.mock import MagicMock, patch
import pytest
from fastapi import FastAPI
from fastapi.testclient import TestClient
# ---------------------------------------------------------------------------
# Helpers — fake yt-dlp format data
# ---------------------------------------------------------------------------
def _make_format(
format_id: str,
height: int | None = None,
vcodec: str = "none",
acodec: str = "none",
ext: str = "mp4",
protocol: str = "https",
url: str = "",
abr: float | None = None,
tbr: float | None = None,
resolution: str | None = None,
) -> dict:
return {
"format_id": format_id,
"height": height,
"width": height * 16 // 9 if height else None,
"vcodec": vcodec,
"acodec": acodec,
"ext": ext,
"protocol": protocol,
"url": url or f"https://example.com/{format_id}.{ext}",
"abr": abr,
"tbr": tbr,
"resolution": resolution or (f"{height * 16 // 9}x{height}" if height else None),
}
def _vod_info(video_id: str = "abc123") -> dict:
return {
"id": video_id,
"title": "Test VOD Video",
"thumbnail": "https://i.ytimg.com/vi/abc123/hqdefault.jpg",
"live_status": "not_live",
"duration": 300,
"formats": [
_make_format("137", height=1080, vcodec="avc1.640028", acodec="none", tbr=5000),
_make_format("136", height=720, vcodec="avc1.640028", acodec="none", tbr=2500),
_make_format("135", height=480, vcodec="avc1.640028", acodec="none", tbr=1200),
_make_format("134", height=360, vcodec="avc1.640028", acodec="none", tbr=600),
_make_format("133", height=240, vcodec="avc1.640028", acodec="none", tbr=300),
_make_format("140", acodec="mp4a.40.2", vcodec="none", abr=128),
_make_format("251", acodec="opus", vcodec="none", abr=160),
_make_format("18", height=360, vcodec="avc1.42001E", acodec="mp4a.40.2", tbr=500),
],
}
def _vod_info_hls(video_id: str = "abc123") -> dict:
return {
"id": video_id,
"title": "Test VOD with HLS",
"thumbnail": "https://i.ytimg.com/vi/abc123/hqdefault.jpg",
"live_status": "not_live",
"duration": 600,
"formats": [
_make_format("136", height=720, vcodec="avc1.640028", acodec="none", ext="m3u8", protocol="m3u8_native", tbr=2500),
_make_format("135", height=480, vcodec="avc1.640028", acodec="none", ext="m3u8", protocol="m3u8_native", tbr=1200),
_make_format("140", acodec="mp4a.40.2", vcodec="none", ext="m3u8", protocol="m3u8_native", abr=128),
],
}
def _live_info(video_id: str = "live999") -> dict:
return {
"id": video_id,
"title": "Live Stream Test",
"thumbnail": "https://i.ytimg.com/vi/live999/hqdefault_live.jpg",
"live_status": "is_live",
"duration": None,
"formats": [
_make_format("91", height=144, vcodec="avc1.42C00B", acodec="mp4a.40.5", ext="mp4", protocol="m3u8_native"),
_make_format("92", height=240, vcodec="avc1.4D4015", acodec="mp4a.40.5", ext="mp4", protocol="m3u8_native"),
_make_format("93", height=360, vcodec="avc1.4D401E", acodec="mp4a.40.2", ext="mp4", protocol="m3u8_native"),
_make_format("94", height=480, vcodec="avc1.4D401F", acodec="mp4a.40.2", ext="mp4", protocol="m3u8_native", tbr=1200),
_make_format("95", height=720, vcodec="avc1.4D401F", acodec="mp4a.40.2", ext="mp4", protocol="m3u8_native"),
],
}
def _upcoming_info(video_id: str = "up999") -> dict:
return {
"id": video_id,
"title": "Upcoming Stream",
"thumbnail": "https://i.ytimg.com/vi/up999/hqdefault.jpg",
"live_status": "is_upcoming",
"duration": None,
"formats": [],
}
def _private_info(video_id: str = "priv99") -> dict:
import yt_dlp
raise yt_dlp.utils.DownloadError("Private video. Sign in if you've been granted access to this video")
# ---------------------------------------------------------------------------
# Mock helpers
# ---------------------------------------------------------------------------
def _make_mock_ydl(return_value: dict | Exception) -> MagicMock:
"""Build a mock yt_dlp.YoutubeDL context manager with .extract_info."""
mock_instance = MagicMock()
if isinstance(return_value, Exception):
mock_instance.extract_info.side_effect = return_value
else:
mock_instance.extract_info.return_value = return_value
mock_ydl = MagicMock()
mock_ydl.__enter__.return_value = mock_instance
mock_ydl.__exit__.return_value = None
return mock_ydl
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture
def youtube_client(monkeypatch):
"""FastAPI TestClient with youtube router mounted, cached settings cleared."""
from app.routers.youtube import router
from app.core.config import get_settings
get_settings.cache_clear()
monkeypatch.setenv("YOUTUBE_PROXY_ENABLED", "true")
get_settings.cache_clear()
app = FastAPI()
app.include_router(router, prefix="/api/v1")
return TestClient(app)
# ---------------------------------------------------------------------------
# Unit: Format selection
# ---------------------------------------------------------------------------
class TestFormatSelection:
def test_selects_best_video_at_or_under_480p(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
formats = _vod_info()["formats"]
video, audio = svc._select_best_formats(formats)
assert video is not None
assert audio is not None
assert video["height"] == 480
assert video["vcodec"] != "none"
assert video["acodec"] == "none"
assert audio["acodec"] != "none"
assert audio["vcodec"] == "none"
def test_falls_back_to_lowest_video_if_no_480p(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
formats = [
_make_format("137", height=1080, vcodec="avc1", acodec="none", tbr=5000),
_make_format("136", height=720, vcodec="avc1", acodec="none", tbr=2500),
_make_format("140", acodec="mp4a", vcodec="none", abr=128),
]
video, audio = svc._select_best_formats(formats)
assert video is not None
assert video["height"] == 720 # Lowest available (no ≤480p exist)
def test_selects_highest_bitrate_audio(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
formats = [
_make_format("137", height=480, vcodec="avc1", acodec="none", tbr=1200),
_make_format("140", acodec="mp4a", vcodec="none", abr=128),
_make_format("251", acodec="opus", vcodec="none", abr=160),
_make_format("250", acodec="opus", vcodec="none", abr=64),
]
video, audio = svc._select_best_formats(formats)
assert audio is not None
assert audio["format_id"] == "251" # Highest abr
def test_no_formats_raises(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
with pytest.raises(ValueError, match="No streamable formats"):
svc._select_best_formats([])
def test_no_video_only_formats_falls_back_to_combined(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
formats = [
_make_format("18", height=360, vcodec="avc1", acodec="mp4a", tbr=500),
_make_format("140", acodec="mp4a", vcodec="none", abr=128),
]
video, audio = svc._select_best_formats(formats)
# Fallback: combined format as video
assert video is not None
assert video["format_id"] == "18"
assert audio is not None
def test_hls_preference_for_live(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
formats = [
_make_format("135", height=480, vcodec="avc1", acodec="none", ext="mp4", protocol="https", tbr=1200),
_make_format("301", height=480, vcodec="avc1", acodec="none", ext="m3u8", protocol="m3u8_native", tbr=1200),
_make_format("140", acodec="mp4a", vcodec="none", ext="m3u8", protocol="m3u8_native", abr=128),
]
video, audio = svc._select_best_formats(formats)
assert video["protocol"] == "m3u8_native"
assert audio["protocol"] == "m3u8_native"
def test_combined_only_all_combined_formats(self):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=300)
formats = [
_make_format("93", height=360, vcodec="avc1", acodec="mp4a", ext="mp4", protocol="m3u8_native"),
_make_format("94", height=480, vcodec="avc1", acodec="mp4a", ext="mp4", protocol="m3u8_native"),
_make_format("95", height=720, vcodec="avc1", acodec="mp4a", ext="mp4", protocol="m3u8_native"),
_make_format("96", height=1080, vcodec="avc1", acodec="mp4a", ext="mp4", protocol="m3u8_native"),
]
video, audio = svc._select_best_formats(formats)
assert video["height"] == 480
assert audio["height"] == 480
assert video["url"] == audio["url"]
# ---------------------------------------------------------------------------
# Integration: Route + mocked yt-dlp
# ---------------------------------------------------------------------------
class TestYouTubeExtractVOD:
def test_extract_vod_returns_proxy_urls(self, youtube_client):
mock_ydl = _make_mock_ydl(_vod_info("abc123"))
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=abc123"},
)
assert resp.status_code == 200
data = resp.json()
assert data["video_id"] == "abc123"
assert data["title"] == "Test VOD Video"
assert data["is_live"] is False
assert data["is_upcoming"] is False
assert data["video_proxy_url"] is not None
assert data["audio_proxy_url"] is not None
assert data["video_proxy_url"].startswith("/api/v1/youtube/proxy/")
assert data["thumbnail_url"] == "https://i.ytimg.com/vi/abc123/hqdefault.jpg"
assert len(data["formats"]) > 0
def test_extract_vod_hls_returns_manifest_proxy_urls(self, youtube_client):
mock_ydl = _make_mock_ydl(_vod_info_hls("abc123"))
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=abc123"},
)
assert resp.status_code == 200
data = resp.json()
assert "manifest.m3u8?url=" in data["video_proxy_url"]
assert "manifest.m3u8?url=" in data["audio_proxy_url"]
def test_error_field_is_none_on_success(self, youtube_client):
mock_ydl = _make_mock_ydl(_vod_info())
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=abc123"},
)
assert resp.status_code == 200
assert resp.json()["error"] is None
class TestYouTubeExtractLive:
def test_extract_live_returns_is_live_true(self, youtube_client):
mock_ydl = _make_mock_ydl(_live_info())
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=live999"},
)
assert resp.status_code == 200
data = resp.json()
assert data["video_id"] == "live999"
assert data["is_live"] is True
assert data["is_upcoming"] is False
assert data["video_proxy_url"] is not None
assert data["audio_proxy_url"] is not None
def test_live_combined_format_same_url_for_both(self, youtube_client):
mock_ydl = _make_mock_ydl(_live_info("combined_test"))
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=combined_test"},
)
assert resp.status_code == 200
data = resp.json()
assert data["is_live"] is True
assert data["video_proxy_url"] == data["audio_proxy_url"]
class TestYouTubeExtractUpcoming:
def test_extract_upcoming_returns_is_upcoming_true(self, youtube_client):
mock_ydl = _make_mock_ydl(_upcoming_info())
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=up999"},
)
assert resp.status_code == 200
data = resp.json()
assert data["video_id"] == "up999"
assert data["is_upcoming"] is True
assert data["is_live"] is False
assert data["video_proxy_url"] is None
assert data["audio_proxy_url"] is None
class TestYouTubeExtractErrors:
def test_private_video_returns_error_field(self, youtube_client):
import yt_dlp
exc = yt_dlp.utils.DownloadError("Private video")
mock_ydl = _make_mock_ydl(exc)
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=priv99"},
)
assert resp.status_code == 200
data = resp.json()
assert data["error"] is not None
assert "Private video" in data["error"]
def test_disabled_proxy_returns_503(self, monkeypatch, youtube_client):
monkeypatch.setenv("YOUTUBE_PROXY_ENABLED", "false")
from app.core.config import get_settings
get_settings.cache_clear()
resp = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=abc123"},
)
assert resp.status_code == 503
class TestURLCaching:
def test_cached_result_not_re_extracted(self, youtube_client):
mock_ydl = _make_mock_ydl(_vod_info("cached1"))
instance = mock_ydl.__enter__.return_value
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
r1 = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=cached1"},
)
r2 = youtube_client.post(
"/api/v1/youtube/extract",
json={"url": "https://www.youtube.com/watch?v=cached1"},
)
assert r1.status_code == 200
assert r2.status_code == 200
assert r1.json()["video_id"] == r2.json()["video_id"]
assert instance.extract_info.call_count == 1 # Cached, not called twice
def test_cache_expiry_triggers_re_extract(self, monkeypatch):
from app.services.youtube_service import YouTubeService
svc = YouTubeService(timeout=30, cache_ttl=0) # 0 TTL = immediate expiry
mock_ydl = _make_mock_ydl(_vod_info("exp1"))
instance = mock_ydl.__enter__.return_value
with patch("app.services.youtube_service.yt_dlp.YoutubeDL", return_value=mock_ydl):
import asyncio
asyncio.run(svc.extract_streams("https://www.youtube.com/watch?v=exp1"))
# Cache should be set but TTL=0 means expired
asyncio.run(svc.extract_streams("https://www.youtube.com/watch?v=exp1"))
assert instance.extract_info.call_count == 2
class TestProxyURLConstruction:
def test_proxy_url_encodes_upstream_url(self):
from app.services.youtube_service import YouTubeService
from urllib.parse import quote, unquote
svc = YouTubeService(timeout=30, cache_ttl=300)
upstream = "https://manifest.googlevideo.com/123/hls_playlist.m3u8?id=abc&key=def"
proxy = svc._build_proxy_url(upstream)
assert proxy.startswith("/api/v1/youtube/proxy/manifest.m3u8?url=")
# Extract and decode the URL parameter
encoded = proxy.split("url=", 1)[1]
decoded = unquote(encoded)
assert decoded == upstream

View File

@ -19,3 +19,4 @@ langchain-openai>=1.1.11,<1.2.0
dashscope>=0.4.0
aiofiles>=24.0.0
zhconv>=1.4.0
yt-dlp>=2024.0.0

View File

@ -13,6 +13,7 @@
"@tanstack/react-query": "^5.0.0",
"autoprefixer": "^10.5.0",
"axios": "^1.6.0",
"hls.js": "^1.5.0",
"lucide-react": "^0.190.0",
"pdfjs-dist": "^5.6.205",
"react": "^18.2.0",