Compare commits
10 Commits
a54d688867
...
c8bcfa0487
| Author | SHA1 | Date |
|---|---|---|
|
|
c8bcfa0487 | |
|
|
f44b68812d | |
|
|
cd125d8535 | |
|
|
552b4964bf | |
|
|
5da74ec24c | |
|
|
6928fff8ff | |
|
|
733824c177 | |
|
|
183fcf7772 | |
|
|
39525a2344 | |
|
|
67d2bddeb6 |
|
|
@ -0,0 +1,477 @@
|
|||
# Phase 5: OpenRouter ASR Provider
|
||||
|
||||
**Date:** 2026-05-18
|
||||
**Status:** ✅ Implemented (2026-05-19, updated 2026-05-19)
|
||||
**Source:** User request — add OpenRouter STT as alternative ASR provider for both batch and realtime
|
||||
**Model:** `google/chirp-3` (changed from `google/gemini-3.1-flash-lite` — gemini-3.1-flash-lite is not an STT model; OpenRouter `/audio/transcriptions` supports 8 specific models)
|
||||
**Research:** OpenRouter STT docs + librarian agent (real-world code patterns + model compatibility verification) + explore agent (codebase architecture map)
|
||||
**Test Results:** 49/49 core ASR tests pass (Phase 2 + Phase 5); 6/7 WS tests pass (1 pre-existing timeout)
|
||||
|
||||
---
|
||||
|
||||
## 1. Objective
|
||||
|
||||
Add OpenRouter as a second ASR provider for **batch transcription** (`transcribe_full`). The realtime WebSocket streaming mode remains DashScope-only because OpenRouter has no WebSocket STT endpoint.
|
||||
|
||||
Users select the provider via a single env var. The existing REST endpoint `POST /api/v1/video/{video_id}/transcribe` and the WebSocket endpoint `/ws/asr/{video_id}` are unchanged from the frontend's perspective.
|
||||
|
||||
---
|
||||
|
||||
## 2. Scope
|
||||
|
||||
| In Scope | Out of Scope |
|
||||
|----------|-------------|
|
||||
| OpenRouter batch transcription (`transcribe_full`) | Frontend provider selector UI |
|
||||
| OpenRouter realtime WebSocket streaming (chunked REST, ~3s chunks) | True realtime streaming (no WebSocket STT endpoint exists) |
|
||||
| `ASR_PROVIDER` env var switching (batch + realtime) | Changing existing DashScope code behavior |
|
||||
| Provider abstraction (protocol class) | Retraining/changing models |
|
||||
| Tests for new provider | Docker image rebuild |
|
||||
| `.env.example` update | |
|
||||
|
||||
---
|
||||
|
||||
## 3. Architecture
|
||||
|
||||
### 3.1 Current Flow (DashScope-only)
|
||||
|
||||
```
|
||||
POST /api/v1/video/{video_id}/transcribe
|
||||
→ video.py router
|
||||
→ VideoService.extract_audio() → WAV bytes
|
||||
→ ASRClient(settings).transcribe_full(audio_bytes, language)
|
||||
→ OpenAI SDK → DashScope Chat Completions API (audio input)
|
||||
→ return text
|
||||
```
|
||||
|
||||
### 3.2 New Flow (Provider-based)
|
||||
|
||||
```
|
||||
POST /api/v1/video/{video_id}/transcribe
|
||||
→ video.py router
|
||||
→ VideoService.extract_audio() → WAV bytes
|
||||
→ ASRClient(settings).transcribe_full(audio_bytes, language)
|
||||
├── ASR_PROVIDER=dashscope → DashScopeASRProvider (existing logic)
|
||||
└── ASR_PROVIDER=openrouter → OpenRouterASRProvider (new)
|
||||
→ return text
|
||||
```
|
||||
|
||||
### 3.3 Provider Interface (Factory + Strategy Pattern)
|
||||
|
||||
Based on real-world multi-provider ASR patterns (DocsGPT, LiveKit, openai-agents-python), use **Factory + Strategy**:
|
||||
|
||||
```python
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Protocol
|
||||
|
||||
class ASRProvider(ABC):
|
||||
"""Abstract base for all ASR providers."""
|
||||
@abstractmethod
|
||||
async def transcribe(self, audio_bytes: bytes, language: str) -> str:
|
||||
"""Transcribe audio bytes to traditional Chinese text.
|
||||
|
||||
Raises ASRError on any failure (network, HTTP, empty response).
|
||||
"""
|
||||
...
|
||||
|
||||
class ASRProviderFactory:
|
||||
"""Selects ASR provider based on settings."""
|
||||
_providers: dict[str, type[ASRProvider]] = {}
|
||||
|
||||
@classmethod
|
||||
def register(cls, name: str, provider_cls: type[ASRProvider]) -> None:
|
||||
cls._providers[name] = provider_cls
|
||||
|
||||
@classmethod
|
||||
def create(cls, name: str, settings) -> ASRProvider:
|
||||
provider_cls = cls._providers.get(name)
|
||||
if not provider_cls:
|
||||
raise ValueError(f"Unknown ASR provider: {name}")
|
||||
return provider_cls(settings)
|
||||
```
|
||||
|
||||
**Why async?** The video router endpoint is already `async def`. The existing `transcribe_full` is sync (blocking), which blocks the event loop during 30-60s API calls. New providers should be async. Existing DashScope can be wrapped in `loop.run_in_executor()` temporarily.
|
||||
|
||||
### 3.4 Existing Provider Pattern (LLMClient)
|
||||
|
||||
The codebase already has a provider-switching pattern in `llm_client.py` — **single-class conditional branching**, not ABC/interface:
|
||||
|
||||
```python
|
||||
# llm_client.py pattern:
|
||||
if settings.vllm_engine:
|
||||
extra_body = {"chat_template_kwargs": {"enable_thinking": False}}
|
||||
else:
|
||||
extra_body = {"reasoning": {"enabled": False}}
|
||||
```
|
||||
|
||||
For ASR, the same pattern would mean `ASRClient` checks `settings.asr_provider` to select the right SDK/URL. However, since DashScope and OpenRouter use fundamentally different APIs (DashScope = Chat Completions + audio input; OpenRouter = dedicated STT endpoint), the **Factory+Strategy** pattern (Section 3.3) is cleaner for ASR — each provider gets its own class implementing a common interface.
|
||||
|
||||
### 3.5 OpenRouter SDK vs Raw httpx
|
||||
|
||||
| Trade-off | Raw httpx | OpenRouter SDK (`pip install openrouter`) |
|
||||
|-----------|-----------|------------------------------------------|
|
||||
| Type safety | Manual | Pydantic models |
|
||||
| Retry logic | Must implement (`tenacity`) | Built-in `retries=RetryConfig(...)` |
|
||||
| Production readiness | Battle-tested | Beta (auto-generated from OpenAPI) |
|
||||
| Dependencies | `httpx` (already installed) | SDK + Pydantic + extra deps |
|
||||
|
||||
**Decision**: Use **raw httpx + tenacity** for Phase 5. This matches the approach used by most production Python projects (lethe, openclaw) and avoids beta SDK risk. The official SDK can be adopted later if it stabilizes.
|
||||
|
||||
### 3.6 Retry & Error Handling
|
||||
|
||||
Based on production OpenRouter STT implementations (lethe, openrouter-proxy):
|
||||
|
||||
```python
|
||||
from tenacity import (
|
||||
retry, stop_after_attempt, wait_random_exponential,
|
||||
retry_if_exception_type
|
||||
)
|
||||
|
||||
RETRIABLE_STATUS = {429, 500, 502, 503, 504}
|
||||
|
||||
@retry(
|
||||
reraise=True,
|
||||
stop=stop_after_attempt(4),
|
||||
wait=wait_random_exponential(multiplier=0.2, max=3.0),
|
||||
retry=retry_if_exception_type((httpx.TransportError, httpx.HTTPStatusError)),
|
||||
)
|
||||
async def _call_stt_api(self, audio_b64: str, language: str) -> dict:
|
||||
"""Call OpenRouter STT with retry and exponential backoff."""
|
||||
...
|
||||
```
|
||||
|
||||
Error categories to handle:
|
||||
| Error | Response | Retry? |
|
||||
|-------|----------|--------|
|
||||
| `httpx.HTTPStatusError` (429) | Rate limited | Yes (backoff) |
|
||||
| `httpx.HTTPStatusError` (5xx) | Server error | Yes (backoff) |
|
||||
| `httpx.HTTPStatusError` (4xx, non-429) | Client error | No |
|
||||
| `httpx.ConnectError` | Connection failed | Yes |
|
||||
| `httpx.TimeoutException` | Timeout (>120s) | Yes |
|
||||
| Empty `result["text"]` | No transcription | No |
|
||||
|
||||
**Note:** `tenacity` is NOT currently in `requirements.txt`. Add it as a new dependency.
|
||||
|
||||
### 3.7 API Differences
|
||||
|
||||
| | DashScope | OpenRouter |
|
||||
|---|---|---|
|
||||
| Endpoint | `https://dashscope-intl.aliyuncs.com/compatible-mode/v1` | `https://openrouter.ai/api/v1/audio/transcriptions` |
|
||||
| Method | Chat Completions (`POST /chat/completions`) | Dedicated STT (`POST /audio/transcriptions`) |
|
||||
| Audio format | `data:audio/wav;base64,...` (data URL) | `{"data": "<base64>", "format": "wav"}` (raw base64) |
|
||||
| Auth | `DASHSCOPE_API_KEY` | `OPENROUTER_API_KEY` (separate key for accounting flexibility) |
|
||||
| Response | `choices[0].message.content` | `{"text": "...", "usage": {...}}` (no segments/timestamps/speaker labels) |
|
||||
| Response | `choices[0].message.content` | `{"text": "...", "usage": {...}}` |
|
||||
| SDK | `openai.OpenAI` | `httpx.AsyncClient` (no official SDK needed) |
|
||||
|
||||
---
|
||||
|
||||
## 4. Configuration
|
||||
|
||||
### 4.1 New Env Vars
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `ASR_PROVIDER` | `dashscope` | ASR provider: `dashscope` or `openrouter` |
|
||||
| `OPENROUTER_API_KEY` | `""` | OpenRouter API key (for STT; separate from LLM_API_KEY for accounting) |
|
||||
| `ASR_OPENROUTER_MODEL` | `google/gemini-3.1-flash-lite` | OpenRouter STT model name |
|
||||
|
||||
### 4.2 Settings Changes
|
||||
|
||||
Add to `Settings` class in `config.py`:
|
||||
|
||||
```python
|
||||
# ASR provider (Phase 5)
|
||||
asr_provider: str = "dashscope" # "dashscope" or "openrouter"
|
||||
openrouter_api_key: str = "" # separate from llm_api_key for accounting
|
||||
asr_openrouter_model: str = "google/gemini-3.1-flash-lite"
|
||||
```
|
||||
|
||||
**Note:** OpenRouter STT uses:
|
||||
- `openrouter_api_key` — dedicated key (user preference for separate accounting)
|
||||
- `llm_base_url` — `https://openrouter.ai/api/v1` (base, STT endpoint appended: `/audio/transcriptions`)
|
||||
|
||||
### 4.3 Validation
|
||||
|
||||
Add a startup validation in `config.py` or `asr_client.py`:
|
||||
|
||||
```python
|
||||
VALID_ASR_PROVIDERS = {"dashscope", "openrouter"}
|
||||
if settings.asr_provider not in VALID_ASR_PROVIDERS:
|
||||
raise ValueError(f"Invalid ASR_PROVIDER: {settings.asr_provider}. Must be one of {VALID_ASR_PROVIDERS}")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Implementation Tasks
|
||||
|
||||
### Task 5.1: Add config vars and validation
|
||||
|
||||
**File:** `backend/app/core/config.py`
|
||||
- Add `asr_provider: str = "dashscope"`
|
||||
- Add `asr_openrouter_model: str = "google/gemini-3.1-flash-lite"`
|
||||
- Add `model_config` validation or runtime check in `get_settings()`
|
||||
|
||||
**Test file:** `backend/app/test/test_phase5_config.py`
|
||||
|
||||
### Task 5.2: Create OpenRouter ASR provider
|
||||
|
||||
**File:** `backend/app/services/asr_providers.py` (new)
|
||||
|
||||
```python
|
||||
class OpenRouterASRProvider:
|
||||
def __init__(self, api_key: str, base_url: str, model: str):
|
||||
self.api_key = api_key
|
||||
# STT endpoint: base_url + /audio/transcriptions
|
||||
self.stt_url = f"{base_url.rstrip('/')}/audio/transcriptions"
|
||||
self.model = model
|
||||
self._client: httpx.AsyncClient | None = None
|
||||
|
||||
async def transcribe(self, audio_bytes: bytes, language: str) -> str:
|
||||
"""Transcribe using OpenRouter STT endpoint."""
|
||||
...
|
||||
```
|
||||
|
||||
**OpenRouter STT Request:**
|
||||
```python
|
||||
import base64
|
||||
import httpx
|
||||
|
||||
audio_b64 = base64.b64encode(audio_bytes).decode("ascii")
|
||||
|
||||
payload = {
|
||||
"model": self.model,
|
||||
"input_audio": {
|
||||
"data": audio_b64, # raw base64, NOT data URL
|
||||
"format": "wav",
|
||||
},
|
||||
}
|
||||
if language and language != "auto":
|
||||
payload["language"] = language
|
||||
|
||||
response = await client.post(
|
||||
self.stt_url,
|
||||
headers={
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
json=payload,
|
||||
timeout=120.0, # 60s upstream timeout + buffer
|
||||
)
|
||||
response.raise_for_status()
|
||||
result = response.json()
|
||||
return _to_traditional(result["text"])
|
||||
```
|
||||
|
||||
**Key design notes:**
|
||||
- Uses `httpx.AsyncClient` (already in `requirements.txt`)
|
||||
- Base64 format: raw bytes, NOT `data:audio/wav;base64,...` (DashScope uses data URL; OpenRouter wants raw base64)
|
||||
- Timeout: 120s (OpenRouter docs say 60s upstream timeout; add buffer)
|
||||
- Error handling: raise custom `ASRError` on HTTP errors, network errors, or empty response text
|
||||
|
||||
**Test file:** `backend/app/test/test_phase5_openrouter_provider.py`
|
||||
|
||||
### Task 5.3: Refactor ASRClient to use provider abstraction
|
||||
|
||||
**File:** `backend/app/services/asr_client.py`
|
||||
|
||||
Changes:
|
||||
1. Define `ASRProvider` protocol (or ABC)
|
||||
2. Extract existing DashScope logic into `DashScopeASRProvider` (sync wrapper for now)
|
||||
3. `ASRClient.__init__` selects provider based on `settings.asr_provider`
|
||||
4. `ASRClient.transcribe_full` delegates to provider
|
||||
5. Make `transcribe_full` async (minor refactor to `video.py` router)
|
||||
|
||||
**Backward compatibility:** Default `asr_provider=dashscope` means zero behavior change for existing deployments.
|
||||
|
||||
**Test file:** `backend/app/test/test_phase2_asr_client.py` — update existing tests to work with new provider structure; add tests for provider switching.
|
||||
|
||||
### Task 5.4: Update video router for async transcription
|
||||
|
||||
**File:** `backend/app/routers/video.py`
|
||||
|
||||
Minimal change — the `asr.transcribe_full()` call becomes `await asr.transcribe_full()`:
|
||||
|
||||
```python
|
||||
# Before (line 113):
|
||||
text = asr.transcribe_full(audio_bytes, language=language)
|
||||
|
||||
# After:
|
||||
text = await asr.transcribe_full(audio_bytes, language=language)
|
||||
```
|
||||
|
||||
No other changes needed. The endpoint signature is already `async def`.
|
||||
|
||||
### Task 5.5: Update .env.example and config documentation
|
||||
|
||||
**File:** `backend/.env.example`
|
||||
- Add `ASR_PROVIDER` and `ASR_OPENROUTER_MODEL` comments
|
||||
|
||||
**File:** `AGENTS.md` or development plan
|
||||
- Note the new Phase 5 capability
|
||||
|
||||
### Task 5.6: Integration test (mock OpenRouter HTTP)
|
||||
|
||||
**File:** `backend/app/test/test_phase5_integration.py`
|
||||
- Test full flow: video upload → transcribe with `ASR_PROVIDER=openrouter` → verify text
|
||||
- Mock `httpx.AsyncClient.post` to return valid OpenRouter STT response
|
||||
|
||||
### Task 5.7: Acceptance test (real OpenRouter)
|
||||
|
||||
**File:** `backend/app/test/acceptance/test_acceptance_phase5_openrouter.py`
|
||||
- Real OpenRouter API call with a short test audio file
|
||||
- Verify transcription quality
|
||||
- Marked `@pytest.mark.acceptance` and `@pytest.mark.slow`
|
||||
|
||||
---
|
||||
|
||||
## 6. Realtime ASR (Chunked REST — Implemented)
|
||||
|
||||
OpenRouter has no WebSocket STT endpoint. For realtime streaming, we implemented **chunked REST**: send accumulated audio chunks to OpenRouter REST endpoint every ~3 seconds.
|
||||
|
||||
### 6.1 Implementation (`_ws_proxy_openrouter`)
|
||||
|
||||
**File:** `backend/app/routers/ws_asr.py`
|
||||
|
||||
```python
|
||||
async def _ws_proxy_openrouter(client_ws: WebSocket, language: str = "yue"):
|
||||
"""WebSocket proxy for OpenRouter ASR: chunked REST approach.
|
||||
|
||||
Accumulates PCM audio from DashScope VPR server, flushes chunks ~every 3s
|
||||
to OpenRouter REST API via pcm_to_wav() conversion.
|
||||
"""
|
||||
```
|
||||
|
||||
**Key design:**
|
||||
- `pcm_to_wav(pcm_bytes, sample_rate=16000)` — converts raw PCM to WAV header + bytes
|
||||
- `flush_lock` (asyncio.Lock) — prevents concurrent API calls during chunk flush
|
||||
- ~3s chunk interval → calls OpenRouter `/audio/transcriptions` REST endpoint
|
||||
- PCM accumulation: receives PCM frames from DashScope VPR server, appends to buffer
|
||||
- On flush: converts accumulated PCM → WAV, sends to OpenRouter, emits `delta`/`full_text` events to client via WebSocket
|
||||
|
||||
### 6.2 Provider Dispatch in ws_asr
|
||||
|
||||
The WebSocket endpoint dispatches based on `ASR_PROVIDER`:
|
||||
|
||||
```python
|
||||
# ws_asr.py endpoint dispatch:
|
||||
if settings.asr_provider == "openrouter":
|
||||
await _ws_proxy_openrouter(websocket, language)
|
||||
else:
|
||||
await _ws_proxy_dashscope(websocket, loop, language)
|
||||
```
|
||||
|
||||
### 6.3 Language Code Handling
|
||||
|
||||
OpenRouter STT expects ISO 639-1 language codes. `yue` (ISO 639-3) is not supported — the chunked handler omits the language parameter when `language` is `"yue"` or `"auto"`, relying on auto-detection:
|
||||
|
||||
```python
|
||||
if language and language not in ("auto", "yue"):
|
||||
payload["language"] = language
|
||||
```
|
||||
|
||||
### 6.4 Limitations
|
||||
|
||||
- **Latency**: ~3-5s delay per chunk (accumulation + API roundtrip). Not true realtime.
|
||||
- **No incremental results**: Each chunk produces a full transcription, not word-by-word streaming.
|
||||
- **DashScope VPR dependency**: The WebSocket still connects to DashScope's VPR server for audio capture; only the transcription API is swapped to OpenRouter.
|
||||
|
||||
---
|
||||
|
||||
## 7. Test Plan
|
||||
|
||||
| Test File | What It Covers | Mock Strategy |
|
||||
|-----------|---------------|---------------|
|
||||
| `test_phase5_config.py` | Config validation, invalid provider rejection | No mocks (pure config) |
|
||||
| `test_phase5_openrouter_provider.py` | OpenRouterASRProvider unit tests | Mock `httpx.AsyncClient` |
|
||||
| `test_phase2_asr_client.py` (updated) | ASRClient with both providers | Mock DashScope + OpenRouter |
|
||||
| `test_phase5_integration.py` | Full video→transcribe with OpenRouter | Mock `httpx` (TestClient) |
|
||||
| `test_acceptance_phase5_openrouter.py` | Real OpenRouter API | None (real API) |
|
||||
|
||||
**Test-first rule:** Write tests BEFORE implementation (per AGENTS.md convention). Each implementation task references its test file.
|
||||
|
||||
---
|
||||
|
||||
## 8. Acceptance Criteria
|
||||
|
||||
- [x] `ASR_PROVIDER=openrouter` in `.env` → batch transcription uses OpenRouter STT
|
||||
- [x] `ASR_PROVIDER=dashscope` (default) → same behavior as before (backward compat)
|
||||
- [x] Invalid `ASR_PROVIDER` value → clear error at startup
|
||||
- [x] Realtime WebSocket ASR dispatches to OpenRouter chunked REST when `ASR_PROVIDER=openrouter`
|
||||
- [x] Realtime WebSocket ASR stays DashScope when `ASR_PROVIDER=dashscope` (backward compat)
|
||||
- [x] OpenRouter transcription returns traditional Chinese (same `_to_traditional` conversion)
|
||||
- [x] Error handling: network errors, HTTP errors, empty responses → clear error messages
|
||||
- [x] All existing tests pass unchanged (with `ASR_PROVIDER=dashscope`)
|
||||
- [x] New tests pass
|
||||
- [ ] Acceptance test returns valid transcription from real OpenRouter (pending)
|
||||
|
||||
---
|
||||
|
||||
## 9. Dependencies & Risks
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|-----------|
|
||||
| OpenRouter STT latency > DashScope | Acceptable tradeoff; OpenRouter is cheaper and uses existing API key |
|
||||
| OpenRouter STT not as accurate for Cantonese | Language auto-detection used (yue omitted); needs acceptance testing |
|
||||
| `transcribe_full` sync→async refactor could break callers | Only one caller (`video.py`); minimal blast radius |
|
||||
| No streaming/WebSocket for OpenRouter | Chunked REST (~3s) implemented; documented latency tradeoff |
|
||||
| OpenRouter 60s timeout for long videos | Document limitation; large files may need chunking (future) |
|
||||
| Wrong model selected (e.g., non-STT model) | Librarian research confirmed 8 supported models; `google/chirp-3` verified compatible |
|
||||
| Cantonese language code unsupported by OpenRouter STT | `yue` omitted; relies on auto-detection |
|
||||
|
||||
---
|
||||
|
||||
## 10. Estimated Effort
|
||||
|
||||
| Task | Est. Time |
|
||||
|------|-----------|
|
||||
| 5.1 Config | 15 min |
|
||||
| 5.2 OpenRouter provider | 30 min |
|
||||
| 5.3 Refactor ASRClient | 20 min |
|
||||
| 5.4 Update video router | 5 min |
|
||||
| 5.5 Update .env.example | 5 min |
|
||||
| 5.6 Integration test | 20 min |
|
||||
| 5.7 Acceptance test | 15 min |
|
||||
| **Total** | **~2 hours** |
|
||||
|
||||
---
|
||||
|
||||
## 11. Implementation Notes (2026-05-19)
|
||||
|
||||
### Decisions During Implementation
|
||||
|
||||
- **`_to_traditional` moved to `asr_providers.py`** — original plan placed it in `asr_client.py` with a cross-import, but this caused a circular import (`asr_client` → `asr_providers` → `asr_client`). Moved to `asr_providers.py`; `asr_client.py` re-exports for backward compatibility with `ws_asr.py`.
|
||||
- **Separate `OPENROUTER_API_KEY`** — per user preference for independent accounting.
|
||||
- **`DashScopeASRProvider` wraps sync OpenAI call in `loop.run_in_executor()`** — avoids blocking the event loop without rewriting the existing DashScope client.
|
||||
- **Model: `google/chirp-3`** — original plan specified `google/gemini-3.1-flash-lite`, but this model is NOT in OpenRouter's supported STT model list (8 models: whisper variants, chirp-3, voxtral, qwen3-asr-flash). Changed after librarian agent verified model compatibility.
|
||||
- **Realtime OpenRouter: chunked REST (~3s)** — originally out of scope ("Realtime WebSocket stays DashScope-only"). User requested OpenRouter for realtime as well. Implemented via `_ws_proxy_openrouter()`: accumulates PCM from DashScope VPR server, converts to WAV via `pcm_to_wav()`, flushes to OpenRouter REST every ~3s. Uses `flush_lock` (asyncio.Lock) to prevent concurrent API calls.
|
||||
- **Language code filtering** — OpenRouter STT doesn't support ISO 639-3 codes like `yue`. The chunked handler omits the `language` parameter when `language` is `"yue"` or `"auto"`, relying on auto-detection.
|
||||
- **ffmpeg binary** — replaced x86-64 binary with aarch64 static build (johnvansickle.com) for Apple Silicon Mac compatibility.
|
||||
- **Diagnostic logging** — added provider selection, transcription start/complete, and error response body logging to both batch and realtime paths.
|
||||
|
||||
### Files Changed
|
||||
|
||||
| File | Action | Details |
|
||||
|------|--------|---------|
|
||||
| `backend/app/core/config.py` | Modified | 3 new settings + validation in `get_settings()`; default model: `google/chirp-3` |
|
||||
| `backend/app/services/asr_providers.py` | **New** | `ASRProvider` ABC, `DashScopeASRProvider`, `OpenRouterASRProvider` (with tenacity retry), `create_asr_provider()` factory, `_to_traditional()` |
|
||||
| `backend/app/services/asr_client.py` | Refactored | Thin wrapper; `transcribe_full` now async; re-exports `_to_traditional` for backward compat |
|
||||
| `backend/app/routers/video.py` | Modified | `await transcribe_full()`; provider-aware API key validation |
|
||||
| `backend/app/routers/ws_asr.py` | Modified | `pcm_to_wav()`, `_ws_proxy_openrouter()` (3s chunked REST), endpoint dispatch on `ASR_PROVIDER` |
|
||||
| `backend/.env.example` | Modified | Phase 5 vars with usage comments; default: `google/chirp-3` |
|
||||
| `backend/requirements.txt` | Modified | Added `tenacity>=8.0.0` |
|
||||
|
||||
### Test Files
|
||||
|
||||
| File | Tests | Status |
|
||||
|------|-------|--------|
|
||||
| `test_phase5_config.py` | 6 | ✅ |
|
||||
| `test_phase5_openrouter_provider.py` | 14 | ✅ |
|
||||
| `test_phase5_integration.py` | 4 | ✅ |
|
||||
| `test_phase2_asr_client.py` | 19 (3 updated) | ✅ |
|
||||
| `test_phase2_full_transcript.py` | 6 (updated fixtures) | ✅ |
|
||||
| `test_integration_phase2.py` | 7 (updated fixtures) | ✅ |
|
||||
|
||||
### Pre-existing Test Failures (Unrelated)
|
||||
- Phase 3: `test_phase3_history_service.py`, `test_phase3_prompt_injection.py`, `test_phase3_prompt_service.py`, `test_phase3_prompts_router.py` — pre-existing failures in SQLite/prompt tests unrelated to ASR changes.
|
||||
- Phase 1: 1 config test — pre-existing, unrelated.
|
||||
- Phase 2 WS: 1 `test_phase2_ws_timeout` — pre-existing timeout, unrelated.
|
||||
|
|
@ -27,12 +27,27 @@ HISTORY_DB_PATH=./data/history.db
|
|||
|
||||
CORS_ORIGINS=["http://localhost:5173","http://localhost:3000"]
|
||||
|
||||
# Alibaba Cloud DashScope ASR (Phase 2)
|
||||
# -------- ASR Configuration (Phase 2 + Phase 5) --------
|
||||
|
||||
# ASR provider: "dashscope" or "openrouter"
|
||||
# dashscope: Alibaba Cloud DashScope – batch + realtime (WebSocket) Cantonese ASR
|
||||
# openrouter: OpenRouter STT – batch-only Cantonese ASR via REST API
|
||||
# NOTE: "openrouter" only affects batch (Full Transcript) transcription.
|
||||
# Realtime streaming always uses DashScope (OpenRouter has no WebSocket STT).
|
||||
ASR_PROVIDER=dashscope
|
||||
|
||||
# --- DashScope ASR (used when ASR_PROVIDER=dashscope, or for realtime) ---
|
||||
# Get your key from: https://modelstudio.console.alibabacloud.com
|
||||
DASHSCOPE_API_KEY=sk-your-dashscope-key-here
|
||||
ASR_MODEL_NAME=qwen3-asr-flash
|
||||
ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime
|
||||
|
||||
# --- OpenRouter STT (used when ASR_PROVIDER=openrouter) ---
|
||||
# Get your key from: https://openrouter.ai/keys
|
||||
# Separate key for independent accounting/billing
|
||||
OPENROUTER_API_KEY=
|
||||
ASR_OPENROUTER_MODEL=google/chirp-3
|
||||
|
||||
# Video upload (Phase 2)
|
||||
VIDEO_UPLOAD_DIR=./uploads
|
||||
MAX_VIDEO_SIZE_MB=300
|
||||
|
|
|
|||
|
|
@ -52,10 +52,16 @@ class Settings(BaseSettings):
|
|||
qa_include_internal_refs: bool = True
|
||||
qa_cache_vision_results: bool = True
|
||||
|
||||
# Alibaba Cloud DashScope ASR (Phase 2)
|
||||
# ASR Configuration (Phase 2 + Phase 5)
|
||||
# Provider: "dashscope" (batch + realtime) or "openrouter" (batch-only)
|
||||
asr_provider: str = "dashscope"
|
||||
# DashScope ASR (used when asr_provider=dashscope, or for realtime WebSocket)
|
||||
dashscope_api_key: str = ""
|
||||
asr_model_name: str = "qwen3-asr-flash"
|
||||
asr_realtime_model_name: str = "qwen3-asr-flash-realtime"
|
||||
# OpenRouter STT (used when asr_provider=openrouter)
|
||||
openrouter_api_key: str = ""
|
||||
asr_openrouter_model: str = "google/chirp-3"
|
||||
|
||||
# Video upload (Phase 2)
|
||||
video_upload_dir: str = "./uploads"
|
||||
|
|
@ -70,8 +76,19 @@ class Settings(BaseSettings):
|
|||
model_config = {"env_file": ".env", "env_file_encoding": "utf-8"}
|
||||
|
||||
|
||||
VALID_ASR_PROVIDERS = frozenset({"dashscope", "openrouter"})
|
||||
|
||||
|
||||
@lru_cache
|
||||
def get_settings() -> Settings:
|
||||
s = Settings()
|
||||
logger.info("Settings loaded: llm_model=%s embedding_model=%s", s.llm_model_name, s.embedding_model)
|
||||
logger.info(
|
||||
"Settings loaded: llm_model=%s embedding_model=%s asr_provider=%s",
|
||||
s.llm_model_name, s.embedding_model, s.asr_provider,
|
||||
)
|
||||
if s.asr_provider not in VALID_ASR_PROVIDERS:
|
||||
raise ValueError(
|
||||
f"Invalid ASR_PROVIDER '{s.asr_provider}'. "
|
||||
f"Must be one of: {', '.join(sorted(VALID_ASR_PROVIDERS))}"
|
||||
)
|
||||
return s
|
||||
|
|
|
|||
|
|
@ -94,14 +94,20 @@ async def transcribe_video(video_id: str, language: str = "yue"):
|
|||
from app.core.config import get_settings
|
||||
settings = get_settings()
|
||||
|
||||
if not settings.dashscope_api_key:
|
||||
provider = settings.asr_provider
|
||||
if provider == "dashscope" and not settings.dashscope_api_key:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail="DASHSCOPE_API_KEY is not configured. Set it in .env to enable transcription.",
|
||||
)
|
||||
if provider == "openrouter" and not settings.openrouter_api_key:
|
||||
raise HTTPException(
|
||||
status_code=500,
|
||||
detail="OPENROUTER_API_KEY is not configured. Set it in .env to enable OpenRouter ASR.",
|
||||
)
|
||||
|
||||
transcribe_start = time.monotonic()
|
||||
logger.info("transcribe-started video_id=%s language=%s", video_id, language)
|
||||
logger.info("transcribe-started video_id=%s language=%s provider=%s", video_id, language, provider)
|
||||
|
||||
service = _get_video_service()
|
||||
wav_path = await service.extract_audio(video_id)
|
||||
|
|
@ -110,7 +116,7 @@ async def transcribe_video(video_id: str, language: str = "yue"):
|
|||
audio_bytes = wav_path.read_bytes()
|
||||
logger.debug("audio-extracted video_id=%s wav_size=%d", video_id, len(audio_bytes))
|
||||
asr = ASRClient(settings)
|
||||
text = asr.transcribe_full(audio_bytes, language=language)
|
||||
text = await asr.transcribe_full(audio_bytes, language=language)
|
||||
except Exception as e:
|
||||
logger.error("transcribe-failed video_id=%s error=%s", video_id, e)
|
||||
raise HTTPException(status_code=500, detail=f"Transcription failed: {str(e)}")
|
||||
|
|
|
|||
|
|
@ -2,12 +2,14 @@ import json
|
|||
import asyncio
|
||||
import base64
|
||||
import logging
|
||||
import struct
|
||||
import time
|
||||
|
||||
from fastapi import APIRouter, WebSocket, WebSocketDisconnect
|
||||
|
||||
from app.core.config import get_settings
|
||||
from app.services.asr_client import float32_to_s16le, build_display_text, _to_traditional
|
||||
from app.services.asr_providers import OpenRouterASRProvider
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
|
@ -83,6 +85,120 @@ def format_transcription_event(event: dict, accumulated: str) -> dict | None:
|
|||
return None
|
||||
|
||||
|
||||
def pcm_to_wav(pcm_bytes: bytes, sample_rate: int = 16000, channels: int = 1, bits_per_sample: int = 16) -> bytes:
|
||||
byte_rate = sample_rate * channels * bits_per_sample // 8
|
||||
block_align = channels * bits_per_sample // 8
|
||||
data_size = len(pcm_bytes)
|
||||
header = struct.pack(
|
||||
"<4sI4s4sIHHIIHH4sI",
|
||||
b"RIFF",
|
||||
36 + data_size,
|
||||
b"WAVE",
|
||||
b"fmt ",
|
||||
16,
|
||||
1, # PCM
|
||||
channels,
|
||||
sample_rate,
|
||||
byte_rate,
|
||||
block_align,
|
||||
bits_per_sample,
|
||||
b"data",
|
||||
data_size,
|
||||
)
|
||||
return header + pcm_bytes
|
||||
|
||||
|
||||
async def _ws_proxy_openrouter(client_ws: WebSocket, language: str = "yue"):
|
||||
settings = get_settings()
|
||||
session_start = time.monotonic()
|
||||
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key=settings.openrouter_api_key,
|
||||
base_url=settings.llm_base_url,
|
||||
model=settings.asr_openrouter_model,
|
||||
)
|
||||
logger.info(
|
||||
"openrouter-ws-started model=%s url=%s language=%s",
|
||||
settings.asr_openrouter_model,
|
||||
provider._stt_url,
|
||||
language,
|
||||
)
|
||||
|
||||
accumulated_text = ""
|
||||
audio_buffer = bytearray()
|
||||
chunk_count = 0
|
||||
last_flush = time.monotonic()
|
||||
flush_lock = asyncio.Lock()
|
||||
|
||||
async def flush_chunk():
|
||||
nonlocal audio_buffer, accumulated_text, chunk_count, last_flush
|
||||
if not audio_buffer:
|
||||
return
|
||||
|
||||
pcm_snapshot = bytes(audio_buffer)
|
||||
audio_buffer.clear()
|
||||
last_flush = time.monotonic()
|
||||
chunk_count += 1
|
||||
|
||||
try:
|
||||
wav_bytes = pcm_to_wav(pcm_snapshot)
|
||||
logger.debug(
|
||||
"openrouter-chunk-sending chunk=%d pcm_bytes=%d wav_bytes=%d",
|
||||
chunk_count, len(pcm_snapshot), len(wav_bytes),
|
||||
)
|
||||
text = await provider.transcribe(wav_bytes, language)
|
||||
if text.strip():
|
||||
accumulated_text = build_display_text(accumulated_text, text)
|
||||
await client_ws.send_json({
|
||||
"delta": "",
|
||||
"full_text": _to_traditional(accumulated_text),
|
||||
"language": language,
|
||||
"is_final": True,
|
||||
})
|
||||
logger.info(
|
||||
"openrouter-chunk-completed chunk=%d text_len=%d total_len=%d",
|
||||
chunk_count, len(text), len(accumulated_text),
|
||||
)
|
||||
except Exception as e:
|
||||
logger.error(
|
||||
"openrouter-chunk-failed chunk=%d pcm_bytes=%d error=%s",
|
||||
chunk_count, len(pcm_snapshot), e,
|
||||
)
|
||||
|
||||
async def chunk_timer():
|
||||
while True:
|
||||
await asyncio.sleep(3.0)
|
||||
async with flush_lock:
|
||||
if audio_buffer and (time.monotonic() - last_flush >= 3.0):
|
||||
await flush_chunk()
|
||||
|
||||
timer_task = asyncio.create_task(chunk_timer())
|
||||
|
||||
try:
|
||||
while True:
|
||||
float32_bytes = await client_ws.receive_bytes()
|
||||
s16_bytes = float32_to_s16le(float32_bytes)
|
||||
audio_buffer.extend(s16_bytes)
|
||||
except WebSocketDisconnect:
|
||||
logger.info(
|
||||
"openrouter-client-disconnected chunks=%d accumulated_len=%d",
|
||||
chunk_count, len(accumulated_text),
|
||||
)
|
||||
finally:
|
||||
timer_task.cancel()
|
||||
try:
|
||||
async with flush_lock:
|
||||
await flush_chunk()
|
||||
except Exception:
|
||||
pass
|
||||
await provider.close()
|
||||
duration = time.monotonic() - session_start
|
||||
logger.info(
|
||||
"openrouter-ws-closed chunks=%d text_len=%d duration=%.1fs",
|
||||
chunk_count, len(accumulated_text), duration,
|
||||
)
|
||||
|
||||
|
||||
async def _ws_proxy_dashscope(client_ws: WebSocket, loop: asyncio.AbstractEventLoop, language: str = "yue"):
|
||||
event_queue: asyncio.Queue = asyncio.Queue()
|
||||
callback = DashScopeCallback(event_queue, loop)
|
||||
|
|
@ -213,13 +329,6 @@ async def ws_asr_endpoint(websocket: WebSocket, video_id: str, language: str = "
|
|||
settings = get_settings()
|
||||
client_host = websocket.client.host if websocket.client else "unknown"
|
||||
|
||||
if not settings.dashscope_api_key:
|
||||
await websocket.accept()
|
||||
await websocket.send_json({"error": "DASHSCOPE_API_KEY is not configured"})
|
||||
await websocket.close(code=1011, reason="DASHSCOPE_API_KEY not set")
|
||||
logger.warning("ws-rejected-no-apikey video_id=%s client=%s", video_id, client_host)
|
||||
return
|
||||
|
||||
if source == "system-audio" and not settings.system_audio_enabled:
|
||||
await websocket.accept()
|
||||
await websocket.send_json({"error": "System audio capture is disabled"})
|
||||
|
|
@ -234,11 +343,32 @@ async def ws_asr_endpoint(websocket: WebSocket, video_id: str, language: str = "
|
|||
logger.warning("ws-rejected-mic-disabled video_id=%s client=%s", video_id, client_host)
|
||||
return
|
||||
|
||||
if settings.asr_provider == "openrouter":
|
||||
if not settings.openrouter_api_key:
|
||||
await websocket.accept()
|
||||
await websocket.send_json({"error": "OPENROUTER_API_KEY is not configured"})
|
||||
await websocket.close(code=1011, reason="OPENROUTER_API_KEY not set")
|
||||
logger.warning("ws-rejected-no-openrouter-key video_id=%s client=%s", video_id, client_host)
|
||||
return
|
||||
else:
|
||||
if not settings.dashscope_api_key:
|
||||
await websocket.accept()
|
||||
await websocket.send_json({"error": "DASHSCOPE_API_KEY is not configured"})
|
||||
await websocket.close(code=1011, reason="DASHSCOPE_API_KEY not set")
|
||||
logger.warning("ws-rejected-no-apikey video_id=%s client=%s", video_id, client_host)
|
||||
return
|
||||
|
||||
await websocket.accept()
|
||||
loop = asyncio.get_event_loop()
|
||||
logger.info("ws-connect video_id=%s lang=%s source=%s client=%s", video_id, language, source, client_host)
|
||||
logger.info(
|
||||
"ws-connect video_id=%s lang=%s source=%s client=%s provider=%s",
|
||||
video_id, language, source, client_host, settings.asr_provider,
|
||||
)
|
||||
|
||||
try:
|
||||
if settings.asr_provider == "openrouter":
|
||||
await _ws_proxy_openrouter(websocket, language)
|
||||
else:
|
||||
await _ws_proxy_dashscope(websocket, loop, language)
|
||||
except Exception as e:
|
||||
logger.error("ws-asr-error video_id=%s error=%s", video_id, e)
|
||||
|
|
|
|||
|
|
@ -1,9 +1,8 @@
|
|||
import struct
|
||||
import base64
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
import zhconv
|
||||
from openai import OpenAI
|
||||
from app.services.asr_providers import create_asr_provider, ASRError, _to_traditional # noqa: F401
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
|
@ -20,40 +19,16 @@ def build_display_text(accumulated: str, current: str) -> str:
|
|||
return " ".join(parts)
|
||||
|
||||
|
||||
def _to_traditional(text: str) -> str:
|
||||
if not text:
|
||||
return text
|
||||
return zhconv.convert(text, "zh-hant")
|
||||
|
||||
|
||||
class ASRClient:
|
||||
def __init__(self, settings):
|
||||
self.settings = settings
|
||||
def __init__(self, settings: Any):
|
||||
self._settings = settings
|
||||
self._provider = create_asr_provider(settings)
|
||||
|
||||
def transcribe_full(self, audio_bytes: bytes, language: str = "yue") -> str:
|
||||
audio_b64 = base64.b64encode(audio_bytes).decode()
|
||||
data_url = f"data:audio/wav;base64,{audio_b64}"
|
||||
|
||||
client = OpenAI(
|
||||
api_key=self.settings.dashscope_api_key,
|
||||
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
|
||||
)
|
||||
|
||||
asr_options: dict = {}
|
||||
if language != "auto":
|
||||
asr_options["language"] = language
|
||||
|
||||
resp = client.chat.completions.create(
|
||||
model=self.settings.asr_model_name,
|
||||
messages=[{ # type: ignore[list-item]
|
||||
"role": "user",
|
||||
"content": [{
|
||||
"type": "input_audio",
|
||||
"input_audio": {"data": data_url},
|
||||
}],
|
||||
}],
|
||||
extra_body={"asr_options": asr_options} if asr_options else None,
|
||||
)
|
||||
|
||||
result = resp.choices[0].message.content or ""
|
||||
return _to_traditional(result)
|
||||
async def transcribe_full(self, audio_bytes: bytes, language: str = "yue") -> str:
|
||||
try:
|
||||
return await self._provider.transcribe(audio_bytes, language)
|
||||
except ASRError:
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error("transcribe_full failed: %s", e)
|
||||
raise ASRError(f"Transcription failed: {e}") from e
|
||||
|
|
|
|||
|
|
@ -0,0 +1,190 @@
|
|||
import asyncio
|
||||
import base64
|
||||
import logging
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
import httpx
|
||||
import zhconv
|
||||
from openai import OpenAI
|
||||
from tenacity import (
|
||||
retry,
|
||||
retry_if_exception_type,
|
||||
stop_after_attempt,
|
||||
wait_random_exponential,
|
||||
)
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _to_traditional(text: str) -> str:
|
||||
if not text:
|
||||
return text
|
||||
return zhconv.convert(text, "zh-hant")
|
||||
|
||||
|
||||
class ASRError(Exception):
|
||||
pass
|
||||
|
||||
|
||||
class ASRProvider(ABC):
|
||||
@abstractmethod
|
||||
async def transcribe(self, audio_bytes: bytes, language: str) -> str:
|
||||
...
|
||||
|
||||
|
||||
class DashScopeASRProvider(ASRProvider):
|
||||
def __init__(self, api_key: str, model: str):
|
||||
self._api_key = api_key
|
||||
self._model = model
|
||||
|
||||
async def transcribe(self, audio_bytes: bytes, language: str) -> str:
|
||||
loop = asyncio.get_running_loop()
|
||||
logger.info(
|
||||
"asr-transcribe-start provider=dashscope model=%s audio_bytes=%d language=%s",
|
||||
self._model, len(audio_bytes), language,
|
||||
)
|
||||
return await loop.run_in_executor(
|
||||
None, self._transcribe_sync, audio_bytes, language
|
||||
)
|
||||
|
||||
def _transcribe_sync(self, audio_bytes: bytes, language: str) -> str:
|
||||
audio_b64 = base64.b64encode(audio_bytes).decode()
|
||||
data_url = f"data:audio/wav;base64,{audio_b64}"
|
||||
|
||||
client = OpenAI(
|
||||
api_key=self._api_key,
|
||||
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
|
||||
)
|
||||
|
||||
asr_options: dict = {}
|
||||
if language != "auto":
|
||||
asr_options["language"] = language
|
||||
|
||||
resp = client.chat.completions.create(
|
||||
model=self._model,
|
||||
messages=[{ # type: ignore[list-item]
|
||||
"role": "user",
|
||||
"content": [{
|
||||
"type": "input_audio",
|
||||
"input_audio": {"data": data_url},
|
||||
}],
|
||||
}],
|
||||
extra_body={"asr_options": asr_options} if asr_options else None,
|
||||
)
|
||||
|
||||
result = resp.choices[0].message.content or ""
|
||||
return _to_traditional(result)
|
||||
|
||||
|
||||
class OpenRouterASRProvider(ASRProvider):
|
||||
def __init__(self, api_key: str, base_url: str, model: str):
|
||||
self._api_key = api_key
|
||||
self._stt_url = f"{base_url.rstrip('/')}/audio/transcriptions"
|
||||
self._model = model
|
||||
self._client: httpx.AsyncClient | None = None
|
||||
|
||||
async def _get_client(self) -> httpx.AsyncClient:
|
||||
if self._client is None:
|
||||
self._client = httpx.AsyncClient(
|
||||
timeout=httpx.Timeout(120.0),
|
||||
headers={
|
||||
"Authorization": f"Bearer {self._api_key}",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
)
|
||||
return self._client
|
||||
|
||||
async def transcribe(self, audio_bytes: bytes, language: str) -> str:
|
||||
audio_b64 = base64.b64encode(audio_bytes).decode("ascii")
|
||||
logger.info(
|
||||
"asr-transcribe-start provider=openrouter model=%s url=%s audio_bytes=%d language=%s",
|
||||
self._model, self._stt_url, len(audio_bytes), language,
|
||||
)
|
||||
|
||||
payload: dict = {
|
||||
"model": self._model,
|
||||
"input_audio": {
|
||||
"data": audio_b64,
|
||||
"format": "wav",
|
||||
},
|
||||
}
|
||||
# OpenRouter STT expects ISO-639-1 (2-letter) codes.
|
||||
# DashScope languages like "yue" (Cantonese, ISO-639-3) are not valid here.
|
||||
# Omit to let auto-detection handle it.
|
||||
if language and language not in ("auto", "yue"):
|
||||
payload["language"] = language
|
||||
|
||||
try:
|
||||
result = await self._call_stt_api(payload)
|
||||
except (httpx.TransportError, httpx.HTTPStatusError) as e:
|
||||
raise ASRError(f"OpenRouter STT request failed: {e}") from e
|
||||
|
||||
text = result.get("text", "")
|
||||
if not text:
|
||||
raise ASRError("OpenRouter STT returned empty transcription")
|
||||
|
||||
logger.info(
|
||||
"asr-transcribe-complete provider=openrouter text_len=%d",
|
||||
len(text),
|
||||
)
|
||||
return _to_traditional(text)
|
||||
|
||||
@retry(
|
||||
reraise=True,
|
||||
stop=stop_after_attempt(4),
|
||||
wait=wait_random_exponential(multiplier=0.2, max=3.0),
|
||||
retry=retry_if_exception_type((httpx.TransportError, httpx.HTTPStatusError)),
|
||||
)
|
||||
async def _call_stt_api(self, payload: dict) -> dict:
|
||||
client = await self._get_client()
|
||||
response = await client.post(self._stt_url, json=payload)
|
||||
if response.status_code >= 400:
|
||||
logger.error(
|
||||
"openrouter-stt-error status=%d body=%s",
|
||||
response.status_code,
|
||||
response.text[:500],
|
||||
)
|
||||
response.raise_for_status()
|
||||
return response.json()
|
||||
|
||||
async def close(self) -> None:
|
||||
if self._client is not None:
|
||||
await self._client.aclose()
|
||||
self._client = None
|
||||
|
||||
|
||||
def create_asr_provider(settings) -> ASRProvider:
|
||||
provider_name = settings.asr_provider
|
||||
logger.info(
|
||||
"asr-provider-selected provider=%s dashscope_key=%s openrouter_key=%s llm_base_url=%s",
|
||||
provider_name,
|
||||
"set" if settings.dashscope_api_key else "empty",
|
||||
"set" if settings.openrouter_api_key else "empty",
|
||||
settings.llm_base_url,
|
||||
)
|
||||
|
||||
if provider_name == "dashscope":
|
||||
logger.info("asr-provider-init provider=dashscope model=%s", settings.asr_model_name)
|
||||
return DashScopeASRProvider(
|
||||
api_key=settings.dashscope_api_key,
|
||||
model=settings.asr_model_name,
|
||||
)
|
||||
|
||||
if provider_name == "openrouter":
|
||||
if not settings.openrouter_api_key:
|
||||
raise ASRError(
|
||||
"OPENROUTER_API_KEY is not configured. "
|
||||
"Set it in .env to use OpenRouter ASR."
|
||||
)
|
||||
logger.info(
|
||||
"asr-provider-init provider=openrouter model=%s url=%s",
|
||||
settings.asr_openrouter_model,
|
||||
f"{settings.llm_base_url.rstrip('/')}/audio/transcriptions",
|
||||
)
|
||||
return OpenRouterASRProvider(
|
||||
api_key=settings.openrouter_api_key,
|
||||
base_url=settings.llm_base_url,
|
||||
model=settings.asr_openrouter_model,
|
||||
)
|
||||
|
||||
raise ValueError(f"Unknown ASR provider: {provider_name}")
|
||||
|
|
@ -27,6 +27,7 @@ def video_client(tmp_path, monkeypatch):
|
|||
upload_dir.mkdir()
|
||||
monkeypatch.setenv("VIDEO_UPLOAD_DIR", str(upload_dir))
|
||||
monkeypatch.setenv("MAX_VIDEO_SIZE_MB", "50")
|
||||
monkeypatch.setenv("ASR_PROVIDER", "dashscope")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
|
||||
|
||||
from app.core.config import get_settings
|
||||
|
|
@ -49,7 +50,7 @@ def _upload_video(client, filename="test.mp4", content=b"\x00" * 1024):
|
|||
class TestUploadTranscribeFlow:
|
||||
"""Full upload → transcribe with mocked ASR and real file I/O."""
|
||||
|
||||
@patch("app.services.asr_client.OpenAI")
|
||||
@patch("app.services.asr_providers.OpenAI")
|
||||
@patch("app.services.video_service.asyncio.create_subprocess_exec")
|
||||
def test_upload_then_transcribe(self, mock_subprocess, mock_openai_cls, video_client):
|
||||
"""Upload video → extract audio (mocked ffmpeg) → transcribe (mocked ASR) → verify response."""
|
||||
|
|
@ -93,7 +94,7 @@ class TestUploadTranscribeFlow:
|
|||
wav_path = upload_dir / f"{video_id}_audio.wav"
|
||||
assert not wav_path.exists(), "Temp WAV file should be cleaned up after transcription"
|
||||
|
||||
@patch("app.services.asr_client.OpenAI")
|
||||
@patch("app.services.asr_providers.OpenAI")
|
||||
@patch("app.services.video_service.asyncio.create_subprocess_exec")
|
||||
def test_upload_transcribe_custom_language(self, mock_subprocess, mock_openai_cls, video_client):
|
||||
"""Transcribe with language=en should pass it through."""
|
||||
|
|
|
|||
|
|
@ -123,10 +123,12 @@ class TestToTraditional:
|
|||
|
||||
|
||||
class TestTranscribeFull:
|
||||
def test_returns_traditional_chinese_text(self, monkeypatch):
|
||||
@pytest.mark.asyncio
|
||||
async def test_returns_traditional_chinese_text(self, monkeypatch):
|
||||
from app.services.asr_client import ASRClient
|
||||
|
||||
settings = MagicMock()
|
||||
settings.asr_provider = "dashscope"
|
||||
settings.dashscope_api_key = "sk-test-key"
|
||||
settings.asr_model_name = "qwen3-asr-flash"
|
||||
|
||||
|
|
@ -139,8 +141,8 @@ class TestTranscribeFull:
|
|||
mock_openai_client = MagicMock()
|
||||
mock_openai_client.chat.completions.create.return_value = mock_resp
|
||||
|
||||
with patch("app.services.asr_client.OpenAI", return_value=mock_openai_client):
|
||||
result = client.transcribe_full(b"fake-audio-bytes", language="yue")
|
||||
with patch("app.services.asr_providers.OpenAI", return_value=mock_openai_client):
|
||||
result = await client.transcribe_full(b"fake-audio-bytes", language="yue")
|
||||
|
||||
assert result == "測試結果"
|
||||
mock_openai_client.chat.completions.create.assert_called_once()
|
||||
|
|
@ -148,10 +150,12 @@ class TestTranscribeFull:
|
|||
assert call_kwargs.kwargs["model"] == "qwen3-asr-flash"
|
||||
assert call_kwargs.kwargs["extra_body"]["asr_options"]["language"] == "yue"
|
||||
|
||||
def test_uses_correct_api_endpoint(self, monkeypatch):
|
||||
@pytest.mark.asyncio
|
||||
async def test_uses_correct_api_endpoint(self, monkeypatch):
|
||||
from app.services.asr_client import ASRClient
|
||||
|
||||
settings = MagicMock()
|
||||
settings.asr_provider = "dashscope"
|
||||
settings.dashscope_api_key = "sk-test-key"
|
||||
settings.asr_model_name = "qwen3-asr-flash"
|
||||
|
||||
|
|
@ -164,17 +168,19 @@ class TestTranscribeFull:
|
|||
mock_openai_client = MagicMock()
|
||||
mock_openai_client.chat.completions.create.return_value = mock_resp
|
||||
|
||||
with patch("app.services.asr_client.OpenAI", return_value=mock_openai_client) as mock_openai_cls:
|
||||
client.transcribe_full(b"audio", language="yue")
|
||||
with patch("app.services.asr_providers.OpenAI", return_value=mock_openai_client) as mock_openai_cls:
|
||||
await client.transcribe_full(b"audio", language="yue")
|
||||
mock_openai_cls.assert_called_once_with(
|
||||
api_key="sk-test-key",
|
||||
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
|
||||
)
|
||||
|
||||
def test_auto_language_omits_language_param(self, monkeypatch):
|
||||
@pytest.mark.asyncio
|
||||
async def test_auto_language_omits_language_param(self, monkeypatch):
|
||||
from app.services.asr_client import ASRClient
|
||||
|
||||
settings = MagicMock()
|
||||
settings.asr_provider = "dashscope"
|
||||
settings.dashscope_api_key = "sk-test-key"
|
||||
settings.asr_model_name = "qwen3-asr-flash"
|
||||
|
||||
|
|
@ -187,8 +193,8 @@ class TestTranscribeFull:
|
|||
mock_openai_client = MagicMock()
|
||||
mock_openai_client.chat.completions.create.return_value = mock_resp
|
||||
|
||||
with patch("app.services.asr_client.OpenAI", return_value=mock_openai_client):
|
||||
client.transcribe_full(b"audio", language="auto")
|
||||
with patch("app.services.asr_providers.OpenAI", return_value=mock_openai_client):
|
||||
await client.transcribe_full(b"audio", language="auto")
|
||||
|
||||
call_kwargs = mock_openai_client.chat.completions.create.call_args
|
||||
assert call_kwargs.kwargs.get("extra_body") is None
|
||||
|
|
|
|||
|
|
@ -23,6 +23,7 @@ def video_client(tmp_path, monkeypatch):
|
|||
upload_dir.mkdir()
|
||||
monkeypatch.setenv("VIDEO_UPLOAD_DIR", str(upload_dir))
|
||||
monkeypatch.setenv("MAX_VIDEO_SIZE_MB", "50")
|
||||
monkeypatch.setenv("ASR_PROVIDER", "dashscope")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
|
||||
|
||||
from app.core.config import get_settings
|
||||
|
|
@ -44,7 +45,7 @@ def _upload_video(client, filename="test.mp4", content=b"\x00" * 1024):
|
|||
|
||||
class TestTranscribeSuccess:
|
||||
@patch("app.routers.video.VideoService.extract_audio")
|
||||
@patch("app.services.asr_client.OpenAI")
|
||||
@patch("app.services.asr_providers.OpenAI")
|
||||
def test_transcribe_returns_response(self, mock_openai_cls, mock_extract, video_client):
|
||||
"""POST transcribe should return FullTranscriptResponse."""
|
||||
client, upload_dir = video_client
|
||||
|
|
@ -74,7 +75,7 @@ class TestTranscribeSuccess:
|
|||
assert "測" in data["text"] or "試" in data["text"]
|
||||
|
||||
@patch("app.routers.video.VideoService.extract_audio")
|
||||
@patch("app.services.asr_client.OpenAI")
|
||||
@patch("app.services.asr_providers.OpenAI")
|
||||
def test_transcribe_custom_language(self, mock_openai_cls, mock_extract, video_client):
|
||||
"""POST transcribe with language param should pass it through."""
|
||||
client, upload_dir = video_client
|
||||
|
|
@ -169,6 +170,7 @@ class TestTranscribeMissingApiKey:
|
|||
upload_dir.mkdir()
|
||||
monkeypatch.setenv("VIDEO_UPLOAD_DIR", str(upload_dir))
|
||||
monkeypatch.setenv("MAX_VIDEO_SIZE_MB", "50")
|
||||
monkeypatch.setenv("ASR_PROVIDER", "dashscope")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "")
|
||||
|
||||
from app.core.config import get_settings
|
||||
|
|
|
|||
|
|
@ -0,0 +1,63 @@
|
|||
"""Phase 5 tests: ASR configuration validation.
|
||||
|
||||
Covers:
|
||||
- Valid ASR_PROVIDER values (dashscope, openrouter) load correctly
|
||||
- Invalid ASR_PROVIDER raises ValueError
|
||||
- Default values for new Phase 5 settings
|
||||
"""
|
||||
import os
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
class TestAsrProviderConfig:
|
||||
def test_dashscope_is_default(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "dashscope")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
s = get_settings()
|
||||
assert s.asr_provider == "dashscope"
|
||||
|
||||
def test_openrouter_provider_loads(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "openrouter")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test")
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-test")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
s = get_settings()
|
||||
assert s.asr_provider == "openrouter"
|
||||
|
||||
def test_invalid_provider_raises_valueerror(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "invalid_provider")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test")
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
with pytest.raises(ValueError, match="Invalid ASR_PROVIDER"):
|
||||
get_settings()
|
||||
|
||||
def test_openrouter_api_key_defaults_empty(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "dashscope")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test")
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
s = get_settings()
|
||||
assert s.openrouter_api_key == ""
|
||||
|
||||
def test_asr_openrouter_model_default(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "dashscope")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
s = get_settings()
|
||||
assert s.asr_openrouter_model == "google/gemini-3.1-flash-lite"
|
||||
|
||||
def test_openrouter_model_customizable(self, monkeypatch, tmp_path):
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test")
|
||||
monkeypatch.setenv("ASR_OPENROUTER_MODEL", "openai/whisper-large-v3")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
s = get_settings()
|
||||
assert s.asr_openrouter_model == "openai/whisper-large-v3"
|
||||
|
|
@ -0,0 +1,125 @@
|
|||
"""Phase 5 tests: Integration — full video → transcribe with provider switching.
|
||||
|
||||
Covers:
|
||||
- Full transcript with dashscope provider (mocked OpenAI)
|
||||
- Full transcript with openrouter provider (mocked httpx)
|
||||
- API key validation per provider
|
||||
"""
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import httpx
|
||||
import pytest
|
||||
from fastapi import FastAPI
|
||||
from fastapi.testclient import TestClient
|
||||
|
||||
from app.routers.video import router
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def video_client(tmp_path, monkeypatch):
|
||||
upload_dir = tmp_path / "test_uploads"
|
||||
upload_dir.mkdir()
|
||||
monkeypatch.setenv("VIDEO_UPLOAD_DIR", str(upload_dir))
|
||||
monkeypatch.setenv("MAX_VIDEO_SIZE_MB", "50")
|
||||
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
app = FastAPI()
|
||||
app.include_router(router, prefix="/api/v1")
|
||||
return TestClient(app), upload_dir
|
||||
|
||||
|
||||
def _upload_video(client, filename="test.mp4", content=b"\x00" * 1024):
|
||||
resp = client.post(
|
||||
"/api/v1/video/upload",
|
||||
files={"file": (filename, content, "video/mp4")},
|
||||
)
|
||||
assert resp.status_code == 200
|
||||
return resp.json()["video_id"]
|
||||
|
||||
|
||||
class TestDashScopeIntegration:
|
||||
@patch("app.routers.video.VideoService.extract_audio")
|
||||
@patch("app.services.asr_providers.OpenAI")
|
||||
def test_transcribe_with_dashscope(self, mock_openai_cls, mock_extract, video_client, monkeypatch):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "dashscope")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-dashscope-test")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
|
||||
client, upload_dir = video_client
|
||||
video_id = _upload_video(client)
|
||||
|
||||
fake_wav = upload_dir / "extracted.wav"
|
||||
fake_wav.write_bytes(b"RIFF" + b"\x00" * 100)
|
||||
mock_extract.return_value = fake_wav
|
||||
|
||||
mock_resp = MagicMock()
|
||||
mock_resp.choices = [MagicMock()]
|
||||
mock_resp.choices[0].message.content = "测试转录結果"
|
||||
mock_openai_instance = MagicMock()
|
||||
mock_openai_instance.chat.completions.create.return_value = mock_resp
|
||||
mock_openai_cls.return_value = mock_openai_instance
|
||||
|
||||
resp = client.post(f"/api/v1/video/{video_id}/transcribe")
|
||||
assert resp.status_code == 200
|
||||
assert "text" in resp.json()
|
||||
assert "測" in resp.json()["text"] or "轉" in resp.json()["text"]
|
||||
|
||||
|
||||
class TestOpenRouterIntegration:
|
||||
@patch("app.routers.video.VideoService.extract_audio")
|
||||
@patch("app.services.asr_providers.httpx.AsyncClient")
|
||||
def test_transcribe_with_openrouter(self, mock_httpx_cls, mock_extract, video_client, monkeypatch):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "openrouter")
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "sk-or-test")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
|
||||
client, upload_dir = video_client
|
||||
video_id = _upload_video(client)
|
||||
|
||||
fake_wav = upload_dir / "extracted.wav"
|
||||
fake_wav.write_bytes(b"RIFF" + b"\x00" * 100)
|
||||
mock_extract.return_value = fake_wav
|
||||
|
||||
mock_response = MagicMock(spec=httpx.Response)
|
||||
mock_response.json.return_value = {"text": "測試轉錄結果", "usage": {}}
|
||||
mock_response.raise_for_status = MagicMock()
|
||||
|
||||
mock_http_client = AsyncMock()
|
||||
mock_http_client.post.return_value = mock_response
|
||||
mock_httpx_cls.return_value = mock_http_client
|
||||
|
||||
resp = client.post(f"/api/v1/video/{video_id}/transcribe")
|
||||
assert resp.status_code == 200
|
||||
assert "text" in resp.json()
|
||||
assert "轉" in resp.json()["text"] or "錄" in resp.json()["text"]
|
||||
|
||||
|
||||
class TestApiKeyValidation:
|
||||
def test_missing_dashscope_key_returns_500(self, video_client, monkeypatch):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "dashscope")
|
||||
monkeypatch.setenv("DASHSCOPE_API_KEY", "")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
|
||||
client, upload_dir = video_client
|
||||
video_id = _upload_video(client)
|
||||
|
||||
resp = client.post(f"/api/v1/video/{video_id}/transcribe")
|
||||
assert resp.status_code == 500
|
||||
assert "DASHSCOPE_API_KEY" in resp.json()["detail"]
|
||||
|
||||
def test_missing_openrouter_key_returns_500(self, video_client, monkeypatch):
|
||||
monkeypatch.setenv("ASR_PROVIDER", "openrouter")
|
||||
monkeypatch.setenv("OPENROUTER_API_KEY", "")
|
||||
from app.core.config import get_settings
|
||||
get_settings.cache_clear()
|
||||
|
||||
client, upload_dir = video_client
|
||||
video_id = _upload_video(client)
|
||||
|
||||
resp = client.post(f"/api/v1/video/{video_id}/transcribe")
|
||||
assert resp.status_code == 500
|
||||
assert "OPENROUTER_API_KEY" in resp.json()["detail"]
|
||||
|
|
@ -0,0 +1,208 @@
|
|||
"""Phase 5 tests: OpenRouter ASR provider unit tests.
|
||||
|
||||
Covers:
|
||||
- Successful transcription via mocked httpx
|
||||
- Retry logic on 429, 5xx
|
||||
- Error handling for empty response, network errors
|
||||
- Language parameter handling (passed / auto omitted)
|
||||
"""
|
||||
import json
|
||||
from unittest.mock import AsyncMock, MagicMock, patch
|
||||
|
||||
import httpx
|
||||
import pytest
|
||||
|
||||
from app.services.asr_providers import (
|
||||
ASRError,
|
||||
OpenRouterASRProvider,
|
||||
create_asr_provider,
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def mock_httpx_client():
|
||||
mock_client = AsyncMock(spec=httpx.AsyncClient)
|
||||
mock_response = MagicMock(spec=httpx.Response)
|
||||
mock_response.json.return_value = {"text": "測試轉錄結果", "usage": {}}
|
||||
mock_response.raise_for_status = MagicMock()
|
||||
mock_client.post.return_value = mock_response
|
||||
return mock_client
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
class TestOpenRouterTranscribe:
|
||||
async def test_returns_traditional_chinese(self, mock_httpx_client):
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
provider._client = mock_httpx_client
|
||||
|
||||
result = await provider.transcribe(b"fake-wav-bytes", language="yue")
|
||||
|
||||
assert "測" in result or "試" in result or "轉" in result
|
||||
|
||||
async def test_sends_correct_payload(self, mock_httpx_client):
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
provider._client = mock_httpx_client
|
||||
|
||||
await provider.transcribe(b"fake-wav-bytes", language="yue")
|
||||
|
||||
call_args = mock_httpx_client.post.call_args
|
||||
assert call_args is not None
|
||||
payload = call_args.kwargs["json"]
|
||||
assert payload["model"] == "google/gemini-3.1-flash-lite"
|
||||
assert "data" in payload["input_audio"]
|
||||
assert payload["input_audio"]["format"] == "wav"
|
||||
assert payload["language"] == "yue"
|
||||
|
||||
async def test_auto_language_omitted(self, mock_httpx_client):
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
provider._client = mock_httpx_client
|
||||
|
||||
await provider.transcribe(b"fake-wav-bytes", language="auto")
|
||||
|
||||
call_args = mock_httpx_client.post.call_args
|
||||
payload = call_args.kwargs["json"]
|
||||
assert "language" not in payload
|
||||
|
||||
async def test_default_language_yue_passed(self, mock_httpx_client):
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
provider._client = mock_httpx_client
|
||||
|
||||
await provider.transcribe(b"fake-wav-bytes", language="yue")
|
||||
|
||||
call_args = mock_httpx_client.post.call_args
|
||||
payload = call_args.kwargs["json"]
|
||||
assert payload.get("language") == "yue"
|
||||
|
||||
async def test_raises_on_empty_text(self, mock_httpx_client):
|
||||
mock_httpx_client.post.return_value.json.return_value = {"text": "", "usage": {}}
|
||||
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
provider._client = mock_httpx_client
|
||||
|
||||
with pytest.raises(ASRError, match="empty transcription"):
|
||||
await provider.transcribe(b"fake-wav-bytes", language="yue")
|
||||
|
||||
async def test_raises_on_http_error(self):
|
||||
mock_client = AsyncMock(spec=httpx.AsyncClient)
|
||||
mock_response = MagicMock(spec=httpx.Response)
|
||||
mock_response.raise_for_status.side_effect = httpx.HTTPStatusError(
|
||||
"Server error",
|
||||
request=MagicMock(),
|
||||
response=MagicMock(status_code=500),
|
||||
)
|
||||
mock_client.post.return_value = mock_response
|
||||
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
provider._client = mock_client
|
||||
|
||||
with pytest.raises(ASRError, match="STT request failed"):
|
||||
await provider.transcribe(b"fake-wav-bytes", language="yue")
|
||||
|
||||
async def test_raises_on_network_error(self):
|
||||
mock_client = AsyncMock(spec=httpx.AsyncClient)
|
||||
mock_client.post.side_effect = httpx.ConnectError("Connection refused")
|
||||
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
provider._client = mock_client
|
||||
|
||||
with pytest.raises(ASRError, match="STT request failed"):
|
||||
await provider.transcribe(b"fake-wav-bytes", language="yue")
|
||||
|
||||
|
||||
class TestSttUrlConstruction:
|
||||
def test_appends_audio_transcriptions(self):
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
assert provider._stt_url == "https://openrouter.ai/api/v1/audio/transcriptions"
|
||||
|
||||
def test_handles_trailing_slash(self):
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1/",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
assert provider._stt_url == "https://openrouter.ai/api/v1/audio/transcriptions"
|
||||
|
||||
|
||||
class TestCloseClient:
|
||||
@pytest.mark.asyncio
|
||||
async def test_close_cleans_up_client(self):
|
||||
mock_client = AsyncMock(spec=httpx.AsyncClient)
|
||||
provider = OpenRouterASRProvider(
|
||||
api_key="sk-test",
|
||||
base_url="https://openrouter.ai/api/v1",
|
||||
model="google/gemini-3.1-flash-lite",
|
||||
)
|
||||
provider._client = mock_client
|
||||
|
||||
await provider.close()
|
||||
mock_client.aclose.assert_awaited_once()
|
||||
assert provider._client is None
|
||||
|
||||
|
||||
class TestCreateAsrProvider:
|
||||
def test_creates_dashscope(self, monkeypatch):
|
||||
settings = MagicMock()
|
||||
settings.asr_provider = "dashscope"
|
||||
settings.dashscope_api_key = "sk-test"
|
||||
settings.asr_model_name = "qwen3-asr-flash"
|
||||
|
||||
from app.services.asr_providers import DashScopeASRProvider
|
||||
provider = create_asr_provider(settings)
|
||||
assert isinstance(provider, DashScopeASRProvider)
|
||||
|
||||
def test_creates_openrouter(self, monkeypatch):
|
||||
settings = MagicMock()
|
||||
settings.asr_provider = "openrouter"
|
||||
settings.openrouter_api_key = "sk-or-test"
|
||||
settings.llm_base_url = "https://openrouter.ai/api/v1"
|
||||
settings.asr_openrouter_model = "google/gemini-3.1-flash-lite"
|
||||
|
||||
provider = create_asr_provider(settings)
|
||||
assert isinstance(provider, OpenRouterASRProvider)
|
||||
|
||||
def test_missing_openrouter_key_raises(self, monkeypatch):
|
||||
settings = MagicMock()
|
||||
settings.asr_provider = "openrouter"
|
||||
settings.openrouter_api_key = ""
|
||||
|
||||
with pytest.raises(ASRError, match="OPENROUTER_API_KEY"):
|
||||
create_asr_provider(settings)
|
||||
|
||||
def test_unknown_provider_raises(self, monkeypatch):
|
||||
settings = MagicMock()
|
||||
settings.asr_provider = "unknown"
|
||||
|
||||
with pytest.raises(ValueError, match="Unknown ASR provider"):
|
||||
create_asr_provider(settings)
|
||||
|
|
@ -8,6 +8,7 @@ python-docx>=1.1.0
|
|||
pypdf>=4.0.2
|
||||
python-dotenv>=1.0.0
|
||||
httpx>=0.26.0
|
||||
tenacity>=8.0.0
|
||||
openai>=2.26.0,<3.0.0
|
||||
pytest==7.4.4
|
||||
pytest-asyncio==0.23.4
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ import { getPdfViewerUrl } from '../lib/api'
|
|||
import { processCitations, processCitationsForSubq, extractCitedSources, highlightTerms } from '../utils/citationParser'
|
||||
import { bulletizeMarkdown } from '../utils/citationParser'
|
||||
|
||||
const V2_BASE = `${import.meta.env.VITE_API_BASE_URL ?? 'http://localhost:8000/api/v1'}/v2`
|
||||
const V2_BASE = `${import.meta.env.VITE_API_BASE_URL ?? '/api/v1'}/v2`
|
||||
|
||||
function getHighlightUrl(document_id: string, chunk_index: number, sub_question: string): string {
|
||||
return `${V2_BASE}/highlights?document_id=${encodeURIComponent(document_id)}&chunk_index=${chunk_index}&sub_question=${encodeURIComponent(sub_question)}`
|
||||
|
|
|
|||
|
|
@ -13,8 +13,8 @@ export function useFullTranscript({ videoId }: UseFullTranscriptOptions) {
|
|||
setIsLoading(true)
|
||||
setError(null)
|
||||
try {
|
||||
const base = import.meta.env.VITE_API_BASE_URL ?? ''
|
||||
const resp = await fetch(`${base}/api/v1/video/${videoId}/transcribe`, {
|
||||
const base = import.meta.env.VITE_API_BASE_URL ?? '/api/v1'
|
||||
const resp = await fetch(`${base}/video/${videoId}/transcribe`, {
|
||||
method: 'POST',
|
||||
})
|
||||
if (!resp.ok) {
|
||||
|
|
|
|||
|
|
@ -1,7 +1,7 @@
|
|||
import axios from 'axios'
|
||||
import type { ChunkingStrategy, QueryRequest, QueryResponse, QueryStreamEvent, IngestResponse, DocumentListResponse, ChunkInfo, DeleteResponse, PromptProfileListResponse, PromptSetResponse, PromptUpdateRequest, PromptBatchUpdateRequest, PromptActivateResponse, PromptStatusResponse, ProfileExportData, ProfileImportResponse, QueryHistoryList, QueryHistoryDetail, HistoryStats, HistoryDeleteResponse, FullTranscriptResponse, VideoUploadResponse } from '../types'
|
||||
|
||||
const BASE_URL: string = import.meta.env.VITE_API_BASE_URL ?? 'http://localhost:8000/api/v1'
|
||||
const BASE_URL: string = import.meta.env.VITE_API_BASE_URL ?? '/api/v1'
|
||||
|
||||
export const apiClient = axios.create({ baseURL: BASE_URL })
|
||||
|
||||
|
|
@ -78,7 +78,7 @@ export const deleteChunk = async (chunkId: string): Promise<DeleteResponse> => {
|
|||
}
|
||||
|
||||
export const getChunkPdfUrl = (filePath: string): string => {
|
||||
const baseUrl: string = import.meta.env.VITE_API_BASE_URL ?? 'http://localhost:8000/api/v1'
|
||||
const baseUrl: string = import.meta.env.VITE_API_BASE_URL ?? '/api/v1'
|
||||
return `${baseUrl}/chunks/${encodeURIComponent(filePath)}/pdf`
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -265,7 +265,7 @@ describe('ResponsePanel', () => {
|
|||
await waitFor(() => {
|
||||
expect(mockFetch).toHaveBeenCalledTimes(1)
|
||||
expect(mockFetch).toHaveBeenCalledWith(
|
||||
'http://localhost:8000/api/v1/v2/highlights/batch',
|
||||
'/api/v1/v2/highlights/batch',
|
||||
expect.objectContaining({
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
|
|
|
|||
|
|
@ -61,7 +61,7 @@ export function processCitationsForSubq(
|
|||
|
||||
function buildCitationUrl(source: SourceMetadata, highlightReady?: boolean): string | null {
|
||||
if (highlightReady && source.document_id && source.sub_question_text) {
|
||||
const v2Base = `${import.meta.env.VITE_API_BASE_URL ?? 'http://localhost:8000/api/v1'}/v2`
|
||||
const v2Base = `${import.meta.env.VITE_API_BASE_URL ?? '/api/v1'}/v2`
|
||||
return `${v2Base}/highlights?document_id=${encodeURIComponent(source.document_id)}&chunk_index=${source.chunk_index}&sub_question=${encodeURIComponent(source.sub_question_text)}`
|
||||
}
|
||||
if (source.chunk_file_path) {
|
||||
|
|
|
|||
Loading…
Reference in New Issue