feat: Phase 4.8-4.9 — integration tests, acceptance tests, docs, and polish

This commit is contained in:
Woody 2026-05-15 09:51:45 +08:00
parent 1e8773469e
commit d69c180544
12 changed files with 1073 additions and 15 deletions

View File

@ -1,8 +1,8 @@
# Phase 4: System Audio & Mic Capture → ASR → RAG — Implementation Plan # Phase 4: System Audio & Mic Capture → ASR → RAG — Implementation Plan
**Created:** 2026-05-09 **Created:** 2026-05-09
**Updated:** 2026-05-14 **Updated:** 2026-05-15
**Status:** 📋 Draft (Not Started) **Status:** ✅ Complete
**Depends on:** Phase 1 (Complete), Phase 2 (Complete), Phase 3 (Complete) **Depends on:** Phase 1 (Complete), Phase 2 (Complete), Phase 3 (Complete)
--- ---
@ -210,15 +210,15 @@ const asr = source === 'system-audio' ? systemAudioASR
| Sub-Phase | Description | Effort | Depends On | Status | | Sub-Phase | Description | Effort | Depends On | Status |
|-----------|-------------|--------|------------|--------| |-----------|-------------|--------|------------|--------|
| 4.1 | Config & Infrastructure | 0.5 day | — | 📋 Draft | | 4.1 | Config & Infrastructure | 0.5 day | — | ✅ Complete |
| 4.2 | System Audio Capture Hook (`useSystemAudioASR`) | 1 day | 4.1 | 📋 Draft | | 4.2 | System Audio Capture Hook (`useSystemAudioASR`) | 1 day | 4.1 | ✅ Complete |
| 4.3 | SystemAudioCapture UI Component | 0.5 day | 4.2 | 📋 Draft | | 4.3 | SystemAudioCapture UI Component | 0.5 day | 4.2 | ✅ Complete |
| 4.4 | Mic Capture Hook (`useMicASR`) | 0.5 day | 4.1 | 📋 Draft | | 4.4 | Mic Capture Hook (`useMicASR`) | 0.5 day | 4.1 | ✅ Complete |
| 4.5 | MicCapture UI Component | 0.5 day | 4.4 | 📋 Draft | | 4.5 | MicCapture UI Component | 0.5 day | 4.4 | ✅ Complete |
| 4.6 | LTTPage Integration (all 3 sources) | 0.5 day | 4.2, 4.3, 4.4, 4.5 | 📋 Draft | | 4.6 | LTTPage Integration (all 3 sources) | 0.5 day | 4.2, 4.3, 4.4, 4.5 | ✅ Complete |
| 4.7 | Backend Adjustments | 0.5 day | 4.1 | 📋 Draft | | 4.7 | Backend Adjustments | 0.5 day | 4.1 | ✅ Complete |
| 4.8 | Integration & Acceptance Tests | 1 day | 4.6, 4.7 | 📋 Draft | | 4.8 | Integration & Acceptance Tests | 1 day | 4.6, 4.7 | ✅ Complete |
| 4.9 | Polish & Documentation | 0.5 day | 4.8 | 📋 Draft | | 4.9 | Polish & Documentation | 0.5 day | 4.8 | ✅ Complete |
| **Total** | | **5.5 days** | | | | **Total** | | **5.5 days** | | |
### Phase 4.1 — Config & Infrastructure (0.5 day) ### Phase 4.1 — Config & Infrastructure (0.5 day)
@ -649,5 +649,5 @@ README.md ← add System Audio + Listen
**File Information** **File Information**
- Filename: `phase4_system_audio_plan.md` - Filename: `phase4_system_audio_plan.md`
- Created: 2026-05-09 - Created: 2026-05-09
- Updated: 2026-05-14 — Added Listen Mic as third source; removed YouTube - Updated: 2026-05-15 — Phase 4 complete (all 9 sub-phases)
- Status: Draft — awaiting review before Phase 4.1 implementation begins - Status: Complete — 46 frontend tests + 14 backend tests passing, production build verified

View File

@ -244,6 +244,63 @@ Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline
- `ffmpeg` on server (for batch transcription) - `ffmpeg` on server (for batch transcription)
- `dashscope` Python package (in `requirements.txt`) - `dashscope` Python package (in `requirements.txt`)
### System Audio Capture & Listen Mic (Phase 4)
Two additional live audio sources alongside video Upload:
#### System Audio Capture
Captures audio output from any application on your computer (browser tab, Spotify, Zoom) via `getDisplayMedia()`.
**How to use:**
1. Select the **"System Audio"** tab in the LTTPage source selector
2. Click **"Start Capture"**
3. Choose a browser tab or window in the permission dialog — make sure **"Share audio"** is checked
4. Real-time Cantonese ASR transcription flows into the QueryInput
5. Edit the transcript while capturing continues, then submit your query
**Use cases:** Transcribing YouTube videos, podcasts, lectures, or meetings playing on your computer without downloading files.
#### Listen Mic
Captures microphone input via `getUserMedia()`.
**How to use:**
1. Select the **"Listen Mic"** tab
2. Click **"Start Listening"**
3. Allow microphone access when prompted
4. Speak — real-time transcription flows into QueryInput
5. Edit transcript while listening, then submit your query
**Use cases:** Recording live meetings, dictating questions verbally, transcribing spoken Cantonese in real time.
#### Browser Compatibility
**System Audio (`getDisplayMedia`):**
| Platform / Browser | Tab Audio | System Audio | Supported |
|--------------------|-----------|-------------|-----------|
| Chrome/Edge (Windows) | ✅ | ✅ | **Full support** |
| Chrome/Edge (macOS 14.2+) | ✅ | ✅ | **Full support** |
| Chrome/Edge (Linux) | ✅ | ❌ | Tab audio only |
| Firefox | ❌ | ❌ | Not supported |
| Safari | ❌ | ❌ | Not supported |
**Listen Mic (`getUserMedia`):** Universally supported in all modern browsers (Chrome, Firefox, Safari, Edge).
#### Limitations
- System Audio capture requires Chrome or Edge (Chromium-based browsers)
- No "Full Transcript" button — streaming ASR only (no batch transcription for live sources)
- `getDisplayMedia()` always shows a screen/tab picker even for audio-only capture (browser limitation)
- Each capture session generates a new UUID; the WebSocket reconnects on every Start/Stop
#### Configuration
```bash
# In backend/.env — feature toggles (default: true)
SYSTEM_AUDIO_ENABLED=true
MIC_ENABLED=true
```
### Installing ffmpeg ### Installing ffmpeg
```bash ```bash

View File

@ -36,3 +36,8 @@ ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime
# Video upload (Phase 2) # Video upload (Phase 2)
VIDEO_UPLOAD_DIR=./uploads VIDEO_UPLOAD_DIR=./uploads
MAX_VIDEO_SIZE_MB=300 MAX_VIDEO_SIZE_MB=300
# Live audio capture toggles (Phase 4)
# Set to false to disable System Audio or Listen Mic capture
SYSTEM_AUDIO_ENABLED=true
MIC_ENABLED=true

View File

@ -0,0 +1,105 @@
"""Acceptance tests: Phase 4 System Audio and Mic Capture with real DashScope ASR.
Prerequisites:
- ChromaDB running (local or docker)
- .env configured with valid DASHSCOPE_API_KEY
- SYSTEM_AUDIO_ENABLED=true and MIC_ENABLED=true in .env
- Chrome/Edge browser for system audio capture tests
- Working microphone for mic capture tests
These tests require manual user interaction (browser permission dialogs).
Run with: pytest app/test/acceptance/test_acceptance_phase4.py -v -m acceptance
"""
import pytest
@pytest.mark.acceptance
@pytest.mark.slow
@pytest.mark.skip(reason="Requires real browser with getDisplayMedia support")
def test_real_system_audio_capture_connects():
"""Verify WebSocket endpoint accepts system-audio source with real DashScope ASR.
Manual steps:
1. Start backend with SYSTEM_AUDIO_ENABLED=true and valid DASHSCOPE_API_KEY
2. Open frontend in Chrome/Edge
3. Click 'System Audio' capture button
4. Browser prompts for screen/tab share select a tab playing audio
5. Verify WebSocket connects at /ws/asr/{video_id}?source=system-audio
6. Verify ASR transcript events appear in real time
7. Verify no 'System audio capture is disabled' error
"""
pass
@pytest.mark.acceptance
@pytest.mark.slow
@pytest.mark.skip(reason="Requires real microphone and browser permission")
def test_real_mic_capture_connects():
"""Verify WebSocket endpoint accepts mic source with real DashScope ASR.
Manual steps:
1. Start backend with MIC_ENABLED=true and valid DASHSCOPE_API_KEY
2. Open frontend in Chrome/Edge
3. Click 'Microphone' capture button
4. Browser prompts for microphone permission grant it
5. Verify WebSocket connects at /ws/asr/{video_id}?source=mic
6. Speak into microphone and verify ASR transcript events appear
7. Verify no 'Microphone capture is disabled' error
"""
pass
@pytest.mark.acceptance
@pytest.mark.slow
@pytest.mark.skip(reason="Requires real DashScope ASR and browser audio capture")
def test_real_dashscope_asr_with_system_audio():
"""Real DashScope ASR processes system audio PCM stream.
Manual steps:
1. Start backend with SYSTEM_AUDIO_ENABLED=true
2. Play a Cantonese audio/video clip in a browser tab
3. Start system audio capture targeting that tab
4. Verify DashScope OmniRealtimeConversation receives PCM chunks
5. Verify transcript sentences appear with sentence-begin/sentence-end events
6. Verify accumulated transcript text is non-empty
7. Check backend logs for 'dashscope-session-closed' with text_len > 0
"""
pass
@pytest.mark.acceptance
@pytest.mark.slow
@pytest.mark.skip(reason="Requires real DashScope ASR and microphone")
def test_real_dashscope_asr_with_microphone():
"""Real DashScope ASR processes microphone PCM stream.
Manual steps:
1. Start backend with MIC_ENABLED=true
2. Start microphone capture in frontend
3. Speak a clear Cantonese sentence (e.g. '今日天氣好好')
4. Verify DashScope returns transcription results
5. Verify transcript text matches spoken content (allow partial match)
6. Verify Traditional Chinese conversion applied (if _to_traditional active)
7. Check backend logs for 'dashscope-session-closed' with text_len > 0
"""
pass
@pytest.mark.acceptance
@pytest.mark.slow
@pytest.mark.skip(reason="Full end-to-end requires browser + ASR + LLM")
def test_end_to_end_capture_to_rag():
"""End-to-end: capture system audio -> ASR transcript -> RAG query.
Manual steps:
1. Ingest relevant documents via /api/v1/ingest
2. Start backend with SYSTEM_AUDIO_ENABLED=true, valid DASHSCOPE_API_KEY,
and valid LLM_BASE_URL/LLM_API_KEY
3. Open frontend, start system audio capture on a tab with relevant audio
4. Let ASR produce a transcript in the query input
5. Click 'Ask' to trigger RAG query
6. Verify SSE stream: decomposed -> retrieving -> filtering -> generating -> completed
7. Verify final answer contains bullet points with source citations
8. Verify sources reference ingested documents
"""
pass

View File

@ -0,0 +1,76 @@
"""Phase 4 integration tests: System Audio and Mic capture WebSocket endpoint.
Covers:
- WebSocket handshake with UUID-format video_id (no actual video file)
- source=system-audio connection accepted with language param
- source=mic connection accepted with language param
- Config toggles disable both system-audio and mic features
Uses FastAPI TestClient with real router. Only external DashScope ASR is
implicitly avoided (client disconnects before proxy call completes).
"""
import uuid
import pytest
from fastapi import FastAPI
from fastapi.testclient import TestClient
@pytest.fixture
def phase4_integration_app(monkeypatch):
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
monkeypatch.setenv("SYSTEM_AUDIO_ENABLED", "true")
monkeypatch.setenv("MIC_ENABLED", "true")
from app.core.config import get_settings
from app.routers.ws_asr import router
get_settings.cache_clear()
app = FastAPI()
app.include_router(router)
return app
def test_websocket_accepts_uuid_video_id(phase4_integration_app):
"""WebSocket connects with a proper UUID video_id; no actual video file needed."""
video_uuid = str(uuid.uuid4())
client = TestClient(phase4_integration_app)
with client.websocket_connect(f"/ws/asr/{video_uuid}") as ws:
pass
def test_websocket_accepts_system_audio_source(phase4_integration_app):
"""WebSocket with source=system-audio and language=en connects for audio processing."""
client = TestClient(phase4_integration_app)
with client.websocket_connect(
"/ws/asr/integ-test-vid?source=system-audio&language=en"
) as ws:
pass
def test_websocket_accepts_mic_source(phase4_integration_app):
"""WebSocket with source=mic and language=zh connects successfully."""
client = TestClient(phase4_integration_app)
with client.websocket_connect(
"/ws/asr/integ-test-vid?source=mic&language=zh"
) as ws:
pass
def test_config_toggles_disable_features(monkeypatch):
"""When both toggles disabled, system-audio and mic sources return error messages."""
monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
monkeypatch.setenv("SYSTEM_AUDIO_ENABLED", "false")
monkeypatch.setenv("MIC_ENABLED", "false")
from app.core.config import get_settings
from app.routers.ws_asr import router
get_settings.cache_clear()
app = FastAPI()
app.include_router(router)
client = TestClient(app)
with client.websocket_connect("/ws/asr/vid-1?source=system-audio") as ws:
data = ws.receive_json()
assert "disabled" in data.get("error", "").lower()
with client.websocket_connect("/ws/asr/vid-2?source=mic") as ws:
data = ws.receive_json()
assert "disabled" in data.get("error", "").lower()

View File

@ -143,11 +143,14 @@ User Question
| Phase 1 Frontend | 2-3 days | UI layout + text query flow | ✅ Complete | | Phase 1 Frontend | 2-3 days | UI layout + text query flow | ✅ Complete |
| Phase 2 Backend | 4-5 days | Video upload + WebSocket ASR + question extraction | ✅ Complete | | Phase 2 Backend | 4-5 days | Video upload + WebSocket ASR + question extraction | ✅ Complete |
| Phase 2 Frontend | 3-4 days | Video player + live transcript + auto/manual flow | ✅ Complete | | Phase 2 Frontend | 3-4 days | Video player + live transcript + auto/manual flow | ✅ Complete |
| Phase 4 System Audio & Mic | 5.5 days | System Audio capture + Listen Mic + real-time ASR → RAG | ✅ Complete |
| Testing & Polish | 1-2 days | End-to-end testing + deployment scripts | ⬜ Pending | | Testing & Polish | 1-2 days | End-to-end testing + deployment scripts | ⬜ Pending |
**Total Estimated Effort**: 13-17 developer days (2-3 weeks) **Total Estimated Effort**: 19-23 developer days (3-4 weeks)
> **Note:** Phase 3 (YouTube Live Stream Proxy → ASR) was implemented (5.5 days, 7 sub-phases) and later reverted in favor of Phase 4's more versatile System Audio Capture approach using `getDisplayMedia()`. > **Note:** Phase 3 (YouTube Live Stream Proxy → ASR) was implemented (5.5 days, 7 sub-phases) and later reverted in favor of Phase 4's more versatile System Audio Capture approach using `getDisplayMedia()`.
>
> **Phase 4** adds System Audio Capture (`getDisplayMedia`) and Listen Mic (`getUserMedia`) as live audio sources alongside video Upload. Both pipe audio through the existing WebSocket → DashScope realtime ASR → RAG pipeline. Implementation complete with 46 frontend + 14 backend tests. See `.plans/phase4_system_audio_plan.md` for details.
--- ---
@ -167,4 +170,4 @@ User Question
**File Information** **File Information**
- Filename: `development_plan.md` - Filename: `development_plan.md`
- Last Updated: May 2026 - Last Updated: May 2026
- Status: Phase 1 ✅, Phase 2 ✅ — Phase 4 (System Audio Capture) up next, Phase 3 removed - Status: Phase 1 ✅, Phase 2 ✅, Phase 4 ✅ — System Audio Capture & Listen Mic complete

View File

@ -0,0 +1,36 @@
/**
* Browser compatibility detection utilities.
* Used for showing/hiding compatibility warnings in capture UI components.
*/
/**
* Checks whether system audio capture (getDisplayMedia with audio) is supported.
* Requires Chromium-based browser (Chrome/Edge) Firefox and Safari do not support
* capturing system audio via the Screen Capture API.
*
* @returns true if the browser likely supports system audio capture
*/
export function isSystemAudioSupported(): boolean {
if (typeof navigator === 'undefined') return false
const ua = navigator.userAgent
// Firefox does not support capturing audio with getDisplayMedia
if (ua.includes('Firefox')) return false
// Safari does not support it either
if (ua.includes('Safari') && !ua.includes('Chrome') && !ua.includes('Chromium')) {
return false
}
// Mobile browsers don't support it
if (/Android|iPhone|iPad|iPod/i.test(ua)) return false
// Check for Chromium engine
if ('chrome' in window || ua.includes('Chrome') || ua.includes('Chromium') || ua.includes('Edg')) {
return true
}
// Default: assume not supported
return false
}

View File

@ -0,0 +1,283 @@
/**
* Phase 4 LTTPage integration tests for System Audio and Listen Mic tabs.
*
* Coverage:
* - SourceSelector renders all 3 tabs
* - Default source is Upload (VideoUpload visible)
* - Switching to System Audio tab renders SystemAudioCapture
* - Switching to Listen Mic tab renders MicCapture
* - System Audio tab hides Full Transcript button
* - Listen Mic tab hides Full Transcript button
* - Full Transcript button only visible in Upload tab when video present
* - QueryInput receives partialText from active ASR source
*/
import React from 'react'
import { render, screen, fireEvent, waitFor } from '@testing-library/react'
import { describe, it, expect, vi, beforeEach } from 'vitest'
import { LTTPage } from '../pages/LTTPage'
const mockQueryStreamMutate = vi.fn()
const mockMutate = vi.fn()
const mockReset = vi.fn()
const mockRequestFullTranscript = vi.fn()
const mockStartStreaming = vi.fn()
const mockStopStreaming = vi.fn()
const mockStartCapture = vi.fn()
const mockStopCapture = vi.fn()
const mockStartListening = vi.fn()
const mockStopListening = vi.fn()
let mockQueryStreamPhase = 'idle'
let mockQueryStreamExtractedQuestions: string[] | null = null
let mockIsPending = false
let mockIsError = false
let mockError: Error | null = null
let mockData: import('../types').VideoUploadResponse | null = null
let mockASRTranscript = ''
let mockASRPartialTranscript = ''
let mockASRIsStreaming = false
let mockASRStatus = 'idle'
let mockFTFullTranscript = ''
let mockFTIsLoading = false
let mockFTError: string | null = null
let mockSystemAudioStatus: import('../types').SystemAudioStatus = 'idle'
let mockSystemAudioTranscript = ''
let mockSystemAudioPartialTranscript = ''
let mockSystemAudioError: string | null = null
let mockMicStatus: import('../types').MicStatus = 'idle'
let mockMicTranscript = ''
let mockMicPartialTranscript = ''
let mockMicError: string | null = null
vi.mock('../lib/queries', () => ({
useQueryDocumentStream: () => ({
phase: mockQueryStreamPhase,
extractedQuestions: mockQueryStreamExtractedQuestions,
answer: null,
sources: null,
subQuestionSources: null,
historyId: null,
error: null,
mutate: mockQueryStreamMutate,
reset: vi.fn(),
}),
useVideoUpload: () => ({
mutate: mockMutate,
isPending: mockIsPending,
isError: mockIsError,
error: mockError,
data: mockData,
reset: mockReset,
}),
}))
vi.mock('../hooks/useVideoASR', () => ({
useVideoASR: () => ({
transcript: mockASRTranscript,
partialTranscript: mockASRPartialTranscript,
isStreaming: mockASRIsStreaming,
status: mockASRStatus,
startStreaming: mockStartStreaming,
stopStreaming: mockStopStreaming,
}),
}))
vi.mock('../hooks/useFullTranscript', () => ({
useFullTranscript: () => ({
fullTranscript: mockFTFullTranscript,
isLoading: mockFTIsLoading,
error: mockFTError,
requestFullTranscript: mockRequestFullTranscript,
}),
}))
vi.mock('../hooks/useSystemAudioASR', () => ({
useSystemAudioASR: () => ({
status: mockSystemAudioStatus,
transcript: mockSystemAudioTranscript,
partialTranscript: mockSystemAudioPartialTranscript,
error: mockSystemAudioError,
startCapture: mockStartCapture,
stopCapture: mockStopCapture,
}),
}))
vi.mock('../hooks/useMicASR', () => ({
useMicASR: () => ({
status: mockMicStatus,
transcript: mockMicTranscript,
partialTranscript: mockMicPartialTranscript,
error: mockMicError,
startListening: mockStartListening,
stopListening: mockStopListening,
}),
}))
vi.mock('../hooks/useMediaStreamASR', () => ({
useMediaStreamASR: () => ({
status: 'idle',
transcript: '',
partialTranscript: '',
error: null,
start: vi.fn(),
stop: vi.fn(),
}),
}))
vi.mock('../lib/api', () => ({
getVideoUrl: (videoId: string) => `http://localhost:8000/api/v1/video/${videoId}`,
}))
describe('LTTPage integration (Phase 4 — System Audio & Listen Mic tabs)', () => {
beforeEach(() => {
vi.clearAllMocks()
mockQueryStreamMutate.mockClear()
mockMutate.mockClear()
mockReset.mockClear()
mockRequestFullTranscript.mockClear()
mockStartCapture.mockClear()
mockStopCapture.mockClear()
mockStartListening.mockClear()
mockStopListening.mockClear()
mockQueryStreamPhase = 'idle'
mockQueryStreamExtractedQuestions = null
mockIsPending = false
mockIsError = false
mockError = null
mockData = null
mockASRTranscript = ''
mockASRPartialTranscript = ''
mockASRIsStreaming = false
mockASRStatus = 'idle'
mockFTFullTranscript = ''
mockFTIsLoading = false
mockFTError = null
mockSystemAudioStatus = 'idle'
mockSystemAudioTranscript = ''
mockSystemAudioPartialTranscript = ''
mockSystemAudioError = null
mockMicStatus = 'idle'
mockMicTranscript = ''
mockMicPartialTranscript = ''
mockMicError = null
})
it('renders SourceSelector with all 3 tabs', () => {
render(<LTTPage />)
const tablist = screen.getByRole('tablist')
expect(tablist).toBeInTheDocument()
const tabs = screen.getAllByRole('tab')
expect(tabs).toHaveLength(3)
expect(screen.getByRole('tab', { name: /upload/i })).toBeInTheDocument()
expect(screen.getByRole('tab', { name: /system audio/i })).toBeInTheDocument()
expect(screen.getByRole('tab', { name: /listen mic/i })).toBeInTheDocument()
})
it('default source is Upload, showing VideoUpload', () => {
render(<LTTPage />)
expect(screen.getByTestId('video-dropzone')).toBeInTheDocument()
expect(screen.queryByText('Start Capture')).not.toBeInTheDocument()
expect(screen.queryByText('Start Listening')).not.toBeInTheDocument()
})
it('switching to System Audio tab renders SystemAudioCapture', () => {
render(<LTTPage />)
fireEvent.click(screen.getByRole('tab', { name: /system audio/i }))
expect(screen.getByRole('button', { name: /start capture/i })).toBeInTheDocument()
expect(screen.queryByTestId('video-dropzone')).not.toBeInTheDocument()
expect(screen.queryByText('Start Listening')).not.toBeInTheDocument()
})
it('switching to Listen Mic tab renders MicCapture', () => {
render(<LTTPage />)
fireEvent.click(screen.getByRole('tab', { name: /listen mic/i }))
expect(screen.getByRole('button', { name: /start listening/i })).toBeInTheDocument()
expect(screen.queryByTestId('video-dropzone')).not.toBeInTheDocument()
expect(screen.queryByText('Start Capture')).not.toBeInTheDocument()
})
it('System Audio tab shows capture UI, not Full Transcript button', () => {
render(<LTTPage />)
fireEvent.click(screen.getByRole('tab', { name: /system audio/i }))
expect(screen.getByRole('button', { name: /start capture/i })).toBeInTheDocument()
expect(screen.queryByRole('button', { name: /full transcript/i })).not.toBeInTheDocument()
})
it('Listen Mic tab shows listening UI, not Full Transcript button', () => {
render(<LTTPage />)
fireEvent.click(screen.getByRole('tab', { name: /listen mic/i }))
expect(screen.getByRole('button', { name: /start listening/i })).toBeInTheDocument()
expect(screen.queryByRole('button', { name: /full transcript/i })).not.toBeInTheDocument()
})
it('Full Transcript button visible only in Upload tab', async () => {
const mockResponse = {
video_id: 'vid-456',
filename: 'test.mp4',
size_bytes: 1024,
url: 'http://localhost:8000/api/v1/video/vid-456',
}
mockMutate.mockImplementation((_vars: any, options?: any) => {
if (options?.onSuccess) {
options.onSuccess(mockResponse)
}
})
render(<LTTPage />)
const dropzone = screen.getByTestId('video-dropzone')
const file = new File(['dummy'], 'test.mp4', { type: 'video/mp4' })
fireEvent.drop(dropzone, {
dataTransfer: { files: [file] },
})
await waitFor(() => {
expect(screen.getByRole('button', { name: /full transcript/i })).toBeInTheDocument()
})
fireEvent.click(screen.getByRole('tab', { name: /system audio/i }))
expect(screen.queryByRole('button', { name: /full transcript/i })).not.toBeInTheDocument()
fireEvent.click(screen.getByRole('tab', { name: /upload/i }))
await waitFor(() => {
expect(screen.getByRole('button', { name: /full transcript/i })).toBeInTheDocument()
})
})
it('QueryInput receives transcript text from active ASR source', async () => {
mockSystemAudioPartialTranscript = 'sys audio partial'
render(<LTTPage />)
fireEvent.click(screen.getByRole('tab', { name: /system audio/i }))
await waitFor(() => {
const textarea = screen.getByPlaceholderText('Ask a question about your documents...')
expect(textarea).toHaveValue('sys audio partial')
})
})
})

View File

@ -0,0 +1,69 @@
import { describe, it, expect, vi } from 'vitest'
import { render, screen, fireEvent } from '@testing-library/react'
import { MicCapture } from '../components/MicCapture'
import type { MicStatus } from '../types'
const defaultProps = {
status: 'idle' as MicStatus,
error: null as string | null,
onStart: vi.fn(),
onStop: vi.fn(),
}
describe('MicCapture', () => {
it('renders "Start Listening" button with Mic icon in idle state', () => {
render(<MicCapture {...defaultProps} />)
expect(screen.getByRole('button', { name: /start listening/i })).toBeInTheDocument()
const svg = screen.getByRole('button', { name: /start listening/i }).querySelector('svg')
expect(svg).toBeInTheDocument()
})
it('does not render compatibility warning in idle state', () => {
render(<MicCapture {...defaultProps} />)
expect(screen.queryByText(/chrome\/edge/i)).not.toBeInTheDocument()
expect(screen.queryByText(/firefox/i)).not.toBeInTheDocument()
})
it('calls onStart when Start Listening button is clicked', () => {
const onStart = vi.fn()
render(<MicCapture {...defaultProps} onStart={onStart} />)
fireEvent.click(screen.getByRole('button', { name: /start listening/i }))
expect(onStart).toHaveBeenCalledTimes(1)
})
it('renders "Waiting for microphone permission..." with spinner in requesting state', () => {
render(<MicCapture {...defaultProps} status="requesting" />)
expect(screen.getByText(/waiting for microphone permission/i)).toBeInTheDocument()
const spinner = document.querySelector('.animate-spin')
expect(spinner).toBeInTheDocument()
})
it('renders "Stop Listening" button, pulsing green dot, and "Listening..." in listening state', () => {
render(<MicCapture {...defaultProps} status="listening" />)
expect(screen.getByRole('button', { name: /stop listening/i })).toBeInTheDocument()
expect(screen.getByText('Listening...')).toBeInTheDocument()
const dot = document.querySelector('.animate-pulse')
expect(dot).toBeInTheDocument()
})
it('renders "Stopping..." disabled button in stopping state', () => {
render(<MicCapture {...defaultProps} status="stopping" />)
const btn = screen.getByRole('button', { name: /stopping/i })
expect(btn).toBeDisabled()
})
it('renders error message in red box and "Try Again" button in error state', () => {
render(<MicCapture {...defaultProps} status="error" error="Mic not found" />)
expect(screen.getByText('Mic not found')).toBeInTheDocument()
const redBox = screen.getByText('Mic not found').closest('[class*="red"]')
expect(redBox).toBeInTheDocument()
expect(screen.getByRole('button', { name: /try again/i })).toBeInTheDocument()
})
it('calls onStart when "Try Again" is clicked in error state', () => {
const onStart = vi.fn()
render(<MicCapture {...defaultProps} status="error" error="fail" onStart={onStart} />)
fireEvent.click(screen.getByRole('button', { name: /try again/i }))
expect(onStart).toHaveBeenCalledTimes(1)
})
})

View File

@ -0,0 +1,77 @@
import { describe, it, expect, vi } from 'vitest'
import { render, screen, fireEvent } from '@testing-library/react'
import { SystemAudioCapture } from '../components/SystemAudioCapture'
import type { SystemAudioStatus } from '../types'
const defaultProps = {
status: 'idle' as SystemAudioStatus,
error: null as string | null,
onStart: vi.fn(),
onStop: vi.fn(),
}
describe('SystemAudioCapture', () => {
it('renders "Start Capture" button with MonitorSpeaker icon in idle state', () => {
render(<SystemAudioCapture {...defaultProps} />)
expect(screen.getByRole('button', { name: /start capture/i })).toBeInTheDocument()
const svg = screen.getByRole('button', { name: /start capture/i }).querySelector('svg')
expect(svg).toBeInTheDocument()
})
it('renders browser compatibility warning in idle state', () => {
render(<SystemAudioCapture {...defaultProps} />)
expect(screen.getByText(/chrome\/edge/i)).toBeInTheDocument()
const amberBox = screen.getByText(/chrome\/edge/i).closest('[class*="amber"]')
expect(amberBox).toBeInTheDocument()
})
it('calls onStart when Start Capture button is clicked', () => {
const onStart = vi.fn()
render(<SystemAudioCapture {...defaultProps} onStart={onStart} />)
fireEvent.click(screen.getByRole('button', { name: /start capture/i }))
expect(onStart).toHaveBeenCalledTimes(1)
})
it('renders "Waiting for permission..." with spinner in requesting state', () => {
render(<SystemAudioCapture {...defaultProps} status="requesting" />)
expect(screen.getByText(/waiting for permission/i)).toBeInTheDocument()
const spinner = document.querySelector('.animate-spin')
expect(spinner).toBeInTheDocument()
})
it('renders "Stop Capture" button, pulsing green dot, and "Capturing system audio..." in capturing state', () => {
render(<SystemAudioCapture {...defaultProps} status="capturing" />)
expect(screen.getByRole('button', { name: /stop capture/i })).toBeInTheDocument()
expect(screen.getByText(/capturing system audio/i)).toBeInTheDocument()
const dot = document.querySelector('.animate-pulse')
expect(dot).toBeInTheDocument()
})
it('renders "Stopping..." disabled button in stopping state', () => {
render(<SystemAudioCapture {...defaultProps} status="stopping" />)
const btn = screen.getByRole('button', { name: /stopping/i })
expect(btn).toBeDisabled()
})
it('renders error message in red box and "Try Again" button in error state', () => {
render(<SystemAudioCapture {...defaultProps} status="error" error="Permission denied" />)
expect(screen.getByText('Permission denied')).toBeInTheDocument()
const redBox = screen.getByText('Permission denied').closest('[class*="red"]')
expect(redBox).toBeInTheDocument()
expect(screen.getByRole('button', { name: /try again/i })).toBeInTheDocument()
})
it('calls onStart when "Try Again" is clicked in error state', () => {
const onStart = vi.fn()
render(<SystemAudioCapture {...defaultProps} status="error" error="fail" onStart={onStart} />)
fireEvent.click(screen.getByRole('button', { name: /try again/i }))
expect(onStart).toHaveBeenCalledTimes(1)
})
it('calls onStop when "Stop Capture" is clicked in capturing state', () => {
const onStop = vi.fn()
render(<SystemAudioCapture {...defaultProps} status="capturing" onStop={onStop} />)
fireEvent.click(screen.getByRole('button', { name: /stop capture/i }))
expect(onStop).toHaveBeenCalledTimes(1)
})
})

View File

@ -0,0 +1,166 @@
/**
* Phase 4 tests: useMicASR hook state management.
*
* Mocks useMediaStreamASR (the shared pipeline) and navigator.mediaDevices.getUserMedia.
* Does NOT test AudioContext/WebSocket internals those belong to useMediaStreamASR tests.
*/
import { renderHook, act } from '@testing-library/react'
import { describe, it, expect, vi, beforeEach } from 'vitest'
import type { MicStatus } from '../types'
const mockStart = vi.fn()
const mockStop = vi.fn()
vi.mock('../hooks/useMediaStreamASR', () => ({
useMediaStreamASR: () => ({
status: 'idle',
transcript: '',
partialTranscript: '',
error: null,
start: mockStart,
stop: mockStop,
}),
}))
import { useMicASR } from '../hooks/useMicASR'
function makeMicStream(audioTracks = 1) {
const audio = Array.from({ length: audioTracks }, () => ({ stop: vi.fn(), kind: 'audio' }))
return {
getAudioTracks: () => audio,
getTracks: () => audio,
} as unknown as MediaStream
}
function setGetUserMedia(fn: () => Promise<MediaStream>) {
Object.defineProperty(globalThis.navigator, 'mediaDevices', {
value: { getUserMedia: vi.fn(fn) },
writable: true,
configurable: true,
})
}
beforeEach(() => {
vi.clearAllMocks()
setGetUserMedia(async () => makeMicStream())
})
describe('useMicASR', () => {
it('test_initial_state', () => {
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
expect(result.current.status).toBe<MicStatus>('idle')
expect(result.current.transcript).toBe('')
expect(result.current.partialTranscript).toBe('')
expect(result.current.error).toBeNull()
})
it('test_returns_startListening_and_stopListening', () => {
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
expect(typeof result.current.startListening).toBe('function')
expect(typeof result.current.stopListening).toBe('function')
})
it('test_startListening_calls_getUserMedia_with_audio_constraints', async () => {
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startListening()
})
expect(navigator.mediaDevices.getUserMedia).toHaveBeenCalledWith({
audio: { echoCancellation: false, noiseSuppression: false, autoGainControl: false },
video: false,
})
})
it('test_startListening_sets_status_to_requesting_then_listening', async () => {
setGetUserMedia(async () => makeMicStream())
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
act(() => {
result.current.startListening()
})
expect(result.current.status).toBe<MicStatus>('requesting')
await act(async () => {
await new Promise((r) => setTimeout(r, 0))
})
expect(navigator.mediaDevices.getUserMedia).toHaveBeenCalled()
expect(mockStart).toHaveBeenCalledTimes(1)
expect(mockStart.mock.calls[0][0].getAudioTracks()).toHaveLength(1)
})
it('test_startListening_NotAllowedError_sets_idle', async () => {
setGetUserMedia(async () => { throw new DOMException('Denied', 'NotAllowedError') })
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startListening()
})
expect(result.current.status).toBe<MicStatus>('idle')
expect(result.current.error).toMatch(/Microphone access denied/)
})
it('test_startListening_NotFoundError_sets_error', async () => {
setGetUserMedia(async () => { throw new DOMException('No device', 'NotFoundError') })
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startListening()
})
expect(result.current.status).toBe<MicStatus>('error')
expect(result.current.error).toMatch(/No microphone found/)
})
it('test_startListening_NotSupportedError_sets_error', async () => {
setGetUserMedia(async () => { throw new DOMException('Not supported', 'NotSupportedError') })
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startListening()
})
expect(result.current.status).toBe<MicStatus>('error')
expect(result.current.error).toMatch(/not supported/)
})
it('test_startListening_no_audio_track_sets_error', async () => {
setGetUserMedia(async () => makeMicStream(0))
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startListening()
})
expect(result.current.status).toBe<MicStatus>('error')
expect(result.current.error).toMatch(/No microphone input/)
})
it('test_stopListening_calls_pipeline_stop', () => {
const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
act(() => {
result.current.stopListening()
})
expect(mockStop).toHaveBeenCalledTimes(1)
})
it('test_cleanup_on_unmount', () => {
const { unmount } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
expect(() => {
unmount()
}).not.toThrow()
})
})

View File

@ -0,0 +1,181 @@
/**
* Phase 4 tests: useSystemAudioASR hook state management.
*
* Mocks useMediaStreamASR (the shared pipeline) and navigator.mediaDevices.getDisplayMedia.
* Does NOT test AudioContext/WebSocket internals those belong to useMediaStreamASR tests.
*/
import { renderHook, act } from '@testing-library/react'
import { describe, it, expect, vi, beforeEach } from 'vitest'
import type { SystemAudioStatus } from '../types'
const mockStart = vi.fn()
const mockStop = vi.fn()
vi.mock('../hooks/useMediaStreamASR', () => ({
useMediaStreamASR: () => ({
status: 'idle',
transcript: '',
partialTranscript: '',
error: null,
start: mockStart,
stop: mockStop,
}),
}))
import { useSystemAudioASR } from '../hooks/useSystemAudioASR'
function makeStream(audioTracks = 1) {
const audio = Array.from({ length: audioTracks }, () => ({ stop: vi.fn(), kind: 'audio' }))
const video = [{ stop: vi.fn(), kind: 'video' }]
return {
getAudioTracks: () => audio,
getVideoTracks: () => video,
getTracks: () => [...audio, ...video],
} as unknown as MediaStream
}
function setGetDisplayMedia(fn: () => Promise<MediaStream>) {
Object.defineProperty(globalThis.navigator, 'mediaDevices', {
value: { getDisplayMedia: vi.fn(fn) },
writable: true,
configurable: true,
})
}
beforeEach(() => {
vi.clearAllMocks()
setGetDisplayMedia(async () => makeStream())
})
describe('useSystemAudioASR', () => {
it('test_initial_state', () => {
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
expect(result.current.status).toBe<SystemAudioStatus>('idle')
expect(result.current.transcript).toBe('')
expect(result.current.partialTranscript).toBe('')
expect(result.current.error).toBeNull()
})
it('test_returns_startCapture_and_stopCapture', () => {
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
expect(typeof result.current.startCapture).toBe('function')
expect(typeof result.current.stopCapture).toBe('function')
})
it('test_startCapture_calls_getDisplayMedia_with_correct_constraints', async () => {
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startCapture()
})
expect(navigator.mediaDevices.getDisplayMedia).toHaveBeenCalledWith({
video: true,
audio: { systemAudio: 'include', echoCancellation: false, noiseSuppression: false, autoGainControl: false },
})
})
it('test_startCapture_sets_status_to_requesting_then_capturing', async () => {
setGetDisplayMedia(async () => makeStream())
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
act(() => {
result.current.startCapture()
})
expect(result.current.status).toBe<SystemAudioStatus>('requesting')
await act(async () => {
await new Promise((r) => setTimeout(r, 0))
})
expect(navigator.mediaDevices.getDisplayMedia).toHaveBeenCalled()
expect(mockStart).toHaveBeenCalledTimes(1)
expect(mockStart.mock.calls[0][0].getAudioTracks()).toHaveLength(1)
})
it('test_startCapture_AbortError_sets_idle', async () => {
setGetDisplayMedia(async () => { throw new DOMException('User abort', 'AbortError') })
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startCapture()
})
expect(result.current.status).toBe<SystemAudioStatus>('idle')
expect(result.current.error).toMatch(/Permission denied/)
})
it('test_startCapture_NotAllowedError_sets_idle', async () => {
setGetDisplayMedia(async () => { throw new DOMException('Denied', 'NotAllowedError') })
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startCapture()
})
expect(result.current.status).toBe<SystemAudioStatus>('idle')
expect(result.current.error).toMatch(/Permission denied/)
})
it('test_startCapture_NotSupportedError_sets_error_with_platform_warning', async () => {
setGetDisplayMedia(async () => { throw new DOMException('Not supported', 'NotSupportedError') })
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startCapture()
})
expect(result.current.status).toBe<SystemAudioStatus>('error')
expect(result.current.error).toMatch(/not supported on this platform/)
})
it('test_startCapture_no_audio_track_sets_error', async () => {
setGetDisplayMedia(async () => makeStream(0))
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
await act(async () => {
await result.current.startCapture()
})
expect(result.current.status).toBe<SystemAudioStatus>('error')
expect(result.current.error).toMatch(/No audio track/)
})
it('test_stopCapture_calls_pipeline_stop', () => {
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
act(() => {
result.current.stopCapture()
})
expect(mockStop).toHaveBeenCalledTimes(1)
})
it('test_cleanup_on_unmount', () => {
const { unmount } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
expect(() => {
unmount()
}).not.toThrow()
})
it('test_rapid_start_stop_cycles', async () => {
const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
for (let i = 0; i < 3; i++) {
setGetDisplayMedia(async () => makeStream())
await act(async () => { await result.current.startCapture() })
act(() => { result.current.stopCapture() })
}
expect(mockStart).toHaveBeenCalledTimes(3)
expect(mockStop).toHaveBeenCalledTimes(3)
})
})