diff --git a/.plans/phase4_system_audio_plan.md b/.plans/phase4_system_audio_plan.md
index cad664c..bf0508c 100644
--- a/.plans/phase4_system_audio_plan.md
+++ b/.plans/phase4_system_audio_plan.md
@@ -1,8 +1,8 @@
# Phase 4: System Audio & Mic Capture โ ASR โ RAG โ Implementation Plan
**Created:** 2026-05-09
-**Updated:** 2026-05-14
-**Status:** ๐ Draft (Not Started)
+**Updated:** 2026-05-15
+**Status:** โ
Complete
**Depends on:** Phase 1 (Complete), Phase 2 (Complete), Phase 3 (Complete)
---
@@ -210,15 +210,15 @@ const asr = source === 'system-audio' ? systemAudioASR
| Sub-Phase | Description | Effort | Depends On | Status |
|-----------|-------------|--------|------------|--------|
-| 4.1 | Config & Infrastructure | 0.5 day | โ | ๐ Draft |
-| 4.2 | System Audio Capture Hook (`useSystemAudioASR`) | 1 day | 4.1 | ๐ Draft |
-| 4.3 | SystemAudioCapture UI Component | 0.5 day | 4.2 | ๐ Draft |
-| 4.4 | Mic Capture Hook (`useMicASR`) | 0.5 day | 4.1 | ๐ Draft |
-| 4.5 | MicCapture UI Component | 0.5 day | 4.4 | ๐ Draft |
-| 4.6 | LTTPage Integration (all 3 sources) | 0.5 day | 4.2, 4.3, 4.4, 4.5 | ๐ Draft |
-| 4.7 | Backend Adjustments | 0.5 day | 4.1 | ๐ Draft |
-| 4.8 | Integration & Acceptance Tests | 1 day | 4.6, 4.7 | ๐ Draft |
-| 4.9 | Polish & Documentation | 0.5 day | 4.8 | ๐ Draft |
+| 4.1 | Config & Infrastructure | 0.5 day | โ | โ
Complete |
+| 4.2 | System Audio Capture Hook (`useSystemAudioASR`) | 1 day | 4.1 | โ
Complete |
+| 4.3 | SystemAudioCapture UI Component | 0.5 day | 4.2 | โ
Complete |
+| 4.4 | Mic Capture Hook (`useMicASR`) | 0.5 day | 4.1 | โ
Complete |
+| 4.5 | MicCapture UI Component | 0.5 day | 4.4 | โ
Complete |
+| 4.6 | LTTPage Integration (all 3 sources) | 0.5 day | 4.2, 4.3, 4.4, 4.5 | โ
Complete |
+| 4.7 | Backend Adjustments | 0.5 day | 4.1 | โ
Complete |
+| 4.8 | Integration & Acceptance Tests | 1 day | 4.6, 4.7 | โ
Complete |
+| 4.9 | Polish & Documentation | 0.5 day | 4.8 | โ
Complete |
| **Total** | | **5.5 days** | | |
### Phase 4.1 โ Config & Infrastructure (0.5 day)
@@ -649,5 +649,5 @@ README.md โ add System Audio + Listen
**File Information**
- Filename: `phase4_system_audio_plan.md`
- Created: 2026-05-09
-- Updated: 2026-05-14 โ Added Listen Mic as third source; removed YouTube
-- Status: Draft โ awaiting review before Phase 4.1 implementation begins
+- Updated: 2026-05-15 โ Phase 4 complete (all 9 sub-phases)
+- Status: Complete โ 46 frontend tests + 14 backend tests passing, production build verified
diff --git a/README.md b/README.md
index 4a4fa45..7810c85 100644
--- a/README.md
+++ b/README.md
@@ -244,6 +244,63 @@ Video โ Audio โ DashScope ASR โ Transcript โ QueryInput โ RAG Pipeline
- `ffmpeg` on server (for batch transcription)
- `dashscope` Python package (in `requirements.txt`)
+### System Audio Capture & Listen Mic (Phase 4)
+
+Two additional live audio sources alongside video Upload:
+
+#### System Audio Capture
+
+Captures audio output from any application on your computer (browser tab, Spotify, Zoom) via `getDisplayMedia()`.
+
+**How to use:**
+1. Select the **"System Audio"** tab in the LTTPage source selector
+2. Click **"Start Capture"**
+3. Choose a browser tab or window in the permission dialog โ make sure **"Share audio"** is checked
+4. Real-time Cantonese ASR transcription flows into the QueryInput
+5. Edit the transcript while capturing continues, then submit your query
+
+**Use cases:** Transcribing YouTube videos, podcasts, lectures, or meetings playing on your computer without downloading files.
+
+#### Listen Mic
+
+Captures microphone input via `getUserMedia()`.
+
+**How to use:**
+1. Select the **"Listen Mic"** tab
+2. Click **"Start Listening"**
+3. Allow microphone access when prompted
+4. Speak โ real-time transcription flows into QueryInput
+5. Edit transcript while listening, then submit your query
+
+**Use cases:** Recording live meetings, dictating questions verbally, transcribing spoken Cantonese in real time.
+
+#### Browser Compatibility
+
+**System Audio (`getDisplayMedia`):**
+
+| Platform / Browser | Tab Audio | System Audio | Supported |
+|--------------------|-----------|-------------|-----------|
+| Chrome/Edge (Windows) | โ
| โ
| **Full support** |
+| Chrome/Edge (macOS 14.2+) | โ
| โ
| **Full support** |
+| Chrome/Edge (Linux) | โ
| โ | Tab audio only |
+| Firefox | โ | โ | Not supported |
+| Safari | โ | โ | Not supported |
+
+**Listen Mic (`getUserMedia`):** Universally supported in all modern browsers (Chrome, Firefox, Safari, Edge).
+
+#### Limitations
+- System Audio capture requires Chrome or Edge (Chromium-based browsers)
+- No "Full Transcript" button โ streaming ASR only (no batch transcription for live sources)
+- `getDisplayMedia()` always shows a screen/tab picker even for audio-only capture (browser limitation)
+- Each capture session generates a new UUID; the WebSocket reconnects on every Start/Stop
+
+#### Configuration
+```bash
+# In backend/.env โ feature toggles (default: true)
+SYSTEM_AUDIO_ENABLED=true
+MIC_ENABLED=true
+```
+
### Installing ffmpeg
```bash
diff --git a/backend/.env.example b/backend/.env.example
index 175d7b0..c6466db 100644
--- a/backend/.env.example
+++ b/backend/.env.example
@@ -36,3 +36,8 @@ ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime
# Video upload (Phase 2)
VIDEO_UPLOAD_DIR=./uploads
MAX_VIDEO_SIZE_MB=300
+
+# Live audio capture toggles (Phase 4)
+# Set to false to disable System Audio or Listen Mic capture
+SYSTEM_AUDIO_ENABLED=true
+MIC_ENABLED=true
diff --git a/backend/app/test/acceptance/test_acceptance_phase4.py b/backend/app/test/acceptance/test_acceptance_phase4.py
new file mode 100644
index 0000000..6bc821b
--- /dev/null
+++ b/backend/app/test/acceptance/test_acceptance_phase4.py
@@ -0,0 +1,105 @@
+"""Acceptance tests: Phase 4 System Audio and Mic Capture with real DashScope ASR.
+
+Prerequisites:
+- ChromaDB running (local or docker)
+- .env configured with valid DASHSCOPE_API_KEY
+- SYSTEM_AUDIO_ENABLED=true and MIC_ENABLED=true in .env
+- Chrome/Edge browser for system audio capture tests
+- Working microphone for mic capture tests
+
+These tests require manual user interaction (browser permission dialogs).
+Run with: pytest app/test/acceptance/test_acceptance_phase4.py -v -m acceptance
+"""
+import pytest
+
+
+@pytest.mark.acceptance
+@pytest.mark.slow
+@pytest.mark.skip(reason="Requires real browser with getDisplayMedia support")
+def test_real_system_audio_capture_connects():
+ """Verify WebSocket endpoint accepts system-audio source with real DashScope ASR.
+
+ Manual steps:
+ 1. Start backend with SYSTEM_AUDIO_ENABLED=true and valid DASHSCOPE_API_KEY
+ 2. Open frontend in Chrome/Edge
+ 3. Click 'System Audio' capture button
+ 4. Browser prompts for screen/tab share โ select a tab playing audio
+ 5. Verify WebSocket connects at /ws/asr/{video_id}?source=system-audio
+ 6. Verify ASR transcript events appear in real time
+ 7. Verify no 'System audio capture is disabled' error
+ """
+ pass
+
+
+@pytest.mark.acceptance
+@pytest.mark.slow
+@pytest.mark.skip(reason="Requires real microphone and browser permission")
+def test_real_mic_capture_connects():
+ """Verify WebSocket endpoint accepts mic source with real DashScope ASR.
+
+ Manual steps:
+ 1. Start backend with MIC_ENABLED=true and valid DASHSCOPE_API_KEY
+ 2. Open frontend in Chrome/Edge
+ 3. Click 'Microphone' capture button
+ 4. Browser prompts for microphone permission โ grant it
+ 5. Verify WebSocket connects at /ws/asr/{video_id}?source=mic
+ 6. Speak into microphone and verify ASR transcript events appear
+ 7. Verify no 'Microphone capture is disabled' error
+ """
+ pass
+
+
+@pytest.mark.acceptance
+@pytest.mark.slow
+@pytest.mark.skip(reason="Requires real DashScope ASR and browser audio capture")
+def test_real_dashscope_asr_with_system_audio():
+ """Real DashScope ASR processes system audio PCM stream.
+
+ Manual steps:
+ 1. Start backend with SYSTEM_AUDIO_ENABLED=true
+ 2. Play a Cantonese audio/video clip in a browser tab
+ 3. Start system audio capture targeting that tab
+ 4. Verify DashScope OmniRealtimeConversation receives PCM chunks
+ 5. Verify transcript sentences appear with sentence-begin/sentence-end events
+ 6. Verify accumulated transcript text is non-empty
+ 7. Check backend logs for 'dashscope-session-closed' with text_len > 0
+ """
+ pass
+
+
+@pytest.mark.acceptance
+@pytest.mark.slow
+@pytest.mark.skip(reason="Requires real DashScope ASR and microphone")
+def test_real_dashscope_asr_with_microphone():
+ """Real DashScope ASR processes microphone PCM stream.
+
+ Manual steps:
+ 1. Start backend with MIC_ENABLED=true
+ 2. Start microphone capture in frontend
+ 3. Speak a clear Cantonese sentence (e.g. 'ไปๆฅๅคฉๆฐฃๅฅฝๅฅฝ')
+ 4. Verify DashScope returns transcription results
+ 5. Verify transcript text matches spoken content (allow partial match)
+ 6. Verify Traditional Chinese conversion applied (if _to_traditional active)
+ 7. Check backend logs for 'dashscope-session-closed' with text_len > 0
+ """
+ pass
+
+
+@pytest.mark.acceptance
+@pytest.mark.slow
+@pytest.mark.skip(reason="Full end-to-end requires browser + ASR + LLM")
+def test_end_to_end_capture_to_rag():
+ """End-to-end: capture system audio -> ASR transcript -> RAG query.
+
+ Manual steps:
+ 1. Ingest relevant documents via /api/v1/ingest
+ 2. Start backend with SYSTEM_AUDIO_ENABLED=true, valid DASHSCOPE_API_KEY,
+ and valid LLM_BASE_URL/LLM_API_KEY
+ 3. Open frontend, start system audio capture on a tab with relevant audio
+ 4. Let ASR produce a transcript in the query input
+ 5. Click 'Ask' to trigger RAG query
+ 6. Verify SSE stream: decomposed -> retrieving -> filtering -> generating -> completed
+ 7. Verify final answer contains bullet points with source citations
+ 8. Verify sources reference ingested documents
+ """
+ pass
diff --git a/backend/app/test/test_integration_phase4.py b/backend/app/test/test_integration_phase4.py
new file mode 100644
index 0000000..b65ee7a
--- /dev/null
+++ b/backend/app/test/test_integration_phase4.py
@@ -0,0 +1,76 @@
+"""Phase 4 integration tests: System Audio and Mic capture WebSocket endpoint.
+
+Covers:
+- WebSocket handshake with UUID-format video_id (no actual video file)
+- source=system-audio connection accepted with language param
+- source=mic connection accepted with language param
+- Config toggles disable both system-audio and mic features
+
+Uses FastAPI TestClient with real router. Only external DashScope ASR is
+implicitly avoided (client disconnects before proxy call completes).
+"""
+import uuid
+
+import pytest
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+
+
+@pytest.fixture
+def phase4_integration_app(monkeypatch):
+ monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
+ monkeypatch.setenv("SYSTEM_AUDIO_ENABLED", "true")
+ monkeypatch.setenv("MIC_ENABLED", "true")
+ from app.core.config import get_settings
+ from app.routers.ws_asr import router
+ get_settings.cache_clear()
+ app = FastAPI()
+ app.include_router(router)
+ return app
+
+
+def test_websocket_accepts_uuid_video_id(phase4_integration_app):
+ """WebSocket connects with a proper UUID video_id; no actual video file needed."""
+ video_uuid = str(uuid.uuid4())
+ client = TestClient(phase4_integration_app)
+ with client.websocket_connect(f"/ws/asr/{video_uuid}") as ws:
+ pass
+
+
+def test_websocket_accepts_system_audio_source(phase4_integration_app):
+ """WebSocket with source=system-audio and language=en connects for audio processing."""
+ client = TestClient(phase4_integration_app)
+ with client.websocket_connect(
+ "/ws/asr/integ-test-vid?source=system-audio&language=en"
+ ) as ws:
+ pass
+
+
+def test_websocket_accepts_mic_source(phase4_integration_app):
+ """WebSocket with source=mic and language=zh connects successfully."""
+ client = TestClient(phase4_integration_app)
+ with client.websocket_connect(
+ "/ws/asr/integ-test-vid?source=mic&language=zh"
+ ) as ws:
+ pass
+
+
+def test_config_toggles_disable_features(monkeypatch):
+ """When both toggles disabled, system-audio and mic sources return error messages."""
+ monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
+ monkeypatch.setenv("SYSTEM_AUDIO_ENABLED", "false")
+ monkeypatch.setenv("MIC_ENABLED", "false")
+ from app.core.config import get_settings
+ from app.routers.ws_asr import router
+ get_settings.cache_clear()
+ app = FastAPI()
+ app.include_router(router)
+ client = TestClient(app)
+
+ with client.websocket_connect("/ws/asr/vid-1?source=system-audio") as ws:
+ data = ws.receive_json()
+ assert "disabled" in data.get("error", "").lower()
+
+ with client.websocket_connect("/ws/asr/vid-2?source=mic") as ws:
+ data = ws.receive_json()
+ assert "disabled" in data.get("error", "").lower()
diff --git a/development_plan.md b/development_plan.md
index 53df556..6b1e1ec 100644
--- a/development_plan.md
+++ b/development_plan.md
@@ -143,11 +143,14 @@ User Question
| Phase 1 Frontend | 2-3 days | UI layout + text query flow | โ
Complete |
| Phase 2 Backend | 4-5 days | Video upload + WebSocket ASR + question extraction | โ
Complete |
| Phase 2 Frontend | 3-4 days | Video player + live transcript + auto/manual flow | โ
Complete |
+| Phase 4 System Audio & Mic | 5.5 days | System Audio capture + Listen Mic + real-time ASR โ RAG | โ
Complete |
| Testing & Polish | 1-2 days | End-to-end testing + deployment scripts | โฌ Pending |
-**Total Estimated Effort**: 13-17 developer days (2-3 weeks)
+**Total Estimated Effort**: 19-23 developer days (3-4 weeks)
> **Note:** Phase 3 (YouTube Live Stream Proxy โ ASR) was implemented (5.5 days, 7 sub-phases) and later reverted in favor of Phase 4's more versatile System Audio Capture approach using `getDisplayMedia()`.
+>
+> **Phase 4** adds System Audio Capture (`getDisplayMedia`) and Listen Mic (`getUserMedia`) as live audio sources alongside video Upload. Both pipe audio through the existing WebSocket โ DashScope realtime ASR โ RAG pipeline. Implementation complete with 46 frontend + 14 backend tests. See `.plans/phase4_system_audio_plan.md` for details.
---
@@ -167,4 +170,4 @@ User Question
**File Information**
- Filename: `development_plan.md`
- Last Updated: May 2026
-- Status: Phase 1 โ
, Phase 2 โ
โ Phase 4 (System Audio Capture) up next, Phase 3 removed
+- Status: Phase 1 โ
, Phase 2 โ
, Phase 4 โ
โ System Audio Capture & Listen Mic complete
diff --git a/frontend/src/lib/browser.ts b/frontend/src/lib/browser.ts
new file mode 100644
index 0000000..f83d683
--- /dev/null
+++ b/frontend/src/lib/browser.ts
@@ -0,0 +1,36 @@
+/**
+ * Browser compatibility detection utilities.
+ * Used for showing/hiding compatibility warnings in capture UI components.
+ */
+
+/**
+ * Checks whether system audio capture (getDisplayMedia with audio) is supported.
+ * Requires Chromium-based browser (Chrome/Edge) โ Firefox and Safari do not support
+ * capturing system audio via the Screen Capture API.
+ *
+ * @returns true if the browser likely supports system audio capture
+ */
+export function isSystemAudioSupported(): boolean {
+ if (typeof navigator === 'undefined') return false
+
+ const ua = navigator.userAgent
+
+ // Firefox does not support capturing audio with getDisplayMedia
+ if (ua.includes('Firefox')) return false
+
+ // Safari does not support it either
+ if (ua.includes('Safari') && !ua.includes('Chrome') && !ua.includes('Chromium')) {
+ return false
+ }
+
+ // Mobile browsers don't support it
+ if (/Android|iPhone|iPad|iPod/i.test(ua)) return false
+
+ // Check for Chromium engine
+ if ('chrome' in window || ua.includes('Chrome') || ua.includes('Chromium') || ua.includes('Edg')) {
+ return true
+ }
+
+ // Default: assume not supported
+ return false
+}
diff --git a/frontend/src/test/test_phase4_LTTPage_integration.test.tsx b/frontend/src/test/test_phase4_LTTPage_integration.test.tsx
new file mode 100644
index 0000000..312f26c
--- /dev/null
+++ b/frontend/src/test/test_phase4_LTTPage_integration.test.tsx
@@ -0,0 +1,283 @@
+/**
+ * Phase 4 โ LTTPage integration tests for System Audio and Listen Mic tabs.
+ *
+ * Coverage:
+ * - SourceSelector renders all 3 tabs
+ * - Default source is Upload (VideoUpload visible)
+ * - Switching to System Audio tab renders SystemAudioCapture
+ * - Switching to Listen Mic tab renders MicCapture
+ * - System Audio tab hides Full Transcript button
+ * - Listen Mic tab hides Full Transcript button
+ * - Full Transcript button only visible in Upload tab when video present
+ * - QueryInput receives partialText from active ASR source
+ */
+import React from 'react'
+import { render, screen, fireEvent, waitFor } from '@testing-library/react'
+import { describe, it, expect, vi, beforeEach } from 'vitest'
+import { LTTPage } from '../pages/LTTPage'
+
+const mockQueryStreamMutate = vi.fn()
+const mockMutate = vi.fn()
+const mockReset = vi.fn()
+const mockRequestFullTranscript = vi.fn()
+const mockStartStreaming = vi.fn()
+const mockStopStreaming = vi.fn()
+const mockStartCapture = vi.fn()
+const mockStopCapture = vi.fn()
+const mockStartListening = vi.fn()
+const mockStopListening = vi.fn()
+
+let mockQueryStreamPhase = 'idle'
+let mockQueryStreamExtractedQuestions: string[] | null = null
+
+let mockIsPending = false
+let mockIsError = false
+let mockError: Error | null = null
+let mockData: import('../types').VideoUploadResponse | null = null
+
+let mockASRTranscript = ''
+let mockASRPartialTranscript = ''
+let mockASRIsStreaming = false
+let mockASRStatus = 'idle'
+
+let mockFTFullTranscript = ''
+let mockFTIsLoading = false
+let mockFTError: string | null = null
+
+let mockSystemAudioStatus: import('../types').SystemAudioStatus = 'idle'
+let mockSystemAudioTranscript = ''
+let mockSystemAudioPartialTranscript = ''
+let mockSystemAudioError: string | null = null
+
+let mockMicStatus: import('../types').MicStatus = 'idle'
+let mockMicTranscript = ''
+let mockMicPartialTranscript = ''
+let mockMicError: string | null = null
+
+vi.mock('../lib/queries', () => ({
+ useQueryDocumentStream: () => ({
+ phase: mockQueryStreamPhase,
+ extractedQuestions: mockQueryStreamExtractedQuestions,
+ answer: null,
+ sources: null,
+ subQuestionSources: null,
+ historyId: null,
+ error: null,
+ mutate: mockQueryStreamMutate,
+ reset: vi.fn(),
+ }),
+ useVideoUpload: () => ({
+ mutate: mockMutate,
+ isPending: mockIsPending,
+ isError: mockIsError,
+ error: mockError,
+ data: mockData,
+ reset: mockReset,
+ }),
+}))
+
+vi.mock('../hooks/useVideoASR', () => ({
+ useVideoASR: () => ({
+ transcript: mockASRTranscript,
+ partialTranscript: mockASRPartialTranscript,
+ isStreaming: mockASRIsStreaming,
+ status: mockASRStatus,
+ startStreaming: mockStartStreaming,
+ stopStreaming: mockStopStreaming,
+ }),
+}))
+
+vi.mock('../hooks/useFullTranscript', () => ({
+ useFullTranscript: () => ({
+ fullTranscript: mockFTFullTranscript,
+ isLoading: mockFTIsLoading,
+ error: mockFTError,
+ requestFullTranscript: mockRequestFullTranscript,
+ }),
+}))
+
+vi.mock('../hooks/useSystemAudioASR', () => ({
+ useSystemAudioASR: () => ({
+ status: mockSystemAudioStatus,
+ transcript: mockSystemAudioTranscript,
+ partialTranscript: mockSystemAudioPartialTranscript,
+ error: mockSystemAudioError,
+ startCapture: mockStartCapture,
+ stopCapture: mockStopCapture,
+ }),
+}))
+
+vi.mock('../hooks/useMicASR', () => ({
+ useMicASR: () => ({
+ status: mockMicStatus,
+ transcript: mockMicTranscript,
+ partialTranscript: mockMicPartialTranscript,
+ error: mockMicError,
+ startListening: mockStartListening,
+ stopListening: mockStopListening,
+ }),
+}))
+
+vi.mock('../hooks/useMediaStreamASR', () => ({
+ useMediaStreamASR: () => ({
+ status: 'idle',
+ transcript: '',
+ partialTranscript: '',
+ error: null,
+ start: vi.fn(),
+ stop: vi.fn(),
+ }),
+}))
+
+vi.mock('../lib/api', () => ({
+ getVideoUrl: (videoId: string) => `http://localhost:8000/api/v1/video/${videoId}`,
+}))
+
+describe('LTTPage integration (Phase 4 โ System Audio & Listen Mic tabs)', () => {
+ beforeEach(() => {
+ vi.clearAllMocks()
+ mockQueryStreamMutate.mockClear()
+ mockMutate.mockClear()
+ mockReset.mockClear()
+ mockRequestFullTranscript.mockClear()
+ mockStartCapture.mockClear()
+ mockStopCapture.mockClear()
+ mockStartListening.mockClear()
+ mockStopListening.mockClear()
+
+ mockQueryStreamPhase = 'idle'
+ mockQueryStreamExtractedQuestions = null
+
+ mockIsPending = false
+ mockIsError = false
+ mockError = null
+ mockData = null
+
+ mockASRTranscript = ''
+ mockASRPartialTranscript = ''
+ mockASRIsStreaming = false
+ mockASRStatus = 'idle'
+
+ mockFTFullTranscript = ''
+ mockFTIsLoading = false
+ mockFTError = null
+
+ mockSystemAudioStatus = 'idle'
+ mockSystemAudioTranscript = ''
+ mockSystemAudioPartialTranscript = ''
+ mockSystemAudioError = null
+
+ mockMicStatus = 'idle'
+ mockMicTranscript = ''
+ mockMicPartialTranscript = ''
+ mockMicError = null
+ })
+
+ it('renders SourceSelector with all 3 tabs', () => {
+ render()
+
+ const tablist = screen.getByRole('tablist')
+ expect(tablist).toBeInTheDocument()
+
+ const tabs = screen.getAllByRole('tab')
+ expect(tabs).toHaveLength(3)
+
+ expect(screen.getByRole('tab', { name: /upload/i })).toBeInTheDocument()
+ expect(screen.getByRole('tab', { name: /system audio/i })).toBeInTheDocument()
+ expect(screen.getByRole('tab', { name: /listen mic/i })).toBeInTheDocument()
+ })
+
+ it('default source is Upload, showing VideoUpload', () => {
+ render()
+
+ expect(screen.getByTestId('video-dropzone')).toBeInTheDocument()
+ expect(screen.queryByText('Start Capture')).not.toBeInTheDocument()
+ expect(screen.queryByText('Start Listening')).not.toBeInTheDocument()
+ })
+
+ it('switching to System Audio tab renders SystemAudioCapture', () => {
+ render()
+
+ fireEvent.click(screen.getByRole('tab', { name: /system audio/i }))
+
+ expect(screen.getByRole('button', { name: /start capture/i })).toBeInTheDocument()
+ expect(screen.queryByTestId('video-dropzone')).not.toBeInTheDocument()
+ expect(screen.queryByText('Start Listening')).not.toBeInTheDocument()
+ })
+
+ it('switching to Listen Mic tab renders MicCapture', () => {
+ render()
+
+ fireEvent.click(screen.getByRole('tab', { name: /listen mic/i }))
+
+ expect(screen.getByRole('button', { name: /start listening/i })).toBeInTheDocument()
+ expect(screen.queryByTestId('video-dropzone')).not.toBeInTheDocument()
+ expect(screen.queryByText('Start Capture')).not.toBeInTheDocument()
+ })
+
+ it('System Audio tab shows capture UI, not Full Transcript button', () => {
+ render()
+
+ fireEvent.click(screen.getByRole('tab', { name: /system audio/i }))
+
+ expect(screen.getByRole('button', { name: /start capture/i })).toBeInTheDocument()
+ expect(screen.queryByRole('button', { name: /full transcript/i })).not.toBeInTheDocument()
+ })
+
+ it('Listen Mic tab shows listening UI, not Full Transcript button', () => {
+ render()
+
+ fireEvent.click(screen.getByRole('tab', { name: /listen mic/i }))
+
+ expect(screen.getByRole('button', { name: /start listening/i })).toBeInTheDocument()
+ expect(screen.queryByRole('button', { name: /full transcript/i })).not.toBeInTheDocument()
+ })
+
+ it('Full Transcript button visible only in Upload tab', async () => {
+ const mockResponse = {
+ video_id: 'vid-456',
+ filename: 'test.mp4',
+ size_bytes: 1024,
+ url: 'http://localhost:8000/api/v1/video/vid-456',
+ }
+
+ mockMutate.mockImplementation((_vars: any, options?: any) => {
+ if (options?.onSuccess) {
+ options.onSuccess(mockResponse)
+ }
+ })
+
+ render()
+
+ const dropzone = screen.getByTestId('video-dropzone')
+ const file = new File(['dummy'], 'test.mp4', { type: 'video/mp4' })
+ fireEvent.drop(dropzone, {
+ dataTransfer: { files: [file] },
+ })
+
+ await waitFor(() => {
+ expect(screen.getByRole('button', { name: /full transcript/i })).toBeInTheDocument()
+ })
+
+ fireEvent.click(screen.getByRole('tab', { name: /system audio/i }))
+ expect(screen.queryByRole('button', { name: /full transcript/i })).not.toBeInTheDocument()
+
+ fireEvent.click(screen.getByRole('tab', { name: /upload/i }))
+ await waitFor(() => {
+ expect(screen.getByRole('button', { name: /full transcript/i })).toBeInTheDocument()
+ })
+ })
+
+ it('QueryInput receives transcript text from active ASR source', async () => {
+ mockSystemAudioPartialTranscript = 'sys audio partial'
+
+ render()
+
+ fireEvent.click(screen.getByRole('tab', { name: /system audio/i }))
+
+ await waitFor(() => {
+ const textarea = screen.getByPlaceholderText('Ask a question about your documents...')
+ expect(textarea).toHaveValue('sys audio partial')
+ })
+ })
+})
diff --git a/frontend/src/test/test_phase4_MicCapture.test.tsx b/frontend/src/test/test_phase4_MicCapture.test.tsx
new file mode 100644
index 0000000..9d2d87e
--- /dev/null
+++ b/frontend/src/test/test_phase4_MicCapture.test.tsx
@@ -0,0 +1,69 @@
+import { describe, it, expect, vi } from 'vitest'
+import { render, screen, fireEvent } from '@testing-library/react'
+import { MicCapture } from '../components/MicCapture'
+import type { MicStatus } from '../types'
+
+const defaultProps = {
+ status: 'idle' as MicStatus,
+ error: null as string | null,
+ onStart: vi.fn(),
+ onStop: vi.fn(),
+}
+
+describe('MicCapture', () => {
+ it('renders "Start Listening" button with Mic icon in idle state', () => {
+ render()
+ expect(screen.getByRole('button', { name: /start listening/i })).toBeInTheDocument()
+ const svg = screen.getByRole('button', { name: /start listening/i }).querySelector('svg')
+ expect(svg).toBeInTheDocument()
+ })
+
+ it('does not render compatibility warning in idle state', () => {
+ render()
+ expect(screen.queryByText(/chrome\/edge/i)).not.toBeInTheDocument()
+ expect(screen.queryByText(/firefox/i)).not.toBeInTheDocument()
+ })
+
+ it('calls onStart when Start Listening button is clicked', () => {
+ const onStart = vi.fn()
+ render()
+ fireEvent.click(screen.getByRole('button', { name: /start listening/i }))
+ expect(onStart).toHaveBeenCalledTimes(1)
+ })
+
+ it('renders "Waiting for microphone permission..." with spinner in requesting state', () => {
+ render()
+ expect(screen.getByText(/waiting for microphone permission/i)).toBeInTheDocument()
+ const spinner = document.querySelector('.animate-spin')
+ expect(spinner).toBeInTheDocument()
+ })
+
+ it('renders "Stop Listening" button, pulsing green dot, and "Listening..." in listening state', () => {
+ render()
+ expect(screen.getByRole('button', { name: /stop listening/i })).toBeInTheDocument()
+ expect(screen.getByText('Listening...')).toBeInTheDocument()
+ const dot = document.querySelector('.animate-pulse')
+ expect(dot).toBeInTheDocument()
+ })
+
+ it('renders "Stopping..." disabled button in stopping state', () => {
+ render()
+ const btn = screen.getByRole('button', { name: /stopping/i })
+ expect(btn).toBeDisabled()
+ })
+
+ it('renders error message in red box and "Try Again" button in error state', () => {
+ render()
+ expect(screen.getByText('Mic not found')).toBeInTheDocument()
+ const redBox = screen.getByText('Mic not found').closest('[class*="red"]')
+ expect(redBox).toBeInTheDocument()
+ expect(screen.getByRole('button', { name: /try again/i })).toBeInTheDocument()
+ })
+
+ it('calls onStart when "Try Again" is clicked in error state', () => {
+ const onStart = vi.fn()
+ render()
+ fireEvent.click(screen.getByRole('button', { name: /try again/i }))
+ expect(onStart).toHaveBeenCalledTimes(1)
+ })
+})
diff --git a/frontend/src/test/test_phase4_SystemAudioCapture.test.tsx b/frontend/src/test/test_phase4_SystemAudioCapture.test.tsx
new file mode 100644
index 0000000..7a7a07a
--- /dev/null
+++ b/frontend/src/test/test_phase4_SystemAudioCapture.test.tsx
@@ -0,0 +1,77 @@
+import { describe, it, expect, vi } from 'vitest'
+import { render, screen, fireEvent } from '@testing-library/react'
+import { SystemAudioCapture } from '../components/SystemAudioCapture'
+import type { SystemAudioStatus } from '../types'
+
+const defaultProps = {
+ status: 'idle' as SystemAudioStatus,
+ error: null as string | null,
+ onStart: vi.fn(),
+ onStop: vi.fn(),
+}
+
+describe('SystemAudioCapture', () => {
+ it('renders "Start Capture" button with MonitorSpeaker icon in idle state', () => {
+ render()
+ expect(screen.getByRole('button', { name: /start capture/i })).toBeInTheDocument()
+ const svg = screen.getByRole('button', { name: /start capture/i }).querySelector('svg')
+ expect(svg).toBeInTheDocument()
+ })
+
+ it('renders browser compatibility warning in idle state', () => {
+ render()
+ expect(screen.getByText(/chrome\/edge/i)).toBeInTheDocument()
+ const amberBox = screen.getByText(/chrome\/edge/i).closest('[class*="amber"]')
+ expect(amberBox).toBeInTheDocument()
+ })
+
+ it('calls onStart when Start Capture button is clicked', () => {
+ const onStart = vi.fn()
+ render()
+ fireEvent.click(screen.getByRole('button', { name: /start capture/i }))
+ expect(onStart).toHaveBeenCalledTimes(1)
+ })
+
+ it('renders "Waiting for permission..." with spinner in requesting state', () => {
+ render()
+ expect(screen.getByText(/waiting for permission/i)).toBeInTheDocument()
+ const spinner = document.querySelector('.animate-spin')
+ expect(spinner).toBeInTheDocument()
+ })
+
+ it('renders "Stop Capture" button, pulsing green dot, and "Capturing system audio..." in capturing state', () => {
+ render()
+ expect(screen.getByRole('button', { name: /stop capture/i })).toBeInTheDocument()
+ expect(screen.getByText(/capturing system audio/i)).toBeInTheDocument()
+ const dot = document.querySelector('.animate-pulse')
+ expect(dot).toBeInTheDocument()
+ })
+
+ it('renders "Stopping..." disabled button in stopping state', () => {
+ render()
+ const btn = screen.getByRole('button', { name: /stopping/i })
+ expect(btn).toBeDisabled()
+ })
+
+ it('renders error message in red box and "Try Again" button in error state', () => {
+ render()
+ expect(screen.getByText('Permission denied')).toBeInTheDocument()
+ const redBox = screen.getByText('Permission denied').closest('[class*="red"]')
+ expect(redBox).toBeInTheDocument()
+ expect(screen.getByRole('button', { name: /try again/i })).toBeInTheDocument()
+ })
+
+ it('calls onStart when "Try Again" is clicked in error state', () => {
+ const onStart = vi.fn()
+ render()
+ fireEvent.click(screen.getByRole('button', { name: /try again/i }))
+ expect(onStart).toHaveBeenCalledTimes(1)
+ })
+
+ it('calls onStop when "Stop Capture" is clicked in capturing state', () => {
+ const onStop = vi.fn()
+ render()
+ fireEvent.click(screen.getByRole('button', { name: /stop capture/i }))
+ expect(onStop).toHaveBeenCalledTimes(1)
+ })
+})
diff --git a/frontend/src/test/test_phase4_useMicASR.test.ts b/frontend/src/test/test_phase4_useMicASR.test.ts
new file mode 100644
index 0000000..7abe4c1
--- /dev/null
+++ b/frontend/src/test/test_phase4_useMicASR.test.ts
@@ -0,0 +1,166 @@
+/**
+ * Phase 4 tests: useMicASR hook state management.
+ *
+ * Mocks useMediaStreamASR (the shared pipeline) and navigator.mediaDevices.getUserMedia.
+ * Does NOT test AudioContext/WebSocket internals โ those belong to useMediaStreamASR tests.
+ */
+import { renderHook, act } from '@testing-library/react'
+import { describe, it, expect, vi, beforeEach } from 'vitest'
+import type { MicStatus } from '../types'
+
+const mockStart = vi.fn()
+const mockStop = vi.fn()
+
+vi.mock('../hooks/useMediaStreamASR', () => ({
+ useMediaStreamASR: () => ({
+ status: 'idle',
+ transcript: '',
+ partialTranscript: '',
+ error: null,
+ start: mockStart,
+ stop: mockStop,
+ }),
+}))
+
+import { useMicASR } from '../hooks/useMicASR'
+
+function makeMicStream(audioTracks = 1) {
+ const audio = Array.from({ length: audioTracks }, () => ({ stop: vi.fn(), kind: 'audio' }))
+ return {
+ getAudioTracks: () => audio,
+ getTracks: () => audio,
+ } as unknown as MediaStream
+}
+
+function setGetUserMedia(fn: () => Promise) {
+ Object.defineProperty(globalThis.navigator, 'mediaDevices', {
+ value: { getUserMedia: vi.fn(fn) },
+ writable: true,
+ configurable: true,
+ })
+}
+
+beforeEach(() => {
+ vi.clearAllMocks()
+ setGetUserMedia(async () => makeMicStream())
+})
+
+describe('useMicASR', () => {
+ it('test_initial_state', () => {
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ expect(result.current.status).toBe('idle')
+ expect(result.current.transcript).toBe('')
+ expect(result.current.partialTranscript).toBe('')
+ expect(result.current.error).toBeNull()
+ })
+
+ it('test_returns_startListening_and_stopListening', () => {
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ expect(typeof result.current.startListening).toBe('function')
+ expect(typeof result.current.stopListening).toBe('function')
+ })
+
+ it('test_startListening_calls_getUserMedia_with_audio_constraints', async () => {
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startListening()
+ })
+
+ expect(navigator.mediaDevices.getUserMedia).toHaveBeenCalledWith({
+ audio: { echoCancellation: false, noiseSuppression: false, autoGainControl: false },
+ video: false,
+ })
+ })
+
+ it('test_startListening_sets_status_to_requesting_then_listening', async () => {
+ setGetUserMedia(async () => makeMicStream())
+
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ act(() => {
+ result.current.startListening()
+ })
+ expect(result.current.status).toBe('requesting')
+
+ await act(async () => {
+ await new Promise((r) => setTimeout(r, 0))
+ })
+
+ expect(navigator.mediaDevices.getUserMedia).toHaveBeenCalled()
+ expect(mockStart).toHaveBeenCalledTimes(1)
+ expect(mockStart.mock.calls[0][0].getAudioTracks()).toHaveLength(1)
+ })
+
+ it('test_startListening_NotAllowedError_sets_idle', async () => {
+ setGetUserMedia(async () => { throw new DOMException('Denied', 'NotAllowedError') })
+
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startListening()
+ })
+
+ expect(result.current.status).toBe('idle')
+ expect(result.current.error).toMatch(/Microphone access denied/)
+ })
+
+ it('test_startListening_NotFoundError_sets_error', async () => {
+ setGetUserMedia(async () => { throw new DOMException('No device', 'NotFoundError') })
+
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startListening()
+ })
+
+ expect(result.current.status).toBe('error')
+ expect(result.current.error).toMatch(/No microphone found/)
+ })
+
+ it('test_startListening_NotSupportedError_sets_error', async () => {
+ setGetUserMedia(async () => { throw new DOMException('Not supported', 'NotSupportedError') })
+
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startListening()
+ })
+
+ expect(result.current.status).toBe('error')
+ expect(result.current.error).toMatch(/not supported/)
+ })
+
+ it('test_startListening_no_audio_track_sets_error', async () => {
+ setGetUserMedia(async () => makeMicStream(0))
+
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startListening()
+ })
+
+ expect(result.current.status).toBe('error')
+ expect(result.current.error).toMatch(/No microphone input/)
+ })
+
+ it('test_stopListening_calls_pipeline_stop', () => {
+ const { result } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ act(() => {
+ result.current.stopListening()
+ })
+
+ expect(mockStop).toHaveBeenCalledTimes(1)
+ })
+
+ it('test_cleanup_on_unmount', () => {
+ const { unmount } = renderHook(() => useMicASR({ wsUrl: 'ws://test' }))
+
+ expect(() => {
+ unmount()
+ }).not.toThrow()
+ })
+})
diff --git a/frontend/src/test/test_phase4_useSystemAudioASR.test.ts b/frontend/src/test/test_phase4_useSystemAudioASR.test.ts
new file mode 100644
index 0000000..215496a
--- /dev/null
+++ b/frontend/src/test/test_phase4_useSystemAudioASR.test.ts
@@ -0,0 +1,181 @@
+/**
+ * Phase 4 tests: useSystemAudioASR hook state management.
+ *
+ * Mocks useMediaStreamASR (the shared pipeline) and navigator.mediaDevices.getDisplayMedia.
+ * Does NOT test AudioContext/WebSocket internals โ those belong to useMediaStreamASR tests.
+ */
+import { renderHook, act } from '@testing-library/react'
+import { describe, it, expect, vi, beforeEach } from 'vitest'
+import type { SystemAudioStatus } from '../types'
+
+const mockStart = vi.fn()
+const mockStop = vi.fn()
+
+vi.mock('../hooks/useMediaStreamASR', () => ({
+ useMediaStreamASR: () => ({
+ status: 'idle',
+ transcript: '',
+ partialTranscript: '',
+ error: null,
+ start: mockStart,
+ stop: mockStop,
+ }),
+}))
+
+import { useSystemAudioASR } from '../hooks/useSystemAudioASR'
+
+function makeStream(audioTracks = 1) {
+ const audio = Array.from({ length: audioTracks }, () => ({ stop: vi.fn(), kind: 'audio' }))
+ const video = [{ stop: vi.fn(), kind: 'video' }]
+ return {
+ getAudioTracks: () => audio,
+ getVideoTracks: () => video,
+ getTracks: () => [...audio, ...video],
+ } as unknown as MediaStream
+}
+
+function setGetDisplayMedia(fn: () => Promise) {
+ Object.defineProperty(globalThis.navigator, 'mediaDevices', {
+ value: { getDisplayMedia: vi.fn(fn) },
+ writable: true,
+ configurable: true,
+ })
+}
+
+beforeEach(() => {
+ vi.clearAllMocks()
+ setGetDisplayMedia(async () => makeStream())
+})
+
+describe('useSystemAudioASR', () => {
+ it('test_initial_state', () => {
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ expect(result.current.status).toBe('idle')
+ expect(result.current.transcript).toBe('')
+ expect(result.current.partialTranscript).toBe('')
+ expect(result.current.error).toBeNull()
+ })
+
+ it('test_returns_startCapture_and_stopCapture', () => {
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ expect(typeof result.current.startCapture).toBe('function')
+ expect(typeof result.current.stopCapture).toBe('function')
+ })
+
+ it('test_startCapture_calls_getDisplayMedia_with_correct_constraints', async () => {
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startCapture()
+ })
+
+ expect(navigator.mediaDevices.getDisplayMedia).toHaveBeenCalledWith({
+ video: true,
+ audio: { systemAudio: 'include', echoCancellation: false, noiseSuppression: false, autoGainControl: false },
+ })
+ })
+
+ it('test_startCapture_sets_status_to_requesting_then_capturing', async () => {
+ setGetDisplayMedia(async () => makeStream())
+
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ act(() => {
+ result.current.startCapture()
+ })
+ expect(result.current.status).toBe('requesting')
+
+ await act(async () => {
+ await new Promise((r) => setTimeout(r, 0))
+ })
+
+ expect(navigator.mediaDevices.getDisplayMedia).toHaveBeenCalled()
+ expect(mockStart).toHaveBeenCalledTimes(1)
+ expect(mockStart.mock.calls[0][0].getAudioTracks()).toHaveLength(1)
+ })
+
+ it('test_startCapture_AbortError_sets_idle', async () => {
+ setGetDisplayMedia(async () => { throw new DOMException('User abort', 'AbortError') })
+
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startCapture()
+ })
+
+ expect(result.current.status).toBe('idle')
+ expect(result.current.error).toMatch(/Permission denied/)
+ })
+
+ it('test_startCapture_NotAllowedError_sets_idle', async () => {
+ setGetDisplayMedia(async () => { throw new DOMException('Denied', 'NotAllowedError') })
+
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startCapture()
+ })
+
+ expect(result.current.status).toBe('idle')
+ expect(result.current.error).toMatch(/Permission denied/)
+ })
+
+ it('test_startCapture_NotSupportedError_sets_error_with_platform_warning', async () => {
+ setGetDisplayMedia(async () => { throw new DOMException('Not supported', 'NotSupportedError') })
+
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startCapture()
+ })
+
+ expect(result.current.status).toBe('error')
+ expect(result.current.error).toMatch(/not supported on this platform/)
+ })
+
+ it('test_startCapture_no_audio_track_sets_error', async () => {
+ setGetDisplayMedia(async () => makeStream(0))
+
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ await act(async () => {
+ await result.current.startCapture()
+ })
+
+ expect(result.current.status).toBe('error')
+ expect(result.current.error).toMatch(/No audio track/)
+ })
+
+ it('test_stopCapture_calls_pipeline_stop', () => {
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ act(() => {
+ result.current.stopCapture()
+ })
+
+ expect(mockStop).toHaveBeenCalledTimes(1)
+ })
+
+ it('test_cleanup_on_unmount', () => {
+ const { unmount } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ expect(() => {
+ unmount()
+ }).not.toThrow()
+ })
+
+ it('test_rapid_start_stop_cycles', async () => {
+ const { result } = renderHook(() => useSystemAudioASR({ wsUrl: 'ws://test' }))
+
+ for (let i = 0; i < 3; i++) {
+ setGetDisplayMedia(async () => makeStream())
+ await act(async () => { await result.current.startCapture() })
+ act(() => { result.current.stopCapture() })
+ }
+
+ expect(mockStart).toHaveBeenCalledTimes(3)
+ expect(mockStop).toHaveBeenCalledTimes(3)
+ })
+})