Merge branch 'Phase4-dev'

2026-05-14 23:29:42 +08:00 · 2026-05-14 23:29:42 +08:00 · 1e8773469e
parent a8a2cc0940 624df8cf9a
commit 1e8773469e
16 changed files with 1177 additions and 169 deletions
--- a/.plans/phase4_system_audio_plan.md
+++ b/.plans/phase4_system_audio_plan.md
@ -1,7 +1,7 @@
-# Phase 4: System Audio Capture → ASR → RAG — Implementation Plan
+# Phase 4: System Audio & Mic Capture → ASR → RAG — Implementation Plan

 **Created:** 2026-05-09
-**Updated:** 2026-05-09
+**Updated:** 2026-05-14
 **Status:** 📋 Draft (Not Started)
 **Depends on:** Phase 1 (Complete), Phase 2 (Complete), Phase 3 (Complete)

@ -9,24 +9,40 @@

 ## 1. Overview

-Phase 4 adds **system audio capture** as a third audio source in the LTTPage, alongside file Upload and YouTube. Instead of playing a video in the browser, the user captures audio output from any application on their computer (browser tab, Spotify, Zoom, system sounds) and pipes it through the existing ASR → RAG pipeline.
+Phase 4 adds two new live audio sources in the LTTPage, alongside file Upload:

-**Use cases:**
+1. **System Audio Capture** — captures audio output from any application on the user's computer (browser tab, Spotify, Zoom, system sounds) via `getDisplayMedia()`.
+2. **Listen Mic** — captures microphone input (user's voice, room audio) via `getUserMedia({ audio: true })`.
+
+Both pipe audio through the existing WebSocket → DashScope realtime ASR → RAG pipeline.
+
+### System Audio — Use Cases
 - Watching a YouTube video in a regular browser tab (no proxy needed — just share that tab's audio)
 - Listening to a podcast, lecture, or meeting and getting real-time transcript + RAG
 - Transcribing any audio playing on the computer without needing to download files

-### How It Works
+### Listen Mic — Use Cases
+- Recording a live meeting or lecture through the computer's microphone
+- Dictating questions or notes verbally and getting RAG answers
+- Transcribing spoken Cantonese in real time without a video source
+
+### How They Work

 ```
-User clicks "System Audio" → clicks "Start Capture"
-  → Browser shows permission dialog (screen/tab picker)
-  → User selects tab/window/screen (with audio)
-  → getDisplayMedia() returns MediaStream (with audio track)
-  → AudioContext.createMediaStreamSource(stream)
-  → ScriptProcessorNode (Float32 PCM, mono 16kHz)
-  → WebSocket → FastAPI → DashScope realtime ASR
-  → transcript → QueryInput → RAG Pipeline
+[System Audio]
+  User clicks "System Audio" → "Start Capture"
+    → Browser shows permission dialog (screen/tab picker)
+    → User selects tab/window/screen (with audio)
+    → getDisplayMedia() returns MediaStream (with audio track)
+    → AudioContext.createMediaStreamSource(stream)
+    → ScriptProcessorNode → WebSocket → DashScope ASR → Transcript → RAG
+
+[Listen Mic]
+  User clicks "Listen Mic" → "Start Listening"
+    → Browser shows microphone permission prompt
+    → getUserMedia({ audio: true }) returns MediaStream
+    → AudioContext.createMediaStreamSource(stream)
+    → ScriptProcessorNode → WebSocket → DashScope ASR → Transcript → RAG
 ```

 ### Audio Routing (vs Existing Sources)
@ -34,59 +50,85 @@ User clicks "System Audio" → clicks "Start Capture"
 | Source | Audio Input | SourceNode Type | Start/Stop Trigger |
 |--------|-------------|-----------------|-------------------|
 | Upload | `<video>` element | `createMediaElementSource` | play/pause events |
-| YouTube | `<audio>` element | `createMediaElementSource` | play/pause events on `<video>` |
 | **System Audio** | MediaStream from `getDisplayMedia()` | `createMediaStreamSource` | Manual Start/Stop button + track ended event |
+| **Listen Mic** | MediaStream from `getUserMedia({ audio: true })` | `createMediaStreamSource` | Manual Start/Stop button + track ended event |

-### Why New Hook (Not Reuse Existing)
+### Why New Hooks (Not Reuse Existing)

-The existing `useVideoASR` and `useYouTubeASR` hooks depend on HTML media elements (`<video>`, `<audio>`) for both the audio source and play/pause lifecycle. System audio capture uses a **MediaStream** object (no DOM element), and its lifecycle is controlled by user permission (grant/revoke) and manual start/stop, not DOM events. A new hook is architecturally cleaner than overloading the existing ones with branching logic.
+The existing `useVideoASR` hook depends on HTML media elements (`<video>`) for both the audio source and play/pause lifecycle. Both new sources use **MediaStream** objects (no DOM element), and their lifecycle is controlled by user permission (grant/revoke) and manual start/stop, not DOM events.
+
+**System Audio** and **Listen Mic** share the same audio processing pipeline (`MediaStream → AudioContext → ScriptProcessorNode → WebSocket`) but differ in their capture API. A shared internal audio processing utility (`useMediaStreamASR` or similar) should be extracted to avoid code duplication between the two hooks.

 ---

 ## 2. User Flow

-1. User selects **"System Audio"** tab (third option alongside Upload / YouTube)
+### 2.1 System Audio
+
+1. User selects **"System Audio"** tab (second option alongside Upload / Listen Mic)
 2. UI shows a **"Start Capture"** button with browser compatibility info
 3. User clicks **"Start Capture"**
 4. Browser opens **permission dialog** (screen/tab picker)
   - User selects a browser tab (e.g., "YouTube — Live Stream") or "Entire Screen"
   - User checks "Share audio" if available
 5. On approval: capture starts — status indicator shows "Capturing" with a live audio level meter
-6. Real-time ASR transcription flows into **QueryInput** (same as Upload/YouTube)
+6. Real-time ASR transcription flows into **QueryInput** (same as Upload)
 7. User can **edit transcript while capturing** continues
 8. User clicks **"Stop Capture"** to end — transcript stays in QueryInput
 9. User submits query → RAG pipeline processes it
-10. **"Full Transcript" button hidden** (streaming ASR only, same as YouTube)
+10. **"Full Transcript" button hidden** (streaming ASR only — no batch transcription for live sources)

-### Permission Denied Flow
+#### Permission Denied Flow
+- User clicks "Cancel" in permission dialog → error: "Permission denied — system audio capture requires your explicit permission"
+- User revokes permission (Chrome "Stop sharing") → capture stops gracefully, status: "Capture stopped"
+- No audio track in the stream → error: "No audio track found in the shared content"

-1. If user clicks "Cancel" in permission dialog → error state: "Permission denied — system audio capture requires your explicit permission"
-2. If user revokes permission (Chrome "Stop sharing") → capture stops gracefully, status: "Capture stopped"
-3. If no audio track in the stream → error: "No audio track found in the shared content"
+### 2.2 Listen Mic
+
+1. User selects **"Listen Mic"** tab (third option)
+2. UI shows a **"Start Listening"** button (no browser compatibility warning — widely supported)
+3. User clicks **"Start Listening"**
+4. Browser shows **microphone permission prompt** (first time only)
+5. On approval: listening starts — status indicator shows "Listening" with a live audio level meter
+6. Real-time ASR transcription flows into **QueryInput**
+7. User can **edit transcript while listening** continues
+8. User clicks **"Stop Listening"** to end — transcript stays in QueryInput
+9. User submits query → RAG pipeline processes it
+10. **"Full Transcript" button hidden** (streaming ASR only)
+
+#### Permission Denied Flow
+- User clicks "Block" in mic permission prompt → error: "Microphone access denied — please allow microphone access in your browser settings"
+- User revokes permission via browser UI → listening stops, status: "Microphone disconnected"
+- No audio track → error: "No microphone input detected"

 ---

 ## 3. Architecture

-### 3.1 Component Tree (LTTPage — System Audio Mode)
+### 3.1 Component Tree (LTTPage — All Sources)

 ```
 LTTPage
-├── SourceSelector (tabs: Upload | YouTube | System Audio)
+├── SourceSelector (tabs: Upload | System Audio | Listen Mic)
 ├── [source === 'system-audio']
-│   ├── SystemAudioCapture
-│   │   ├── Start/Stop button
-│   │   ├── Status indicator (idle | requesting | capturing | error)
-│   │   ├── Audio level meter (optional, nice-to-have)
-│   │   └── Browser compatibility note (non-Chrome users)
-│   └── (no video player — audio-only capture)
-├── QueryInput (receives transcript from useSystemAudioASR)
+│   └── SystemAudioCapture
+│       ├── Start/Stop button
+│       ├── Status indicator (idle | requesting | capturing | error)
+│       ├── Audio level meter (optional, nice-to-have)
+│       └── Browser compatibility note (non-Chrome users)
+├── [source === 'mic']
+│   └── MicCapture
+│       ├── Start/Stop button
+│       ├── Status indicator (idle | requesting | listening | error)
+│       └── Audio level meter (optional, nice-to-have)
+├── QueryInput (receives transcript from active ASR hook)
 ├── ExtractedQuestionsDisplay
 └── RAG Response Panel
 ```

 ### 3.2 Data Flow

+#### System Audio
 ```
 SystemAudioCapture (UI)
  │
@ -99,32 +141,51 @@ useSystemAudioASR hook
  │     └── User picks tab/window → returns MediaStream
  │
  ├── AudioContext.createMediaStreamSource(stream)
-  │     └── MediaStreamAudioSourceNode
  │
  ├── ScriptProcessorNode (4096 buffer, mono 16kHz)
-  │     └── onaudioprocess: convert Float32 → Int16 PCM
  │
  ├── WebSocket → ws://host/ws/asr/{uuid}?language=yue
-  │     └── Sends binary PCM frames
  │
  └── Returns: { status, transcript, partialTranscript, startCapture, stopCapture }
-        │
-        ▼
-LTTPage unifies: const asr = source === 'system-audio' ? systemAudioASR : ...
+```
+
+#### Listen Mic
+```
+MicCapture (UI)
+  │
+  ├── "Start Listening" click → calls startListening() from hook
  │
  ▼
-QueryInput receives asr.partialTranscript
+useMicASR hook
+  │
+  ├── getUserMedia({ audio: true })
+  │     └── Browser shows mic permission prompt → returns MediaStream
+  │
+  ├── AudioContext.createMediaStreamSource(stream)
+  │
+  ├── ScriptProcessorNode (4096 buffer, mono 16kHz)
+  │
+  ├── WebSocket → ws://host/ws/asr/{uuid}?language=yue
+  │
+  └── Returns: { status, transcript, partialTranscript, startListening, stopListening }
+```
+
+#### LTTPage Unification
+```typescript
+const asr = source === 'system-audio' ? systemAudioASR
+  : source === 'mic' ? micASR
+  : uploadASR
 ```

 ### 3.3 Backend Changes

-**Minimal.** The existing WebSocket ASR endpoint (`ws_asr.py`) already accepts audio from any source. The only addition is handling a **UUID-based `video_id`** for system audio sessions (no real video file).
+**Minimal.** The existing WebSocket ASR endpoint (`ws_asr.py`) already accepts audio from any source. The only additions are UUID-based `video_id` handling and feature toggles.

 | Change | File | Description |
 |--------|------|-------------|
 | Allow UUID video_id | `backend/app/routers/ws_asr.py` | Accept non-file-based video IDs (already accepts any string) |
-| Transcript persistence | `backend/app/services/history_service.py` | Store system audio transcripts with UUID session ID (optional — nice-to-have) |
-| Config | `backend/app/core/config.py` | Add `SYSTEM_AUDIO_ENABLED` toggle (default: true) |
+| Transcript persistence | `backend/app/services/history_service.py` | Store system audio & mic transcripts with UUID session ID (optional — nice-to-have) |
+| Config | `backend/app/core/config.py` | Add `SYSTEM_AUDIO_ENABLED` and `MIC_ENABLED` toggles (default: true) |

 **No changes needed to:**
 - DashScope ASR client (receives PCM, doesn't care about source)
@ -135,11 +196,13 @@ QueryInput receives asr.partialTranscript

 | File | Status | Description |
 |------|--------|-------------|
+| `frontend/src/components/SourceSelector.tsx` | **New** | Reusable tab bar component (Upload \| System Audio \| Listen Mic) |
 | `frontend/src/hooks/useSystemAudioASR.ts` | **New** | Hook: getDisplayMedia → AudioContext → WebSocket |
-| `frontend/src/components/SystemAudioCapture.tsx` | **New** | UI: Start/Stop button, status, compatibility note |
-| `frontend/src/pages/LTTPage.tsx` | **Modified** | Add "System Audio" tab, wire hook, unify ASR |
-| `frontend/src/types/index.ts` | **Modified** | Add SystemAudioStatus type |
-| `frontend/src/components/SourceSelector.tsx` | **Refactor** | Extract source tabs into reusable component (optional — can inline in LTTPage) |
+| `frontend/src/hooks/useMicASR.ts` | **New** | Hook: getUserMedia → AudioContext → WebSocket |
+| `frontend/src/components/SystemAudioCapture.tsx` | **New** | UI: Start/Stop, status, compatibility note |
+| `frontend/src/components/MicCapture.tsx` | **New** | UI: Start/Stop, status |
+| `frontend/src/pages/LTTPage.tsx` | **Modified** | Add source selector, wire hooks, unify ASR, conditional rendering |
+| `frontend/src/types/index.ts` | **Modified** | Add SourceType, SystemAudioStatus, MicStatus types |

 ---

@ -150,25 +213,31 @@ QueryInput receives asr.partialTranscript
 | 4.1 | Config & Infrastructure | 0.5 day | — | 📋 Draft |
 | 4.2 | System Audio Capture Hook (`useSystemAudioASR`) | 1 day | 4.1 | 📋 Draft |
 | 4.3 | SystemAudioCapture UI Component | 0.5 day | 4.2 | 📋 Draft |
-| 4.4 | LTTPage Integration | 0.5 day | 4.2, 4.3 | 📋 Draft |
-| 4.5 | Backend Adjustments | 0.5 day | 4.1 | 📋 Draft |
-| 4.6 | Integration & Acceptance Tests | 1 day | 4.4, 4.5 | 📋 Draft |
-| 4.7 | Polish & Documentation | 0.5 day | 4.6 | 📋 Draft |
-| **Total** | | **4.5 days** | | |
+| 4.4 | Mic Capture Hook (`useMicASR`) | 0.5 day | 4.1 | 📋 Draft |
+| 4.5 | MicCapture UI Component | 0.5 day | 4.4 | 📋 Draft |
+| 4.6 | LTTPage Integration (all 3 sources) | 0.5 day | 4.2, 4.3, 4.4, 4.5 | 📋 Draft |
+| 4.7 | Backend Adjustments | 0.5 day | 4.1 | 📋 Draft |
+| 4.8 | Integration & Acceptance Tests | 1 day | 4.6, 4.7 | 📋 Draft |
+| 4.9 | Polish & Documentation | 0.5 day | 4.8 | 📋 Draft |
+| **Total** | | **5.5 days** | | |

 ### Phase 4.1 — Config & Infrastructure (0.5 day)

-**Objective:** Add system audio feature toggle, define types, establish UUID generation.
+**Objective:** Add feature toggles, define types, establish UUID generation.

 **Tasks:**
-1. Add `SYSTEM_AUDIO_ENABLED` to `backend/app/core/config.py` (default: `True`)
+1. Add `SYSTEM_AUDIO_ENABLED` and `MIC_ENABLED` to `backend/app/core/config.py` (default: `True`)
 2. Add `SystemAudioStatus` type to `frontend/src/types/index.ts`:
   ```typescript
   type SystemAudioStatus = 'idle' | 'requesting' | 'capturing' | 'stopping' | 'error'
   ```
-3. Add `SystemAudioASRState` interface to types
-4. Add `video_id` UUID generation helper (frontend-side: `crypto.randomUUID()`)
-5. Verify WebSocket ASR endpoint accepts arbitrary `video_id` strings (it does — confirm with a quick test)
+3. Add `MicStatus` type:
+   ```typescript
+   type MicStatus = 'idle' | 'requesting' | 'listening' | 'stopping' | 'error'
+   ```
+4. Add `SystemAudioASRState` and `MicASRState` interfaces to types
+5. Add `video_id` UUID generation helper (frontend-side: `crypto.randomUUID()`)
+6. Verify WebSocket ASR endpoint accepts arbitrary `video_id` strings (it does — confirm with a quick test)

 **Test Files:** `backend/app/test/test_phase4_config.py`

@ -205,7 +274,7 @@ interface UseSystemAudioASRReturn {

 **Pattern to Follow:**
 - AudioContext setup: follow `useVideoASR.ts` lines 45-143 (AudioContext, ScriptProcessor, sample rate conversion)
- WebSocket handling: follow `useYouTubeASR.ts` lines 35-100
+- WebSocket handling: follow `useVideoASR.ts` lines 35-100
 - State management: combine patterns from both hooks, adapting for MediaStream source

 **Test Files:** `frontend/src/test/test_phase4_useSystemAudioASR.test.ts`
@ -239,57 +308,128 @@ On Linux, only tab audio is available (not full system audio).

 **Test Files:** `frontend/src/test/test_phase4_SystemAudioCapture.test.tsx`

-### Phase 4.4 — LTTPage Integration (0.5 day)
+### Phase 4.4 — Mic Capture Hook (0.5 day)

-**Objective:** Wire the System Audio source into LTTPage, adding it as the third tab alongside Upload and YouTube.
+**Objective:** Create `useMicASR.ts` hook that captures microphone input and streams it to the ASR WebSocket.
+
+**Key Design:**
+```typescript
+interface UseMicASRProps {
+  wsUrl: string   // e.g., ws://localhost:8000/ws/asr/{uuid}?language=yue
+}
+
+interface UseMicASRReturn {
+  status: 'idle' | 'requesting' | 'listening' | 'stopping' | 'error'
+  transcript: string
+  partialTranscript: string
+  error: string | null
+  startListening: () => Promise<void>
+  stopListening: () => void
+}
+```
+
+**Implementation Details:**
+- `startListening()`: calls `navigator.mediaDevices.getUserMedia({ audio: true, video: false })`
+  - On success: creates AudioContext, `createMediaStreamSource(stream)`, connects ScriptProcessor → WebSocket
+  - On user deny: sets status to `'idle'`, sets error "Microphone access denied"
+  - On no audio track: sets status to `'error'`, sets error "No microphone input detected"
+- `stopListening()`: stops all tracks in the MediaStream, closes AudioContext, closes WebSocket
+- Auto-stop: listens for `track.onended` (user revokes permission) → calls stopListening
+- Audio processing: identical to useSystemAudioASR — `ScriptProcessorNode(4096)`, convert Float32 → Int16 PCM, send via WebSocket
+- WebSocket lifecycle: connect on listening start, close on listening stop
+- Cleanup: useEffect return closes AudioContext, WebSocket, and stops tracks
+
+**Code Sharing:** Extract shared audio processing logic (`MediaStream → AudioContext → ScriptProcessorNode → WebSocket`) into a reusable internal utility (`useMediaStreamASR` or `audioPipeline.ts`) to avoid duplication between `useSystemAudioASR` and `useMicASR`.
+
+**Test Files:** `frontend/src/test/test_phase4_useMicASR.test.ts`
+
+### Phase 4.5 — MicCapture UI Component (0.5 day)
+
+**Objective:** Create the `MicCapture.tsx` component with Start/Stop button and status display.
+
+**Component Props:**
+```typescript
+interface MicCaptureProps {
+  status: MicStatus
+  error: string | null
+  onStart: () => void
+  onStop: () => void
+}
+```
+
+**UI States:**
+1. **Idle**: "Start Listening" button (blue, prominent) — no compatibility warning needed (mic is universally supported)
+2. **Requesting**: "Waiting for microphone permission..." (loading spinner)
+3. **Listening**: "Stop Listening" button (red) + pulsing green dot + "Listening..."
+4. **Error**: Red banner with error message + "Try Again" button
+
+**Test Files:** `frontend/src/test/test_phase4_MicCapture.test.tsx`
+
+### Phase 4.6 — LTTPage Integration (0.5 day)
+
+**Objective:** Create the `SourceSelector` tab bar component and wire both new sources into LTTPage.
+
+**New Component — `SourceSelector.tsx`:**
+```typescript
+interface SourceSelectorProps {
+  activeSource: SourceType
+  onSelect: (source: SourceType) => void
+}
+```
+- Three tabs: Upload (📁), System Audio (🔊), Listen Mic (🎤)
+- Active tab highlighted with blue background, inactive tabs gray
+- Icons from lucide-react: `Upload`, `MonitorSpeaker`, `Mic`

 **Changes to `LTTPage.tsx`:**
-1. Extend `SourceType` from `'upload' | 'youtube'` to `'upload' | 'youtube' | 'system-audio'`
-2. Add third tab button (icon: `AudioLines` from lucide-react) in the source selector
-3. Initialize `useSystemAudioASR` hook with a UUID-based WebSocket URL
-4. Update `asr` variable:
+1. Add `SourceType` state: `const [source, setSource] = useState<SourceType>('upload')`
+2. Render `<SourceSelector activeSource={source} onSelect={setSource} />` above the panels
+3. Extend `SourceType` to `'upload' | 'system-audio' | 'mic'`
+4. Initialize `useSystemAudioASR` and `useMicASR` hooks with session-scoped UUIDs (generated once when tab selected, reused across Start/Stop cycles)
+5. Update `asr` variable:
   ```typescript
-   const asr = source === 'youtube' ? youtubeASR 
-     : source === 'system-audio' ? systemAudioASR 
+   const asr = source === 'system-audio' ? systemAudioASR
+     : source === 'mic' ? micASR
     : uploadASR
   ```
-5. Conditional rendering:
-   ```
-   {source === 'upload' && <VideoUploader />}
-   {source === 'youtube' && <YouTubeMode />}
+6. Conditional rendering:
+   ```tsx
+   {source === 'upload' && <VideoUpload />}
   {source === 'system-audio' && <SystemAudioCapture />}
+   {source === 'mic' && <MicCapture />}
   ```
-6. WebSocket URL: `ws://host/ws/asr/{crypto.randomUUID()}?language=yue`
-7. Full Transcript button: hidden for system-audio (same as YouTube)
-8. QueryInput: remains editable during capture (same behavior as other sources)
+7. WebSocket URL: `ws://host/ws/asr/{sessionUUID}?language=yue` (UUID stable per session, regenerated only on source switch)
+8. Full Transcript button: hidden for system-audio AND mic (streaming ASR only)
+9. QueryInput: remains editable during capture/listening

 **Test Files:** `frontend/src/test/test_phase4_LTTPage_integration.test.tsx`

-### Phase 4.5 — Backend Adjustments (0.5 day)
+### Phase 4.7 — Backend Adjustments (0.5 day)

-**Objective:** Ensure backend handles system audio sessions correctly.
+**Objective:** Ensure backend handles both system audio and mic sessions correctly.

 **Tasks:**
 1. Verify `ws_asr.py` WebSocket endpoint works with arbitrary `video_id` (UUID format) — likely no changes needed
-2. Add `SYSTEM_AUDIO_ENABLED` config validation in the router (return 503 if disabled)
-3. Handle system audio sessions in transcript history (optional — store with `source: 'system-audio'` metadata)
-4. Verify the ASR client handles system audio PCM identically to video audio
+2. Add `SYSTEM_AUDIO_ENABLED` and `MIC_ENABLED` config validation in the router (return 503 if disabled)
+3. Handle system audio and mic sessions in transcript history (optional — store with `source: 'system-audio'` / `source: 'mic'` metadata)
+4. Verify the ASR client handles audio from both sources identically

 **No new endpoints needed.** The existing WebSocket and ASR infrastructure is source-agnostic.

 **Test Files:** `backend/app/test/test_phase4_config.py`

-### Phase 4.6 — Integration & Acceptance Tests (1 day)
+### Phase 4.8 — Integration & Acceptance Tests (1 day)

-**Objective:** Comprehensive tests for the system audio capture flow.
+**Objective:** Comprehensive tests for both capture flows.

 **Backend Integration Tests** (`backend/app/test/test_integration_phase4.py`):
 1. WebSocket accepts UUID video_id
 2. ASR processes audio from system audio session
-3. Config toggle disables feature
+3. ASR processes audio from mic session
+4. Config toggles disable features

 **Frontend Tests:**
-1. **Hook tests** (`test_phase4_useSystemAudioASR.test.ts`): ~10 tests
+
+1. **System Audio Hook tests** (`test_phase4_useSystemAudioASR.test.ts`): ~10 tests
   - Mock `getDisplayMedia` → successful capture
   - Mock `getDisplayMedia` → user cancels (permission denied)
   - Mock `getDisplayMedia` → no audio track
@ -300,31 +440,52 @@ On Linux, only tab audio is available (not full system audio).
   - `stopCapture` cleanup
   - Multiple rapid start/stop cycles

-2. **Component tests** (`test_phase4_SystemAudioCapture.test.tsx`): ~5 tests
+2. **System Audio Component tests** (`test_phase4_SystemAudioCapture.test.tsx`): ~5 tests
   - All UI states render correctly (idle, requesting, capturing, error)
   - Start button calls onStart
   - Stop button calls onStop
   - Error state shows message and retry button
   - Compatibility note visible for non-Chrome (optional)

-3. **Integration tests** (`test_phase4_LTTPage_integration.test.tsx`): ~5 tests
+3. **Mic Hook tests** (`test_phase4_useMicASR.test.ts`): ~8 tests
+   - Mock `getUserMedia` → successful capture
+   - Mock `getUserMedia` → user denies (permission denied)
+   - Mock `getUserMedia` → no audio track
+   - AudioContext setup and teardown
+   - WebSocket connection lifecycle
+   - `track.onended` triggers auto-stop
+   - `stopListening` cleanup
+   - PCM conversion and sending
+
+4. **Mic Component tests** (`test_phase4_MicCapture.test.tsx`): ~4 tests
+   - All UI states render correctly (idle, requesting, listening, error)
+   - Start button calls onStart
+   - Stop button calls onStop
+   - Error state shows message and retry button
+
+5. **LTTPage Integration tests** (`test_phase4_LTTPage_integration.test.tsx`): ~8 tests
   - System Audio tab renders and switches correctly
+   - Listen Mic tab renders and switches correctly
   - ASR variable selects systemAudioASR when source is system-audio
-   - Full Transcript button hidden for system audio
+   - ASR variable selects micASR when source is mic
+   - Full Transcript button hidden for system audio and mic
   - QueryInput receives transcript from system audio
+   - QueryInput receives transcript from mic
   - Source switching preserves transcript

 **Acceptance Tests** (`backend/app/test/acceptance/test_acceptance_phase4.py`):
 - Real `getDisplayMedia` with actual browser (manual — requires human interaction)
+- Real `getUserMedia` with actual microphone (manual — requires human interaction)
 - Real DashScope ASR with system audio stream
- End-to-end: capture → ASR → transcript → RAG answer
+- Real DashScope ASR with microphone stream
+- End-to-end: capture → ASR → transcript → RAG answer (both sources)

-### Phase 4.7 — Polish & Documentation (0.5 day)
+### Phase 4.9 — Polish & Documentation (0.5 day)

 **Tasks:**
-1. Update `README.md` — add System Audio Capture section with usage instructions, browser compatibility table, and limitations
+1. Update `README.md` — add System Audio Capture and Listen Mic sections with usage instructions, browser compatibility table, and limitations
 2. Update `development_plan.md` — add Phase 4 row to timeline, mark status
-3. Add browser detection helper for compatibility warning
+3. Add browser detection helper for system audio compatibility warning
 4. Verify production build (`npm run build`)
 5. Run full CI regression (`pytest` + `vitest`)
 6. Final commit
@ -335,34 +496,51 @@ On Linux, only tab audio is available (not full system audio).

 | Decision | Rationale |
 |----------|-----------|
-| New hook (`useSystemAudioASR`) rather than modifying existing | MediaStream source requires `createMediaStreamSource` (not `createMediaElementSource`), and lifecycle is permission-based (not play/pause events). Separate hook avoids branching complexity. |
-| UUID-based `video_id` | No actual video file for system audio. `crypto.randomUUID()` generates unique session IDs. Backend WebSocket already accepts arbitrary strings. |
-| Manual Start/Stop (not auto) | `getDisplayMedia()` requires explicit user action (browser policy). Cannot auto-start. |
-| No video display in System Audio mode | User watches content in another tab/window. Only capture status and audio controls shown. |
-| `video: false` in getDisplayMedia | Audio-only capture reduces bandwidth and permission scope. User only needs to share audio. |
-| Hide Full Transcript button for system audio | Same as YouTube — streaming ASR only. Full transcript would require recording and batch processing (future Phase 5). |
-| Browser compatibility note in UI | `getDisplayMedia` with audio is Chrome/Edge-only. Non-supporting browsers get clear messaging. |
+| New hooks rather than modifying existing | MediaStream source requires `createMediaStreamSource` (not `createMediaElementSource`), and lifecycle is permission-based (not play/pause events). Separate hooks avoid branching complexity. |
+| Two separate hooks + shared audio utility | System Audio and Mic share identical audio processing (MediaStream → PCM → WebSocket) but differ in capture API (`getDisplayMedia` vs `getUserMedia`) and UX. Extract shared pipeline to avoid duplication. |
+| UUID-based `video_id` (per-session) | No actual video file for live audio. UUID generated once when source tab is selected, reused across Start/Stop cycles within the same session. Regenerated only when switching between sources. Backend WebSocket already accepts arbitrary strings. |
+| Manual Start/Stop (not auto) | Both `getDisplayMedia()` and `getUserMedia()` require explicit user action (browser policy). Cannot auto-start. |
+| No video display in System Audio or Mic mode | User watches/listens to content elsewhere. Only capture status and audio controls shown. |
+| `video: false` in getDisplayMedia | Audio-only capture reduces bandwidth and permission scope. |
+| Hide Full Transcript button for both new sources | Streaming ASR only — no video file to batch transcribe. Full transcript would require audio recording (future Phase 5). |
+| Browser compatibility note only for System Audio | Mic (`getUserMedia`) is universally supported in all modern browsers. System Audio (`getDisplayMedia` with audio) is Chrome/Edge-only. |
+| Mic uses `getUserMedia({ audio: true, video: false })` | Audio-only capture — no camera needed. |

-### getDisplayMedia Options
+### getDisplayMedia Options (System Audio)

 ```javascript
 const stream = await navigator.mediaDevices.getDisplayMedia({
-  video: false,                        // No video needed
+  video: false,
  audio: {
-    systemAudio: 'include',            // Request system audio (tab + full system where supported)
-    echoCancellation: false,           // Don't filter audio
-    noiseSuppression: false,           // Don't filter audio
-    autoGainControl: false,            // Don't adjust volume
+    systemAudio: 'include',
+    echoCancellation: false,
+    noiseSuppression: false,
+    autoGainControl: false,
  }
 })
 ```

-**Note on `video: false`:** Setting `video: false` tells the browser we only want audio. However, the browser permission dialog still shows screen/tab selection (there's no "audio-only picker"). The user must select a tab or screen to share — this is a browser limitation, not ours.
+**Note on `video: false`:** Setting `video: false` tells the browser we only want audio. However, the browser permission dialog still shows screen/tab selection (there's no "audio-only picker"). The user must select a tab or screen to share — this is a browser limitation.
+
+### getUserMedia Options (Listen Mic)
+
+```javascript
+const stream = await navigator.mediaDevices.getUserMedia({
+  audio: {
+    echoCancellation: false,    // Don't filter audio (pass raw mic input)
+    noiseSuppression: false,    // Don't filter audio
+    autoGainControl: false,     // Don't adjust volume
+  },
+  video: false,
+})
+```

 ---

 ## 6. Browser Compatibility

+### System Audio (`getDisplayMedia`)
+
 | Platform / Browser | Tab Audio | System Audio | Works? |
 |--------------------|-----------|-------------|--------|
 | Chrome/Edge (Windows) | ✅ | ✅ | **Best — full support** |
@ -376,11 +554,21 @@ const stream = await navigator.mediaDevices.getDisplayMedia({
 ```typescript
 function isSystemAudioSupported(): boolean {
  const isChromium = 'chrome' in window || navigator.userAgent.includes('Chrome')
-  // Firefox and Safari don't support audio in getDisplayMedia
  return isChromium && !navigator.userAgent.includes('Firefox')
 }
 ```

+### Listen Mic (`getUserMedia`)
+
+| Platform / Browser | Microphone | Works? |
+|--------------------|-----------|--------|
+| Chrome/Edge | ✅ | **Full support** |
+| Firefox | ✅ | **Full support** |
+| Safari | ✅ | **Full support** |
+| Mobile browsers | ✅ | **Full support** |
+
+Mic capture is universally supported — no compatibility warning needed.
+
 ---

 ## 7. Test Strategy
@ -389,16 +577,19 @@ function isSystemAudioSupported(): boolean {

 | File | Type | Count | Description |
 |------|------|-------|-------------|
-| `test_phase4_config.py` | Backend integration | 3 | Config toggle, WebSocket accepts UUID |
-| `test_phase4_useSystemAudioASR.test.ts` | Frontend unit | ~10 | Hook behavior: capture, permission, audio, WS |
+| `test_phase4_config.py` | Backend integration | 4 | Config toggles, WebSocket accepts UUID |
+| `test_phase4_useSystemAudioASR.test.ts` | Frontend unit | ~10 | Hook: capture, permission, audio, WS |
 | `test_phase4_SystemAudioCapture.test.tsx` | Frontend component | ~5 | UI states: idle, requesting, capturing, error |
-| `test_phase4_LTTPage_integration.test.tsx` | Frontend integration | ~5 | Tab switching, ASR unification, Full Transcript |
-| `test_integration_phase4.py` | Backend integration | 4 | Config toggle, WebSocket, ASR client |
-| `test_acceptance_phase4.py` | Acceptance | 3 | Real browser + real DashScope ASR |
+| `test_phase4_useMicASR.test.ts` | Frontend unit | ~8 | Hook: capture, permission, audio, WS |
+| `test_phase4_MicCapture.test.tsx` | Frontend component | ~4 | UI states: idle, requesting, listening, error |
+| `test_phase4_LTTPage_integration.test.tsx` | Frontend integration | ~8 | Tab switching, ASR unification, Full Transcript |
+| `test_integration_phase4.py` | Backend integration | 4 | Config toggles, WebSocket, ASR client |
+| `test_acceptance_phase4.py` | Acceptance | 5 | Real browser + real mic + real DashScope ASR |

 ### Mocking Strategy

 - **`getDisplayMedia`**: Mock with `jest.fn()` returning a synthetic MediaStream with an AudioTrack
+- **`getUserMedia`**: Mock with `jest.fn()` returning a synthetic MediaStream with an AudioTrack
 - **AudioContext**: Use `jest-webgl-mock` or manual mock for AudioContext, ScriptProcessorNode
 - **WebSocket**: Mock via `vitest` WebSocket mock (same pattern as Phase 2/3 tests)
 - **DashScope ASR**: Mock in CI; real in acceptance tests
@ -410,9 +601,13 @@ function isSystemAudioSupported(): boolean {
 ### New Files
 ```
 frontend/src/hooks/useSystemAudioASR.ts
+frontend/src/hooks/useMicASR.ts
 frontend/src/components/SystemAudioCapture.tsx
+frontend/src/components/MicCapture.tsx
 frontend/src/test/test_phase4_useSystemAudioASR.test.ts
 frontend/src/test/test_phase4_SystemAudioCapture.test.tsx
+frontend/src/test/test_phase4_useMicASR.test.ts
+frontend/src/test/test_phase4_MicCapture.test.tsx
 frontend/src/test/test_phase4_LTTPage_integration.test.tsx
 backend/app/test/test_phase4_config.py
 backend/app/test/test_integration_phase4.py
@ -422,11 +617,11 @@ backend/app/test/acceptance/test_acceptance_phase4.py

 ### Modified Files
 ```
-frontend/src/pages/LTTPage.tsx                    ← add "System Audio" tab, wire hook
-frontend/src/types/index.ts                       ← add SystemAudioStatus, SystemAudioASRState
-backend/app/core/config.py                        ← add SYSTEM_AUDIO_ENABLED
+frontend/src/pages/LTTPage.tsx                    ← add "System Audio" + "Listen Mic" tabs, wire hooks
+frontend/src/types/index.ts                       ← add SystemAudioStatus, MicStatus, ASRState types
+backend/app/core/config.py                        ← add SYSTEM_AUDIO_ENABLED, MIC_ENABLED
 development_plan.md                               ← add Phase 4 row
-README.md                                         ← add System Audio Capture section
+README.md                                         ← add System Audio + Listen Mic sections
 ```

 ---
@ -434,13 +629,17 @@ README.md                                         ← add System Audio Capture s
 ## 9. Acceptance Criteria

 - [ ] User can select "System Audio" tab in LTTPage
- [ ] Clicking "Start Capture" opens browser permission dialog
- [ ] On permission grant, audio streams through WebSocket to DashScope ASR
- [ ] Real-time transcript flows into QueryInput
- [ ] User can edit transcript while capture continues
+- [ ] User can select "Listen Mic" tab in LTTPage
+- [ ] Clicking "Start Capture" (System Audio) opens browser permission dialog
+- [ ] Clicking "Start Listening" (Listen Mic) opens microphone permission prompt
+- [ ] On permission grant, audio streams through WebSocket to DashScope ASR (both sources)
+- [ ] Real-time transcript flows into QueryInput (both sources)
+- [ ] User can edit transcript while capture/listening continues
 - [ ] "Stop Capture" properly closes MediaStream, AudioContext, WebSocket
- [ ] Permission denied shows clear error message
- [ ] Browser compatibility note shown for non-Chrome browsers
+- [ ] "Stop Listening" properly closes MediaStream, AudioContext, WebSocket
+- [ ] Permission denied shows clear error message (both sources)
+- [ ] Browser compatibility note shown for System Audio on non-Chrome browsers
+- [ ] No compatibility warning for Listen Mic (universally supported)
 - [ ] All CI tests pass (no regressions)
 - [ ] Acceptance tests pass with real DashScope ASR
 - [ ] `npm run build` produces clean production build
@ -450,4 +649,5 @@ README.md                                         ← add System Audio Capture s
 **File Information**
 - Filename: `phase4_system_audio_plan.md`
 - Created: 2026-05-09
+- Updated: 2026-05-14 — Added Listen Mic as third source; removed YouTube
 - Status: Draft — awaiting review before Phase 4.1 implementation begins
--- a/backend/app/core/config.py
+++ b/backend/app/core/config.py
@ -54,6 +54,10 @@ class Settings(BaseSettings):
    max_video_size_mb: int = 300
    supported_video_formats: list[str] = [".mp4", ".webm", ".mov", ".avi", ".mkv"]

+    # Phase 4 — Live audio capture toggles
+    system_audio_enabled: bool = True
+    mic_enabled: bool = True
+
    # Development helpers
    model_config = {"env_file": ".env", "env_file_encoding": "utf-8"}

--- a/backend/app/routers/ws_asr.py
+++ b/backend/app/routers/ws_asr.py
@ -209,7 +209,7 @@ async def _ws_proxy_dashscope(client_ws: WebSocket, loop: asyncio.AbstractEventL


@router.websocket("/ws/asr/{video_id}")
-async def ws_asr_endpoint(websocket: WebSocket, video_id: str, language: str = "yue"):
+async def ws_asr_endpoint(websocket: WebSocket, video_id: str, language: str = "yue", source: str = "upload"):
    settings = get_settings()
    client_host = websocket.client.host if websocket.client else "unknown"

@ -220,9 +220,23 @@ async def ws_asr_endpoint(websocket: WebSocket, video_id: str, language: str = "
        logger.warning("ws-rejected-no-apikey video_id=%s client=%s", video_id, client_host)
        return

+    if source == "system-audio" and not settings.system_audio_enabled:
+        await websocket.accept()
+        await websocket.send_json({"error": "System audio capture is disabled"})
+        await websocket.close(code=1008, reason="System audio disabled")
+        logger.warning("ws-rejected-system-audio-disabled video_id=%s client=%s", video_id, client_host)
+        return
+
+    if source == "mic" and not settings.mic_enabled:
+        await websocket.accept()
+        await websocket.send_json({"error": "Microphone capture is disabled"})
+        await websocket.close(code=1008, reason="Mic disabled")
+        logger.warning("ws-rejected-mic-disabled video_id=%s client=%s", video_id, client_host)
+        return
+
    await websocket.accept()
    loop = asyncio.get_event_loop()
-    logger.info("ws-connect video_id=%s lang=%s client=%s", video_id, language, client_host)
+    logger.info("ws-connect video_id=%s lang=%s source=%s client=%s", video_id, language, source, client_host)

    try:
        await _ws_proxy_dashscope(websocket, loop, language)
--- a/backend/app/test/test_phase4_config.py
+++ b/backend/app/test/test_phase4_config.py
@ -0,0 +1,140 @@
+"""Phase 4 config tests: system audio and mic capture feature toggles."""
+import pytest
+from fastapi import FastAPI
+from fastapi.testclient import TestClient
+
+
+@pytest.fixture
+def phase4_ws_app(monkeypatch):
+    monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
+    monkeypatch.setenv("SYSTEM_AUDIO_ENABLED", "true")
+    monkeypatch.setenv("MIC_ENABLED", "true")
+    from app.core.config import get_settings
+    from app.routers.ws_asr import router
+    get_settings.cache_clear()
+    app = FastAPI()
+    app.include_router(router)
+    return app
+
+
+class TestWSSourceToggle:
+    def test_system_audio_source_connects(self, phase4_ws_app):
+        client = TestClient(phase4_ws_app)
+        with client.websocket_connect("/ws/asr/test-uuid?source=system-audio") as ws:
+            pass
+
+    def test_mic_source_connects(self, phase4_ws_app):
+        client = TestClient(phase4_ws_app)
+        with client.websocket_connect("/ws/asr/test-uuid?source=mic") as ws:
+            pass
+
+    def test_default_source_is_upload(self, phase4_ws_app):
+        client = TestClient(phase4_ws_app)
+        with client.websocket_connect("/ws/asr/test-uuid") as ws:
+            pass
+
+    def test_system_audio_disabled_rejects(self, monkeypatch):
+        monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
+        monkeypatch.setenv("SYSTEM_AUDIO_ENABLED", "false")
+        from app.core.config import get_settings
+        from app.routers.ws_asr import router
+        get_settings.cache_clear()
+        app = FastAPI()
+        app.include_router(router)
+        client = TestClient(app)
+        with client.websocket_connect("/ws/asr/test-uuid?source=system-audio") as ws:
+            data = ws.receive_json()
+            assert "disabled" in data.get("error", "").lower()
+
+    def test_mic_disabled_rejects(self, monkeypatch):
+        monkeypatch.setenv("DASHSCOPE_API_KEY", "sk-test-key")
+        monkeypatch.setenv("MIC_ENABLED", "false")
+        from app.core.config import get_settings
+        from app.routers.ws_asr import router
+        get_settings.cache_clear()
+        app = FastAPI()
+        app.include_router(router)
+        client = TestClient(app)
+        with client.websocket_connect("/ws/asr/test-uuid?source=mic") as ws:
+            data = ws.receive_json()
+            assert "disabled" in data.get("error", "").lower()
+
+
+def test_config_system_audio_defaults(monkeypatch, tmp_path):
+    monkeypatch.delenv("SYSTEM_AUDIO_ENABLED", raising=False)
+    monkeypatch.setenv("LLM_API_KEY", "sk-test")
+    monkeypatch.setenv("DP_API_KEY", "sk-test")
+    monkeypatch.setenv("EMBEDDING_API_KEY", "sk-test")
+    env_file = tmp_path / ".env"
+    env_file.write_text("")
+    monkeypatch.chdir(tmp_path)
+
+    from app.core.config import Settings, get_settings
+    get_settings.cache_clear()
+    settings = Settings(_env_file=())
+    assert settings.system_audio_enabled is True
+
+
+def test_config_mic_defaults(monkeypatch, tmp_path):
+    monkeypatch.delenv("MIC_ENABLED", raising=False)
+    monkeypatch.setenv("LLM_API_KEY", "sk-test")
+    monkeypatch.setenv("DP_API_KEY", "sk-test")
+    monkeypatch.setenv("EMBEDDING_API_KEY", "sk-test")
+    env_file = tmp_path / ".env"
+    env_file.write_text("")
+    monkeypatch.chdir(tmp_path)
+
+    from app.core.config import Settings, get_settings
+    get_settings.cache_clear()
+    settings = Settings(_env_file=())
+    assert settings.mic_enabled is True
+
+
+def test_config_system_audio_disabled(tmp_path, monkeypatch):
+    env_file = tmp_path / ".env"
+    env_file.write_text(
+        "SYSTEM_AUDIO_ENABLED=false\n"
+        "LLM_API_KEY=sk-test\n"
+        "DP_API_KEY=sk-test\n"
+        "EMBEDDING_API_KEY=sk-test\n"
+    )
+    monkeypatch.chdir(tmp_path)
+    from app.core.config import Settings, get_settings
+    get_settings.cache_clear()
+
+    settings = Settings()
+    assert settings.system_audio_enabled is False
+
+
+def test_config_mic_disabled(tmp_path, monkeypatch):
+    env_file = tmp_path / ".env"
+    env_file.write_text(
+        "MIC_ENABLED=false\n"
+        "LLM_API_KEY=sk-test\n"
+        "DP_API_KEY=sk-test\n"
+        "EMBEDDING_API_KEY=sk-test\n"
+    )
+    monkeypatch.chdir(tmp_path)
+    from app.core.config import Settings, get_settings
+    get_settings.cache_clear()
+
+    settings = Settings()
+    assert settings.mic_enabled is False
+
+
+def test_config_loads_both_toggles_from_env(tmp_path, monkeypatch):
+    env_file = tmp_path / ".env"
+    env_file.write_text(
+        "SYSTEM_AUDIO_ENABLED=true\n"
+        "MIC_ENABLED=true\n"
+        "LLM_API_KEY=sk-test\n"
+        "DP_API_KEY=sk-test\n"
+        "EMBEDDING_API_KEY=sk-test\n"
+    )
+    monkeypatch.chdir(tmp_path)
+    from app.core.config import Settings, get_settings
+    get_settings.cache_clear()
+
+    settings = Settings()
+    assert settings.system_audio_enabled is True
+    assert settings.mic_enabled is True
--- a/frontend/.pnpmrc
+++ b/frontend/.pnpmrc
@ -0,0 +1,2 @@
+onlyBuiltDependencies:
+  - esbuild
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
@ -2071,15 +2071,6 @@
        "node": ">=6.9.0"
      }
    },
-    "node_modules/@types/babel__generator": {
-      "dev": true
-    },
-    "node_modules/@types/babel__template": {
-      "dev": true
-    },
-    "node_modules/@types/babel__traverse": {
-      "dev": true
-    },
    "node_modules/@types/chai": {
      "version": "4.3.20",
      "resolved": "https://registry.npmjs.org/@types/chai/-/chai-4.3.20.tgz",
@ -2130,9 +2121,6 @@
        "@types/unist": "*"
      }
    },
-    "node_modules/@types/jest": {
-      "dev": true
-    },
    "node_modules/@types/mdast": {
      "version": "4.0.4",
      "resolved": "https://registry.npmjs.org/@types/mdast/-/mdast-4.0.4.tgz",
@ -2158,7 +2146,6 @@
        "undici-types": "~7.19.0"
      }
    },
-    "node_modules/@types/prop-types": {},
    "node_modules/@types/react": {
      "version": "18.3.28",
      "resolved": "https://registry.npmjs.org/@types/react/-/react-18.3.28.tgz",
--- a/frontend/package.json
+++ b/frontend/package.json
@ -34,5 +34,10 @@
    "ts-node": "^10.9.1",
    "typescript": "^5.1.6",
    "vitest": "^0.34.3"
+  },
+  "pnpm": {
+    "onlyBuiltDependencies": [
+      "esbuild"
+    ]
  }
 }
--- a/frontend/pnpm-workspace.yaml
+++ b/frontend/pnpm-workspace.yaml
@ -0,0 +1,11 @@
+allowBuilds:
+  '"': true
+  '[': true
+  ']': true
+  b: true
+  d: true
+  e: true
+  i: true
+  l: true
+  s: true
+  u: true
--- a/frontend/src/components/MicCapture.tsx
+++ b/frontend/src/components/MicCapture.tsx
@ -0,0 +1,80 @@
+import React from 'react'
+import { Mic, Loader2, AlertCircle, Circle } from 'lucide-react'
+import type { MicStatus } from '../types'
+
+export interface MicCaptureProps {
+  status: MicStatus
+  error: string | null
+  onStart: () => void
+  onStop: () => void
+}
+
+export const MicCapture: React.FC<MicCaptureProps> = ({
+  status,
+  error,
+  onStart,
+  onStop,
+}) => {
+  if (status === 'error' && error) {
+    return (
+      <div className="h-full flex flex-col">
+        <div className="p-3 bg-red-50 border border-red-200 rounded-lg flex items-start gap-2">
+          <AlertCircle className="w-4 h-4 text-red-500 shrink-0 mt-0.5" />
+          <div className="flex-1">
+            <div className="text-sm text-red-700">{error}</div>
+            <button
+              onClick={onStart}
+              className="mt-2 text-xs text-red-600 hover:text-red-800 font-medium underline"
+            >
+              Try Again
+            </button>
+          </div>
+        </div>
+      </div>
+    )
+  }
+
+  if (status === 'requesting') {
+    return (
+      <div className="h-full flex flex-col items-center justify-center space-y-3">
+        <Loader2 className="w-8 h-8 text-blue-600 animate-spin" />
+        <div className="text-sm text-gray-600 font-medium">Waiting for microphone permission...</div>
+      </div>
+    )
+  }
+
+  if (status === 'listening' || status === 'stopping') {
+    return (
+      <div className="h-full flex flex-col items-center justify-center space-y-4">
+        <div className="flex items-center gap-2">
+          <Circle className="w-3 h-3 text-green-500 fill-green-500 animate-pulse" />
+          <span className="text-sm text-gray-600 font-medium">Listening...</span>
+        </div>
+        <div className="flex items-end gap-1 h-8">
+          <div className="w-2 bg-green-500 rounded-full animate-[bounce_1s_infinite]" style={{ height: '40%', animationDelay: '0ms' }} />
+          <div className="w-2 bg-green-500 rounded-full animate-[bounce_1s_infinite]" style={{ height: '70%', animationDelay: '150ms' }} />
+          <div className="w-2 bg-green-500 rounded-full animate-[bounce_1s_infinite]" style={{ height: '55%', animationDelay: '300ms' }} />
+        </div>
+        <button
+          onClick={onStop}
+          disabled={status === 'stopping'}
+          className="w-full px-4 py-2 bg-red-600 text-white font-medium rounded hover:bg-red-700 focus:outline-none focus:ring-2 focus:ring-red-500 focus:ring-offset-2 disabled:opacity-50 disabled:cursor-not-allowed disabled:hover:bg-red-600 transition-all duration-200"
+        >
+          {status === 'stopping' ? 'Stopping...' : 'Stop Listening'}
+        </button>
+      </div>
+    )
+  }
+
+  return (
+    <div className="h-full flex flex-col">
+      <button
+        onClick={onStart}
+        className="w-full px-4 py-2 bg-blue-600 text-white font-medium rounded hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 transition-all duration-200 flex items-center justify-center gap-2"
+      >
+        <Mic className="w-4 h-4" />
+        Start Listening
+      </button>
+    </div>
+  )
+}
--- a/frontend/src/components/SourceSelector.tsx
+++ b/frontend/src/components/SourceSelector.tsx
@ -0,0 +1,42 @@
+import React from 'react'
+import { Upload, MonitorSpeaker, Mic } from 'lucide-react'
+import type { SourceType } from '../types'
+
+interface SourceSelectorProps {
+  activeSource: SourceType
+  onSelect: (source: SourceType) => void
+}
+
+export const SourceSelector: React.FC<SourceSelectorProps> = ({ activeSource, onSelect }) => {
+  const tabs: { id: SourceType; label: string; icon: React.ElementType }[] = [
+    { id: 'upload', label: 'Upload', icon: Upload },
+    { id: 'system-audio', label: 'System Audio', icon: MonitorSpeaker },
+    { id: 'mic', label: 'Listen Mic', icon: Mic },
+  ]
+
+  return (
+    <div className="flex gap-1 p-1 bg-gray-100 rounded-lg" role="tablist">
+      {tabs.map(tab => {
+        const isActive = activeSource === tab.id
+        const Icon = tab.icon
+        return (
+          <button
+            key={tab.id}
+            role="tab"
+            aria-selected={isActive}
+            onClick={() => onSelect(tab.id)}
+            className={[
+              'flex items-center gap-2 px-4 py-2 rounded-md text-sm font-medium transition-all duration-200',
+              isActive
+                ? 'bg-white text-blue-700 shadow-sm'
+                : 'text-gray-500 hover:text-gray-700 hover:bg-gray-50',
+            ].join(' ')}
+          >
+            <Icon className="w-4 h-4" />
+            {tab.label}
+          </button>
+        )
+      })}
+    </div>
+  )
+}
--- a/frontend/src/components/SystemAudioCapture.tsx
+++ b/frontend/src/components/SystemAudioCapture.tsx
@ -0,0 +1,86 @@
+import React from 'react'
+import { MonitorSpeaker, Loader2, AlertCircle, Circle } from 'lucide-react'
+import type { SystemAudioStatus } from '../types'
+
+export interface SystemAudioCaptureProps {
+  status: SystemAudioStatus
+  error: string | null
+  onStart: () => void
+  onStop: () => void
+}
+
+export const SystemAudioCapture: React.FC<SystemAudioCaptureProps> = ({
+  status,
+  error,
+  onStart,
+  onStop,
+}) => {
+  if (status === 'error' && error) {
+    return (
+      <div className="h-full flex flex-col">
+        <div className="p-3 bg-red-50 border border-red-200 rounded-lg flex items-start gap-2">
+          <AlertCircle className="w-4 h-4 text-red-500 shrink-0 mt-0.5" />
+          <div className="flex-1">
+            <div className="text-sm text-red-700">{error}</div>
+            <button
+              onClick={onStart}
+              className="mt-2 text-xs text-red-600 hover:text-red-800 font-medium underline"
+            >
+              Try Again
+            </button>
+          </div>
+        </div>
+      </div>
+    )
+  }
+
+  if (status === 'requesting') {
+    return (
+      <div className="h-full flex flex-col items-center justify-center space-y-3">
+        <Loader2 className="w-8 h-8 text-blue-600 animate-spin" />
+        <div className="text-sm text-gray-600 font-medium">Waiting for permission...</div>
+      </div>
+    )
+  }
+
+  if (status === 'capturing' || status === 'stopping') {
+    return (
+      <div className="h-full flex flex-col items-center justify-center space-y-4">
+        <div className="flex items-center gap-2">
+          <Circle className="w-3 h-3 text-green-500 fill-green-500 animate-pulse" />
+          <span className="text-sm text-gray-600 font-medium">Capturing system audio...</span>
+        </div>
+        <div className="flex items-end gap-1 h-8">
+          <div className="w-2 bg-green-500 rounded-full animate-[bounce_1s_infinite]" style={{ height: '40%', animationDelay: '0ms' }} />
+          <div className="w-2 bg-green-500 rounded-full animate-[bounce_1s_infinite]" style={{ height: '70%', animationDelay: '150ms' }} />
+          <div className="w-2 bg-green-500 rounded-full animate-[bounce_1s_infinite]" style={{ height: '55%', animationDelay: '300ms' }} />
+        </div>
+        <button
+          onClick={onStop}
+          disabled={status === 'stopping'}
+          className="w-full px-4 py-2 bg-red-600 text-white font-medium rounded hover:bg-red-700 focus:outline-none focus:ring-2 focus:ring-red-500 focus:ring-offset-2 disabled:opacity-50 disabled:cursor-not-allowed disabled:hover:bg-red-600 transition-all duration-200"
+        >
+          {status === 'stopping' ? 'Stopping...' : 'Stop Capture'}
+        </button>
+      </div>
+    )
+  }
+
+  return (
+    <div className="h-full flex flex-col space-y-3">
+      <button
+        onClick={onStart}
+        className="w-full px-4 py-2 bg-blue-600 text-white font-medium rounded hover:bg-blue-700 focus:outline-none focus:ring-2 focus:ring-blue-500 focus:ring-offset-2 transition-all duration-200 flex items-center justify-center gap-2"
+      >
+        <MonitorSpeaker className="w-4 h-4" />
+        Start Capture
+      </button>
+      <div className="p-3 bg-amber-50 border border-amber-200 rounded-lg flex items-start gap-2">
+        <AlertCircle className="w-4 h-4 text-amber-600 shrink-0 mt-0.5" />
+        <div className="text-xs text-amber-700 leading-relaxed">
+          System audio capture works best in Chrome/Edge on Windows/macOS. Firefox and Safari do not support this feature. On Linux, only tab audio is available.
+        </div>
+      </div>
+    </div>
+  )
+}
--- a/frontend/src/hooks/useMediaStreamASR.ts
+++ b/frontend/src/hooks/useMediaStreamASR.ts
@ -0,0 +1,191 @@
+import { useState, useRef, useCallback, useEffect } from 'react'
+import type { ASRMessage } from '../types'
+
+export interface UseMediaStreamASRProps {
+  wsUrl: string
+}
+
+export interface UseMediaStreamASRReturn {
+  status: 'idle' | 'requesting' | 'streaming' | 'stopping' | 'error'
+  transcript: string
+  partialTranscript: string
+  error: string | null
+  start: (stream: MediaStream) => void
+  stop: () => void
+}
+
+/**
+ * Shared audio pipeline: AudioContext → ScriptProcessorNode → Float32 PCM → WebSocket.
+ * Wrapper hooks (system audio, mic) obtain the MediaStream, then call `start(stream)`.
+ * Follows the exact audio-processing and WebSocket message pattern from useVideoASR.ts.
+ */
+export function useMediaStreamASR({ wsUrl }: UseMediaStreamASRProps): UseMediaStreamASRReturn {
+  const [status, setStatus] = useState<'idle' | 'requesting' | 'streaming' | 'stopping' | 'error'>('idle')
+  const [transcript, setTranscript] = useState('')
+  const [partialTranscript, setPartialTranscript] = useState('')
+  const [error, setError] = useState<string | null>(null)
+
+  const wsRef = useRef<WebSocket | null>(null)
+  const audioContextRef = useRef<AudioContext | null>(null)
+  const processorRef = useRef<ScriptProcessorNode | null>(null)
+  const sourceRef = useRef<MediaStreamAudioSourceNode | null>(null)
+  const streamRef = useRef<MediaStream | null>(null)
+  const isStreamingRef = useRef(false)
+  const isManualCloseRef = useRef(false)
+  const transcriptRef = useRef('')
+  const lastStashRef = useRef('')
+
+  const cleanup = useCallback(() => {
+    isStreamingRef.current = false
+
+    // Stash handling — mirrors useVideoASR stopStreaming lines 101-111
+    let currentText = transcriptRef.current.trim()
+    const stash = lastStashRef.current.trim()
+    if (stash && !currentText.endsWith(stash)) {
+      currentText += stash
+      transcriptRef.current = currentText
+    }
+    lastStashRef.current = ''
+    if (currentText) {
+      setTranscript(currentText)
+      // Keep partialTranscript populated so the text remains visible in QueryInput
+      // after the user stops capture/listening. Unlike video ASR, mic/system-audio
+      // hooks have no onFinalTranscript callback to persist via queryText.
+      setPartialTranscript(currentText)
+    }
+
+    if (streamRef.current) {
+      streamRef.current.getTracks().forEach(t => {
+        t.onended = null
+        t.stop()
+      })
+      streamRef.current = null
+    }
+
+    processorRef.current?.disconnect()
+    sourceRef.current?.disconnect()
+    processorRef.current = null
+    sourceRef.current = null
+
+    if (wsRef.current) {
+      isManualCloseRef.current = true
+      wsRef.current.close()
+      wsRef.current = null
+    }
+
+    if (audioContextRef.current) {
+      audioContextRef.current.close()
+      audioContextRef.current = null
+    }
+  }, [])
+
+  const stop = useCallback(() => {
+    setStatus('stopping')
+    cleanup()
+    setStatus('idle')
+  }, [cleanup])
+
+  const start = useCallback((stream: MediaStream) => {
+    cleanup()
+
+    setError(null)
+    transcriptRef.current = ''
+    lastStashRef.current = ''
+    setTranscript('')
+    setPartialTranscript('')
+    streamRef.current = stream
+    isManualCloseRef.current = false
+
+    stream.getAudioTracks().forEach(track => {
+      track.onended = () => {
+        cleanup()
+        setStatus('idle')
+      }
+    })
+
+    try {
+      // AudioContext + ScriptProcessorNode — mirrors useVideoASR lines 117-136
+      const audioContext = new AudioContext({ sampleRate: 16000 })
+      audioContextRef.current = audioContext
+
+      const source = audioContext.createMediaStreamSource(stream)
+      sourceRef.current = source
+
+      const processor = audioContext.createScriptProcessor(4096, 1, 1)
+      processorRef.current = processor
+
+      // onaudioprocess — mirrors useVideoASR lines 126-132 exactly
+      processor.onaudioprocess = (e) => {
+        const float32Data = e.inputBuffer.getChannelData(0)
+        const outputData = e.outputBuffer.getChannelData(0)
+        outputData.set(float32Data)
+        if (!isStreamingRef.current) return
+        if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return
+        wsRef.current.send(float32Data.buffer)
+      }
+
+      source.connect(processor)
+      processor.connect(audioContext.destination)
+
+      const ws = new WebSocket(wsUrl)
+      wsRef.current = ws
+
+      ws.onopen = () => {
+        isStreamingRef.current = true
+        setStatus('streaming')
+      }
+
+      // Message parsing — mirrors useVideoASR lines 51-64 exactly
+      ws.onmessage = (e) => {
+        const msg: ASRMessage = JSON.parse(e.data)
+        if (msg.is_final && msg.full_text) {
+          transcriptRef.current = msg.full_text
+          lastStashRef.current = ''
+          setTranscript(msg.full_text)
+          setPartialTranscript(msg.full_text)
+        } else if (msg.delta) {
+          transcriptRef.current += msg.delta
+          lastStashRef.current = (msg as any).stash || ''
+          setTranscript(transcriptRef.current)
+          setPartialTranscript(transcriptRef.current)
+        }
+      }
+
+      ws.onerror = () => {
+        console.error('[useMediaStreamASR] WebSocket error')
+        setError('WebSocket connection error')
+        setStatus('error')
+        isManualCloseRef.current = true
+        cleanup()
+      }
+
+      ws.onclose = () => {
+        isStreamingRef.current = false
+        if (isManualCloseRef.current) return
+        setError('ASR connection closed unexpectedly')
+        setStatus('error')
+      }
+    } catch (err) {
+      console.error('[useMediaStreamASR] start failed:', err)
+      setError(err instanceof Error ? err.message : 'Failed to start audio pipeline')
+      setStatus('error')
+    }
+  }, [wsUrl, cleanup])
+
+  useEffect(() => {
+    return () => {
+      if (streamRef.current) {
+        streamRef.current.getTracks().forEach(t => {
+          t.onended = null
+          t.stop()
+        })
+      }
+      processorRef.current?.disconnect()
+      sourceRef.current?.disconnect()
+      wsRef.current?.close()
+      audioContextRef.current?.close()
+    }
+  }, [])
+
+  return { status, transcript, partialTranscript, error, start, stop }
+}
--- a/frontend/src/hooks/useMicASR.ts
+++ b/frontend/src/hooks/useMicASR.ts
@ -0,0 +1,85 @@
+import { useState, useEffect } from 'react'
+import type { MicStatus } from '../types'
+import { useMediaStreamASR } from './useMediaStreamASR'
+
+export function useMicASR({ wsUrl }: { wsUrl: string }) {
+  const pipeline = useMediaStreamASR({ wsUrl })
+  const [status, setStatus] = useState<MicStatus>('idle')
+  const [wrapperError, setWrapperError] = useState<string | null>(null)
+
+  useEffect(() => {
+    switch (pipeline.status) {
+      case 'streaming':
+        setStatus('listening')
+        setWrapperError(null)
+        break
+      case 'stopping':
+        setStatus('stopping')
+        break
+      case 'error':
+        setStatus('error')
+        setWrapperError(null)
+        break
+      case 'idle':
+        setStatus('idle')
+        break
+    }
+  }, [pipeline.status])
+
+  const startListening = async () => {
+    setWrapperError(null)
+    setStatus('requesting')
+
+    try {
+      const stream = await navigator.mediaDevices.getUserMedia({
+        audio: {
+          echoCancellation: false,
+          noiseSuppression: false,
+          autoGainControl: false,
+        },
+        video: false,
+      })
+
+      if (stream.getAudioTracks().length === 0) {
+        stream.getTracks().forEach(t => t.stop())
+        setStatus('error')
+        setWrapperError('No microphone input detected')
+        return
+      }
+
+      pipeline.start(stream)
+    } catch (err) {
+      console.error('[useMicASR] getUserMedia failed:', err)
+      if (err instanceof DOMException && err.name === 'NotAllowedError') {
+        setStatus('idle')
+        setWrapperError('Microphone access denied — please allow microphone access in your browser settings')
+        return
+      }
+      if (err instanceof DOMException && err.name === 'NotFoundError') {
+        setStatus('error')
+        setWrapperError('No microphone found. Please connect a microphone and try again.')
+        return
+      }
+      if (err instanceof DOMException && err.name === 'NotSupportedError') {
+        setStatus('error')
+        setWrapperError('Microphone access is not supported in this browser.')
+        return
+      }
+      setStatus('error')
+      setWrapperError(err instanceof Error ? err.message : 'Failed to start microphone capture')
+    }
+  }
+
+  const stopListening = () => {
+    pipeline.stop()
+  }
+
+  return {
+    status,
+    transcript: pipeline.transcript,
+    partialTranscript: pipeline.partialTranscript,
+    error: wrapperError ?? pipeline.error,
+    startListening,
+    stopListening,
+  }
+}
--- a/frontend/src/hooks/useSystemAudioASR.ts
+++ b/frontend/src/hooks/useSystemAudioASR.ts
@ -0,0 +1,91 @@
+import { useState, useEffect } from 'react'
+import type { SystemAudioStatus } from '../types'
+import { useMediaStreamASR } from './useMediaStreamASR'
+
+export function useSystemAudioASR({ wsUrl }: { wsUrl: string }) {
+  const pipeline = useMediaStreamASR({ wsUrl })
+  const [status, setStatus] = useState<SystemAudioStatus>('idle')
+  const [wrapperError, setWrapperError] = useState<string | null>(null)
+
+  useEffect(() => {
+    switch (pipeline.status) {
+      case 'streaming':
+        setStatus('capturing')
+        setWrapperError(null)
+        break
+      case 'stopping':
+        setStatus('stopping')
+        break
+      case 'error':
+        setStatus('error')
+        setWrapperError(null)
+        break
+      case 'idle':
+        setStatus('idle')
+        break
+    }
+  }, [pipeline.status])
+
+  const startCapture = async () => {
+    setWrapperError(null)
+    setStatus('requesting')
+
+    try {
+      // getDisplayMedia() SPEC: video:true is REQUIRED.
+      // Setting video:false causes TypeError (Chrome) or NotSupportedError.
+      // We capture video but immediately discard it — only audio is used.
+      const stream = await navigator.mediaDevices.getDisplayMedia({
+        video: true,
+        audio: {
+          systemAudio: 'include',
+          echoCancellation: false,
+          noiseSuppression: false,
+          autoGainControl: false,
+        },
+      } as any)
+
+      // Stop video tracks immediately — we only need audio
+      stream.getVideoTracks().forEach((t) => t.stop())
+
+      if (stream.getAudioTracks().length === 0) {
+        stream.getTracks().forEach((t) => t.stop())
+        setStatus('error')
+        setWrapperError(
+          'No audio track found. Make sure to enable "Share audio" in the sharing dialog and select a tab or window that is playing audio.',
+        )
+        return
+      }
+
+      pipeline.start(stream)
+    } catch (err) {
+      console.error('[useSystemAudioASR] getDisplayMedia failed:', err)
+      if (err instanceof DOMException) {
+        if (err.name === 'AbortError' || err.name === 'NotAllowedError') {
+          setStatus('idle')
+          setWrapperError('Permission denied — system audio capture requires your explicit permission')
+          return
+        }
+        if (err.name === 'NotSupportedError') {
+          setStatus('error')
+          setWrapperError('System audio capture is not supported on this platform. Linux only supports tab audio — try Chrome/Edge on Windows or macOS for full system audio.')
+          return
+        }
+      }
+      setStatus('error')
+      setWrapperError(err instanceof Error ? err.message : 'Failed to start system audio capture')
+    }
+  }
+
+  const stopCapture = () => {
+    pipeline.stop()
+  }
+
+  return {
+    status,
+    transcript: pipeline.transcript,
+    partialTranscript: pipeline.partialTranscript,
+    error: wrapperError ?? pipeline.error,
+    startCapture,
+    stopCapture,
+  }
+}
--- a/frontend/src/pages/LTTPage.tsx
+++ b/frontend/src/pages/LTTPage.tsx
@ -1,8 +1,10 @@
-import React, { useState, useCallback, useEffect } from 'react'
+import React, { useState, useCallback, useEffect, useMemo } from 'react'
 import { Loader2, AlertCircle, FileText } from 'lucide-react'
 import { Group, Panel, Separator } from 'react-resizable-panels'
 import { useQueryDocumentStream } from '../lib/queries'
 import { useVideoASR } from '../hooks/useVideoASR'
+import { useSystemAudioASR } from '../hooks/useSystemAudioASR'
+import { useMicASR } from '../hooks/useMicASR'
 import { useFullTranscript } from '../hooks/useFullTranscript'
 import { getVideoUrl } from '../lib/api'
 import { QueryInput } from '../components/QueryInput'
@ -10,15 +12,20 @@ import { ExtractedQuestionsDisplay } from '../components/ExtractedQuestionsDispl
 import { ResponsePanel } from '../components/ResponsePanel'
 import { VideoUpload } from '../components/VideoUpload'
 import { VideoPlayer } from '../components/VideoPlayer'
+import { SourceSelector } from '../components/SourceSelector'
+import { SystemAudioCapture } from '../components/SystemAudioCapture'
+import { MicCapture } from '../components/MicCapture'
+import type { SourceType } from '../types'

 export const LTTPage: React.FC = () => {
+  const [source, setSource] = useState<SourceType>('upload')
  const [currentVideoId, setCurrentVideoId] = useState<string | null>(null)
  const [queryText, setQueryText] = useState('')
  const [videoEl, setVideoEl] = useState<HTMLVideoElement | null>(null)

  const queryStream = useQueryDocumentStream()

-  const asr = useVideoASR({
+  const uploadASR = useVideoASR({
    videoId: currentVideoId ?? '',
    videoElement: videoEl,
    language: 'yue',
@ -29,6 +36,24 @@ export const LTTPage: React.FC = () => {

  const ft = useFullTranscript({ videoId: currentVideoId ?? '' })

+  const systemAudioWsUrl = useMemo(() => {
+    const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'
+    const host = import.meta.env.VITE_WS_HOST ?? window.location.host
+    return `${protocol}//${host}/ws/asr/${crypto.randomUUID()}?language=yue&source=system-audio`
+  }, [])
+  const micWsUrl = useMemo(() => {
+    const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'
+    const host = import.meta.env.VITE_WS_HOST ?? window.location.host
+    return `${protocol}//${host}/ws/asr/${crypto.randomUUID()}?language=yue&source=mic`
+  }, [])
+
+  const systemAudioASR = useSystemAudioASR({ wsUrl: systemAudioWsUrl })
+  const micASR = useMicASR({ wsUrl: micWsUrl })
+
+  const asr = source === 'system-audio' ? systemAudioASR
+    : source === 'mic' ? micASR
+    : uploadASR
+
  useEffect(() => {
    if (ft.fullTranscript) {
      setQueryText(ft.fullTranscript)
@ -58,6 +83,9 @@ export const LTTPage: React.FC = () => {

  return (
    <div className="h-full bg-gray-50">
+      <div className="px-4 pt-3">
+        <SourceSelector activeSource={source} onSelect={setSource} />
+      </div>
      <Group
        orientation="vertical"
        id="ltt-main-group"
@ -69,42 +97,58 @@ export const LTTPage: React.FC = () => {
            <Group orientation="horizontal" id="ltt-upper-group" className="h-full">
              <Panel id="ltt-upper-left" minSize="30%" defaultSize={50}>
                <div className="h-full p-4 overflow-hidden flex flex-col gap-3">
-                  {currentVideoId ? (
-                    <>
-                      <VideoPlayer ref={setVideoEl} src={videoUrl} />
-                      <button
-                        onClick={handleRequestFullTranscript}
-                        disabled={ft.isLoading}
-                        className="shrink-0 flex items-center justify-center gap-2 px-4 py-2 bg-gray-100 hover:bg-gray-200 text-gray-700 font-medium rounded-lg transition-colors duration-200 disabled:opacity-50 disabled:cursor-not-allowed"
-                      >
-                        {ft.isLoading ? (
-                          <Loader2 className="w-4 h-4 animate-spin" />
-                        ) : (
-                          <FileText className="w-4 h-4" />
+                  {source === 'upload' ? (
+                    currentVideoId ? (
+                      <>
+                        <VideoPlayer ref={setVideoEl} src={videoUrl} />
+                        <button
+                          onClick={handleRequestFullTranscript}
+                          disabled={ft.isLoading}
+                          className="shrink-0 flex items-center justify-center gap-2 px-4 py-2 bg-gray-100 hover:bg-gray-200 text-gray-700 font-medium rounded-lg transition-colors duration-200 disabled:opacity-50 disabled:cursor-not-allowed"
+                        >
+                          {ft.isLoading ? (
+                            <Loader2 className="w-4 h-4 animate-spin" />
+                          ) : (
+                            <FileText className="w-4 h-4" />
+                          )}
+                          <span>{ft.isLoading ? 'Transcribing...' : 'Full Transcript'}</span>
+                        </button>
+                        {ft.error && (
+                          <div
+                            data-testid="full-transcript-error"
+                            className="flex items-start gap-2 text-sm text-red-600"
+                          >
+                            <AlertCircle className="w-4 h-4 shrink-0 mt-0.5" />
+                            <span>{ft.error}</span>
+                          </div>
                        )}
-                        <span>{ft.isLoading ? 'Transcribing...' : 'Full Transcript'}</span>
-                      </button>
-                      {ft.error && (
-                        <div
-                          data-testid="full-transcript-error"
-                          className="flex items-start gap-2 text-sm text-red-600"
-                        >
-                          <AlertCircle className="w-4 h-4 shrink-0 mt-0.5" />
-                          <span>{ft.error}</span>
-                        </div>
-                      )}
-                      {asr.status === 'error' && (
-                        <div
-                          data-testid="asr-error-indicator"
-                          className="flex items-center gap-2 text-xs text-red-600 bg-red-50 border border-red-200 rounded px-2 py-1"
-                        >
-                          <AlertCircle className="w-3 h-3" />
-                          <span>ASR error</span>
-                        </div>
-                      )}
-                    </>
+                        {uploadASR.status === 'error' && (
+                          <div
+                            data-testid="asr-error-indicator"
+                            className="flex items-center gap-2 text-xs text-red-600 bg-red-50 border border-red-200 rounded px-2 py-1"
+                          >
+                            <AlertCircle className="w-3 h-3" />
+                            <span>ASR error</span>
+                          </div>
+                        )}
+                      </>
+                    ) : (
+                      <VideoUpload onUploadSuccess={handleUploadSuccess} />
+                    )
+                  ) : source === 'system-audio' ? (
+                    <SystemAudioCapture
+                      status={systemAudioASR.status}
+                      error={systemAudioASR.error}
+                      onStart={systemAudioASR.startCapture}
+                      onStop={systemAudioASR.stopCapture}
+                    />
                  ) : (
-                    <VideoUpload onUploadSuccess={handleUploadSuccess} />
+                    <MicCapture
+                      status={micASR.status}
+                      error={micASR.error}
+                      onStart={micASR.startListening}
+                      onStop={micASR.stopListening}
+                    />
                  )}
                </div>
              </Panel>
--- a/frontend/src/types/index.ts
+++ b/frontend/src/types/index.ts
@ -196,3 +196,29 @@ export interface VideoUploadResponse {
  size_bytes: number
  url: string
 }
+
+// Phase 4 — Live audio capture types
+
+export type SourceType = 'upload' | 'system-audio' | 'mic'
+
+export type SystemAudioStatus = 'idle' | 'requesting' | 'capturing' | 'stopping' | 'error'
+
+export type MicStatus = 'idle' | 'requesting' | 'listening' | 'stopping' | 'error'
+
+export interface SystemAudioASRState {
+  status: SystemAudioStatus
+  transcript: string
+  partialTranscript: string
+  error: string | null
+  startCapture: () => Promise<void>
+  stopCapture: () => void
+}
+
+export interface MicASRState {
+  status: MicStatus
+  transcript: string
+  partialTranscript: string
+  error: string | null
+  startListening: () => Promise<void>
+  stopListening: () => void
+}