# Bug Fix Plan: Phase 4 Audio Echo/Overlapping **Date**: 2026-05-18 **Status**: Planned **Affected Feature**: Phase 4 — System Audio Capture & Listen Mic --- ## Symptom When using "System Audio" or "Listen Mic" capture, the captured audio plays back through the speakers, creating: - **System Audio**: infinite echo loop (captured audio → speakers → recaptured → speakers → ...) - **Listen Mic**: howling feedback loop (mic → speakers → mic → ...) ## Root Cause **File**: `frontend/src/hooks/useMediaStreamASR.ts`, lines 118–128 The `ScriptProcessorNode.onauidoprocess` handler copies captured PCM data to the output buffer, and the processor is connected directly to `audioContext.destination` (system speakers): ```typescript // Lines 118-128 processor.onaudioprocess = (e) => { const float32Data = e.inputBuffer.getChannelData(0) const outputData = e.outputBuffer.getChannelData(0) outputData.set(float32Data) // ← copies captured audio to output // ... wsRef.current.send(float32Data.buffer) } source.connect(processor) processor.connect(audioContext.destination) // ← routes output to speakers ``` **Why video ASR is not affected**: `useVideoASR.ts` uses the same pattern, but it's **intentional** — the user needs to hear the video. Only Phase 4 live capture (system audio / mic) should mute output. **Backend**: `ws_asr.py` is clean — passthrough proxy to DashScope ASR, JSON only, no audio sent back. ## Fix **Single file to modify**: `frontend/src/hooks/useMediaStreamASR.ts` **Approach**: Insert a `GainNode` with `gain = 0` between the processor and `audioContext.destination`. This keeps the processor in the audio graph (ensuring `onaudioprocess` fires in all browsers) while muting output. ``` Before: source → processor → audioContext.destination ❌ After: source → processor → zeroGain(0.0) → destination ✅ ``` ### Changes 1. **Add `gainNodeRef`** alongside existing refs (~line 31) 2. **Create zero-gain `GainNode`** after processor creation (~line 115) 3. **Replace** `processor.connect(audioContext.destination)` with zero-gain path 4. **Remove** `outputData.set(float32Data)` — unnecessary since output buffer is unused 5. **Clean up gain node** in `cleanup()` and `useEffect` teardown ### Diff (conceptual) ```diff const processor = audioContext.createScriptProcessor(4096, 1, 1) processorRef.current = processor + const zeroGain = audioContext.createGain() + zeroGain.gain.value = 0 + gainNodeRef.current = zeroGain processor.onaudioprocess = (e) => { const float32Data = e.inputBuffer.getChannelData(0) - const outputData = e.outputBuffer.getChannelData(0) - outputData.set(float32Data) if (!isStreamingRef.current) return if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return wsRef.current.send(float32Data.buffer) } source.connect(processor) - processor.connect(audioContext.destination) + processor.connect(zeroGain) + zeroGain.connect(audioContext.destination) ``` ## Acceptance Criteria - [ ] System Audio capture: transcript streams normally, **no audio playback** - [ ] Listen Mic: transcript streams normally, **no feedback loop** - [ ] Video ASR (Upload tab): video audio **still plays** (regression check) - [ ] Existing Phase 4 tests pass: `pnpm test -- test_phase4` - [ ] Stop/restart capture works (gain node cleaned up properly) ## Implementation Tasks 1. Modify `useMediaStreamASR.ts`: add zero-gain node, remove output copy, update cleanup 2. Verify with manual test (System Audio + Listen Mic) 3. Run existing Phase 4 frontend tests 4. Commit with message: `fix: mute audio output during System Audio and Mic capture to prevent echo`