3.7 KiB
Bug Fix Plan: Phase 4 Audio Echo/Overlapping
Date: 2026-05-18
Status: Completed
Commit: 80af17a — fix: mute audio output during System Audio and Mic capture to prevent echo
Affected Feature: Phase 4 — System Audio Capture & Listen Mic
Symptom
When using "System Audio" or "Listen Mic" capture, the captured audio plays back through the speakers, creating:
- System Audio: infinite echo loop (captured audio → speakers → recaptured → speakers → ...)
- Listen Mic: howling feedback loop (mic → speakers → mic → ...)
Root Cause
File: frontend/src/hooks/useMediaStreamASR.ts, lines 118–128
The ScriptProcessorNode.onauidoprocess handler copies captured PCM data to the output buffer, and the processor is connected directly to audioContext.destination (system speakers):
// Lines 118-128
processor.onaudioprocess = (e) => {
const float32Data = e.inputBuffer.getChannelData(0)
const outputData = e.outputBuffer.getChannelData(0)
outputData.set(float32Data) // ← copies captured audio to output
// ...
wsRef.current.send(float32Data.buffer)
}
source.connect(processor)
processor.connect(audioContext.destination) // ← routes output to speakers
Why video ASR is not affected: useVideoASR.ts uses the same pattern, but it's intentional — the user needs to hear the video. Only Phase 4 live capture (system audio / mic) should mute output.
Backend: ws_asr.py is clean — passthrough proxy to DashScope ASR, JSON only, no audio sent back.
Fix
Single file to modify: frontend/src/hooks/useMediaStreamASR.ts
Approach: Insert a GainNode with gain = 0 between the processor and audioContext.destination. This keeps the processor in the audio graph (ensuring onaudioprocess fires in all browsers) while muting output.
Before: source → processor → audioContext.destination ❌
After: source → processor → zeroGain(0.0) → destination ✅
Changes
- Add
gainNodeRefalongside existing refs (~line 31) - Create zero-gain
GainNodeafter processor creation (~line 115) - Replace
processor.connect(audioContext.destination)with zero-gain path - Remove
outputData.set(float32Data)— unnecessary since output buffer is unused - Clean up gain node in
cleanup()anduseEffectteardown
Diff (conceptual)
const processor = audioContext.createScriptProcessor(4096, 1, 1)
processorRef.current = processor
+ const zeroGain = audioContext.createGain()
+ zeroGain.gain.value = 0
+ gainNodeRef.current = zeroGain
processor.onaudioprocess = (e) => {
const float32Data = e.inputBuffer.getChannelData(0)
- const outputData = e.outputBuffer.getChannelData(0)
- outputData.set(float32Data)
if (!isStreamingRef.current) return
if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return
wsRef.current.send(float32Data.buffer)
}
source.connect(processor)
- processor.connect(audioContext.destination)
+ processor.connect(zeroGain)
+ zeroGain.connect(audioContext.destination)
Acceptance Criteria
- System Audio capture: transcript streams normally, no audio playback
- Listen Mic: transcript streams normally, no feedback loop
- Video ASR (Upload tab): video audio still plays (regression check —
useVideoASR.tsuntouched) - Existing Phase 4 tests pass:
pnpm test -- test_phase4→ 76 passed, 0 failed - Stop/restart capture works (gain node cleaned up properly)
Implementation Tasks
- Modify
useMediaStreamASR.ts: add zero-gain node, remove output copy, update cleanup - Verify with manual test (System Audio + Listen Mic)
- Run existing Phase 4 frontend tests → 76 passed
- Commit:
80af17a