From 2d3dc7374d5a609c74e68ebf59c567b1c4033536 Mon Sep 17 00:00:00 2001 From: Woody Date: Mon, 18 May 2026 14:47:46 +0800 Subject: [PATCH] docs: Phase 4 audio echo bug fix plan --- .plans/debug_2026-05-18_phase4_audio_echo.md | 97 ++++++++++++++++++++ 1 file changed, 97 insertions(+) create mode 100644 .plans/debug_2026-05-18_phase4_audio_echo.md diff --git a/.plans/debug_2026-05-18_phase4_audio_echo.md b/.plans/debug_2026-05-18_phase4_audio_echo.md new file mode 100644 index 0000000..6a1eb1f --- /dev/null +++ b/.plans/debug_2026-05-18_phase4_audio_echo.md @@ -0,0 +1,97 @@ +# Bug Fix Plan: Phase 4 Audio Echo/Overlapping + +**Date**: 2026-05-18 +**Status**: Planned +**Affected Feature**: Phase 4 — System Audio Capture & Listen Mic + +--- + +## Symptom + +When using "System Audio" or "Listen Mic" capture, the captured audio plays back through the speakers, creating: + +- **System Audio**: infinite echo loop (captured audio → speakers → recaptured → speakers → ...) +- **Listen Mic**: howling feedback loop (mic → speakers → mic → ...) + +## Root Cause + +**File**: `frontend/src/hooks/useMediaStreamASR.ts`, lines 118–128 + +The `ScriptProcessorNode.onauidoprocess` handler copies captured PCM data to the output buffer, and the processor is connected directly to `audioContext.destination` (system speakers): + +```typescript +// Lines 118-128 +processor.onaudioprocess = (e) => { + const float32Data = e.inputBuffer.getChannelData(0) + const outputData = e.outputBuffer.getChannelData(0) + outputData.set(float32Data) // ← copies captured audio to output + // ... + wsRef.current.send(float32Data.buffer) +} + +source.connect(processor) +processor.connect(audioContext.destination) // ← routes output to speakers +``` + +**Why video ASR is not affected**: `useVideoASR.ts` uses the same pattern, but it's **intentional** — the user needs to hear the video. Only Phase 4 live capture (system audio / mic) should mute output. + +**Backend**: `ws_asr.py` is clean — passthrough proxy to DashScope ASR, JSON only, no audio sent back. + +## Fix + +**Single file to modify**: `frontend/src/hooks/useMediaStreamASR.ts` + +**Approach**: Insert a `GainNode` with `gain = 0` between the processor and `audioContext.destination`. This keeps the processor in the audio graph (ensuring `onaudioprocess` fires in all browsers) while muting output. + +``` +Before: source → processor → audioContext.destination ❌ +After: source → processor → zeroGain(0.0) → destination ✅ +``` + +### Changes + +1. **Add `gainNodeRef`** alongside existing refs (~line 31) +2. **Create zero-gain `GainNode`** after processor creation (~line 115) +3. **Replace** `processor.connect(audioContext.destination)` with zero-gain path +4. **Remove** `outputData.set(float32Data)` — unnecessary since output buffer is unused +5. **Clean up gain node** in `cleanup()` and `useEffect` teardown + +### Diff (conceptual) + +```diff + const processor = audioContext.createScriptProcessor(4096, 1, 1) + processorRef.current = processor + ++ const zeroGain = audioContext.createGain() ++ zeroGain.gain.value = 0 ++ gainNodeRef.current = zeroGain + + processor.onaudioprocess = (e) => { + const float32Data = e.inputBuffer.getChannelData(0) +- const outputData = e.outputBuffer.getChannelData(0) +- outputData.set(float32Data) + if (!isStreamingRef.current) return + if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return + wsRef.current.send(float32Data.buffer) + } + + source.connect(processor) +- processor.connect(audioContext.destination) ++ processor.connect(zeroGain) ++ zeroGain.connect(audioContext.destination) +``` + +## Acceptance Criteria + +- [ ] System Audio capture: transcript streams normally, **no audio playback** +- [ ] Listen Mic: transcript streams normally, **no feedback loop** +- [ ] Video ASR (Upload tab): video audio **still plays** (regression check) +- [ ] Existing Phase 4 tests pass: `pnpm test -- test_phase4` +- [ ] Stop/restart capture works (gain node cleaned up properly) + +## Implementation Tasks + +1. Modify `useMediaStreamASR.ts`: add zero-gain node, remove output copy, update cleanup +2. Verify with manual test (System Audio + Listen Mic) +3. Run existing Phase 4 frontend tests +4. Commit with message: `fix: mute audio output during System Audio and Mic capture to prevent echo`