From 2d3dc7374d5a609c74e68ebf59c567b1c4033536 Mon Sep 17 00:00:00 2001
From: Woody <woody.ck.tse@gmail.com>
Date: Mon, 18 May 2026 14:47:46 +0800
Subject: [PATCH] docs: Phase 4 audio echo bug fix plan

---
 .plans/debug_2026-05-18_phase4_audio_echo.md | 97 ++++++++++++++++++++
 1 file changed, 97 insertions(+)
 create mode 100644 .plans/debug_2026-05-18_phase4_audio_echo.md

diff --git a/.plans/debug_2026-05-18_phase4_audio_echo.md b/.plans/debug_2026-05-18_phase4_audio_echo.md
new file mode 100644
index 0000000..6a1eb1f
--- /dev/null
+++ b/.plans/debug_2026-05-18_phase4_audio_echo.md
@@ -0,0 +1,97 @@
+# Bug Fix Plan: Phase 4 Audio Echo/Overlapping
+
+**Date**: 2026-05-18
+**Status**: Planned
+**Affected Feature**: Phase 4 — System Audio Capture & Listen Mic
+
+---
+
+## Symptom
+
+When using "System Audio" or "Listen Mic" capture, the captured audio plays back through the speakers, creating:
+
+- **System Audio**: infinite echo loop (captured audio → speakers → recaptured → speakers → ...)
+- **Listen Mic**: howling feedback loop (mic → speakers → mic → ...)
+
+## Root Cause
+
+**File**: `frontend/src/hooks/useMediaStreamASR.ts`, lines 118–128
+
+The `ScriptProcessorNode.onauidoprocess` handler copies captured PCM data to the output buffer, and the processor is connected directly to `audioContext.destination` (system speakers):
+
+```typescript
+// Lines 118-128
+processor.onaudioprocess = (e) => {
+  const float32Data = e.inputBuffer.getChannelData(0)
+  const outputData = e.outputBuffer.getChannelData(0)
+  outputData.set(float32Data)              // ← copies captured audio to output
+  // ...
+  wsRef.current.send(float32Data.buffer)
+}
+
+source.connect(processor)
+processor.connect(audioContext.destination)  // ← routes output to speakers
+```
+
+**Why video ASR is not affected**: `useVideoASR.ts` uses the same pattern, but it's **intentional** — the user needs to hear the video. Only Phase 4 live capture (system audio / mic) should mute output.
+
+**Backend**: `ws_asr.py` is clean — passthrough proxy to DashScope ASR, JSON only, no audio sent back.
+
+## Fix
+
+**Single file to modify**: `frontend/src/hooks/useMediaStreamASR.ts`
+
+**Approach**: Insert a `GainNode` with `gain = 0` between the processor and `audioContext.destination`. This keeps the processor in the audio graph (ensuring `onaudioprocess` fires in all browsers) while muting output.
+
+```
+Before:  source → processor → audioContext.destination    ❌
+After:   source → processor → zeroGain(0.0) → destination ✅
+```
+
+### Changes
+
+1. **Add `gainNodeRef`** alongside existing refs (~line 31)
+2. **Create zero-gain `GainNode`** after processor creation (~line 115)
+3. **Replace** `processor.connect(audioContext.destination)` with zero-gain path
+4. **Remove** `outputData.set(float32Data)` — unnecessary since output buffer is unused
+5. **Clean up gain node** in `cleanup()` and `useEffect` teardown
+
+### Diff (conceptual)
+
+```diff
+  const processor = audioContext.createScriptProcessor(4096, 1, 1)
+  processorRef.current = processor
+
++ const zeroGain = audioContext.createGain()
++ zeroGain.gain.value = 0
++ gainNodeRef.current = zeroGain
+
+  processor.onaudioprocess = (e) => {
+    const float32Data = e.inputBuffer.getChannelData(0)
+-   const outputData = e.outputBuffer.getChannelData(0)
+-   outputData.set(float32Data)
+    if (!isStreamingRef.current) return
+    if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return
+    wsRef.current.send(float32Data.buffer)
+  }
+
+  source.connect(processor)
+- processor.connect(audioContext.destination)
++ processor.connect(zeroGain)
++ zeroGain.connect(audioContext.destination)
+```
+
+## Acceptance Criteria
+
+- [ ] System Audio capture: transcript streams normally, **no audio playback**
+- [ ] Listen Mic: transcript streams normally, **no feedback loop**
+- [ ] Video ASR (Upload tab): video audio **still plays** (regression check)
+- [ ] Existing Phase 4 tests pass: `pnpm test -- test_phase4`
+- [ ] Stop/restart capture works (gain node cleaned up properly)
+
+## Implementation Tasks
+
+1. Modify `useMediaStreamASR.ts`: add zero-gain node, remove output copy, update cleanup
+2. Verify with manual test (System Audio + Listen Mic)
+3. Run existing Phase 4 frontend tests
+4. Commit with message: `fix: mute audio output during System Audio and Mic capture to prevent echo`