3.7 KiB

Raw Blame History

Bug Fix Plan: Phase 4 Audio Echo/Overlapping

Date: 2026-05-18 Status: Completed Commit: 80af17a — fix: mute audio output during System Audio and Mic capture to prevent echo Affected Feature: Phase 4 — System Audio Capture & Listen Mic

Symptom

When using "System Audio" or "Listen Mic" capture, the captured audio plays back through the speakers, creating:

System Audio: infinite echo loop (captured audio → speakers → recaptured → speakers → ...)
Listen Mic: howling feedback loop (mic → speakers → mic → ...)

Root Cause

File: frontend/src/hooks/useMediaStreamASR.ts, lines 118–128

The ScriptProcessorNode.onauidoprocess handler copies captured PCM data to the output buffer, and the processor is connected directly to audioContext.destination (system speakers):

// Lines 118-128
processor.onaudioprocess = (e) => {
  const float32Data = e.inputBuffer.getChannelData(0)
  const outputData = e.outputBuffer.getChannelData(0)
  outputData.set(float32Data)              // ← copies captured audio to output
  // ...
  wsRef.current.send(float32Data.buffer)
}

source.connect(processor)
processor.connect(audioContext.destination)  // ← routes output to speakers

Why video ASR is not affected: useVideoASR.ts uses the same pattern, but it's intentional — the user needs to hear the video. Only Phase 4 live capture (system audio / mic) should mute output.

Backend: ws_asr.py is clean — passthrough proxy to DashScope ASR, JSON only, no audio sent back.

Fix

Single file to modify: frontend/src/hooks/useMediaStreamASR.ts

Approach: Insert a GainNode with gain = 0 between the processor and audioContext.destination. This keeps the processor in the audio graph (ensuring onaudioprocess fires in all browsers) while muting output.

Before:  source → processor → audioContext.destination    ❌
After:   source → processor → zeroGain(0.0) → destination ✅

Changes

Add gainNodeRef alongside existing refs (~line 31)
Create zero-gain GainNode after processor creation (~line 115)
Replace processor.connect(audioContext.destination) with zero-gain path
Remove outputData.set(float32Data) — unnecessary since output buffer is unused
Clean up gain node in cleanup() and useEffect teardown

Diff (conceptual)

  const processor = audioContext.createScriptProcessor(4096, 1, 1)
  processorRef.current = processor

+ const zeroGain = audioContext.createGain()
+ zeroGain.gain.value = 0
+ gainNodeRef.current = zeroGain

  processor.onaudioprocess = (e) => {
    const float32Data = e.inputBuffer.getChannelData(0)
-   const outputData = e.outputBuffer.getChannelData(0)
-   outputData.set(float32Data)
    if (!isStreamingRef.current) return
    if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return
    wsRef.current.send(float32Data.buffer)
  }

  source.connect(processor)
- processor.connect(audioContext.destination)
+ processor.connect(zeroGain)
+ zeroGain.connect(audioContext.destination)

Acceptance Criteria

System Audio capture: transcript streams normally, no audio playback
Listen Mic: transcript streams normally, no feedback loop
Video ASR (Upload tab): video audio still plays (regression check — useVideoASR.ts untouched)
Existing Phase 4 tests pass: pnpm test -- test_phase4 → 76 passed, 0 failed
Stop/restart capture works (gain node cleaned up properly)

Implementation Tasks

Modify useMediaStreamASR.ts: add zero-gain node, remove output copy, update cleanup
Verify with manual test (System Audio + Listen Mic)
Run existing Phase 4 frontend tests → 76 passed
Commit: 80af17a

3.7 KiB Raw Blame History Unescape Escape