legco_ai_assistant/.plans/debug_2026-05-18_phase4_aud...

3.6 KiB
Raw Blame History

Bug Fix Plan: Phase 4 Audio Echo/Overlapping

Date: 2026-05-18 Status: Planned Affected Feature: Phase 4 — System Audio Capture & Listen Mic


Symptom

When using "System Audio" or "Listen Mic" capture, the captured audio plays back through the speakers, creating:

  • System Audio: infinite echo loop (captured audio → speakers → recaptured → speakers → ...)
  • Listen Mic: howling feedback loop (mic → speakers → mic → ...)

Root Cause

File: frontend/src/hooks/useMediaStreamASR.ts, lines 118128

The ScriptProcessorNode.onauidoprocess handler copies captured PCM data to the output buffer, and the processor is connected directly to audioContext.destination (system speakers):

// Lines 118-128
processor.onaudioprocess = (e) => {
  const float32Data = e.inputBuffer.getChannelData(0)
  const outputData = e.outputBuffer.getChannelData(0)
  outputData.set(float32Data)              // ← copies captured audio to output
  // ...
  wsRef.current.send(float32Data.buffer)
}

source.connect(processor)
processor.connect(audioContext.destination)  // ← routes output to speakers

Why video ASR is not affected: useVideoASR.ts uses the same pattern, but it's intentional — the user needs to hear the video. Only Phase 4 live capture (system audio / mic) should mute output.

Backend: ws_asr.py is clean — passthrough proxy to DashScope ASR, JSON only, no audio sent back.

Fix

Single file to modify: frontend/src/hooks/useMediaStreamASR.ts

Approach: Insert a GainNode with gain = 0 between the processor and audioContext.destination. This keeps the processor in the audio graph (ensuring onaudioprocess fires in all browsers) while muting output.

Before:  source → processor → audioContext.destination    ❌
After:   source → processor → zeroGain(0.0) → destination ✅

Changes

  1. Add gainNodeRef alongside existing refs (~line 31)
  2. Create zero-gain GainNode after processor creation (~line 115)
  3. Replace processor.connect(audioContext.destination) with zero-gain path
  4. Remove outputData.set(float32Data) — unnecessary since output buffer is unused
  5. Clean up gain node in cleanup() and useEffect teardown

Diff (conceptual)

  const processor = audioContext.createScriptProcessor(4096, 1, 1)
  processorRef.current = processor

+ const zeroGain = audioContext.createGain()
+ zeroGain.gain.value = 0
+ gainNodeRef.current = zeroGain

  processor.onaudioprocess = (e) => {
    const float32Data = e.inputBuffer.getChannelData(0)
-   const outputData = e.outputBuffer.getChannelData(0)
-   outputData.set(float32Data)
    if (!isStreamingRef.current) return
    if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return
    wsRef.current.send(float32Data.buffer)
  }

  source.connect(processor)
- processor.connect(audioContext.destination)
+ processor.connect(zeroGain)
+ zeroGain.connect(audioContext.destination)

Acceptance Criteria

  • System Audio capture: transcript streams normally, no audio playback
  • Listen Mic: transcript streams normally, no feedback loop
  • Video ASR (Upload tab): video audio still plays (regression check)
  • Existing Phase 4 tests pass: pnpm test -- test_phase4
  • Stop/restart capture works (gain node cleaned up properly)

Implementation Tasks

  1. Modify useMediaStreamASR.ts: add zero-gain node, remove output copy, update cleanup
  2. Verify with manual test (System Audio + Listen Mic)
  3. Run existing Phase 4 frontend tests
  4. Commit with message: fix: mute audio output during System Audio and Mic capture to prevent echo