docs: Phase 4 audio echo bug fix plan
This commit is contained in:
parent
d5e7e2d0ca
commit
2d3dc7374d
|
|
@ -0,0 +1,97 @@
|
||||||
|
# Bug Fix Plan: Phase 4 Audio Echo/Overlapping
|
||||||
|
|
||||||
|
**Date**: 2026-05-18
|
||||||
|
**Status**: Planned
|
||||||
|
**Affected Feature**: Phase 4 — System Audio Capture & Listen Mic
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
When using "System Audio" or "Listen Mic" capture, the captured audio plays back through the speakers, creating:
|
||||||
|
|
||||||
|
- **System Audio**: infinite echo loop (captured audio → speakers → recaptured → speakers → ...)
|
||||||
|
- **Listen Mic**: howling feedback loop (mic → speakers → mic → ...)
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
**File**: `frontend/src/hooks/useMediaStreamASR.ts`, lines 118–128
|
||||||
|
|
||||||
|
The `ScriptProcessorNode.onauidoprocess` handler copies captured PCM data to the output buffer, and the processor is connected directly to `audioContext.destination` (system speakers):
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Lines 118-128
|
||||||
|
processor.onaudioprocess = (e) => {
|
||||||
|
const float32Data = e.inputBuffer.getChannelData(0)
|
||||||
|
const outputData = e.outputBuffer.getChannelData(0)
|
||||||
|
outputData.set(float32Data) // ← copies captured audio to output
|
||||||
|
// ...
|
||||||
|
wsRef.current.send(float32Data.buffer)
|
||||||
|
}
|
||||||
|
|
||||||
|
source.connect(processor)
|
||||||
|
processor.connect(audioContext.destination) // ← routes output to speakers
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why video ASR is not affected**: `useVideoASR.ts` uses the same pattern, but it's **intentional** — the user needs to hear the video. Only Phase 4 live capture (system audio / mic) should mute output.
|
||||||
|
|
||||||
|
**Backend**: `ws_asr.py` is clean — passthrough proxy to DashScope ASR, JSON only, no audio sent back.
|
||||||
|
|
||||||
|
## Fix
|
||||||
|
|
||||||
|
**Single file to modify**: `frontend/src/hooks/useMediaStreamASR.ts`
|
||||||
|
|
||||||
|
**Approach**: Insert a `GainNode` with `gain = 0` between the processor and `audioContext.destination`. This keeps the processor in the audio graph (ensuring `onaudioprocess` fires in all browsers) while muting output.
|
||||||
|
|
||||||
|
```
|
||||||
|
Before: source → processor → audioContext.destination ❌
|
||||||
|
After: source → processor → zeroGain(0.0) → destination ✅
|
||||||
|
```
|
||||||
|
|
||||||
|
### Changes
|
||||||
|
|
||||||
|
1. **Add `gainNodeRef`** alongside existing refs (~line 31)
|
||||||
|
2. **Create zero-gain `GainNode`** after processor creation (~line 115)
|
||||||
|
3. **Replace** `processor.connect(audioContext.destination)` with zero-gain path
|
||||||
|
4. **Remove** `outputData.set(float32Data)` — unnecessary since output buffer is unused
|
||||||
|
5. **Clean up gain node** in `cleanup()` and `useEffect` teardown
|
||||||
|
|
||||||
|
### Diff (conceptual)
|
||||||
|
|
||||||
|
```diff
|
||||||
|
const processor = audioContext.createScriptProcessor(4096, 1, 1)
|
||||||
|
processorRef.current = processor
|
||||||
|
|
||||||
|
+ const zeroGain = audioContext.createGain()
|
||||||
|
+ zeroGain.gain.value = 0
|
||||||
|
+ gainNodeRef.current = zeroGain
|
||||||
|
|
||||||
|
processor.onaudioprocess = (e) => {
|
||||||
|
const float32Data = e.inputBuffer.getChannelData(0)
|
||||||
|
- const outputData = e.outputBuffer.getChannelData(0)
|
||||||
|
- outputData.set(float32Data)
|
||||||
|
if (!isStreamingRef.current) return
|
||||||
|
if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return
|
||||||
|
wsRef.current.send(float32Data.buffer)
|
||||||
|
}
|
||||||
|
|
||||||
|
source.connect(processor)
|
||||||
|
- processor.connect(audioContext.destination)
|
||||||
|
+ processor.connect(zeroGain)
|
||||||
|
+ zeroGain.connect(audioContext.destination)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Acceptance Criteria
|
||||||
|
|
||||||
|
- [ ] System Audio capture: transcript streams normally, **no audio playback**
|
||||||
|
- [ ] Listen Mic: transcript streams normally, **no feedback loop**
|
||||||
|
- [ ] Video ASR (Upload tab): video audio **still plays** (regression check)
|
||||||
|
- [ ] Existing Phase 4 tests pass: `pnpm test -- test_phase4`
|
||||||
|
- [ ] Stop/restart capture works (gain node cleaned up properly)
|
||||||
|
|
||||||
|
## Implementation Tasks
|
||||||
|
|
||||||
|
1. Modify `useMediaStreamASR.ts`: add zero-gain node, remove output copy, update cleanup
|
||||||
|
2. Verify with manual test (System Audio + Listen Mic)
|
||||||
|
3. Run existing Phase 4 frontend tests
|
||||||
|
4. Commit with message: `fix: mute audio output during System Audio and Mic capture to prevent echo`
|
||||||
Loading…
Reference in New Issue