fix: text accumulation — stashes are sliding windows, merge via overlap detection
DashScope stashes are ~7-char rolling windows, not cumulative. Each partial
event replaces the previous. Completed events rarely sent. This caused text to
jump/replace during streaming and disappear on pause.
Backend:
- Add _merge_stash() — finds overlapping suffix between successive stashes
and appends only new characters, reconstructing full utterance from partials
- format_transcription_event returns raw stash for read_events to merge
- read_events maintains partial_buffer via _merge_stash, clears on completed
- Guard against empty/whitespace-only stashes
Frontend:
- transcriptRef + onFinalTranscriptRef avoid stale closures in pause handler
- stopStreaming fires onFinalTranscript(currentText) before clearing partial
- Removed blind setPartialTranscript('') that erased text on pause
Tests: 16/16 ws_protocol tests pass, frontend tests unchanged
Plan: Updated phase2_implementation_plan.md to Complete with 11-bug log
This commit is contained in:
parent
fcb9ec1f6c
commit
cb0ac07786
|
|
@ -1,8 +1,8 @@
|
||||||
# Phase 2: Video Upload + Video Audio ASR → RAG — Implementation Plan
|
# Phase 2: Video Upload + Video Audio ASR → RAG — Implementation Plan
|
||||||
|
|
||||||
**Created:** 2026-05-06
|
**Created:** 2026-05-06
|
||||||
**Updated:** 2026-05-06 (video audio capture via createMediaElementSource; Full Transcript batch mode)
|
**Updated:** 2026-05-06 (all sub-phases complete; 11 bugs resolved)
|
||||||
**Status:** Planning — Not Started
|
**Status:** Complete
|
||||||
**Depends on:** Phase 1 (Complete)
|
**Depends on:** Phase 1 (Complete)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
@ -311,7 +311,39 @@ frontend/src/pages/LTTPage.tsx, components/QueryInput.tsx, lib/api.ts, lib/queri
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 9. Reference Code (`.examples/`)
|
## 10. Bugs Resolved (11 Total)
|
||||||
|
|
||||||
|
| # | Bug | Root Cause | Fix |
|
||||||
|
|---|-----|-----------|-----|
|
||||||
|
| 1 | Vite proxy missing | `/api` and `/ws` routes not proxied to backend port 8000 | Added proxy config in `vite.config.ts` |
|
||||||
|
| 2 | `crossOrigin` missing on `<video>` | Cross-origin video blocked `createMediaElementSource` | Added `crossOrigin="anonymous"` to `VideoPlayer.tsx` |
|
||||||
|
| 3 | Audio silent (consumed but not passed through) | `onaudioprocess` didn't copy input to output | Added `outputData.set(float32Data)` |
|
||||||
|
| 4 | Audio graph recreated every play/pause | `createMediaElementSource` can only be called once per element | Setup audio graph once per video lifetime (`graphSetupRef`) |
|
||||||
|
| 5 | `AudioContext` suspended on play | Autoplay policy requires user gesture | Added `audioContext.resume()` on play |
|
||||||
|
| 6 | `dashscope` package not installed | Missing from `requirements.txt` | Installed `dashscope>=0.4.0` |
|
||||||
|
| 7 | `api_key` not passed to DashScope | `OmniRealtimeConversation` needs explicit `api_key` | Added `api_key=settings.dashscope_api_key` |
|
||||||
|
| 8 | `data_url` MIME type wrong | `data:audio/wav;base64` instead of `data:audio/wav;base64,` | Fixed data URL format in `transcribe_full()` |
|
||||||
|
| 9 | `extra_body` sent when `language="auto"` | DashScope rejects `extra_body` with auto-detect | Omitted `extra_body` entirely when `language="auto"` |
|
||||||
|
| 10 | Text accumulation broken — stashes are sliding windows | DashScope `stash` is a rolling ~7-char window, not cumulative. Each partial event replaces the previous. Completed events rarely sent. Old code showed partials as-is (text jumped). Simplification silently dropped partials (nothing appeared). | Added `_merge_stash()` — finds overlapping suffix between successive stashes and appends only the new chars. Partial events now contribute to a growing `partial_buffer`. `full_text = accumulated + partial_buffer`. |
|
||||||
|
| 11 | Text disappeared on pause | `stopStreaming` set `partialTranscript = ''` and `onFinalTranscript` never fired (no completed events). | Removed blind clear; added `transcriptRef` to avoid stale closures; `stopStreaming` now fires `onFinalTranscript(currentText)` then clears partial. |
|
||||||
|
|
||||||
|
### Text Accumulation Architecture (Bug #10 Detail)
|
||||||
|
|
||||||
|
```
|
||||||
|
Raw DashScope stashes (sliding window, ~7 chars each):
|
||||||
|
"系多謝主席" → "主席咁咧呢個" → "呢個古洞北" → "三百二十五億"
|
||||||
|
|
||||||
|
_merge_stash() reconstruction:
|
||||||
|
"" + "系多謝主席" → "系多謝主席"
|
||||||
|
"系多謝主席" + overlap("主席")="咁咧呢個" → "系多謝主席咁咧呢個"
|
||||||
|
"系多謝主席咁咧呢個" + overlap("呢個")="古洞北" → "系多謝主席咁咧呢個古洞北"
|
||||||
|
"系多謝主席咁咧呢個古洞北" + no overlap → append with space → "系多謝主席咁咧呢個古洞北 三百二十五億"
|
||||||
|
|
||||||
|
Sent to frontend: {"full_text": accumulated + partial_buffer, "is_final": false}
|
||||||
|
On pause: onFinalTranscript fires with current text, partialTranscript cleared
|
||||||
|
```
|
||||||
|
|
||||||
|
## 11. Reference Code (`.examples/`)
|
||||||
|
|
||||||
| File | Content |
|
| File | Content |
|
||||||
|---|---|
|
|---|---|
|
||||||
|
|
|
||||||
|
|
@ -48,15 +48,25 @@ class DashScopeCallback(OmniRealtimeCallback):
|
||||||
logger.info("dashscope-connection-closed code=%s msg=%s", code, msg)
|
logger.info("dashscope-connection-closed code=%s msg=%s", code, msg)
|
||||||
|
|
||||||
|
|
||||||
|
def _merge_stash(partial_buffer: str, new_stash: str) -> str:
|
||||||
|
if not new_stash.strip():
|
||||||
|
return partial_buffer
|
||||||
|
if not partial_buffer:
|
||||||
|
return new_stash
|
||||||
|
for i in range(min(len(partial_buffer), len(new_stash)), 0, -1):
|
||||||
|
if partial_buffer[-i:] == new_stash[:i]:
|
||||||
|
return partial_buffer + new_stash[i:]
|
||||||
|
return partial_buffer + " " + new_stash
|
||||||
|
|
||||||
|
|
||||||
def format_transcription_event(event: dict, accumulated: str) -> dict | None:
|
def format_transcription_event(event: dict, accumulated: str) -> dict | None:
|
||||||
event_type = event.get("type", "")
|
event_type = event.get("type", "")
|
||||||
|
|
||||||
if event_type == "conversation.item.input_audio_transcription.text":
|
if event_type == "conversation.item.input_audio_transcription.text":
|
||||||
stash = event.get("stash", "")
|
stash = event.get("stash", "")
|
||||||
display = build_display_text(accumulated, stash) if stash else accumulated
|
|
||||||
return {
|
return {
|
||||||
"delta": "",
|
"delta": "",
|
||||||
"full_text": _to_traditional(display),
|
"stash": stash,
|
||||||
"language": event.get("language", "yue"),
|
"language": event.get("language", "yue"),
|
||||||
"is_final": False,
|
"is_final": False,
|
||||||
}
|
}
|
||||||
|
|
@ -104,26 +114,29 @@ async def _ws_proxy_dashscope(client_ws: WebSocket, loop: asyncio.AbstractEventL
|
||||||
logger.info("dashscope-session-updated lang=%s", language)
|
logger.info("dashscope-session-updated lang=%s", language)
|
||||||
|
|
||||||
accumulated_text = ""
|
accumulated_text = ""
|
||||||
|
partial_buffer = ""
|
||||||
chunk_count = 0
|
chunk_count = 0
|
||||||
|
|
||||||
async def read_events():
|
async def read_events():
|
||||||
nonlocal accumulated_text
|
nonlocal accumulated_text, partial_buffer
|
||||||
while True:
|
while True:
|
||||||
event = await event_queue.get()
|
event = await event_queue.get()
|
||||||
result = format_transcription_event(event, accumulated_text)
|
result = format_transcription_event(event, accumulated_text)
|
||||||
if result is not None:
|
if result is None:
|
||||||
|
continue
|
||||||
if result["is_final"]:
|
if result["is_final"]:
|
||||||
event_type = event.get("type", "")
|
|
||||||
if event_type == "conversation.item.input_audio_transcription.completed":
|
|
||||||
transcript = event.get("transcript", "")
|
transcript = event.get("transcript", "")
|
||||||
if transcript and transcript.strip():
|
if transcript and transcript.strip():
|
||||||
accumulated_text = build_display_text(accumulated_text, transcript)
|
accumulated_text = build_display_text(accumulated_text, transcript)
|
||||||
logger.info(
|
partial_buffer = ""
|
||||||
"dashscope-utterance-completed text_len=%d lang=%s",
|
|
||||||
len(accumulated_text),
|
|
||||||
result.get("language", "yue"),
|
|
||||||
)
|
|
||||||
result["full_text"] = _to_traditional(accumulated_text)
|
result["full_text"] = _to_traditional(accumulated_text)
|
||||||
|
logger.info("dashscope-utterance-completed text_len=%d lang=%s", len(accumulated_text), result.get("language", "yue"))
|
||||||
|
else:
|
||||||
|
stash = result.pop("stash", "")
|
||||||
|
if stash.strip():
|
||||||
|
partial_buffer = _merge_stash(partial_buffer, stash)
|
||||||
|
display = build_display_text(accumulated_text, partial_buffer)
|
||||||
|
result["full_text"] = _to_traditional(display)
|
||||||
await client_ws.send_json(result)
|
await client_ws.send_json(result)
|
||||||
|
|
||||||
read_task = asyncio.create_task(read_events())
|
read_task = asyncio.create_task(read_events())
|
||||||
|
|
@ -182,5 +195,9 @@ async def ws_asr_endpoint(websocket: WebSocket, video_id: str, language: str = "
|
||||||
await _ws_proxy_dashscope(websocket, loop, language)
|
await _ws_proxy_dashscope(websocket, loop, language)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logger.error("ws-asr-error video_id=%s error=%s", video_id, e)
|
logger.error("ws-asr-error video_id=%s error=%s", video_id, e)
|
||||||
|
try:
|
||||||
|
await websocket.send_json({"error": "ASR service unavailable", "detail": str(e)})
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
finally:
|
finally:
|
||||||
logger.info("ws-disconnect video_id=%s", video_id)
|
logger.info("ws-disconnect video_id=%s", video_id)
|
||||||
|
|
|
||||||
|
|
@ -75,9 +75,47 @@ class TestDashScopeCallback:
|
||||||
loop.close()
|
loop.close()
|
||||||
|
|
||||||
|
|
||||||
|
class TestMergeStash:
|
||||||
|
def test_merge_empty_buffer_returns_stash(self):
|
||||||
|
from app.routers.ws_asr import _merge_stash
|
||||||
|
|
||||||
|
assert _merge_stash("", "你好") == "你好"
|
||||||
|
|
||||||
|
def test_merge_overlapping_suffix(self):
|
||||||
|
from app.routers.ws_asr import _merge_stash
|
||||||
|
|
||||||
|
assert _merge_stash("系多謝主席", "主席咁咧呢個") == "系多謝主席咁咧呢個"
|
||||||
|
|
||||||
|
def test_merge_overlapping_single_char(self):
|
||||||
|
from app.routers.ws_asr import _merge_stash
|
||||||
|
|
||||||
|
assert _merge_stash("abcde", "efgh") == "abcdefgh"
|
||||||
|
|
||||||
|
def test_merge_no_overlap_appends_with_space(self):
|
||||||
|
from app.routers.ws_asr import _merge_stash
|
||||||
|
|
||||||
|
assert _merge_stash("你好", "世界") == "你好 世界"
|
||||||
|
|
||||||
|
def test_merge_stash_subset_of_buffer(self):
|
||||||
|
from app.routers.ws_asr import _merge_stash
|
||||||
|
|
||||||
|
assert _merge_stash("系多謝主席咁咧", "咧呢") == "系多謝主席咁咧呢"
|
||||||
|
|
||||||
|
def test_merge_empty_stash_preserves_buffer(self):
|
||||||
|
from app.routers.ws_asr import _merge_stash
|
||||||
|
|
||||||
|
assert _merge_stash("你好", "") == "你好"
|
||||||
|
assert _merge_stash("", "") == ""
|
||||||
|
|
||||||
|
def test_merge_whitespace_only_stash_preserves_buffer(self):
|
||||||
|
from app.routers.ws_asr import _merge_stash
|
||||||
|
|
||||||
|
assert _merge_stash("你好", " ") == "你好"
|
||||||
|
|
||||||
|
|
||||||
class TestProxyFormatsTranscriptionTextEvent:
|
class TestProxyFormatsTranscriptionTextEvent:
|
||||||
def test_partial_event_format(self):
|
def test_partial_event_returns_stash_field(self):
|
||||||
"""Partial transcription event should format as ASRTranscriptEvent with is_final=False."""
|
"""Partial event returns raw stash for caller to merge."""
|
||||||
from app.routers.ws_asr import format_transcription_event
|
from app.routers.ws_asr import format_transcription_event
|
||||||
|
|
||||||
event = {
|
event = {
|
||||||
|
|
@ -85,17 +123,16 @@ class TestProxyFormatsTranscriptionTextEvent:
|
||||||
"stash": "你好",
|
"stash": "你好",
|
||||||
"language": "yue",
|
"language": "yue",
|
||||||
}
|
}
|
||||||
accumulated = ""
|
|
||||||
|
|
||||||
result = format_transcription_event(event, accumulated)
|
result = format_transcription_event(event, "")
|
||||||
assert result is not None
|
assert result is not None
|
||||||
assert result["is_final"] is False
|
assert result["is_final"] is False
|
||||||
assert result["language"] == "yue"
|
assert result["language"] == "yue"
|
||||||
assert result["delta"] == ""
|
assert result["delta"] == ""
|
||||||
assert "你好" in result["full_text"]
|
assert result["stash"] == "你好"
|
||||||
|
|
||||||
def test_partial_with_accumulated(self):
|
def test_partial_event_ignores_accumulated(self):
|
||||||
"""Partial event should combine accumulated + current stash."""
|
"""Partial event returns stash unchanged regardless of accumulated."""
|
||||||
from app.routers.ws_asr import format_transcription_event
|
from app.routers.ws_asr import format_transcription_event
|
||||||
|
|
||||||
event = {
|
event = {
|
||||||
|
|
@ -103,11 +140,9 @@ class TestProxyFormatsTranscriptionTextEvent:
|
||||||
"stash": "世界",
|
"stash": "世界",
|
||||||
"language": "yue",
|
"language": "yue",
|
||||||
}
|
}
|
||||||
accumulated = "你好"
|
|
||||||
|
|
||||||
result = format_transcription_event(event, accumulated)
|
result = format_transcription_event(event, "你好")
|
||||||
assert "你好" in result["full_text"]
|
assert result["stash"] == "世界"
|
||||||
assert "世界" in result["full_text"]
|
|
||||||
|
|
||||||
|
|
||||||
class TestProxyFormatsTranscriptionCompletedEvent:
|
class TestProxyFormatsTranscriptionCompletedEvent:
|
||||||
|
|
@ -120,16 +155,15 @@ class TestProxyFormatsTranscriptionCompletedEvent:
|
||||||
"transcript": "你好世界",
|
"transcript": "你好世界",
|
||||||
"language": "yue",
|
"language": "yue",
|
||||||
}
|
}
|
||||||
accumulated = ""
|
|
||||||
|
|
||||||
result = format_transcription_event(event, accumulated)
|
result = format_transcription_event(event, "")
|
||||||
assert result is not None
|
assert result is not None
|
||||||
assert result["is_final"] is True
|
assert result["is_final"] is True
|
||||||
assert result["language"] == "yue"
|
assert result["language"] == "yue"
|
||||||
assert "你好" in result["full_text"]
|
assert "你好" in result["full_text"]
|
||||||
|
|
||||||
def test_completed_updates_accumulated(self):
|
def test_completed_updates_accumulated(self):
|
||||||
"""Completed event should return updated accumulated text."""
|
"""Completed event appends transcript to accumulated text."""
|
||||||
from app.routers.ws_asr import format_transcription_event
|
from app.routers.ws_asr import format_transcription_event
|
||||||
|
|
||||||
event = {
|
event = {
|
||||||
|
|
@ -137,9 +171,8 @@ class TestProxyFormatsTranscriptionCompletedEvent:
|
||||||
"transcript": "世界",
|
"transcript": "世界",
|
||||||
"language": "yue",
|
"language": "yue",
|
||||||
}
|
}
|
||||||
accumulated = "你好"
|
|
||||||
|
|
||||||
result = format_transcription_event(event, accumulated)
|
result = format_transcription_event(event, "你好")
|
||||||
assert "你好" in result["full_text"]
|
assert "你好" in result["full_text"]
|
||||||
assert "世界" in result["full_text"]
|
assert "世界" in result["full_text"]
|
||||||
|
|
||||||
|
|
@ -147,6 +180,5 @@ class TestProxyFormatsTranscriptionCompletedEvent:
|
||||||
"""Unknown event types should return None."""
|
"""Unknown event types should return None."""
|
||||||
from app.routers.ws_asr import format_transcription_event
|
from app.routers.ws_asr import format_transcription_event
|
||||||
|
|
||||||
event = {"type": "unknown.event"}
|
result = format_transcription_event({"type": "unknown.event"}, "")
|
||||||
result = format_transcription_event(event, "")
|
|
||||||
assert result is None
|
assert result is None
|
||||||
|
|
|
||||||
|
|
@ -53,6 +53,7 @@ export const VideoPlayer = forwardRef<HTMLVideoElement, VideoPlayerProps>(({ src
|
||||||
data-testid="video-player"
|
data-testid="video-player"
|
||||||
src={src}
|
src={src}
|
||||||
controls
|
controls
|
||||||
|
crossOrigin="anonymous"
|
||||||
className="w-full max-h-60 rounded-lg bg-black"
|
className="w-full max-h-60 rounded-lg bg-black"
|
||||||
onLoadStart={handleLoadStart}
|
onLoadStart={handleLoadStart}
|
||||||
onCanPlay={handleCanPlay}
|
onCanPlay={handleCanPlay}
|
||||||
|
|
|
||||||
|
|
@ -24,6 +24,10 @@ export function useVideoASR({
|
||||||
const processorRef = useRef<ScriptProcessorNode | null>(null)
|
const processorRef = useRef<ScriptProcessorNode | null>(null)
|
||||||
const sourceRef = useRef<MediaElementAudioSourceNode | null>(null)
|
const sourceRef = useRef<MediaElementAudioSourceNode | null>(null)
|
||||||
const isStreamingRef = useRef(false)
|
const isStreamingRef = useRef(false)
|
||||||
|
const graphSetupRef = useRef(false)
|
||||||
|
const transcriptRef = useRef('')
|
||||||
|
const onFinalTranscriptRef = useRef(onFinalTranscript)
|
||||||
|
onFinalTranscriptRef.current = onFinalTranscript
|
||||||
|
|
||||||
const getWSURL = useCallback(() => {
|
const getWSURL = useCallback(() => {
|
||||||
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'
|
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'
|
||||||
|
|
@ -33,20 +37,7 @@ export function useVideoASR({
|
||||||
return `${protocol}//${backendHost}/ws/asr/${videoId}${langParam}`
|
return `${protocol}//${backendHost}/ws/asr/${videoId}${langParam}`
|
||||||
}, [videoId, language])
|
}, [videoId, language])
|
||||||
|
|
||||||
const startStreaming = useCallback(() => {
|
const connectWebSocket = useCallback(() => {
|
||||||
if (!videoElement) return
|
|
||||||
try {
|
|
||||||
setStatus('connecting')
|
|
||||||
|
|
||||||
const audioContext = new AudioContext({ sampleRate: 16000 })
|
|
||||||
audioContextRef.current = audioContext
|
|
||||||
|
|
||||||
const source = audioContext.createMediaElementSource(videoElement)
|
|
||||||
sourceRef.current = source
|
|
||||||
|
|
||||||
const processor = audioContext.createScriptProcessor(4096, 1, 1)
|
|
||||||
processorRef.current = processor
|
|
||||||
|
|
||||||
const ws = new WebSocket(getWSURL())
|
const ws = new WebSocket(getWSURL())
|
||||||
wsRef.current = ws
|
wsRef.current = ws
|
||||||
|
|
||||||
|
|
@ -58,48 +49,83 @@ export function useVideoASR({
|
||||||
|
|
||||||
ws.onmessage = (e) => {
|
ws.onmessage = (e) => {
|
||||||
const msg: ASRMessage = JSON.parse(e.data)
|
const msg: ASRMessage = JSON.parse(e.data)
|
||||||
|
transcriptRef.current = msg.full_text
|
||||||
setTranscript(msg.full_text)
|
setTranscript(msg.full_text)
|
||||||
setPartialTranscript(msg.is_final ? '' : msg.full_text)
|
setPartialTranscript(msg.is_final ? '' : msg.full_text)
|
||||||
if (msg.is_final && msg.full_text.trim()) {
|
if (msg.is_final && msg.full_text.trim()) {
|
||||||
onFinalTranscript?.(msg.full_text)
|
onFinalTranscriptRef.current?.(msg.full_text)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
ws.onerror = () => setStatus('error')
|
ws.onerror = (e) => {
|
||||||
|
console.error('[useVideoASR] WebSocket error:', e)
|
||||||
|
setStatus('error')
|
||||||
|
}
|
||||||
ws.onclose = () => {
|
ws.onclose = () => {
|
||||||
isStreamingRef.current = false
|
isStreamingRef.current = false
|
||||||
setIsStreaming(false)
|
setIsStreaming(false)
|
||||||
setStatus('disconnected')
|
setStatus('disconnected')
|
||||||
}
|
}
|
||||||
|
}, [getWSURL])
|
||||||
|
|
||||||
|
const closeWebSocket = useCallback(() => {
|
||||||
|
wsRef.current?.close()
|
||||||
|
wsRef.current = null
|
||||||
|
}, [])
|
||||||
|
|
||||||
|
const startStreaming = useCallback(() => {
|
||||||
|
if (!videoElement) return
|
||||||
|
try {
|
||||||
|
setStatus('connecting')
|
||||||
|
audioContextRef.current?.resume()
|
||||||
|
closeWebSocket()
|
||||||
|
connectWebSocket()
|
||||||
|
} catch (err) {
|
||||||
|
console.error('[useVideoASR] startStreaming failed:', err)
|
||||||
|
setStatus('error')
|
||||||
|
}
|
||||||
|
}, [videoElement, closeWebSocket, connectWebSocket])
|
||||||
|
|
||||||
|
const stopStreaming = useCallback(() => {
|
||||||
|
isStreamingRef.current = false
|
||||||
|
setIsStreaming(false)
|
||||||
|
closeWebSocket()
|
||||||
|
setStatus('idle')
|
||||||
|
const currentText = transcriptRef.current.trim()
|
||||||
|
if (currentText) {
|
||||||
|
onFinalTranscriptRef.current?.(currentText)
|
||||||
|
setPartialTranscript('')
|
||||||
|
}
|
||||||
|
}, [closeWebSocket])
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
if (!videoElement || graphSetupRef.current) return
|
||||||
|
try {
|
||||||
|
const audioContext = new AudioContext({ sampleRate: 16000 })
|
||||||
|
audioContextRef.current = audioContext
|
||||||
|
|
||||||
|
const source = audioContext.createMediaElementSource(videoElement)
|
||||||
|
sourceRef.current = source
|
||||||
|
|
||||||
|
const processor = audioContext.createScriptProcessor(4096, 1, 1)
|
||||||
|
processorRef.current = processor
|
||||||
|
|
||||||
processor.onaudioprocess = (e) => {
|
processor.onaudioprocess = (e) => {
|
||||||
|
const float32Data = e.inputBuffer.getChannelData(0)
|
||||||
|
const outputData = e.outputBuffer.getChannelData(0)
|
||||||
|
outputData.set(float32Data)
|
||||||
if (!isStreamingRef.current) return
|
if (!isStreamingRef.current) return
|
||||||
if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return
|
if (!wsRef.current || wsRef.current.readyState !== WebSocket.OPEN) return
|
||||||
const float32Data = e.inputBuffer.getChannelData(0)
|
|
||||||
wsRef.current.send(float32Data.buffer)
|
wsRef.current.send(float32Data.buffer)
|
||||||
}
|
}
|
||||||
|
|
||||||
source.connect(processor)
|
source.connect(processor)
|
||||||
processor.connect(audioContext.destination)
|
processor.connect(audioContext.destination)
|
||||||
} catch {
|
graphSetupRef.current = true
|
||||||
setStatus('error')
|
} catch (err) {
|
||||||
|
console.error('[useVideoASR] audio graph setup failed:', err)
|
||||||
}
|
}
|
||||||
}, [videoElement, getWSURL, onFinalTranscript])
|
}, [videoElement])
|
||||||
|
|
||||||
const stopStreaming = useCallback(() => {
|
|
||||||
isStreamingRef.current = false
|
|
||||||
setIsStreaming(false)
|
|
||||||
processorRef.current?.disconnect()
|
|
||||||
processorRef.current = null
|
|
||||||
sourceRef.current?.disconnect()
|
|
||||||
sourceRef.current = null
|
|
||||||
wsRef.current?.close()
|
|
||||||
wsRef.current = null
|
|
||||||
audioContextRef.current?.close()
|
|
||||||
audioContextRef.current = null
|
|
||||||
setStatus('idle')
|
|
||||||
setPartialTranscript('')
|
|
||||||
}, [])
|
|
||||||
|
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
return () => {
|
return () => {
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue