# Phase 4: System Audio Capture → ASR → RAG — Implementation Plan **Created:** 2026-05-09 **Updated:** 2026-05-09 **Status:** 📋 Draft (Not Started) **Depends on:** Phase 1 (Complete), Phase 2 (Complete), Phase 3 (Complete) --- ## 1. Overview Phase 4 adds **system audio capture** as a third audio source in the LTTPage, alongside file Upload and YouTube. Instead of playing a video in the browser, the user captures audio output from any application on their computer (browser tab, Spotify, Zoom, system sounds) and pipes it through the existing ASR → RAG pipeline. **Use cases:** - Watching a YouTube video in a regular browser tab (no proxy needed — just share that tab's audio) - Listening to a podcast, lecture, or meeting and getting real-time transcript + RAG - Transcribing any audio playing on the computer without needing to download files ### How It Works ``` User clicks "System Audio" → clicks "Start Capture" → Browser shows permission dialog (screen/tab picker) → User selects tab/window/screen (with audio) → getDisplayMedia() returns MediaStream (with audio track) → AudioContext.createMediaStreamSource(stream) → ScriptProcessorNode (Float32 PCM, mono 16kHz) → WebSocket → FastAPI → DashScope realtime ASR → transcript → QueryInput → RAG Pipeline ``` ### Audio Routing (vs Existing Sources) | Source | Audio Input | SourceNode Type | Start/Stop Trigger | |--------|-------------|-----------------|-------------------| | Upload | `