# Phase 2: Video Upload + Video Audio ASR → RAG — Implementation Plan **Created:** 2026-05-06 **Updated:** 2026-05-06 (video audio capture via createMediaElementSource; Full Transcript batch mode) **Status:** Planning — Not Started **Depends on:** Phase 1 (Complete) --- ## 1. Overview Phase 2 adds video upload/playback and ASR transcription of the **video's audio track** (not microphone). When the video plays, browser captures the video audio output and streams it to Alibaba Cloud DashScope for real-time transcription. A "Full Transcript" button sends the complete video audio for batch (non-streaming) transcription via backend ffmpeg extraction. ### Two ASR Modes **Mode A — Streaming (real-time, auto on play):** ```