5.8 KiB

Raw Blame History

RAG Video Q&A Web Application - Development Plan

Project Overview
Web-based application built in two phases.

Phase 1: Text question → RAG retrieval → Point-form answer (strictly from database)
Phase 2: Video upload + player → real-time audio streaming → ASR transcription → question extraction → Phase 1 RAG flow

Tech Stack

Backend: Python + FastAPI (REST + WebSocket)
Frontend: TypeScript + React 18 (Vite) + shadcn/ui + Tailwind CSS
Server: Linux Ubuntu 22.04
RAG Database: ChromaDB (persistent)
LLM/ASR Integration: Dynamic via .env (supports local vLLM, OpenRouter, Alibaba Cloud)
- Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727
Models:
- Embedding: qwen/qwen3-embedding-4b
- LLM: qwen/qwen3.5-35b-a3b
- ASR: Qwen/Qwen3-ASR-1.7B

Deployment

Development: Simple commands (uvicorn + npm run dev)
Production: Docker + Nginx

Project Structure (Monorepo)

app/ ├── backend/ # FastAPI │ ├── app/ │ │ ├── main.py │ │ ├── routers/ # query.py, ingest.py, video.py, ws_asr.py │ │ ├── services/ # rag.py, llm_client.py, asr_client.py, video_service.py │ │ ├── models/ # Pydantic schemas │ │ ├── core/ # config.py, database.py │ │ └── utils/ # chunking, metadata extraction │ ├── uploads/ # video storage (max 300MB) │ ├── requirements.txt │ └── .env.example ├── frontend/ # React + TypeScript (Vite) │ ├── src/ │ │ ├── components/ │ │ ├── pages/ │ │ ├── lib/ # api.ts │ │ └── App.tsx │ ├── package.json │ └── vite.config.ts ├── chroma_db/ # Persistent vector store ├── Dockerfile ├── docker-compose.yml ├── nginx.conf └── deploy.sh

Key Requirements Incorporated

LLM/ASR Configuration: Backend reads from .env for easy switching between development (OpenRouter / Alibaba Cloud) and production (local vLLM).
RAG Database: ChromaDB with metadata support (filename + extracted content metadata).
Embedding Model: qwen/qwen3-embedding-4b via sentence-transformers.
Document Ingestion: Via UI (project-based demo, no user authentication).
Video: MP4 and common formats, maximum 300MB.
ASR Flow: Both automatic (on transcript updates) and manual “Ask from Video” button.
UI Layout:
- Top-Left: Video player
- Top-Right: Real-time transcript + text input box
- Bottom Half: RAG response (bullet points with source metadata)
Authentication: Public demo (no login required).
Mobile: Not required at this stage.

Phase 1: Text Question → RAG → Point-Form Answer (5-7 days)

Backend (FastAPI)

Dynamic configuration via .env (LLM base URL, API key, model names).
services/rag.py: Persistent ChromaDB + Qwen embedding + metadata extraction (filename, upload date, content summary).
services/llm_client.py: OpenAI-compatible client for Qwen LLM with strict RAG prompt (only use retrieved context).
Endpoints:
- POST /api/v1/ingest – Document upload and ingestion with metadata.
- POST /api/v1/query – Question → retrieve → LLM → bullet-point response.

Frontend (React + TS)

Clean layout: Top-right input box, bottom response area.
Type-safe API calls using TanStack Query.
Display answer as clean bullet list with source metadata.

Phase 2: Video Upload + Real-Time ASR → RAG (8-10 days)

Backend Additions

Video upload (POST /api/v1/upload-video) with size/format validation (<300MB).
Static file serving for videos.
WebSocket /ws/asr/{video_id} for real-time audio chunk streaming.
ASR integration with Qwen/Qwen3-ASR-1.7B (file upload or audio content).
Question extraction via LLM, then trigger Phase 1 RAG (auto + manual support).

Frontend Additions

Drag & drop video upload + progress.
Video player (<video controls>).
Live transcript display (scrolling box).
Top-Left: Video player | Top-Right: Live transcript + manual input.
Bottom: RAG response panel.
Support both automatic “Ask” on transcript updates and manual button.

Development Timeline

Phase	Duration	Key Deliverables
Setup + Phase 1 Backend	3-4 days	FastAPI + Chroma + Metadata + LLM client
Phase 1 Frontend	2-3 days	UI layout + text query flow
Phase 2 Backend	4-5 days	Video upload + WebSocket ASR + question extraction
Phase 2 Frontend	3-4 days	Video player + live transcript + auto/manual flow
Testing & Polish	1-2 days	End-to-end testing + deployment scripts

Total Estimated Effort: 13-17 developer days (2-3 weeks)

Deployment Strategy

Development:

Backend: cd backend && uvicorn app.main:app --reload --port 8000
Frontend: cd frontend && npm run dev

Production:

Use docker-compose up -d (includes backend, built frontend, Nginx reverse proxy).
Simple deploy.sh script for building and restarting.

File Information

Filename: development_plan.md
Last Updated: April 2026
Status: Ready for implementation

5.8 KiB Raw Blame History Unescape Escape