141 lines
5.8 KiB
Markdown
141 lines
5.8 KiB
Markdown
# RAG Video Q&A Web Application - Development Plan
|
||
|
||
**Project Overview**
|
||
Web-based application built in two phases.
|
||
- **Phase 1**: Text question → RAG retrieval → Point-form answer (strictly from database)
|
||
- **Phase 2**: Video upload + player → real-time audio streaming → ASR transcription → question extraction → Phase 1 RAG flow
|
||
|
||
**Tech Stack**
|
||
- **Backend**: Python + FastAPI (REST + WebSocket)
|
||
- **Frontend**: TypeScript + React 18 (Vite) + shadcn/ui + Tailwind CSS
|
||
- **Server**: Linux Ubuntu 22.04
|
||
- **RAG Database**: ChromaDB (persistent)
|
||
- **LLM/ASR Integration**: Dynamic via `.env` (supports local vLLM, OpenRouter, Alibaba Cloud)
|
||
- Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727
|
||
|
||
- **Models**:
|
||
- Embedding: `qwen/qwen3-embedding-4b`
|
||
- LLM: `qwen/qwen3.5-35b-a3b`
|
||
- ASR: `Qwen/Qwen3-ASR-1.7B`
|
||
|
||
**Deployment**
|
||
- Development: Simple commands (`uvicorn` + `npm run dev`)
|
||
- Production: Docker + Nginx
|
||
|
||
---
|
||
|
||
## Project Structure (Monorepo)
|
||
app/
|
||
├── backend/ # FastAPI
|
||
│ ├── app/
|
||
│ │ ├── main.py
|
||
│ │ ├── routers/ # query.py, ingest.py, video.py, ws_asr.py
|
||
│ │ ├── services/ # rag.py, llm_client.py, asr_client.py, video_service.py
|
||
│ │ ├── models/ # Pydantic schemas
|
||
│ │ ├── core/ # config.py, database.py
|
||
│ │ └── utils/ # chunking, metadata extraction
|
||
│ ├── uploads/ # video storage (max 300MB)
|
||
│ ├── requirements.txt
|
||
│ └── .env.example
|
||
├── frontend/ # React + TypeScript (Vite)
|
||
│ ├── src/
|
||
│ │ ├── components/
|
||
│ │ ├── pages/
|
||
│ │ ├── lib/ # api.ts
|
||
│ │ └── App.tsx
|
||
│ ├── package.json
|
||
│ └── vite.config.ts
|
||
├── chroma_db/ # Persistent vector store
|
||
├── Dockerfile
|
||
├── docker-compose.yml
|
||
├── nginx.conf
|
||
└── deploy.sh
|
||
|
||
|
||
---
|
||
|
||
## Key Requirements Incorporated
|
||
|
||
- **LLM/ASR Configuration**: Backend reads from `.env` for easy switching between development (OpenRouter / Alibaba Cloud) and production (local vLLM).
|
||
- **RAG Database**: ChromaDB with metadata support (filename + extracted content metadata).
|
||
- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers.
|
||
- **Document Ingestion**: Via UI (project-based demo, no user authentication).
|
||
- **Video**: MP4 and common formats, maximum 300MB.
|
||
- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** “Ask from Video” button.
|
||
- **UI Layout**:
|
||
- Top-Left: Video player
|
||
- Top-Right: Real-time transcript + text input box
|
||
- Bottom Half: RAG response (bullet points with source metadata)
|
||
- **Authentication**: Public demo (no login required).
|
||
- **Mobile**: Not required at this stage.
|
||
|
||
---
|
||
|
||
## Phase 1: Text Question → RAG → Point-Form Answer (5-7 days)
|
||
|
||
### Backend (FastAPI)
|
||
- Dynamic configuration via `.env` (LLM base URL, API key, model names).
|
||
- `services/rag.py`: Persistent ChromaDB + Qwen embedding + metadata extraction (filename, upload date, content summary).
|
||
- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM with **strict RAG prompt** (only use retrieved context).
|
||
- Endpoints:
|
||
- `POST /api/v1/ingest` – Document upload and ingestion with metadata.
|
||
- `POST /api/v1/query` – Question → retrieve → LLM → bullet-point response.
|
||
|
||
### Frontend (React + TS)
|
||
- Clean layout: Top-right input box, bottom response area.
|
||
- Type-safe API calls using TanStack Query.
|
||
- Display answer as clean bullet list with source metadata.
|
||
|
||
---
|
||
|
||
## Phase 2: Video Upload + Real-Time ASR → RAG (8-10 days)
|
||
|
||
### Backend Additions
|
||
- Video upload (`POST /api/v1/upload-video`) with size/format validation (<300MB).
|
||
- Static file serving for videos.
|
||
- WebSocket `/ws/asr/{video_id}` for real-time audio chunk streaming.
|
||
- ASR integration with `Qwen/Qwen3-ASR-1.7B` (file upload or audio content).
|
||
- Question extraction via LLM, then trigger Phase 1 RAG (auto + manual support).
|
||
|
||
### Frontend Additions
|
||
- Drag & drop video upload + progress.
|
||
- Video player (`<video controls>`).
|
||
- Live transcript display (scrolling box).
|
||
- Top-Left: Video player | Top-Right: Live transcript + manual input.
|
||
- Bottom: RAG response panel.
|
||
- Support both automatic “Ask” on transcript updates and manual button.
|
||
|
||
---
|
||
|
||
## Development Timeline
|
||
|
||
| Phase | Duration | Key Deliverables |
|
||
|-----------------------------|--------------|------------------|
|
||
| Setup + Phase 1 Backend | 3-4 days | FastAPI + Chroma + Metadata + LLM client |
|
||
| Phase 1 Frontend | 2-3 days | UI layout + text query flow |
|
||
| Phase 2 Backend | 4-5 days | Video upload + WebSocket ASR + question extraction |
|
||
| Phase 2 Frontend | 3-4 days | Video player + live transcript + auto/manual flow |
|
||
| Testing & Polish | 1-2 days | End-to-end testing + deployment scripts |
|
||
|
||
**Total Estimated Effort**: 13-17 developer days (2-3 weeks)
|
||
|
||
---
|
||
|
||
## Deployment Strategy
|
||
|
||
**Development**:
|
||
- Backend: `cd backend && uvicorn app.main:app --reload --port 8000`
|
||
- Frontend: `cd frontend && npm run dev`
|
||
|
||
**Production**:
|
||
- Use `docker-compose up -d` (includes backend, built frontend, Nginx reverse proxy).
|
||
- Simple `deploy.sh` script for building and restarting.
|
||
|
||
|
||
---
|
||
|
||
**File Information**
|
||
- Filename: `development_plan.md`
|
||
- Last Updated: April 2026
|
||
- Status: Ready for implementation
|