# RAG Video Q&A — Project Knowledge Base **Generated:** 2026-04-22 **Source:** development_plan.md **Status:** Greenfield (no code yet) --- ## OVERVIEW RAG-powered Video Q&A web app. Phase 1: text → ChromaDB retrieval → bullet-point answer. Phase 2: video upload → real-time ASR → auto/manual RAG query. FastAPI backend + React 18 (Vite) frontend. ## STRUCTURE ``` app/ ├── backend/ # FastAPI (Python) │ ├── app/ │ │ ├── main.py │ │ ├── routers/ # query.py, ingest.py, video.py, ws_asr.py │ │ ├── services/ # rag.py, llm_client.py, asr_client.py, video_service.py │ │ ├── models/ # Pydantic schemas │ │ ├── core/ # config.py, database.py │ │ └── utils/ # chunking.py, metadata_extraction.py │ ├── uploads/ # video storage (max 300MB) │ ├── requirements.txt │ └── .env.example ├── frontend/ # React 18 + TS + Vite │ ├── src/ │ │ ├── components/ # shadcn/ui + custom │ │ ├── pages/ │ │ ├── lib/ │ │ │ └── api.ts # API client (TanStack Query) │ │ └── App.tsx │ ├── package.json │ └── vite.config.ts ├── chroma_db/ # Persistent vector store ├── Dockerfile ├── docker-compose.yml ├── nginx.conf └── deploy.sh ``` ## WHERE TO LOOK | Task | Location | Notes | |------|----------|-------| | API routes | `backend/app/routers/` | Versioned `/api/v1/...` | | Business logic | `backend/app/services/` | RAG, LLM, ASR, video | | Schemas | `backend/app/models/` | Pydantic request/response | | Config | `backend/app/core/config.py` | `.env` driven | | DB init | `backend/app/core/database.py` | ChromaDB persistent | | Frontend API | `frontend/src/lib/api.ts` | TanStack Query | | UI components | `frontend/src/components/` | shadcn/ui + Tailwind | ## CODE MAP *Greenfield — no code yet. See development_plan.md for full specification.* ## CONVENTIONS - **Backend**: `snake_case` files; routers thin, services thick; `.env` for all LLM/ASR config - **Frontend**: PascalCase components; `lib/api.ts` single API client; TanStack Query for server state - **API**: Path versioning `/api/v1/`; WebSocket at `/ws/asr/{video_id}` - **RAG**: Strict prompt — answer ONLY from retrieved context; bullet-point format - **Metadata**: Every doc chunk must have `filename`, `upload_date`, `content_summary` ## ANTI-PATTERNS (THIS PROJECT) - Hardcode LLM URLs/keys — always `.env` - Business logic in routers — belongs in `services/` - Non-persistent ChromaDB — must use `chroma_db/` directory - LLM hallucination outside retrieved context — strict RAG prompt enforced - Plain text responses — always bullet points with source metadata - Missing document metadata — breaks source attribution - Add authentication — public demo only - Mobile-first design — desktop only at this stage ## UNIQUE STYLES - **Dual ASR trigger**: automatic (on transcript update) + manual "Ask from Video" button - **Layout**: Top-Left video player | Top-Right transcript + input | Bottom RAG response - **Provider switching**: same codebase runs dev (OpenRouter/Alibaba Cloud) and prod (local vLLM) - **Video limit**: 300MB max, MP4 + common formats ## TESTING **Backend test directory**: `backend/app/test/` **Naming convention** (pytest, flat structure, phase-prefixed): ``` test_phase_.py ``` **Examples**: - `test_phase1_ingest.py` — Document upload & ChromaDB ingestion - `test_phase1_query.py` — RAG query endpoint - `test_phase1_rag_service.py` — RAG retrieval + strict prompt logic - `test_phase1_llm_client.py` — LLM client (mocked provider) - `test_phase1_chunking.py` — Document chunking utils - `test_phase1_metadata.py` — Metadata extraction - `test_phase2_video_upload.py` — Video upload (<300MB, format validation) - `test_phase2_asr_client.py` — ASR transcription client - `test_phase2_ws_asr.py` — WebSocket audio streaming - `test_phase2_query_from_video.py` — Auto/manual trigger from transcript - `test_integration_phase1.py` — End-to-end text → RAG → answer - `test_integration_phase2.py` — End-to-end video → ASR → RAG → answer **Rules**: - Use `pytest` + `pytest-asyncio` for async tests - Mock all external LLM/ASR calls (do not hit live APIs in tests) - Use `tmp_path` fixture for ChromaDB test instances - Each test file must have a module-level docstring describing coverage ## SUB-PHASE DEVELOPMENT **Workflow**: Plan → Implement → Acceptance Test → Commit ### Sub-Phase Plan Template Each sub-phase plan (stored in `.plans/`) must include: 1. **Objective** — What this sub-phase delivers 2. **Acceptance Criteria** — List of behaviors that must work 3. **Acceptance Tests** — `test_acceptance_.py` file(s) with real environment 4. **Implementation Tasks** — Atomic steps to complete ### Acceptance Testing Rules **Unit tests** (`test_phase*.py`) — mocked, fast, CI-safe **Acceptance tests** (`test_acceptance_*.py`) — real environment, actual LLM/ASR calls **Acceptance test requirements**: - Run against real services (ChromaDB instance, actual LLM API, ASR if applicable) - Name format: `test_acceptance__.py` - Location: `backend/app/test/acceptance/` - Use `pytest` markers: `@pytest.mark.acceptance` and `@pytest.mark.slow` - Each acceptance test file must have docstring describing real environment setup - Acceptance tests run manually before sub-phase completion, not in CI **Example acceptance test**: ```python """Acceptance test: Phase 1 RAG query with real Qwen LLM. Prerequisites: - ChromaDB running (local or docker) - .env configured with valid LLM_BASE_URL and LLM_API_KEY - Test documents ingested via /api/v1/ingest """ import pytest @pytest.mark.acceptance @pytest.mark.slow def test_query_with_real_llm(): """Query should return bullet-point answer from actual LLM.""" # Real HTTP call to LLM provider # Real ChromaDB retrieval pass ``` **Sub-phase completion checklist**: - [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`) - [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`) - [ ] Code reviewed (self or peer) - [ ] Sub-phase plan marked complete in `.plans/` - [ ] Git commit with clear message referencing sub-phase plan ## COMMANDS ```bash # Dev backend: uvicorn app.main:app --reload --port 8000 frontend: npm run dev # Unit tests (mocked, CI-safe) backend: cd backend && pytest app/test/test_phase*.py -v # Acceptance tests (real LLM/ASR/ChromaDB) backend: cd backend && pytest app/test/acceptance/ -v -m acceptance # Prod docker-compose up -d ./deploy.sh ``` ## PLAN STORAGE **All development plans** (including sub-plans, debug plans, and task breakdowns) **must be stored in `.plans/`**. ``` .plans/ ├── development_plan.md # Main development plan (root-level) ├── phase1_backend_plan.md # Phase 1 backend tasks ├── phase1_frontend_plan.md # Phase 1 frontend tasks ├── phase2_backend_plan.md # Phase 2 backend tasks ├── phase2_frontend_plan.md # Phase 2 frontend tasks ├── debug__.md # Debug/diagnosis logs └── _template.md # Plan template (optional) ``` **Rules**: - Name format: `_.md` (snake_case) - Use `debug_` prefix for troubleshooting logs - Root `development_plan.md` stays at root as canonical source - Sub-plans reference root plan, never duplicate it ## NOTES - No routing library specified — single-page app likely sufficient - No client state library specified — `useState`/`useReducer` + TanStack Query - WebSocket client not specified — may need to expand `lib/api.ts` - shadcn/ui components are copied, not imported as npm package - Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727