# RAG Video Q&A — Project Knowledge Base **Generated:** 2026-04-22 **Source:** development_plan.md **Status:** Greenfield (no code yet) --- ## OVERVIEW RAG-powered Video Q&A web app. Phase 1: text → ChromaDB retrieval → bullet-point answer. Phase 2: video upload → real-time ASR → auto/manual RAG query. FastAPI backend + React 18 (Vite) frontend. ## STRUCTURE ``` app/ ├── backend/ # FastAPI (Python) │ ├── app/ │ │ ├── main.py │ │ ├── routers/ # query.py, ingest.py, video.py, ws_asr.py │ │ ├── services/ # rag.py, llm_client.py, asr_client.py, video_service.py │ │ ├── models/ # Pydantic schemas │ │ ├── core/ # config.py, database.py │ │ └── utils/ # chunking.py, metadata_extraction.py │ ├── uploads/ # video storage (max 300MB) │ ├── requirements.txt │ └── .env.example ├── frontend/ # React 18 + TS + Vite │ ├── src/ │ │ ├── components/ # shadcn/ui + custom │ │ ├── pages/ │ │ ├── lib/ │ │ │ └── api.ts # API client (TanStack Query) │ │ └── App.tsx │ ├── package.json │ └── vite.config.ts ├── chroma_db/ # Persistent vector store ├── Dockerfile ├── docker-compose.yml ├── nginx.conf └── deploy.sh ``` ## WHERE TO LOOK | Task | Location | Notes | |------|----------|-------| | API routes | `backend/app/routers/` | Versioned `/api/v1/...` | | Business logic | `backend/app/services/` | RAG, LLM, ASR, video | | Schemas | `backend/app/models/` | Pydantic request/response | | Config | `backend/app/core/config.py` | `.env` driven | | DB init | `backend/app/core/database.py` | ChromaDB persistent | | Frontend API | `frontend/src/lib/api.ts` | TanStack Query | | UI components | `frontend/src/components/` | shadcn/ui + Tailwind | ## CODE MAP - **Backend**: FastAPI app with routers (query, ingest, video, ws_asr, prompts, history), services (rag, llm_client, asr_client, video_service, query_decomposer, relevance_filter, prompt_service, history_service), Pydantic models - **Frontend**: React 18 + TypeScript + Vite with react-resizable-panels layout, TanStack Query, SSE streaming via `queryDocumentStream()`, shadcn/ui + Tailwind components - **Pipeline**: 3-step LLM workflow (decompose → retrieve → filter → generate) with per-sub-question organization ## CONVENTIONS - **Backend**: `snake_case` files; routers thin, services thick; `.env` for all LLM/ASR config - **Frontend**: PascalCase components; `lib/api.ts` single API client; TanStack Query for server state - **API**: Path versioning `/api/v1/`; WebSocket at `/ws/asr/{video_id}` - **RAG**: Strict prompt — answer ONLY from retrieved context; bullet-point format - **Metadata**: Every doc chunk must have `filename`, `upload_date`, `content_summary` ### RAG Pipeline (3-Step LLM Workflow — Per-Sub-Question) ``` User Question ↓ [LLM Call 1] QueryDecomposer — extract 2-5 sub-questions ↓ [ChromaDB] Retrieve per sub-question — each sub-q independently queries ChromaDB ↓ [LLM Call 2] RelevanceFilter (single call) — chunks grouped by sub-q, each scored against its own sub-q ↓ [LLM Call 3] ResponseGeneration — markdown sections per sub-question with ## headers ``` **Per-Sub-Question Organization**: - Retrieval: `RAGService.retrieve_per_subquestion()` queries ChromaDB once per sub-question - Filtering: `RelevanceFilter.filter_per_subquestion()` single LLM call with sub-q grouping - Response: `RAGService.generate_response_per_subquestion()` produces markdown sections with grouped sources - SSE Events: `decomposed → retrieving → filtering → generating → generating_subquestion (per sub-q) → completed` - History: XML chunks wrapped in `` elements; sources stored as list-of-lists JSON - Empty decomposition fallback (Decision #13): if decomposer returns `[]`, uses `[original_question]` ## ANTI-PATTERNS (THIS PROJECT) - Hardcode LLM URLs/keys — always `.env` - Business logic in routers — belongs in `services/` - Non-persistent ChromaDB — must use `chroma_db/` directory - LLM hallucination outside retrieved context — strict RAG prompt enforced - Plain text responses — always bullet points with source metadata - Missing document metadata — breaks source attribution - Add authentication — public demo only - Mobile-first design — desktop only at this stage - Log to console only — all backend logs must go to `backend/app/log/` directory - Commit log files to git — log files must be `.gitignore`d ## UNIQUE STYLES - **Dual ASR trigger**: automatic (on transcript update) + manual "Ask from Video" button - **Layout**: Top-Left video player | Top-Right transcript + input | Bottom RAG response - **Provider switching**: same codebase runs dev (OpenRouter/Alibaba Cloud) and prod (local vLLM) - **Video limit**: 300MB max, MP4 + common formats ## TESTING **Backend test directory**: `backend/app/test/` **Naming convention** (pytest, flat structure, phase-prefixed): ``` test_phase_.py ``` **Examples**: - `test_phase1_ingest.py` — Document upload & ChromaDB ingestion - `test_phase1_query.py` — RAG query endpoint - `test_phase1_rag_service.py` — RAG retrieval + strict prompt logic - `test_phase1_llm_client.py` — LLM client (mocked provider) - `test_phase1_chunking.py` — Document chunking utils - `test_phase1_metadata.py` — Metadata extraction - `test_phase2_video_upload.py` — Video upload (<300MB, format validation) - `test_phase2_asr_client.py` — ASR transcription client - `test_phase2_ws_asr.py` — WebSocket audio streaming - `test_phase2_query_from_video.py` — Auto/manual trigger from transcript - `test_integration_phase1.py` — End-to-end text → RAG → answer - `test_integration_phase2.py` — End-to-end video → ASR → RAG → answer **Rules**: - Use `pytest` + `pytest-asyncio` for async tests - Mock all external LLM/ASR calls (do not hit live APIs in tests) - Use `tmp_path` fixture for ChromaDB test instances - Each test file must have a module-level docstring describing coverage ## SUB-PHASE DEVELOPMENT **Workflow**: Plan → Write Test → Implement → Make Test Pass → Commit ### Sub-Phase Naming Use decimal notation: **Phase X.Y** where X = major phase, Y = sub-phase number. | Example | Scope | |---------|-------| | Phase 1.1 | Project setup, config, database | | Phase 1.2 | Ingestion pipeline | | Phase 1.3 | Query pipeline (3-step LLM workflow) | | Phase 1.4 | Testing & polish | | Phase 2.1 | Video upload backend | | Phase 2.2 | ASR integration | ### Test-First Rule (MANDATORY) Every sub-phase follows **test-driven delivery**: 1. **Write test first** — Before writing implementation code, write the test that defines "done" 2. **Implement** — Write the minimum code to make the test pass 3. **Run test** — Verify test passes (both unit and acceptance where applicable) 4. **Commit** — Only commit after tests pass. Never commit broken tests. 5. **Next sub-phase** — Only start next sub-phase after current is committed **Enforcement**: - Each Implementation Task in a sub-phase plan must list its test file(s) - Tests must be in the `backend/app/test/` or `frontend/src/test/` directory - Pre-commit: `pytest` must pass for backend, `npm test` for frontend ### Sub-Phase Plan Template Each sub-phase plan (stored in `.plans/`) must include: 1. **Objective** — What this sub-phase delivers 2. **Test Files** — List of test files to write BEFORE implementation 3. **Acceptance Criteria** — List of behaviors that must work 4. **Acceptance Tests** — `test_acceptance_.py` file(s) with real environment 5. **Implementation Tasks** — Atomic steps, each referencing its test file ### Acceptance Testing Rules **Unit tests** (`test_phase*.py`) — mocked, fast, CI-safe **Acceptance tests** (`test_acceptance_*.py`) — real environment, actual LLM/ASR calls **Acceptance test requirements**: - Run against real services (ChromaDB instance, actual LLM API, ASR if applicable) - Name format: `test_acceptance__.py` - Location: `backend/app/test/acceptance/` - Use `pytest` markers: `@pytest.mark.acceptance` and `@pytest.mark.slow` - Each acceptance test file must have docstring describing real environment setup - Acceptance tests run manually before sub-phase completion, not in CI **Example acceptance test**: ```python """Acceptance test: Phase 1 RAG query with real Qwen LLM. Prerequisites: - ChromaDB running (local or docker) - .env configured with valid LLM_BASE_URL and LLM_API_KEY - Test documents ingested via /api/v1/ingest """ import pytest @pytest.mark.acceptance @pytest.mark.slow def test_query_with_real_llm(): """Query should return bullet-point answer from actual LLM.""" # Real HTTP call to LLM provider # Real ChromaDB retrieval pass ``` **Sub-phase completion checklist**: - [ ] All unit tests written BEFORE implementation - [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`) - [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`) - [ ] Code reviewed (self or peer) - [ ] Sub-phase plan marked complete in `.plans/` - [ ] Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests") ## COMMANDS ```bash # Dev backend: uvicorn app.main:app --reload --port 8000 frontend: npm run dev # Unit tests (mocked, CI-safe) backend: cd backend && pytest app/test/test_phase*.py -v # Acceptance tests (real LLM/ASR/ChromaDB) backend: cd backend && pytest app/test/acceptance/ -v -m acceptance # Prod docker-compose up -d ./deploy.sh ``` ## PLAN STORAGE **All development plans** (including sub-plans, debug plans, and task breakdowns) **must be stored in `.plans/`**. ``` .plans/ ├── development_plan.md # Main development plan (root-level) ├── phase1_backend_plan.md # Phase 1 backend tasks ├── phase1_frontend_plan.md # Phase 1 frontend tasks ├── phase2_backend_plan.md # Phase 2 backend tasks ├── phase2_frontend_plan.md # Phase 2 frontend tasks ├── debug__.md # Debug/diagnosis logs └── _template.md # Plan template (optional) ``` **Rules**: - Name format: `_.md` (snake_case) - Use `debug_` prefix for troubleshooting logs - Root `development_plan.md` stays at root as canonical source - Sub-plans reference root plan, never duplicate it ## NOTES - No routing library specified — single-page app likely sufficient - No client state library specified — `useState`/`useReducer` + TanStack Query - WebSocket client not specified — may need to expand `lib/api.ts` - shadcn/ui components are copied, not imported as npm package - Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727