13 KiB
RAG Video Q&A — Project Knowledge Base
Generated: 2026-04-22 Updated: 2026-05-15 (Phase 4 added) Source: development_plan.md Status: Phase 1 ✅, Phase 2 ✅, Phase 4 ✅
OVERVIEW
RAG-powered Video Q&A web app. Phase 1: text → ChromaDB retrieval → bullet-point answer. Phase 2: video upload → real-time ASR → auto/manual RAG query. Phase 4: System Audio Capture + Listen Mic → real-time ASR → RAG. FastAPI backend + React 18 (Vite) frontend.
STRUCTURE
app/
├── backend/ # FastAPI (Python)
│ ├── app/
│ │ ├── main.py
│ │ ├── routers/ # query.py, ingest.py, video.py, ws_asr.py
│ │ ├── services/ # rag.py, llm_client.py, asr_client.py, video_service.py
│ │ ├── models/ # Pydantic schemas
│ │ ├── core/ # config.py, database.py
│ │ └── utils/ # chunking.py, metadata_extraction.py
│ ├── uploads/ # video storage (max 300MB)
│ ├── requirements.txt
│ └── .env.example
├── frontend/ # React 18 + TS + Vite
│ ├── src/
│ │ ├── components/ # shadcn/ui + custom (SourceSelector, SystemAudioCapture, MicCapture, etc.)
│ │ ├── hooks/ # useVideoASR, useMediaStreamASR, useSystemAudioASR, useMicASR, etc.
│ │ ├── pages/
│ │ ├── lib/
│ │ │ ├── api.ts # API client (TanStack Query)
│ │ │ └── browser.ts # browser detection (isSystemAudioSupported)
│ │ └── App.tsx
│ ├── package.json
│ └── vite.config.ts
├── chroma_db/ # Persistent vector store
├── Dockerfile
├── docker-compose.yml
├── nginx.conf
└── deploy.sh
WHERE TO LOOK
| Task | Location | Notes |
|---|---|---|
| API routes | backend/app/routers/ |
Versioned /api/v1/... |
| Business logic | backend/app/services/ |
RAG, LLM, ASR, video |
| Schemas | backend/app/models/ |
Pydantic request/response |
| Config | backend/app/core/config.py |
.env driven (incl. SYSTEM_AUDIO_ENABLED, MIC_ENABLED) |
| DB init | backend/app/core/database.py |
ChromaDB persistent |
| Frontend API | frontend/src/lib/api.ts |
TanStack Query |
| UI components | frontend/src/components/ |
shadcn/ui + Tailwind (SourceSelector, SystemAudioCapture, MicCapture) |
| ASR hooks | frontend/src/hooks/ |
useVideoASR, useMediaStreamASR, useSystemAudioASR, useMicASR |
| Browser detection | frontend/src/lib/browser.ts |
isSystemAudioSupported() |
CODE MAP
- Backend: FastAPI app with routers (query, ingest, video, ws_asr, prompts, history), services (rag, llm_client, asr_client, video_service, query_decomposer, relevance_filter, prompt_service, history_service), Pydantic models
- Frontend: React 18 + TypeScript + Vite with react-resizable-panels layout, TanStack Query, SSE streaming via
queryDocumentStream(), shadcn/ui + Tailwind components, SourceSelector tabs (Upload | System Audio | Listen Mic) - Pipeline: 3-step LLM workflow (decompose → retrieve → filter → generate) with per-sub-question organization
- Audio Capture (Phase 4): System Audio (
getDisplayMedia) and Listen Mic (getUserMedia) pipe audio via shareduseMediaStreamASR→ WebSocket → DashScope realtime ASR
CONVENTIONS
- Backend:
snake_casefiles; routers thin, services thick;.envfor all LLM/ASR config - Frontend: PascalCase components;
lib/api.tssingle API client; TanStack Query for server state - API: Path versioning
/api/v1/; WebSocket at/ws/asr/{video_id} - RAG: Strict prompt — answer ONLY from retrieved context; bullet-point format
- Metadata: Every doc chunk must have
filename,upload_date,content_summary
RAG Pipeline (3-Step LLM Workflow — Per-Sub-Question)
User Question
↓
[LLM Call 1] QueryDecomposer — extract 2-5 sub-questions
↓
[ChromaDB] Retrieve per sub-question — each sub-q independently queries ChromaDB
↓
[LLM Call 2] RelevanceFilter (single call) — chunks grouped by sub-q, each scored against its own sub-q
↓
[LLM Call 3] ResponseGeneration — markdown sections per sub-question with ## headers
Per-Sub-Question Organization:
- Retrieval:
RAGService.retrieve_per_subquestion()queries ChromaDB once per sub-question - Filtering:
RelevanceFilter.filter_per_subquestion()single LLM call with sub-q grouping - Response:
RAGService.generate_response_per_subquestion()produces markdown sections with grouped sources - SSE Events:
decomposed → retrieving → filtering → generating → generating_subquestion (per sub-q) → completed - History: XML chunks wrapped in
<sub_q>elements; sources stored as list-of-lists JSON - Empty decomposition fallback (Decision #13): if decomposer returns
[], uses[original_question]
ANTI-PATTERNS (THIS PROJECT)
- Hardcode LLM URLs/keys — always
.env - Business logic in routers — belongs in
services/ - Non-persistent ChromaDB — must use
chroma_db/directory - LLM hallucination outside retrieved context — strict RAG prompt enforced
- Plain text responses — always bullet points with source metadata
- Missing document metadata — breaks source attribution
- Add authentication — public demo only
- Mobile-first design — desktop only at this stage
- Log to console only — all backend logs must go to
backend/app/log/directory - Commit log files to git — log files must be
.gitignored
UNIQUE STYLES
- Dual ASR trigger: automatic (on transcript update) + manual "Ask from Video" button
- Layout: Top-Left video player / SystemAudioCapture / MicCapture | Top-Right transcript + input | Bottom RAG response
- Provider switching: same codebase runs dev (OpenRouter/Alibaba Cloud) and prod (local vLLM)
- Video limit: 300MB max, MP4 + common formats
- Three audio sources (Phase 4): Upload (video element), System Audio (getDisplayMedia), Listen Mic (getUserMedia) — unified via shared useMediaStreamASR pipeline
- Phase 4 ASR routing: WebSocket
/ws/asr/{video_id}?source=system-audio|mic&language=yue— backend is source-agnostic
TESTING
Backend test directory: backend/app/test/
Naming convention (pytest, flat structure, phase-prefixed):
test_phase<N>_<module_or_feature>.py
Examples:
test_phase1_ingest.py— Document upload & ChromaDB ingestiontest_phase1_query.py— RAG query endpointtest_phase1_rag_service.py— RAG retrieval + strict prompt logictest_phase1_llm_client.py— LLM client (mocked provider)test_phase1_chunking.py— Document chunking utilstest_phase1_metadata.py— Metadata extractiontest_phase2_video_upload.py— Video upload (<300MB, format validation)test_phase2_asr_client.py— ASR transcription clienttest_phase2_ws_asr.py— WebSocket audio streamingtest_phase2_query_from_video.py— Auto/manual trigger from transcripttest_phase4_config.py— System audio & mic capture feature togglestest_phase4_*(frontend) — useSystemAudioASR, useMicASR, SystemAudioCapture, MicCapture, LTTPage integrationtest_integration_phase1.py— End-to-end text → RAG → answertest_integration_phase2.py— End-to-end video → ASR → RAG → answertest_integration_phase4.py— End-to-end WebSocket with system-audio/mic sources
Testing Rules (Python Backend):
- Prefer integration tests over unit tests with mocks for all backend features and API routes.
- Use real application via
TestClient(FastAPI). Never mock the database or internal services. - Use existing test database fixtures and
conftest.py. Only mock truly external third-party APIs (LLM, ASR). - Match the exact style and imports of existing tests in the
tests/directory. - Always run
pytestafter writing tests and iterate until they pass against the real system. - Each test file must have a module-level docstring describing coverage.
SUB-PHASE DEVELOPMENT
Workflow: Plan → Write Test → Implement → Make Test Pass → Commit
Sub-Phase Naming
Use decimal notation: Phase X.Y where X = major phase, Y = sub-phase number.
| Example | Scope |
|---|---|
| Phase 1.1 | Project setup, config, database |
| Phase 1.2 | Ingestion pipeline |
| Phase 1.3 | Query pipeline (3-step LLM workflow) |
| Phase 1.4 | Testing & polish |
| Phase 2.1 | Video upload backend |
| Phase 2.2 | ASR integration |
Test-First Rule (MANDATORY)
Every sub-phase follows test-driven delivery:
- Write test first — Before writing implementation code, write the test that defines "done"
- Implement — Write the minimum code to make the test pass
- Run test — Verify test passes (both integration and acceptance where applicable)
- Commit — Only commit after tests pass. Never commit broken tests.
- Next sub-phase — Only start next sub-phase after current is committed
Enforcement:
- Each Implementation Task in a sub-phase plan must list its test file(s)
- Tests must be in the
backend/app/test/orfrontend/src/test/directory - Pre-commit:
pytestmust pass for backend,pnpm testfor frontend
Sub-Phase Plan Template
Each sub-phase plan (stored in .plans/) must include:
- Objective — What this sub-phase delivers
- Test Files — List of test files to write BEFORE implementation
- Acceptance Criteria — List of behaviors that must work
- Acceptance Tests —
test_acceptance_<subphase>.pyfile(s) with real environment - Implementation Tasks — Atomic steps, each referencing its test file
Acceptance Testing Rules
Integration tests (test_phase*.py) — TestClient + real DB, only external APIs mocked, CI-safe
Acceptance tests (test_acceptance_*.py) — real environment, actual LLM/ASR calls
Acceptance test requirements:
- Run against real services (ChromaDB instance, actual LLM API, ASR if applicable)
- Name format:
test_acceptance_<subphase>_<feature>.py - Location:
backend/app/test/acceptance/ - Use
pytestmarkers:@pytest.mark.acceptanceand@pytest.mark.slow - Each acceptance test file must have docstring describing real environment setup
- Acceptance tests run manually before sub-phase completion, not in CI
Example acceptance test:
"""Acceptance test: Phase 1 RAG query with real Qwen LLM.
Prerequisites:
- ChromaDB running (local or docker)
- .env configured with valid LLM_BASE_URL and LLM_API_KEY
- Test documents ingested via /api/v1/ingest
"""
import pytest
@pytest.mark.acceptance
@pytest.mark.slow
def test_query_with_real_llm():
"""Query should return bullet-point answer from actual LLM."""
# Real HTTP call to LLM provider
# Real ChromaDB retrieval
pass
Sub-phase completion checklist:
- All integration tests written BEFORE implementation
- All integration tests pass (
pytest app/test/test_phase*.py -v) - All acceptance tests pass (
pytest app/test/acceptance/ -v -m acceptance) - Code reviewed (self or peer)
- Sub-phase plan marked complete in
.plans/ - Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests")
COMMANDS
# Dev
backend: uvicorn app.main:app --reload --port 8000
frontend: pnpm run dev
# Integration tests (TestClient, real DB, only external APIs mocked)
backend: cd backend && pytest app/test/test_phase*.py -v
# Acceptance tests (real LLM/ASR/ChromaDB)
backend: cd backend && pytest app/test/acceptance/ -v -m acceptance
# Prod
docker-compose up -d
./deploy.sh
PLAN STORAGE
All development plans (including sub-plans, debug plans, and task breakdowns) must be stored in .plans/.
.plans/
├── development_plan.md # Main development plan (root-level)
├── phase1_backend_plan.md # Phase 1 backend tasks
├── phase1_frontend_plan.md # Phase 1 frontend tasks
├── phase2_backend_plan.md # Phase 2 backend tasks
├── phase2_frontend_plan.md # Phase 2 frontend tasks
├── debug_<date>_<issue>.md # Debug/diagnosis logs
└── _template.md # Plan template (optional)
Rules:
- Name format:
<purpose>_<optional_date>.md(snake_case) - Use
debug_prefix for troubleshooting logs - Root
development_plan.mdstays at root as canonical source - Sub-plans reference root plan, never duplicate it
NOTES
- No routing library specified — single-page app likely sufficient
- No client state library specified —
useState/useReducer+ TanStack Query - WebSocket client not specified — may need to expand
lib/api.ts - shadcn/ui components are copied, not imported as npm package
- Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727