13 KiB

Raw Blame History

RAG Video Q&A — Project Knowledge Base

Generated: 2026-04-22 Updated: 2026-05-15 (Phase 4 added) Source: development_plan.md Status: Phase 1 ✅, Phase 2 ✅, Phase 4 ✅

OVERVIEW

RAG-powered Video Q&A web app. Phase 1: text → ChromaDB retrieval → bullet-point answer. Phase 2: video upload → real-time ASR → auto/manual RAG query. Phase 4: System Audio Capture + Listen Mic → real-time ASR → RAG. FastAPI backend + React 18 (Vite) frontend.

STRUCTURE

app/
├── backend/           # FastAPI (Python)
│   ├── app/
│   │   ├── main.py
│   │   ├── routers/      # query.py, ingest.py, video.py, ws_asr.py
│   │   ├── services/     # rag.py, llm_client.py, asr_client.py, video_service.py
│   │   ├── models/       # Pydantic schemas
│   │   ├── core/         # config.py, database.py
│   │   └── utils/        # chunking.py, metadata_extraction.py
│   ├── uploads/          # video storage (max 300MB)
│   ├── requirements.txt
│   └── .env.example
├── frontend/          # React 18 + TS + Vite
│   ├── src/
│   │   ├── components/   # shadcn/ui + custom (SourceSelector, SystemAudioCapture, MicCapture, etc.)
│   │   ├── hooks/        # useVideoASR, useMediaStreamASR, useSystemAudioASR, useMicASR, etc.
│   │   ├── pages/
│   │   ├── lib/
│   │   │   ├── api.ts    # API client (TanStack Query)
│   │   │   └── browser.ts # browser detection (isSystemAudioSupported)
│   │   └── App.tsx
│   ├── package.json
│   └── vite.config.ts
├── chroma_db/         # Persistent vector store
├── Dockerfile
├── docker-compose.yml
├── nginx.conf
└── deploy.sh

WHERE TO LOOK

Task	Location	Notes
API routes	`backend/app/routers/`	Versioned `/api/v1/...`
Business logic	`backend/app/services/`	RAG, LLM, ASR, video
Schemas	`backend/app/models/`	Pydantic request/response
Config	`backend/app/core/config.py`	`.env` driven (incl. `SYSTEM_AUDIO_ENABLED`, `MIC_ENABLED`)
DB init	`backend/app/core/database.py`	ChromaDB persistent
Frontend API	`frontend/src/lib/api.ts`	TanStack Query
UI components	`frontend/src/components/`	shadcn/ui + Tailwind (SourceSelector, SystemAudioCapture, MicCapture)
ASR hooks	`frontend/src/hooks/`	useVideoASR, useMediaStreamASR, useSystemAudioASR, useMicASR
Browser detection	`frontend/src/lib/browser.ts`	isSystemAudioSupported()

CODE MAP

Backend: FastAPI app with routers (query, ingest, video, ws_asr, prompts, history), services (rag, llm_client, asr_client, video_service, query_decomposer, relevance_filter, prompt_service, history_service), Pydantic models
Frontend: React 18 + TypeScript + Vite with react-resizable-panels layout, TanStack Query, SSE streaming via queryDocumentStream(), shadcn/ui + Tailwind components, SourceSelector tabs (Upload | System Audio | Listen Mic)
Pipeline: 3-step LLM workflow (decompose → retrieve → filter → generate) with per-sub-question organization
Audio Capture (Phase 4): System Audio (getDisplayMedia) and Listen Mic (getUserMedia) pipe audio via shared useMediaStreamASR → WebSocket → DashScope realtime ASR

CONVENTIONS

Backend: snake_case files; routers thin, services thick; .env for all LLM/ASR config
Frontend: PascalCase components; lib/api.ts single API client; TanStack Query for server state
API: Path versioning /api/v1/; WebSocket at /ws/asr/{video_id}
RAG: Strict prompt — answer ONLY from retrieved context; bullet-point format
Metadata: Every doc chunk must have filename, upload_date, content_summary

RAG Pipeline (3-Step LLM Workflow — Per-Sub-Question)

User Question
    ↓
[LLM Call 1] QueryDecomposer — extract 2-5 sub-questions
    ↓
[ChromaDB] Retrieve per sub-question — each sub-q independently queries ChromaDB
    ↓
[LLM Call 2] RelevanceFilter (single call) — chunks grouped by sub-q, each scored against its own sub-q
    ↓
[LLM Call 3] ResponseGeneration — markdown sections per sub-question with ## headers

Per-Sub-Question Organization:

Retrieval: RAGService.retrieve_per_subquestion() queries ChromaDB once per sub-question
Filtering: RelevanceFilter.filter_per_subquestion() single LLM call with sub-q grouping
Response: RAGService.generate_response_per_subquestion() produces markdown sections with grouped sources
SSE Events: decomposed → retrieving → filtering → generating → generating_subquestion (per sub-q) → completed
History: XML chunks wrapped in <sub_q> elements; sources stored as list-of-lists JSON
Empty decomposition fallback (Decision #13): if decomposer returns [], uses [original_question]

ANTI-PATTERNS (THIS PROJECT)

Hardcode LLM URLs/keys — always .env
Business logic in routers — belongs in services/
Non-persistent ChromaDB — must use chroma_db/ directory
LLM hallucination outside retrieved context — strict RAG prompt enforced
Plain text responses — always bullet points with source metadata
Missing document metadata — breaks source attribution
Add authentication — public demo only
Mobile-first design — desktop only at this stage
Log to console only — all backend logs must go to backend/app/log/ directory
Commit log files to git — log files must be .gitignored

UNIQUE STYLES

Dual ASR trigger: automatic (on transcript update) + manual "Ask from Video" button
Layout: Top-Left video player / SystemAudioCapture / MicCapture | Top-Right transcript + input | Bottom RAG response
Provider switching: same codebase runs dev (OpenRouter/Alibaba Cloud) and prod (local vLLM)
Video limit: 300MB max, MP4 + common formats
Three audio sources (Phase 4): Upload (video element), System Audio (getDisplayMedia), Listen Mic (getUserMedia) — unified via shared useMediaStreamASR pipeline
Phase 4 ASR routing: WebSocket /ws/asr/{video_id}?source=system-audio|mic&language=yue — backend is source-agnostic

TESTING

Backend test directory: backend/app/test/

Naming convention (pytest, flat structure, phase-prefixed):

test_phase<N>_<module_or_feature>.py

Examples:

test_phase1_ingest.py — Document upload & ChromaDB ingestion
test_phase1_query.py — RAG query endpoint
test_phase1_rag_service.py — RAG retrieval + strict prompt logic
test_phase1_llm_client.py — LLM client (mocked provider)
test_phase1_chunking.py — Document chunking utils
test_phase1_metadata.py — Metadata extraction
test_phase2_video_upload.py — Video upload (<300MB, format validation)
test_phase2_asr_client.py — ASR transcription client
test_phase2_ws_asr.py — WebSocket audio streaming
test_phase2_query_from_video.py — Auto/manual trigger from transcript
test_phase4_config.py — System audio & mic capture feature toggles
test_phase4_* (frontend) — useSystemAudioASR, useMicASR, SystemAudioCapture, MicCapture, LTTPage integration
test_integration_phase1.py — End-to-end text → RAG → answer
test_integration_phase2.py — End-to-end video → ASR → RAG → answer
test_integration_phase4.py — End-to-end WebSocket with system-audio/mic sources

Testing Rules (Python Backend):

Prefer integration tests over unit tests with mocks for all backend features and API routes.
Use real application via TestClient (FastAPI). Never mock the database or internal services.
Use existing test database fixtures and conftest.py. Only mock truly external third-party APIs (LLM, ASR).
Match the exact style and imports of existing tests in the tests/ directory.
Always run pytest after writing tests and iterate until they pass against the real system.
Each test file must have a module-level docstring describing coverage.

SUB-PHASE DEVELOPMENT

Workflow: Plan → Write Test → Implement → Make Test Pass → Commit

Sub-Phase Naming

Use decimal notation: Phase X.Y where X = major phase, Y = sub-phase number.

Example	Scope
Phase 1.1	Project setup, config, database
Phase 1.2	Ingestion pipeline
Phase 1.3	Query pipeline (3-step LLM workflow)
Phase 1.4	Testing & polish
Phase 2.1	Video upload backend
Phase 2.2	ASR integration

Test-First Rule (MANDATORY)

Every sub-phase follows test-driven delivery:

Write test first — Before writing implementation code, write the test that defines "done"
Implement — Write the minimum code to make the test pass
Run test — Verify test passes (both integration and acceptance where applicable)
Commit — Only commit after tests pass. Never commit broken tests.
Next sub-phase — Only start next sub-phase after current is committed

Enforcement:

Each Implementation Task in a sub-phase plan must list its test file(s)
Tests must be in the backend/app/test/ or frontend/src/test/ directory
Pre-commit: pytest must pass for backend, pnpm test for frontend

Sub-Phase Plan Template

Each sub-phase plan (stored in .plans/) must include:

Objective — What this sub-phase delivers
Test Files — List of test files to write BEFORE implementation
Acceptance Criteria — List of behaviors that must work
Acceptance Tests — test_acceptance_<subphase>.py file(s) with real environment
Implementation Tasks — Atomic steps, each referencing its test file

Acceptance Testing Rules

Integration tests (test_phase*.py) — TestClient + real DB, only external APIs mocked, CI-safe Acceptance tests (test_acceptance_*.py) — real environment, actual LLM/ASR calls

Acceptance test requirements:

Run against real services (ChromaDB instance, actual LLM API, ASR if applicable)
Name format: test_acceptance_<subphase>_<feature>.py
Location: backend/app/test/acceptance/
Use pytest markers: @pytest.mark.acceptance and @pytest.mark.slow
Each acceptance test file must have docstring describing real environment setup
Acceptance tests run manually before sub-phase completion, not in CI

Example acceptance test:

"""Acceptance test: Phase 1 RAG query with real Qwen LLM.

Prerequisites:
- ChromaDB running (local or docker)
- .env configured with valid LLM_BASE_URL and LLM_API_KEY
- Test documents ingested via /api/v1/ingest
"""
import pytest

@pytest.mark.acceptance
@pytest.mark.slow
def test_query_with_real_llm():
    """Query should return bullet-point answer from actual LLM."""
    # Real HTTP call to LLM provider
    # Real ChromaDB retrieval
    pass

Sub-phase completion checklist:

All integration tests written BEFORE implementation
All integration tests pass (pytest app/test/test_phase*.py -v)
All acceptance tests pass (pytest app/test/acceptance/ -v -m acceptance)
Code reviewed (self or peer)
Sub-phase plan marked complete in .plans/
Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests")

COMMANDS

# Dev
backend:  uvicorn app.main:app --reload --port 8000
frontend: pnpm run dev

# Integration tests (TestClient, real DB, only external APIs mocked)
backend:  cd backend && pytest app/test/test_phase*.py -v

# Acceptance tests (real LLM/ASR/ChromaDB)
backend:  cd backend && pytest app/test/acceptance/ -v -m acceptance

# Prod
docker-compose up -d
./deploy.sh

PLAN STORAGE

All development plans (including sub-plans, debug plans, and task breakdowns) must be stored in .plans/.

.plans/
├── development_plan.md          # Main development plan (root-level)
├── phase1_backend_plan.md       # Phase 1 backend tasks
├── phase1_frontend_plan.md      # Phase 1 frontend tasks
├── phase2_backend_plan.md       # Phase 2 backend tasks
├── phase2_frontend_plan.md      # Phase 2 frontend tasks
├── debug_<date>_<issue>.md      # Debug/diagnosis logs
└── _template.md                 # Plan template (optional)

Rules:

Name format: <purpose>_<optional_date>.md (snake_case)
Use debug_ prefix for troubleshooting logs
Root development_plan.md stays at root as canonical source
Sub-plans reference root plan, never duplicate it

NOTES

No routing library specified — single-page app likely sufficient
No client state library specified — useState/useReducer + TanStack Query
WebSocket client not specified — may need to expand lib/api.ts
shadcn/ui components are copied, not imported as npm package
Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727

13 KiB Raw Blame History