# RAG Video Q&A — Project Knowledge Base

**Generated:** 2026-04-22
**Source:** development_plan.md
**Status:** Greenfield (no code yet)

---

## OVERVIEW
RAG-powered Video Q&A web app. Phase 1: text → ChromaDB retrieval → bullet-point answer. Phase 2: video upload → real-time ASR → auto/manual RAG query. FastAPI backend + React 18 (Vite) frontend.

## STRUCTURE
```
app/
├── backend/           # FastAPI (Python)
│   ├── app/
│   │   ├── main.py
│   │   ├── routers/      # query.py, ingest.py, video.py, ws_asr.py
│   │   ├── services/     # rag.py, llm_client.py, asr_client.py, video_service.py
│   │   ├── models/       # Pydantic schemas
│   │   ├── core/         # config.py, database.py
│   │   └── utils/        # chunking.py, metadata_extraction.py
│   ├── uploads/          # video storage (max 300MB)
│   ├── requirements.txt
│   └── .env.example
├── frontend/          # React 18 + TS + Vite
│   ├── src/
│   │   ├── components/   # shadcn/ui + custom
│   │   ├── pages/
│   │   ├── lib/
│   │   │   └── api.ts    # API client (TanStack Query)
│   │   └── App.tsx
│   ├── package.json
│   └── vite.config.ts
├── chroma_db/         # Persistent vector store
├── Dockerfile
├── docker-compose.yml
├── nginx.conf
└── deploy.sh
```

## WHERE TO LOOK
| Task | Location | Notes |
|------|----------|-------|
| API routes | `backend/app/routers/` | Versioned `/api/v1/...` |
| Business logic | `backend/app/services/` | RAG, LLM, ASR, video |
| Schemas | `backend/app/models/` | Pydantic request/response |
| Config | `backend/app/core/config.py` | `.env` driven |
| DB init | `backend/app/core/database.py` | ChromaDB persistent |
| Frontend API | `frontend/src/lib/api.ts` | TanStack Query |
| UI components | `frontend/src/components/` | shadcn/ui + Tailwind |

## CODE MAP
- **Backend**: FastAPI app with routers (query, ingest, video, ws_asr, prompts, history), services (rag, llm_client, asr_client, video_service, query_decomposer, relevance_filter, prompt_service, history_service), Pydantic models
- **Frontend**: React 18 + TypeScript + Vite with react-resizable-panels layout, TanStack Query, SSE streaming via `queryDocumentStream()`, shadcn/ui + Tailwind components
- **Pipeline**: 3-step LLM workflow (decompose → retrieve → filter → generate) with per-sub-question organization

## CONVENTIONS
- **Backend**: `snake_case` files; routers thin, services thick; `.env` for all LLM/ASR config
- **Frontend**: PascalCase components; `lib/api.ts` single API client; TanStack Query for server state
- **API**: Path versioning `/api/v1/`; WebSocket at `/ws/asr/{video_id}`
- **RAG**: Strict prompt — answer ONLY from retrieved context; bullet-point format
- **Metadata**: Every doc chunk must have `filename`, `upload_date`, `content_summary`

### RAG Pipeline (3-Step LLM Workflow — Per-Sub-Question)

```
User Question
    ↓
[LLM Call 1] QueryDecomposer — extract 2-5 sub-questions
    ↓
[ChromaDB] Retrieve per sub-question — each sub-q independently queries ChromaDB
    ↓
[LLM Call 2] RelevanceFilter (single call) — chunks grouped by sub-q, each scored against its own sub-q
    ↓
[LLM Call 3] ResponseGeneration — markdown sections per sub-question with ## headers
```

**Per-Sub-Question Organization**:
- Retrieval: `RAGService.retrieve_per_subquestion()` queries ChromaDB once per sub-question
- Filtering: `RelevanceFilter.filter_per_subquestion()` single LLM call with sub-q grouping
- Response: `RAGService.generate_response_per_subquestion()` produces markdown sections with grouped sources
- SSE Events: `decomposed → retrieving → filtering → generating → generating_subquestion (per sub-q) → completed`
- History: XML chunks wrapped in `<sub_q>` elements; sources stored as list-of-lists JSON
- Empty decomposition fallback (Decision #13): if decomposer returns `[]`, uses `[original_question]`

## ANTI-PATTERNS (THIS PROJECT)
- Hardcode LLM URLs/keys — always `.env`
- Business logic in routers — belongs in `services/`
- Non-persistent ChromaDB — must use `chroma_db/` directory
- LLM hallucination outside retrieved context — strict RAG prompt enforced
- Plain text responses — always bullet points with source metadata
- Missing document metadata — breaks source attribution
- Add authentication — public demo only
- Mobile-first design — desktop only at this stage
- Log to console only — all backend logs must go to `backend/app/log/` directory
- Commit log files to git — log files must be `.gitignore`d

## UNIQUE STYLES
- **Dual ASR trigger**: automatic (on transcript update) + manual "Ask from Video" button
- **Layout**: Top-Left video player | Top-Right transcript + input | Bottom RAG response
- **Provider switching**: same codebase runs dev (OpenRouter/Alibaba Cloud) and prod (local vLLM)
- **Video limit**: 300MB max, MP4 + common formats

## TESTING

**Backend test directory**: `backend/app/test/`

**Naming convention** (pytest, flat structure, phase-prefixed):
```
test_phase<N>_<module_or_feature>.py
```

**Examples**:
- `test_phase1_ingest.py` — Document upload & ChromaDB ingestion
- `test_phase1_query.py` — RAG query endpoint
- `test_phase1_rag_service.py` — RAG retrieval + strict prompt logic
- `test_phase1_llm_client.py` — LLM client (mocked provider)
- `test_phase1_chunking.py` — Document chunking utils
- `test_phase1_metadata.py` — Metadata extraction
- `test_phase2_video_upload.py` — Video upload (<300MB, format validation)
- `test_phase2_asr_client.py` — ASR transcription client
- `test_phase2_ws_asr.py` — WebSocket audio streaming
- `test_phase2_query_from_video.py` — Auto/manual trigger from transcript
- `test_integration_phase1.py` — End-to-end text → RAG → answer
- `test_integration_phase2.py` — End-to-end video → ASR → RAG → answer

**Rules**:
- Use `pytest` + `pytest-asyncio` for async tests
- Mock all external LLM/ASR calls (do not hit live APIs in tests)
- Use `tmp_path` fixture for ChromaDB test instances
- Each test file must have a module-level docstring describing coverage

## SUB-PHASE DEVELOPMENT

**Workflow**: Plan → Write Test → Implement → Make Test Pass → Commit

### Sub-Phase Naming

Use decimal notation: **Phase X.Y** where X = major phase, Y = sub-phase number.

| Example | Scope |
|---------|-------|
| Phase 1.1 | Project setup, config, database |
| Phase 1.2 | Ingestion pipeline |
| Phase 1.3 | Query pipeline (3-step LLM workflow) |
| Phase 1.4 | Testing & polish |
| Phase 2.1 | Video upload backend |
| Phase 2.2 | ASR integration |

### Test-First Rule (MANDATORY)

Every sub-phase follows **test-driven delivery**:

1. **Write test first** — Before writing implementation code, write the test that defines "done"
2. **Implement** — Write the minimum code to make the test pass
3. **Run test** — Verify test passes (both unit and acceptance where applicable)
4. **Commit** — Only commit after tests pass. Never commit broken tests.
5. **Next sub-phase** — Only start next sub-phase after current is committed

**Enforcement**:
- Each Implementation Task in a sub-phase plan must list its test file(s)
- Tests must be in the `backend/app/test/` or `frontend/src/test/` directory
- Pre-commit: `pytest` must pass for backend, `npm test` for frontend

### Sub-Phase Plan Template

Each sub-phase plan (stored in `.plans/`) must include:
1. **Objective** — What this sub-phase delivers
2. **Test Files** — List of test files to write BEFORE implementation
3. **Acceptance Criteria** — List of behaviors that must work
4. **Acceptance Tests** — `test_acceptance_<subphase>.py` file(s) with real environment
5. **Implementation Tasks** — Atomic steps, each referencing its test file

### Acceptance Testing Rules

**Unit tests** (`test_phase*.py`) — mocked, fast, CI-safe
**Acceptance tests** (`test_acceptance_*.py`) — real environment, actual LLM/ASR calls

**Acceptance test requirements**:
- Run against real services (ChromaDB instance, actual LLM API, ASR if applicable)
- Name format: `test_acceptance_<subphase>_<feature>.py`
- Location: `backend/app/test/acceptance/`
- Use `pytest` markers: `@pytest.mark.acceptance` and `@pytest.mark.slow`
- Each acceptance test file must have docstring describing real environment setup
- Acceptance tests run manually before sub-phase completion, not in CI

**Example acceptance test**:
```python
"""Acceptance test: Phase 1 RAG query with real Qwen LLM.

Prerequisites:
- ChromaDB running (local or docker)
- .env configured with valid LLM_BASE_URL and LLM_API_KEY
- Test documents ingested via /api/v1/ingest
"""
import pytest

@pytest.mark.acceptance
@pytest.mark.slow
def test_query_with_real_llm():
    """Query should return bullet-point answer from actual LLM."""
    # Real HTTP call to LLM provider
    # Real ChromaDB retrieval
    pass
```

**Sub-phase completion checklist**:
- [ ] All unit tests written BEFORE implementation
- [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`)
- [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
- [ ] Code reviewed (self or peer)
- [ ] Sub-phase plan marked complete in `.plans/`
- [ ] Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests")

## COMMANDS
```bash
# Dev
backend:  uvicorn app.main:app --reload --port 8000
frontend: npm run dev

# Unit tests (mocked, CI-safe)
backend:  cd backend && pytest app/test/test_phase*.py -v

# Acceptance tests (real LLM/ASR/ChromaDB)
backend:  cd backend && pytest app/test/acceptance/ -v -m acceptance

# Prod
docker-compose up -d
./deploy.sh
```

## PLAN STORAGE

**All development plans** (including sub-plans, debug plans, and task breakdowns) **must be stored in `.plans/`**.

```
.plans/
├── development_plan.md          # Main development plan (root-level)
├── phase1_backend_plan.md       # Phase 1 backend tasks
├── phase1_frontend_plan.md      # Phase 1 frontend tasks
├── phase2_backend_plan.md       # Phase 2 backend tasks
├── phase2_frontend_plan.md      # Phase 2 frontend tasks
├── debug_<date>_<issue>.md      # Debug/diagnosis logs
└── _template.md                 # Plan template (optional)
```

**Rules**:
- Name format: `<purpose>_<optional_date>.md` (snake_case)
- Use `debug_` prefix for troubleshooting logs
- Root `development_plan.md` stays at root as canonical source
- Sub-plans reference root plan, never duplicate it

## NOTES
- No routing library specified — single-page app likely sufficient
- No client state library specified — `useState`/`useReducer` + TanStack Query
- WebSocket client not specified — may need to expand `lib/api.ts`
- shadcn/ui components are copied, not imported as npm package
- Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727