legco_ai_assistant/AGENTS.md

13 KiB

RAG Video Q&A — Project Knowledge Base

Generated: 2026-04-22 Updated: 2026-05-15 (Phase 4 added) Source: development_plan.md Status: Phase 1 , Phase 2 , Phase 4


OVERVIEW

RAG-powered Video Q&A web app. Phase 1: text → ChromaDB retrieval → bullet-point answer. Phase 2: video upload → real-time ASR → auto/manual RAG query. Phase 4: System Audio Capture + Listen Mic → real-time ASR → RAG. FastAPI backend + React 18 (Vite) frontend.

STRUCTURE

app/
├── backend/           # FastAPI (Python)
│   ├── app/
│   │   ├── main.py
│   │   ├── routers/      # query.py, ingest.py, video.py, ws_asr.py
│   │   ├── services/     # rag.py, llm_client.py, asr_client.py, video_service.py
│   │   ├── models/       # Pydantic schemas
│   │   ├── core/         # config.py, database.py
│   │   └── utils/        # chunking.py, metadata_extraction.py
│   ├── uploads/          # video storage (max 300MB)
│   ├── requirements.txt
│   └── .env.example
├── frontend/          # React 18 + TS + Vite
│   ├── src/
│   │   ├── components/   # shadcn/ui + custom (SourceSelector, SystemAudioCapture, MicCapture, etc.)
│   │   ├── hooks/        # useVideoASR, useMediaStreamASR, useSystemAudioASR, useMicASR, etc.
│   │   ├── pages/
│   │   ├── lib/
│   │   │   ├── api.ts    # API client (TanStack Query)
│   │   │   └── browser.ts # browser detection (isSystemAudioSupported)
│   │   └── App.tsx
│   ├── package.json
│   └── vite.config.ts
├── chroma_db/         # Persistent vector store
├── Dockerfile
├── docker-compose.yml
├── nginx.conf
└── deploy.sh

WHERE TO LOOK

Task Location Notes
API routes backend/app/routers/ Versioned /api/v1/...
Business logic backend/app/services/ RAG, LLM, ASR, video
Schemas backend/app/models/ Pydantic request/response
Config backend/app/core/config.py .env driven (incl. SYSTEM_AUDIO_ENABLED, MIC_ENABLED)
DB init backend/app/core/database.py ChromaDB persistent
Frontend API frontend/src/lib/api.ts TanStack Query
UI components frontend/src/components/ shadcn/ui + Tailwind (SourceSelector, SystemAudioCapture, MicCapture)
ASR hooks frontend/src/hooks/ useVideoASR, useMediaStreamASR, useSystemAudioASR, useMicASR
Browser detection frontend/src/lib/browser.ts isSystemAudioSupported()

CODE MAP

  • Backend: FastAPI app with routers (query, ingest, video, ws_asr, prompts, history), services (rag, llm_client, asr_client, video_service, query_decomposer, relevance_filter, prompt_service, history_service), Pydantic models
  • Frontend: React 18 + TypeScript + Vite with react-resizable-panels layout, TanStack Query, SSE streaming via queryDocumentStream(), shadcn/ui + Tailwind components, SourceSelector tabs (Upload | System Audio | Listen Mic)
  • Pipeline: 3-step LLM workflow (decompose → retrieve → filter → generate) with per-sub-question organization
  • Audio Capture (Phase 4): System Audio (getDisplayMedia) and Listen Mic (getUserMedia) pipe audio via shared useMediaStreamASR → WebSocket → DashScope realtime ASR

CONVENTIONS

  • Backend: snake_case files; routers thin, services thick; .env for all LLM/ASR config
  • Frontend: PascalCase components; lib/api.ts single API client; TanStack Query for server state
  • API: Path versioning /api/v1/; WebSocket at /ws/asr/{video_id}
  • RAG: Strict prompt — answer ONLY from retrieved context; bullet-point format
  • Metadata: Every doc chunk must have filename, upload_date, content_summary

RAG Pipeline (3-Step LLM Workflow — Per-Sub-Question)

User Question
    ↓
[LLM Call 1] QueryDecomposer — extract 2-5 sub-questions
    ↓
[ChromaDB] Retrieve per sub-question — each sub-q independently queries ChromaDB
    ↓
[LLM Call 2] RelevanceFilter (single call) — chunks grouped by sub-q, each scored against its own sub-q
    ↓
[LLM Call 3] ResponseGeneration — markdown sections per sub-question with ## headers

Per-Sub-Question Organization:

  • Retrieval: RAGService.retrieve_per_subquestion() queries ChromaDB once per sub-question
  • Filtering: RelevanceFilter.filter_per_subquestion() single LLM call with sub-q grouping
  • Response: RAGService.generate_response_per_subquestion() produces markdown sections with grouped sources
  • SSE Events: decomposed → retrieving → filtering → generating → generating_subquestion (per sub-q) → completed
  • History: XML chunks wrapped in <sub_q> elements; sources stored as list-of-lists JSON
  • Empty decomposition fallback (Decision #13): if decomposer returns [], uses [original_question]

ANTI-PATTERNS (THIS PROJECT)

  • Hardcode LLM URLs/keys — always .env
  • Business logic in routers — belongs in services/
  • Non-persistent ChromaDB — must use chroma_db/ directory
  • LLM hallucination outside retrieved context — strict RAG prompt enforced
  • Plain text responses — always bullet points with source metadata
  • Missing document metadata — breaks source attribution
  • Add authentication — public demo only
  • Mobile-first design — desktop only at this stage
  • Log to console only — all backend logs must go to backend/app/log/ directory
  • Commit log files to git — log files must be .gitignored

UNIQUE STYLES

  • Dual ASR trigger: automatic (on transcript update) + manual "Ask from Video" button
  • Layout: Top-Left video player / SystemAudioCapture / MicCapture | Top-Right transcript + input | Bottom RAG response
  • Provider switching: same codebase runs dev (OpenRouter/Alibaba Cloud) and prod (local vLLM)
  • Video limit: 300MB max, MP4 + common formats
  • Three audio sources (Phase 4): Upload (video element), System Audio (getDisplayMedia), Listen Mic (getUserMedia) — unified via shared useMediaStreamASR pipeline
  • Phase 4 ASR routing: WebSocket /ws/asr/{video_id}?source=system-audio|mic&language=yue — backend is source-agnostic

TESTING

Backend test directory: backend/app/test/

Naming convention (pytest, flat structure, phase-prefixed):

test_phase<N>_<module_or_feature>.py

Examples:

  • test_phase1_ingest.py — Document upload & ChromaDB ingestion
  • test_phase1_query.py — RAG query endpoint
  • test_phase1_rag_service.py — RAG retrieval + strict prompt logic
  • test_phase1_llm_client.py — LLM client (mocked provider)
  • test_phase1_chunking.py — Document chunking utils
  • test_phase1_metadata.py — Metadata extraction
  • test_phase2_video_upload.py — Video upload (<300MB, format validation)
  • test_phase2_asr_client.py — ASR transcription client
  • test_phase2_ws_asr.py — WebSocket audio streaming
  • test_phase2_query_from_video.py — Auto/manual trigger from transcript
  • test_phase4_config.py — System audio & mic capture feature toggles
  • test_phase4_* (frontend) — useSystemAudioASR, useMicASR, SystemAudioCapture, MicCapture, LTTPage integration
  • test_integration_phase1.py — End-to-end text → RAG → answer
  • test_integration_phase2.py — End-to-end video → ASR → RAG → answer
  • test_integration_phase4.py — End-to-end WebSocket with system-audio/mic sources

Testing Rules (Python Backend):

  • Prefer integration tests over unit tests with mocks for all backend features and API routes.
  • Use real application via TestClient (FastAPI). Never mock the database or internal services.
  • Use existing test database fixtures and conftest.py. Only mock truly external third-party APIs (LLM, ASR).
  • Match the exact style and imports of existing tests in the tests/ directory.
  • Always run pytest after writing tests and iterate until they pass against the real system.
  • Each test file must have a module-level docstring describing coverage.

SUB-PHASE DEVELOPMENT

Workflow: Plan → Write Test → Implement → Make Test Pass → Commit

Sub-Phase Naming

Use decimal notation: Phase X.Y where X = major phase, Y = sub-phase number.

Example Scope
Phase 1.1 Project setup, config, database
Phase 1.2 Ingestion pipeline
Phase 1.3 Query pipeline (3-step LLM workflow)
Phase 1.4 Testing & polish
Phase 2.1 Video upload backend
Phase 2.2 ASR integration

Test-First Rule (MANDATORY)

Every sub-phase follows test-driven delivery:

  1. Write test first — Before writing implementation code, write the test that defines "done"
  2. Implement — Write the minimum code to make the test pass
  3. Run test — Verify test passes (both integration and acceptance where applicable)
  4. Commit — Only commit after tests pass. Never commit broken tests.
  5. Next sub-phase — Only start next sub-phase after current is committed

Enforcement:

  • Each Implementation Task in a sub-phase plan must list its test file(s)
  • Tests must be in the backend/app/test/ or frontend/src/test/ directory
  • Pre-commit: pytest must pass for backend, pnpm test for frontend

Sub-Phase Plan Template

Each sub-phase plan (stored in .plans/) must include:

  1. Objective — What this sub-phase delivers
  2. Test Files — List of test files to write BEFORE implementation
  3. Acceptance Criteria — List of behaviors that must work
  4. Acceptance Teststest_acceptance_<subphase>.py file(s) with real environment
  5. Implementation Tasks — Atomic steps, each referencing its test file

Acceptance Testing Rules

Integration tests (test_phase*.py) — TestClient + real DB, only external APIs mocked, CI-safe Acceptance tests (test_acceptance_*.py) — real environment, actual LLM/ASR calls

Acceptance test requirements:

  • Run against real services (ChromaDB instance, actual LLM API, ASR if applicable)
  • Name format: test_acceptance_<subphase>_<feature>.py
  • Location: backend/app/test/acceptance/
  • Use pytest markers: @pytest.mark.acceptance and @pytest.mark.slow
  • Each acceptance test file must have docstring describing real environment setup
  • Acceptance tests run manually before sub-phase completion, not in CI

Example acceptance test:

"""Acceptance test: Phase 1 RAG query with real Qwen LLM.

Prerequisites:
- ChromaDB running (local or docker)
- .env configured with valid LLM_BASE_URL and LLM_API_KEY
- Test documents ingested via /api/v1/ingest
"""
import pytest

@pytest.mark.acceptance
@pytest.mark.slow
def test_query_with_real_llm():
    """Query should return bullet-point answer from actual LLM."""
    # Real HTTP call to LLM provider
    # Real ChromaDB retrieval
    pass

Sub-phase completion checklist:

  • All integration tests written BEFORE implementation
  • All integration tests pass (pytest app/test/test_phase*.py -v)
  • All acceptance tests pass (pytest app/test/acceptance/ -v -m acceptance)
  • Code reviewed (self or peer)
  • Sub-phase plan marked complete in .plans/
  • Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests")

COMMANDS

# Dev
backend:  uvicorn app.main:app --reload --port 8000
frontend: pnpm run dev

# Integration tests (TestClient, real DB, only external APIs mocked)
backend:  cd backend && pytest app/test/test_phase*.py -v

# Acceptance tests (real LLM/ASR/ChromaDB)
backend:  cd backend && pytest app/test/acceptance/ -v -m acceptance

# Prod
docker-compose up -d
./deploy.sh

PLAN STORAGE

All development plans (including sub-plans, debug plans, and task breakdowns) must be stored in .plans/.

.plans/
├── development_plan.md          # Main development plan (root-level)
├── phase1_backend_plan.md       # Phase 1 backend tasks
├── phase1_frontend_plan.md      # Phase 1 frontend tasks
├── phase2_backend_plan.md       # Phase 2 backend tasks
├── phase2_frontend_plan.md      # Phase 2 frontend tasks
├── debug_<date>_<issue>.md      # Debug/diagnosis logs
└── _template.md                 # Plan template (optional)

Rules:

  • Name format: <purpose>_<optional_date>.md (snake_case)
  • Use debug_ prefix for troubleshooting logs
  • Root development_plan.md stays at root as canonical source
  • Sub-plans reference root plan, never duplicate it

NOTES