init: project setup with AGENTS.md, test structure, and plan directory

2026-04-22 15:22:29 +08:00 · 2026-04-22 15:22:29 +08:00 · 3c2d647943
commit 3c2d647943
13 changed files with 547 additions and 0 deletions
--- a/.env.txt
+++ b/.env.txt
@ -0,0 +1 @@
+ALIBABA=sk-e84c76a30243448dadfd6eab6d90c3f2
--- a/AGENTS.md
+++ b/AGENTS.md
@ -0,0 +1,148 @@
+# RAG Video Q&A — Project Knowledge Base
+
+**Generated:** 2026-04-22
+**Source:** development_plan.md
+**Status:** Greenfield (no code yet)
+
+---
+
+## OVERVIEW
+RAG-powered Video Q&A web app. Phase 1: text → ChromaDB retrieval → bullet-point answer. Phase 2: video upload → real-time ASR → auto/manual RAG query. FastAPI backend + React 18 (Vite) frontend.
+
+## STRUCTURE
+```
+app/
+├── backend/           # FastAPI (Python)
+│   ├── app/
+│   │   ├── main.py
+│   │   ├── routers/      # query.py, ingest.py, video.py, ws_asr.py
+│   │   ├── services/     # rag.py, llm_client.py, asr_client.py, video_service.py
+│   │   ├── models/       # Pydantic schemas
+│   │   ├── core/         # config.py, database.py
+│   │   └── utils/        # chunking.py, metadata_extraction.py
+│   ├── uploads/          # video storage (max 300MB)
+│   ├── requirements.txt
+│   └── .env.example
+├── frontend/          # React 18 + TS + Vite
+│   ├── src/
+│   │   ├── components/   # shadcn/ui + custom
+│   │   ├── pages/
+│   │   ├── lib/
+│   │   │   └── api.ts    # API client (TanStack Query)
+│   │   └── App.tsx
+│   ├── package.json
+│   └── vite.config.ts
+├── chroma_db/         # Persistent vector store
+├── Dockerfile
+├── docker-compose.yml
+├── nginx.conf
+└── deploy.sh
+```
+
+## WHERE TO LOOK
+| Task | Location | Notes |
+|------|----------|-------|
+| API routes | `backend/app/routers/` | Versioned `/api/v1/...` |
+| Business logic | `backend/app/services/` | RAG, LLM, ASR, video |
+| Schemas | `backend/app/models/` | Pydantic request/response |
+| Config | `backend/app/core/config.py` | `.env` driven |
+| DB init | `backend/app/core/database.py` | ChromaDB persistent |
+| Frontend API | `frontend/src/lib/api.ts` | TanStack Query |
+| UI components | `frontend/src/components/` | shadcn/ui + Tailwind |
+
+## CODE MAP
+*Greenfield — no code yet. See development_plan.md for full specification.*
+
+## CONVENTIONS
+- **Backend**: `snake_case` files; routers thin, services thick; `.env` for all LLM/ASR config
+- **Frontend**: PascalCase components; `lib/api.ts` single API client; TanStack Query for server state
+- **API**: Path versioning `/api/v1/`; WebSocket at `/ws/asr/{video_id}`
+- **RAG**: Strict prompt — answer ONLY from retrieved context; bullet-point format
+- **Metadata**: Every doc chunk must have `filename`, `upload_date`, `content_summary`
+
+## ANTI-PATTERNS (THIS PROJECT)
+- Hardcode LLM URLs/keys — always `.env`
+- Business logic in routers — belongs in `services/`
+- Non-persistent ChromaDB — must use `chroma_db/` directory
+- LLM hallucination outside retrieved context — strict RAG prompt enforced
+- Plain text responses — always bullet points with source metadata
+- Missing document metadata — breaks source attribution
+- Add authentication — public demo only
+- Mobile-first design — desktop only at this stage
+
+## UNIQUE STYLES
+- **Dual ASR trigger**: automatic (on transcript update) + manual "Ask from Video" button
+- **Layout**: Top-Left video player | Top-Right transcript + input | Bottom RAG response
+- **Provider switching**: same codebase runs dev (OpenRouter/Alibaba Cloud) and prod (local vLLM)
+- **Video limit**: 300MB max, MP4 + common formats
+
+## TESTING
+
+**Backend test directory**: `backend/app/test/`
+
+**Naming convention** (pytest, flat structure, phase-prefixed):
+```
+test_phase<N>_<module_or_feature>.py
+```
+
+**Examples**:
+- `test_phase1_ingest.py` — Document upload & ChromaDB ingestion
+- `test_phase1_query.py` — RAG query endpoint
+- `test_phase1_rag_service.py` — RAG retrieval + strict prompt logic
+- `test_phase1_llm_client.py` — LLM client (mocked provider)
+- `test_phase1_chunking.py` — Document chunking utils
+- `test_phase1_metadata.py` — Metadata extraction
+- `test_phase2_video_upload.py` — Video upload (<300MB, format validation)
+- `test_phase2_asr_client.py` — ASR transcription client
+- `test_phase2_ws_asr.py` — WebSocket audio streaming
+- `test_phase2_query_from_video.py` — Auto/manual trigger from transcript
+- `test_integration_phase1.py` — End-to-end text → RAG → answer
+- `test_integration_phase2.py` — End-to-end video → ASR → RAG → answer
+
+**Rules**:
+- Use `pytest` + `pytest-asyncio` for async tests
+- Mock all external LLM/ASR calls (do not hit live APIs in tests)
+- Use `tmp_path` fixture for ChromaDB test instances
+- Each test file must have a module-level docstring describing coverage
+
+## COMMANDS
+```bash
+# Dev
+backend:  uvicorn app.main:app --reload --port 8000
+frontend: npm run dev
+
+# Test
+backend:  cd backend && pytest app/test/ -v
+
+# Prod
+docker-compose up -d
+./deploy.sh
+```
+
+## PLAN STORAGE
+
+**All development plans** (including sub-plans, debug plans, and task breakdowns) **must be stored in `.plans/`**.
+
+```
+.plans/
+├── development_plan.md          # Main development plan (root-level)
+├── phase1_backend_plan.md       # Phase 1 backend tasks
+├── phase1_frontend_plan.md      # Phase 1 frontend tasks
+├── phase2_backend_plan.md       # Phase 2 backend tasks
+├── phase2_frontend_plan.md      # Phase 2 frontend tasks
+├── debug_<date>_<issue>.md      # Debug/diagnosis logs
+└── _template.md                 # Plan template (optional)
+```
+
+**Rules**:
+- Name format: `<purpose>_<optional_date>.md` (snake_case)
+- Use `debug_` prefix for troubleshooting logs
+- Root `development_plan.md` stays at root as canonical source
+- Sub-plans reference root plan, never duplicate it
+
+## NOTES
+- No routing library specified — single-page app likely sufficient
+- No client state library specified — `useState`/`useReducer` + TanStack Query
+- WebSocket client not specified — may need to expand `lib/api.ts`
+- shadcn/ui components are copied, not imported as npm package
+- Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727
--- a/backend/app/test/conftest.py
+++ b/backend/app/test/conftest.py
@ -0,0 +1,23 @@
+"""Shared pytest fixtures for backend tests.
+
+All external LLM/ASR calls must be mocked. Use tmp_path for ChromaDB instances.
+"""
+import pytest
+
+
+@pytest.fixture
+def mock_llm_client(monkeypatch):
+    """Mock LLM client to avoid hitting live APIs."""
+    pass  # TODO: implement mock
+
+
+@pytest.fixture
+def mock_asr_client(monkeypatch):
+    """Mock ASR client to avoid hitting live APIs."""
+    pass  # TODO: implement mock
+
+
+@pytest.fixture
+def chroma_test_dir(tmp_path):
+    """Provide a temporary directory for isolated ChromaDB instances."""
+    return tmp_path / "chroma_test"
--- a/backend/app/test/test_phase1_chunking.py
+++ b/backend/app/test/test_phase1_chunking.py
@ -0,0 +1,24 @@
+"""Phase 1 tests: Document chunking utilities.
+
+Covers:
+- Text splitting strategies
+- Chunk size and overlap parameters
+- Handling of different document formats
+"""
+import pytest
+
+
+class TestChunking:
+    """Document chunking utility tests."""
+
+    def test_chunk_size_limit(self):
+        """Should respect maximum chunk size."""
+        pass  # TODO: implement
+
+    def test_chunk_overlap(self):
+        """Should include overlap between adjacent chunks."""
+        pass  # TODO: implement
+
+    def test_empty_document(self):
+        """Should handle empty or whitespace-only documents."""
+        pass  # TODO: implement
--- a/backend/app/test/test_phase1_ingest.py
+++ b/backend/app/test/test_phase1_ingest.py
@ -0,0 +1,29 @@
+"""Phase 1 tests: Document ingestion endpoint.
+
+Covers:
+- POST /api/v1/ingest with valid documents
+- Metadata extraction (filename, upload_date, content_summary)
+- ChromaDB persistence with embeddings
+- Error handling for unsupported file types
+"""
+import pytest
+
+
+class TestIngest:
+    """Document upload and ChromaDB ingestion tests."""
+
+    def test_ingest_pdf_success(self):
+        """Should ingest PDF and return document ID with metadata."""
+        pass  # TODO: implement
+
+    def test_ingest_txt_success(self):
+        """Should ingest plain text and chunk correctly."""
+        pass  # TODO: implement
+
+    def test_ingest_metadata_extraction(self):
+        """Should extract filename, upload_date, content_summary."""
+        pass  # TODO: implement
+
+    def test_ingest_unsupported_format(self):
+        """Should reject unsupported file formats."""
+        pass  # TODO: implement
--- a/backend/app/test/test_phase1_llm_client.py
+++ b/backend/app/test/test_phase1_llm_client.py
@ -0,0 +1,25 @@
+"""Phase 1 tests: LLM client.
+
+Covers:
+- OpenAI-compatible API client for Qwen LLM
+- Provider switching via .env (OpenRouter, Alibaba Cloud, vLLM)
+- Error handling for API failures
+- Mocked responses in test mode
+"""
+import pytest
+
+
+class TestLLMClient:
+    """LLM client tests (all external calls mocked)."""
+
+    def test_llm_call_success(self, mock_llm_client):
+        """Should return structured response from mocked LLM."""
+        pass  # TODO: implement
+
+    def test_llm_provider_switching(self):
+        """Should switch base URL based on .env config."""
+        pass  # TODO: implement
+
+    def test_llm_api_error_handling(self):
+        """Should handle HTTP errors from LLM provider."""
+        pass  # TODO: implement
--- a/backend/app/test/test_phase1_metadata.py
+++ b/backend/app/test/test_phase1_metadata.py
@ -0,0 +1,25 @@
+"""Phase 1 tests: Metadata extraction utilities.
+
+Covers:
+- Filename extraction
+- Upload date generation
+- Content summary generation
+- Metadata schema validation
+"""
+import pytest
+
+
+class TestMetadata:
+    """Metadata extraction utility tests."""
+
+    def test_extract_filename(self):
+        """Should extract clean filename from path."""
+        pass  # TODO: implement
+
+    def test_generate_upload_date(self):
+        """Should generate ISO format upload date."""
+        pass  # TODO: implement
+
+    def test_content_summary(self):
+        """Should generate concise content summary."""
+        pass  # TODO: implement
--- a/backend/app/test/test_phase1_query.py
+++ b/backend/app/test/test_phase1_query.py
@ -0,0 +1,25 @@
+"""Phase 1 tests: RAG query endpoint.
+
+Covers:
+- POST /api/v1/query question → retrieve → LLM → bullet-point response
+- Strict RAG prompt enforcement (only use retrieved context)
+- Bullet-point response format
+- Source metadata inclusion
+"""
+import pytest
+
+
+class TestQuery:
+    """RAG query endpoint tests."""
+
+    def test_query_returns_bullets(self):
+        """Should return bullet-point answer with source metadata."""
+        pass  # TODO: implement
+
+    def test_query_strict_rag_no_hallucination(self):
+        """Should refuse to answer when no relevant context retrieved."""
+        pass  # TODO: implement
+
+    def test_query_includes_source_metadata(self):
+        """Should include filename, upload_date in response."""
+        pass  # TODO: implement
--- a/backend/app/test/test_phase1_rag_service.py
+++ b/backend/app/test/test_phase1_rag_service.py
@ -0,0 +1,25 @@
+"""Phase 1 tests: RAG service logic.
+
+Covers:
+- ChromaDB retrieval with Qwen embeddings
+- Context assembly for LLM prompt
+- Strict prompt construction (answer ONLY from retrieved context)
+- Metadata handling per chunk
+"""
+import pytest
+
+
+class TestRAGService:
+    """RAG retrieval and prompt logic tests."""
+
+    def test_retrieve_relevant_chunks(self):
+        """Should retrieve semantically relevant chunks from ChromaDB."""
+        pass  # TODO: implement
+
+    def test_strict_prompt_format(self):
+        """Should construct prompt forbidding external knowledge."""
+        pass  # TODO: implement
+
+    def test_chunk_metadata_preserved(self):
+        """Should preserve filename, upload_date, content_summary per chunk."""
+        pass  # TODO: implement
--- a/backend/app/test/test_phase2_asr_client.py
+++ b/backend/app/test/test_phase2_asr_client.py
@ -0,0 +1,25 @@
+"""Phase 2 tests: ASR transcription client.
+
+Covers:
+- Integration with Qwen/Qwen3-ASR-1.7B
+- File upload vs audio content input
+- Error handling for transcription failures
+- Mocked responses in test mode
+"""
+import pytest
+
+
+class TestASRClient:
+    """ASR client tests (all external calls mocked)."""
+
+    def test_asr_transcribe_audio(self, mock_asr_client):
+        """Should return transcript from mocked ASR."""
+        pass  # TODO: implement
+
+    def test_asr_file_upload_mode(self):
+        """Should support file path input."""
+        pass  # TODO: implement
+
+    def test_asr_audio_content_mode(self):
+        """Should support raw audio bytes input."""
+        pass  # TODO: implement
--- a/backend/app/test/test_phase2_video_upload.py
+++ b/backend/app/test/test_phase2_video_upload.py
@ -0,0 +1,29 @@
+"""Phase 2 tests: Video upload endpoint.
+
+Covers:
+- POST /api/v1/upload-video with size validation (<300MB)
+- Format validation (MP4 and common formats)
+- Static file serving
+- Error handling for oversized/invalid files
+"""
+import pytest
+
+
+class TestVideoUpload:
+    """Video upload endpoint tests."""
+
+    def test_upload_mp4_success(self):
+        """Should accept valid MP4 under 300MB."""
+        pass  # TODO: implement
+
+    def test_upload_size_limit(self):
+        """Should reject files over 300MB."""
+        pass  # TODO: implement
+
+    def test_upload_invalid_format(self):
+        """Should reject non-video formats."""
+        pass  # TODO: implement
+
+    def test_static_file_serving(self):
+        """Should serve uploaded video via static URL."""
+        pass  # TODO: implement
--- a/backend/app/test/test_phase2_ws_asr.py
+++ b/backend/app/test/test_phase2_ws_asr.py
@ -0,0 +1,28 @@
+"""Phase 2 tests: WebSocket ASR streaming.
+
+Covers:
+- /ws/asr/{video_id} connection lifecycle
+- Real-time audio chunk streaming
+- Transcript accumulation
+- Connection cleanup on disconnect
+"""
+import pytest
+
+
+class TestWebSocketASR:
+    """WebSocket ASR streaming tests."""
+
+    @pytest.mark.asyncio
+    async def test_ws_connection_established(self):
+        """Should accept WebSocket connection with valid video_id."""
+        pass  # TODO: implement
+
+    @pytest.mark.asyncio
+    async def test_ws_audio_chunk_streaming(self):
+        """Should process audio chunks and return transcripts."""
+        pass  # TODO: implement
+
+    @pytest.mark.asyncio
+    async def test_ws_disconnect_cleanup(self):
+        """Should clean up resources on client disconnect."""
+        pass  # TODO: implement
--- a/development_plan.md
+++ b/development_plan.md
@ -0,0 +1,140 @@
+# RAG Video Q&A Web Application - Development Plan
+
+**Project Overview**  
+Web-based application built in two phases.  
+- **Phase 1**: Text question → RAG retrieval → Point-form answer (strictly from database)  
+- **Phase 2**: Video upload + player → real-time audio streaming → ASR transcription → question extraction → Phase 1 RAG flow  
+
+**Tech Stack**  
+- **Backend**: Python + FastAPI (REST + WebSocket)  
+- **Frontend**: TypeScript + React 18 (Vite) + shadcn/ui + Tailwind CSS  
+- **Server**: Linux Ubuntu 22.04  
+- **RAG Database**: ChromaDB (persistent)  
+- **LLM/ASR Integration**: Dynamic via `.env` (supports local vLLM, OpenRouter, Alibaba Cloud)  
+    - Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727
+
+- **Models**:  
+  - Embedding: `qwen/qwen3-embedding-4b`  
+  - LLM: `qwen/qwen3.5-35b-a3b`  
+  - ASR: `Qwen/Qwen3-ASR-1.7B`  
+
+**Deployment**  
+- Development: Simple commands (`uvicorn` + `npm run dev`)  
+- Production: Docker + Nginx  
+
+---
+
+## Project Structure (Monorepo)
+app/
+├── backend/                  # FastAPI
+│   ├── app/
+│   │   ├── main.py
+│   │   ├── routers/          # query.py, ingest.py, video.py, ws_asr.py
+│   │   ├── services/         # rag.py, llm_client.py, asr_client.py, video_service.py
+│   │   ├── models/           # Pydantic schemas
+│   │   ├── core/             # config.py, database.py
+│   │   └── utils/            # chunking, metadata extraction
+│   ├── uploads/              # video storage (max 300MB)
+│   ├── requirements.txt
+│   └── .env.example
+├── frontend/                 # React + TypeScript (Vite)
+│   ├── src/
+│   │   ├── components/
+│   │   ├── pages/
+│   │   ├── lib/              # api.ts
+│   │   └── App.tsx
+│   ├── package.json
+│   └── vite.config.ts
+├── chroma_db/                # Persistent vector store
+├── Dockerfile
+├── docker-compose.yml
+├── nginx.conf
+└── deploy.sh
+
+
+---
+
+## Key Requirements Incorporated
+
+- **LLM/ASR Configuration**: Backend reads from `.env` for easy switching between development (OpenRouter / Alibaba Cloud) and production (local vLLM).  
+- **RAG Database**: ChromaDB with metadata support (filename + extracted content metadata).  
+- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers.  
+- **Document Ingestion**: Via UI (project-based demo, no user authentication).  
+- **Video**: MP4 and common formats, maximum 300MB.  
+- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** “Ask from Video” button.  
+- **UI Layout**:  
+  - Top-Left: Video player  
+  - Top-Right: Real-time transcript + text input box  
+  - Bottom Half: RAG response (bullet points with source metadata)  
+- **Authentication**: Public demo (no login required).  
+- **Mobile**: Not required at this stage.  
+
+---
+
+## Phase 1: Text Question → RAG → Point-Form Answer (5-7 days)
+
+### Backend (FastAPI)
+- Dynamic configuration via `.env` (LLM base URL, API key, model names).  
+- `services/rag.py`: Persistent ChromaDB + Qwen embedding + metadata extraction (filename, upload date, content summary).  
+- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM with **strict RAG prompt** (only use retrieved context).  
+- Endpoints:  
+  - `POST /api/v1/ingest` – Document upload and ingestion with metadata.  
+  - `POST /api/v1/query` – Question → retrieve → LLM → bullet-point response.
+
+### Frontend (React + TS)
+- Clean layout: Top-right input box, bottom response area.  
+- Type-safe API calls using TanStack Query.  
+- Display answer as clean bullet list with source metadata.
+
+---
+
+## Phase 2: Video Upload + Real-Time ASR → RAG (8-10 days)
+
+### Backend Additions
+- Video upload (`POST /api/v1/upload-video`) with size/format validation (<300MB).  
+- Static file serving for videos.  
+- WebSocket `/ws/asr/{video_id}` for real-time audio chunk streaming.  
+- ASR integration with `Qwen/Qwen3-ASR-1.7B` (file upload or audio content).  
+- Question extraction via LLM, then trigger Phase 1 RAG (auto + manual support).
+
+### Frontend Additions
+- Drag & drop video upload + progress.  
+- Video player (`<video controls>`).  
+- Live transcript display (scrolling box).  
+- Top-Left: Video player | Top-Right: Live transcript + manual input.  
+- Bottom: RAG response panel.  
+- Support both automatic “Ask” on transcript updates and manual button.
+
+---
+
+## Development Timeline
+
+| Phase                        | Duration     | Key Deliverables |
+|-----------------------------|--------------|------------------|
+| Setup + Phase 1 Backend     | 3-4 days     | FastAPI + Chroma + Metadata + LLM client |
+| Phase 1 Frontend            | 2-3 days     | UI layout + text query flow |
+| Phase 2 Backend             | 4-5 days     | Video upload + WebSocket ASR + question extraction |
+| Phase 2 Frontend            | 3-4 days     | Video player + live transcript + auto/manual flow |
+| Testing & Polish            | 1-2 days     | End-to-end testing + deployment scripts |
+
+**Total Estimated Effort**: 13-17 developer days (2-3 weeks)
+
+---
+
+## Deployment Strategy
+
+**Development**:
+- Backend: `cd backend && uvicorn app.main:app --reload --port 8000`
+- Frontend: `cd frontend && npm run dev`
+
+**Production**:
+- Use `docker-compose up -d` (includes backend, built frontend, Nginx reverse proxy).
+- Simple `deploy.sh` script for building and restarting.
+
+
+---
+
+**File Information**  
+- Filename: `development_plan.md`  
+- Last Updated: April 2026  
+- Status: Ready for implementation