docs: add Phase 1 backend and frontend development plans

2026-04-22 15:47:27 +08:00 · 2026-04-22 15:47:27 +08:00 · 1f4e3a2572
parent be48b1d8c7
commit 1f4e3a2572
3 changed files with 395 additions and 15 deletions
--- a/.plans/phase1_backend_plan.md
+++ b/.plans/phase1_backend_plan.md
@ -0,0 +1,204 @@
 # Phase 1 Backend Development Plan
 **Source**: `development_plan.md`  
 **Scope**: FastAPI backend for text-based RAG Q&A  
 **Estimated Duration**: 3-4 days  
 **Status**: Draft
 ---
 ## Objective
 Build a complete FastAPI backend that:
 1. Accepts DOCX uploads, chunks text (1000 tokens / 200 overlap), embeds via Qwen, and stores in persistent ChromaDB with metadata
 2. Runs a 3-step RAG pipeline: query decomposition → retrieval → relevance filtering → bullet-point response
 3. Serves API endpoints for ingestion and querying with full metadata attribution
 ---
 ## Acceptance Criteria
 - [ ] `POST /api/v1/ingest` accepts DOCX, parses content, chunks at 1000/200, embeds, stores in ChromaDB with filename/upload_date/content_summary
 - [ ] `POST /api/v1/query` accepts natural language question, returns JSON with: `keywords`, `answer` (bullet points), `sources` (array of metadata objects)
 - [ ] Query pipeline executes 3 LLM calls: decomposition → relevance filter → response generation
 - [ ] All LLM/ASR configuration reads from `.env` (OpenRouter for dev)
 - [ ] ChromaDB persists to `chroma_db/` directory
 - [ ] Chunking strategy is abstracted (interface/class) for future replacement
 - [ ] All unit tests pass (`pytest app/test/test_phase1_*.py -v`)
 - [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
 ---
 ## Acceptance Tests
 **File**: `backend/app/test/acceptance/test_acceptance_phase1_ingest.py`
 - `test_ingest_docx_with_real_embedding()` — Upload DOCX, verify ChromaDB entries with metadata
 **File**: `backend/app/test/acceptance/test_acceptance_phase1_rag_query.py`
 - `test_query_with_real_llm()` — Ask question, verify 3-step pipeline produces bullet answer with sources
 - `test_query_keywords_displayed()` — Verify response includes extracted keywords
 ---
 ## Implementation Tasks
 ### Day 1: Project Setup & Core Infrastructure
 **Task 1.1**: Environment and dependencies
 - Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
 - Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
 - Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading
 **Task 1.2**: Database initialization
 - Create `backend/app/core/database.py` — ChromaDB persistent client
 - Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/`
 - Function: `get_or_create_collection(name, embedding_function)`
 **Task 1.3**: Project structure
 - Create all `__init__.py` files for package structure
 - Create `backend/app/main.py` with FastAPI app, CORS middleware
 - Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc.
 **Task 1.4**: Pydantic schemas
 - `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename`
 - `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources`
 - `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index`
 ### Day 2: Ingestion Pipeline
 **Task 2.1**: DOCX parsing
 - `utils/docx_parser.py`: `parse_docx(file_path) -> str`
 - Handle paragraphs, tables, headers
 - Return plain text with preserved paragraph breaks
 **Task 2.2**: Chunking abstraction
 - `utils/chunking.py`: Abstract base class `ChunkingStrategy`
 - `TokenChunkingStrategy` implementation using tiktoken
 - Config: chunk_size=1000, overlap=200
 - Method: `chunk(text: str) -> list[str]`
 **Task 2.3**: Metadata extraction
 - `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]`
 - Returns list of metadata dicts matching chunk count
 - Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk)
 **Task 2.4**: Embedding service
 - `services/rag.py`: `RAGService` class
 - Initialize embedding function with `qwen/qwen3-embedding-4b`
 - Method: `ingest_document(file_path, chunks, metadata_list)`
 - Store in ChromaDB collection "documents"
 **Task 2.5**: Ingest endpoint
 - `routers/ingest.py`: `POST /api/v1/ingest`
 - Accept `UploadFile` (DOCX only, validate extension)
 - Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
 - Return `IngestResponse`
 **Task 2.6**: Unit tests
 - `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes
 - `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow
 ### Day 3: Query Pipeline (3-Step)
 **Task 3.1**: LLM client
 - `services/llm_client.py`: `LLMClient` class
 - Constructor takes config from `Settings`
 - Method: `complete(prompt: str, temperature: float = 0.7) -> str`
 - Use httpx with OpenAI-compatible API format
 - Handle errors gracefully
 **Task 3.2**: Query decomposition
 - `services/query_decomposer.py`: `QueryDecomposer` class
 - Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
 - Method: `decompose(question: str) -> list[str]`
 - Parse LLM JSON response into list of keywords
 **Task 3.3**: Retrieval from ChromaDB
 - `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)`
 - Join keywords with space for query text
 - Return list of `(chunk_text, metadata, distance)` tuples
 **Task 3.4**: Relevance filtering
 - `services/relevance_filter.py`: `RelevanceFilter` class
 - Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
 - Input: list of chunks
 - Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
 - Batch all chunks in single LLM call
 **Task 3.5**: Response generation
 - `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str`
 - Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
 - Include chunk content and metadata in context
 - Enforce bullet-point format via prompt
 **Task 3.6**: Query endpoint
 - `routers/query.py`: `POST /api/v1/query`
 - Full pipeline orchestration:
  1. Call `query_decomposer.decompose()` → get keywords
  2. Call `rag.retrieve()` → get chunks
  3. Call `relevance_filter.filter()` → filter chunks
  4. Call `rag.generate_response()` → get answer
 - Return `QueryResponse` with keywords, answer, sources
 ### Day 4: Testing & Polish
 **Task 4.1**: Unit tests
 - `test_phase1_query.py`: Test full pipeline with mocked LLM calls
 - `test_phase1_llm_client.py`: Test LLM client error handling
 - `test_phase1_rag_service.py`: Test retrieval and response generation
 **Task 4.2**: Acceptance tests
 - Create real `.env` with OpenRouter credentials
 - Run `test_acceptance_phase1_ingest.py` with real embedding
 - Run `test_acceptance_phase1_rag_query.py` with real LLM calls
 - Verify keywords appear, answer is bullet format, sources have metadata
 **Task 4.3**: Error handling
 - Add try/except in all endpoints
 - Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
 - Log errors with context
 **Task 4.4**: Documentation
 - Update `AGENTS.md` if any conventions changed
 - Add docstrings to all public methods
 - Verify all imports work
 ---
 ## New Services Required
 | Service | File | Responsibility |
 |---------|------|----------------|
 | Config | `core/config.py` | `.env` loading, Settings class |
 | Database | `core/database.py` | ChromaDB persistent client |
 | LLM Client | `services/llm_client.py` | OpenAI-compatible API wrapper |
 | Query Decomposer | `services/query_decomposer.py` | Extract keywords from question |
 | Relevance Filter | `services/relevance_filter.py` | Batch score chunk relevance |
 | RAG Service | `services/rag.py` | Embedding, retrieval, response generation |
 | DOCX Parser | `utils/docx_parser.py` | Extract text from DOCX |
 | Chunking | `utils/chunking.py` | Token-based chunking with overlap |
 | Metadata | `utils/metadata.py` | Extract file metadata |
 ---
 ## Environment Variables
 ```bash
 LLM_BASE_URL=https://openrouter.ai/api/v1
 LLM_API_KEY=your_openrouter_key
 LLM_MODEL_NAME=qwen/qwen3.5-35b-a3b
 EMBEDDING_MODEL=qwen/qwen3-embedding-4b
 EMBEDDING_BASE_URL=https://openrouter.ai/api/v1
 CHROMA_DB_PATH=./chroma_db
 ```
 ---
 ## Notes
 - Chunking strategy uses ABC pattern for easy future replacement
 - Relevance filtering uses single batch call for efficiency
 - All LLM calls go through `LLMClient` for consistent error handling
 - ChromaDB collection name: "documents"
 - Metadata fields: filename, upload_date (ISO format), content_summary, chunk_index
 - Response format enforced purely through prompt engineering (no JSON schema)
--- a/.plans/phase1_frontend_plan.md
+++ b/.plans/phase1_frontend_plan.md
@ -0,0 +1,152 @@
 # Phase 1 Frontend Development Plan
 **Source**: `development_plan.md`  
 **Scope**: React 18 + TypeScript + Vite frontend for text-based RAG Q&A  
 **Estimated Duration**: 2-3 days  
 **Status**: Draft
 ---
 ## Objective
 Build a React frontend that:
 1. Pre-allocates Phase 2 grid layout (video area empty/hidden in Phase 1)
 2. Allows text input and displays extracted keywords + bullet-point RAG responses with source metadata
 3. Uses TanStack Query for type-safe API calls to the FastAPI backend
 ---
 ## Acceptance Criteria
 - [ ] Phase 2 grid layout renders: Top-Left (empty video placeholder), Top-Right (input + keywords), Bottom (response)
 - [ ] User can type a question and submit
 - [ ] Extracted keywords displayed prominently before final answer
 - [ ] Bullet-point answer displayed with source metadata (filename, upload_date)
 - [ ] Loading states for each pipeline step (keywords loading, answer loading)
 - [ ] Error handling for API failures
 - [ ] Responsive within desktop viewport (no mobile required)
 - [ ] All API calls use TanStack Query with proper caching/invalidation
 ---
 ## Acceptance Tests
 **File**: `frontend/src/test/e2e/phase1_query_flow.spec.ts` (or manual acceptance checklist)
 - User types question → sees keywords appear → sees bullet answer with sources
 - Empty state handled gracefully
 - API error shows user-friendly message
 ---
 ## Implementation Tasks
 ### Day 1: Project Setup & Layout
 1. **Project scaffold**
   - Initialize Vite project: `npm create vite@latest frontend -- --template react-ts`
   - Install dependencies: `tailwindcss`, `postcss`, `autoprefixer`, `@tanstack/react-query`, `axios`
   - Configure Tailwind CSS
   - Set up shadcn/ui (copy components or install via CLI)
 2. **API client**
   - `src/lib/api.ts` — Axios instance with base URL configuration
   - `src/lib/queries.ts` — TanStack Query hooks:
     - `useQueryDocument()` — POST /api/v1/query
     - `useIngestDocument()` — POST /api/v1/ingest
   - Type-safe request/response types matching backend Pydantic schemas
 3. **Layout structure**
   - `src/App.tsx` — Root component with Phase 2 grid pre-allocation
   - Grid layout using Tailwind CSS:
     ```
     Top-Left (50%):    VideoPlaceholder (hidden/empty in Phase 1)
     Top-Right (50%):   QueryInput + KeywordsDisplay
     Bottom (100%):     ResponsePanel
     ```
   - Use CSS Grid or Flexbox for clean separation
 ### Day 2: Components & Integration
 1. **QueryInput component**
   - `src/components/QueryInput.tsx`
   - Textarea for question input
   - Submit button with loading state
   - Calls `useQueryDocument` mutation on submit
 2. **KeywordsDisplay component**
   - `src/components/KeywordsDisplay.tsx`
   - Shows extracted keywords as tags/chips
   - Loading skeleton while keywords are being extracted
   - Animated entrance when keywords arrive
 3. **ResponsePanel component**
   - `src/components/ResponsePanel.tsx`
   - Displays bullet-point answer
   - Shows source metadata cards (filename, upload_date)
   - Loading skeleton while answer is being generated
   - Empty state when no query submitted yet
 4. **IngestPanel component (optional for Phase 1)**
   - `src/components/IngestPanel.tsx`
   - Simple file upload for DOCX
   - Progress indicator during upload
   - Success/error feedback
 5. **Error handling**
   - Global error boundary
   - Toast notifications for API errors
   - Retry mechanism for failed queries
 ### Day 3: Polish & Integration Testing
 1. **Loading states**
   - Skeleton loaders for each panel
   - Step-by-step progress indicator showing pipeline stage:
     "Extracting keywords..." → "Retrieving documents..." → "Filtering relevance..." → "Generating answer..."
 2. **Styling polish**
   - Consistent spacing and typography
   - Dark/light mode support (optional)
   - Smooth transitions between states
 3. **Integration with backend**
   - End-to-end test: upload DOCX → ask question → verify keywords + answer + sources
   - Verify CORS works correctly
   - Test error scenarios
 4. **Build verification**
   - `npm run build` succeeds
   - Production build serves correctly via `npm run preview`
 ---
 ## Dependencies
 ```json
 {
  "dependencies": {
    "react": "^18.2.0",
    "react-dom": "^18.2.0",
    "@tanstack/react-query": "^5.x",
    "axios": "^1.6.x",
    "tailwindcss": "^3.4.x",
    "lucide-react": "^0.x"
  },
  "devDependencies": {
    "@types/react": "^18.2.x",
    "@types/react-dom": "^18.2.x",
    "@vitejs/plugin-react": "^4.2.x",
    "typescript": "^5.3.x",
    "vite": "^5.0.x"
  }
 }
 ```
 ---
 ## Notes
 - Video area in Phase 1 should show a placeholder message: "Video upload coming in Phase 2" or be completely hidden
 - Keywords should be visually distinct from the final answer — consider using badges/tags
 - Source metadata cards should be collapsible to avoid cluttering the response area
 - Consider adding a "copy answer" button for convenience
--- a/development_plan.md
+++ b/development_plan.md
@ -2,7 +2,7 @@
 **Project Overview**  
 Web-based application built in two phases.  
- **Phase 1**: Text question → RAG retrieval → Point-form answer (strictly from database)  
+- **Phase 1**: Text question → query decomposition → RAG retrieval → relevance filtering → point-form answer (strictly from database)  
 - **Phase 2**: Video upload + player → real-time audio streaming → ASR transcription → question extraction → Phase 1 RAG flow  
 **Tech Stack**  
@ -14,8 +14,8 @@ Web-based application built in two phases.
    - Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727
 - **Models**:  
-  - Embedding: `qwen/qwen3-embedding-4b`  
+  - Embedding: `qwen/qwen3-embedding-4b` (via sentence-transformers, provider-switchable via `.env`)  
-  - LLM: `qwen/qwen3.5-35b-a3b`  
+  - LLM: `qwen/qwen3.5-35b-a3b` (OpenRouter for dev, local vLLM for prod)  
  - ASR: `Qwen/Qwen3-ASR-1.7B`  
 **Deployment**  
@ -58,32 +58,56 @@ app/
 - **LLM/ASR Configuration**: Backend reads from `.env` for easy switching between development (OpenRouter / Alibaba Cloud) and production (local vLLM).  
 - **RAG Database**: ChromaDB with metadata support (filename + extracted content metadata).  
- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers.  
+- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers, provider-switchable via `.env` (OpenRouter for dev, local vLLM for prod).  
- **Document Ingestion**: Via UI (project-based demo, no user authentication).  
+- **Document Ingestion**: Via UI (project-based demo, no user authentication). Supported formats: DOCX.  
 - **Chunking Strategy**: 1000 tokens per chunk, 200 token overlap. Strategy abstracted for future replacement.  
 - **Video**: MP4 and common formats, maximum 300MB.  
- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** “Ask from Video” button.  
+- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** "Ask from Video" button.  
- **UI Layout**:  
+- **UI Layout** (Phase 2 grid, pre-allocated in Phase 1):  
-  - Top-Left: Video player  
+  - Top-Left: Video player (empty in Phase 1)  
-  - Top-Right: Real-time transcript + text input box  
+  - Top-Right: Text input box + extracted keywords display  
  - Bottom Half: RAG response (bullet points with source metadata)  
 - **Authentication**: Public demo (no login required).  
 - **Mobile**: Not required at this stage.  
 - **CORS**: Standard FastAPI CORS middleware for frontend-backend communication.
 ---
 ## Phase 1: Text Question → RAG → Point-Form Answer (5-7 days)
 ### RAG Pipeline (3-Step LLM Workflow)
 ```
 User Question
    ↓
 [LLM Call 1] Extract key questions + keywords from user input
    ↓                ← keywords shown to user in UI
 [ChromaDB] Retrieve chunks using extracted keywords
    ↓
 [LLM Call 2] Single batch relevance filter — evaluate all chunks, drop irrelevant ones
    ↓
 [LLM Call 3] Generate bullet-point response from filtered chunks only
 ```
 - **Query Decomposition** (`services/query_decomposer.py`): LLM extracts key questions and search keywords from user's natural language question. Keywords are displayed to the user for transparency.  
 - **Relevance Filtering** (`services/relevance_filter.py`): Single batch LLM call receives all retrieved chunks + original question. Returns relevance verdict for each chunk. Irrelevant chunks are discarded before response generation.  
 - **Strict RAG Prompt**: Final LLM call generates bullet-point answer using ONLY filtered relevant chunks. No external knowledge allowed. Response format enforced via prompt engineering.  
 ### Backend (FastAPI)
- Dynamic configuration via `.env` (LLM base URL, API key, model names).  
+- Dynamic configuration via `.env` (LLM base URL, API key, model names, embedding provider).  
 - `services/rag.py`: Persistent ChromaDB + Qwen embedding + metadata extraction (filename, upload date, content summary).  
- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM with **strict RAG prompt** (only use retrieved context).  
+- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM.  
 - `services/query_decomposer.py`: LLM-based keyword/question extraction.  
 - `services/relevance_filter.py`: LLM-based batch relevance scoring.  
 - `utils/chunking.py`: DOCX parsing + text chunking (1000 tokens, 200 overlap). Strategy abstracted for future replacement.  
 - Endpoints:  
-  - `POST /api/v1/ingest` – Document upload and ingestion with metadata.  
+  - `POST /api/v1/ingest` – DOCX upload, parsing, chunking, embedding, and ingestion with metadata.  
-  - `POST /api/v1/query` – Question → retrieve → LLM → bullet-point response.
+  - `POST /api/v1/query` – Full 3-step pipeline: decompose → retrieve → filter → respond. Returns bullet-point answer + extracted keywords + source metadata.  
 ### Frontend (React + TS)
- Clean layout: Top-right input box, bottom response area.  
+- Phase 2 grid layout pre-allocated: Top-Left video area (empty/hidden), Top-Right input area, Bottom response area.  
 - Type-safe API calls using TanStack Query.  
 - Display extracted keywords to user (shown before final answer arrives).  
 - Display answer as clean bullet list with source metadata.
 ---
@ -137,4 +161,4 @@ app/
 **File Information**  
 - Filename: `development_plan.md`  
 - Last Updated: April 2026  
- Status: Ready for implementation
+- Status: Phase 1 clarified, ready for sub-phase planning