From 1f4e3a257262d7b3539b902c5ebb36a754e5be87 Mon Sep 17 00:00:00 2001
From: Woody <woody.ck.tse@gmail.com>
Date: Wed, 22 Apr 2026 15:47:27 +0800
Subject: [PATCH] docs: add Phase 1 backend and frontend development plans

---
 .plans/phase1_backend_plan.md  | 204 +++++++++++++++++++++++++++++++++
 .plans/phase1_frontend_plan.md | 152 ++++++++++++++++++++++++
 development_plan.md            |  54 ++++++---
 3 files changed, 395 insertions(+), 15 deletions(-)
 create mode 100644 .plans/phase1_backend_plan.md
 create mode 100644 .plans/phase1_frontend_plan.md

diff --git a/.plans/phase1_backend_plan.md b/.plans/phase1_backend_plan.md
new file mode 100644
index 0000000..4cf60c2
--- /dev/null
+++ b/.plans/phase1_backend_plan.md
@@ -0,0 +1,204 @@
+# Phase 1 Backend Development Plan
+
+**Source**: `development_plan.md`  
+**Scope**: FastAPI backend for text-based RAG Q&A  
+**Estimated Duration**: 3-4 days  
+**Status**: Draft
+
+---
+
+## Objective
+
+Build a complete FastAPI backend that:
+1. Accepts DOCX uploads, chunks text (1000 tokens / 200 overlap), embeds via Qwen, and stores in persistent ChromaDB with metadata
+2. Runs a 3-step RAG pipeline: query decomposition → retrieval → relevance filtering → bullet-point response
+3. Serves API endpoints for ingestion and querying with full metadata attribution
+
+---
+
+## Acceptance Criteria
+
+- [ ] `POST /api/v1/ingest` accepts DOCX, parses content, chunks at 1000/200, embeds, stores in ChromaDB with filename/upload_date/content_summary
+- [ ] `POST /api/v1/query` accepts natural language question, returns JSON with: `keywords`, `answer` (bullet points), `sources` (array of metadata objects)
+- [ ] Query pipeline executes 3 LLM calls: decomposition → relevance filter → response generation
+- [ ] All LLM/ASR configuration reads from `.env` (OpenRouter for dev)
+- [ ] ChromaDB persists to `chroma_db/` directory
+- [ ] Chunking strategy is abstracted (interface/class) for future replacement
+- [ ] All unit tests pass (`pytest app/test/test_phase1_*.py -v`)
+- [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
+
+---
+
+## Acceptance Tests
+
+**File**: `backend/app/test/acceptance/test_acceptance_phase1_ingest.py`
+- `test_ingest_docx_with_real_embedding()` — Upload DOCX, verify ChromaDB entries with metadata
+
+**File**: `backend/app/test/acceptance/test_acceptance_phase1_rag_query.py`
+- `test_query_with_real_llm()` — Ask question, verify 3-step pipeline produces bullet answer with sources
+- `test_query_keywords_displayed()` — Verify response includes extracted keywords
+
+---
+
+## Implementation Tasks
+
+### Day 1: Project Setup & Core Infrastructure
+
+**Task 1.1**: Environment and dependencies
+- Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
+- Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
+- Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading
+
+**Task 1.2**: Database initialization
+- Create `backend/app/core/database.py` — ChromaDB persistent client
+- Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/`
+- Function: `get_or_create_collection(name, embedding_function)`
+
+**Task 1.3**: Project structure
+- Create all `__init__.py` files for package structure
+- Create `backend/app/main.py` with FastAPI app, CORS middleware
+- Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc.
+
+**Task 1.4**: Pydantic schemas
+- `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename`
+- `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources`
+- `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index`
+
+### Day 2: Ingestion Pipeline
+
+**Task 2.1**: DOCX parsing
+- `utils/docx_parser.py`: `parse_docx(file_path) -> str`
+- Handle paragraphs, tables, headers
+- Return plain text with preserved paragraph breaks
+
+**Task 2.2**: Chunking abstraction
+- `utils/chunking.py`: Abstract base class `ChunkingStrategy`
+- `TokenChunkingStrategy` implementation using tiktoken
+- Config: chunk_size=1000, overlap=200
+- Method: `chunk(text: str) -> list[str]`
+
+**Task 2.3**: Metadata extraction
+- `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]`
+- Returns list of metadata dicts matching chunk count
+- Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk)
+
+**Task 2.4**: Embedding service
+- `services/rag.py`: `RAGService` class
+- Initialize embedding function with `qwen/qwen3-embedding-4b`
+- Method: `ingest_document(file_path, chunks, metadata_list)`
+- Store in ChromaDB collection "documents"
+
+**Task 2.5**: Ingest endpoint
+- `routers/ingest.py`: `POST /api/v1/ingest`
+- Accept `UploadFile` (DOCX only, validate extension)
+- Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
+- Return `IngestResponse`
+
+**Task 2.6**: Unit tests
+- `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes
+- `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow
+
+### Day 3: Query Pipeline (3-Step)
+
+**Task 3.1**: LLM client
+- `services/llm_client.py`: `LLMClient` class
+- Constructor takes config from `Settings`
+- Method: `complete(prompt: str, temperature: float = 0.7) -> str`
+- Use httpx with OpenAI-compatible API format
+- Handle errors gracefully
+
+**Task 3.2**: Query decomposition
+- `services/query_decomposer.py`: `QueryDecomposer` class
+- Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
+- Method: `decompose(question: str) -> list[str]`
+- Parse LLM JSON response into list of keywords
+
+**Task 3.3**: Retrieval from ChromaDB
+- `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)`
+- Join keywords with space for query text
+- Return list of `(chunk_text, metadata, distance)` tuples
+
+**Task 3.4**: Relevance filtering
+- `services/relevance_filter.py`: `RelevanceFilter` class
+- Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
+- Input: list of chunks
+- Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
+- Batch all chunks in single LLM call
+
+**Task 3.5**: Response generation
+- `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str`
+- Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
+- Include chunk content and metadata in context
+- Enforce bullet-point format via prompt
+
+**Task 3.6**: Query endpoint
+- `routers/query.py`: `POST /api/v1/query`
+- Full pipeline orchestration:
+  1. Call `query_decomposer.decompose()` → get keywords
+  2. Call `rag.retrieve()` → get chunks
+  3. Call `relevance_filter.filter()` → filter chunks
+  4. Call `rag.generate_response()` → get answer
+- Return `QueryResponse` with keywords, answer, sources
+
+### Day 4: Testing & Polish
+
+**Task 4.1**: Unit tests
+- `test_phase1_query.py`: Test full pipeline with mocked LLM calls
+- `test_phase1_llm_client.py`: Test LLM client error handling
+- `test_phase1_rag_service.py`: Test retrieval and response generation
+
+**Task 4.2**: Acceptance tests
+- Create real `.env` with OpenRouter credentials
+- Run `test_acceptance_phase1_ingest.py` with real embedding
+- Run `test_acceptance_phase1_rag_query.py` with real LLM calls
+- Verify keywords appear, answer is bullet format, sources have metadata
+
+**Task 4.3**: Error handling
+- Add try/except in all endpoints
+- Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
+- Log errors with context
+
+**Task 4.4**: Documentation
+- Update `AGENTS.md` if any conventions changed
+- Add docstrings to all public methods
+- Verify all imports work
+
+---
+
+## New Services Required
+
+| Service | File | Responsibility |
+|---------|------|----------------|
+| Config | `core/config.py` | `.env` loading, Settings class |
+| Database | `core/database.py` | ChromaDB persistent client |
+| LLM Client | `services/llm_client.py` | OpenAI-compatible API wrapper |
+| Query Decomposer | `services/query_decomposer.py` | Extract keywords from question |
+| Relevance Filter | `services/relevance_filter.py` | Batch score chunk relevance |
+| RAG Service | `services/rag.py` | Embedding, retrieval, response generation |
+| DOCX Parser | `utils/docx_parser.py` | Extract text from DOCX |
+| Chunking | `utils/chunking.py` | Token-based chunking with overlap |
+| Metadata | `utils/metadata.py` | Extract file metadata |
+
+---
+
+## Environment Variables
+
+```bash
+LLM_BASE_URL=https://openrouter.ai/api/v1
+LLM_API_KEY=your_openrouter_key
+LLM_MODEL_NAME=qwen/qwen3.5-35b-a3b
+EMBEDDING_MODEL=qwen/qwen3-embedding-4b
+EMBEDDING_BASE_URL=https://openrouter.ai/api/v1
+CHROMA_DB_PATH=./chroma_db
+```
+
+---
+
+## Notes
+
+- Chunking strategy uses ABC pattern for easy future replacement
+- Relevance filtering uses single batch call for efficiency
+- All LLM calls go through `LLMClient` for consistent error handling
+- ChromaDB collection name: "documents"
+- Metadata fields: filename, upload_date (ISO format), content_summary, chunk_index
+- Response format enforced purely through prompt engineering (no JSON schema)
\ No newline at end of file
diff --git a/.plans/phase1_frontend_plan.md b/.plans/phase1_frontend_plan.md
new file mode 100644
index 0000000..5857d75
--- /dev/null
+++ b/.plans/phase1_frontend_plan.md
@@ -0,0 +1,152 @@
+# Phase 1 Frontend Development Plan
+
+**Source**: `development_plan.md`  
+**Scope**: React 18 + TypeScript + Vite frontend for text-based RAG Q&A  
+**Estimated Duration**: 2-3 days  
+**Status**: Draft
+
+---
+
+## Objective
+
+Build a React frontend that:
+1. Pre-allocates Phase 2 grid layout (video area empty/hidden in Phase 1)
+2. Allows text input and displays extracted keywords + bullet-point RAG responses with source metadata
+3. Uses TanStack Query for type-safe API calls to the FastAPI backend
+
+---
+
+## Acceptance Criteria
+
+- [ ] Phase 2 grid layout renders: Top-Left (empty video placeholder), Top-Right (input + keywords), Bottom (response)
+- [ ] User can type a question and submit
+- [ ] Extracted keywords displayed prominently before final answer
+- [ ] Bullet-point answer displayed with source metadata (filename, upload_date)
+- [ ] Loading states for each pipeline step (keywords loading, answer loading)
+- [ ] Error handling for API failures
+- [ ] Responsive within desktop viewport (no mobile required)
+- [ ] All API calls use TanStack Query with proper caching/invalidation
+
+---
+
+## Acceptance Tests
+
+**File**: `frontend/src/test/e2e/phase1_query_flow.spec.ts` (or manual acceptance checklist)
+- User types question → sees keywords appear → sees bullet answer with sources
+- Empty state handled gracefully
+- API error shows user-friendly message
+
+---
+
+## Implementation Tasks
+
+### Day 1: Project Setup & Layout
+
+1. **Project scaffold**
+   - Initialize Vite project: `npm create vite@latest frontend -- --template react-ts`
+   - Install dependencies: `tailwindcss`, `postcss`, `autoprefixer`, `@tanstack/react-query`, `axios`
+   - Configure Tailwind CSS
+   - Set up shadcn/ui (copy components or install via CLI)
+
+2. **API client**
+   - `src/lib/api.ts` — Axios instance with base URL configuration
+   - `src/lib/queries.ts` — TanStack Query hooks:
+     - `useQueryDocument()` — POST /api/v1/query
+     - `useIngestDocument()` — POST /api/v1/ingest
+   - Type-safe request/response types matching backend Pydantic schemas
+
+3. **Layout structure**
+   - `src/App.tsx` — Root component with Phase 2 grid pre-allocation
+   - Grid layout using Tailwind CSS:
+     ```
+     Top-Left (50%):    VideoPlaceholder (hidden/empty in Phase 1)
+     Top-Right (50%):   QueryInput + KeywordsDisplay
+     Bottom (100%):     ResponsePanel
+     ```
+   - Use CSS Grid or Flexbox for clean separation
+
+### Day 2: Components & Integration
+
+1. **QueryInput component**
+   - `src/components/QueryInput.tsx`
+   - Textarea for question input
+   - Submit button with loading state
+   - Calls `useQueryDocument` mutation on submit
+
+2. **KeywordsDisplay component**
+   - `src/components/KeywordsDisplay.tsx`
+   - Shows extracted keywords as tags/chips
+   - Loading skeleton while keywords are being extracted
+   - Animated entrance when keywords arrive
+
+3. **ResponsePanel component**
+   - `src/components/ResponsePanel.tsx`
+   - Displays bullet-point answer
+   - Shows source metadata cards (filename, upload_date)
+   - Loading skeleton while answer is being generated
+   - Empty state when no query submitted yet
+
+4. **IngestPanel component (optional for Phase 1)**
+   - `src/components/IngestPanel.tsx`
+   - Simple file upload for DOCX
+   - Progress indicator during upload
+   - Success/error feedback
+
+5. **Error handling**
+   - Global error boundary
+   - Toast notifications for API errors
+   - Retry mechanism for failed queries
+
+### Day 3: Polish & Integration Testing
+
+1. **Loading states**
+   - Skeleton loaders for each panel
+   - Step-by-step progress indicator showing pipeline stage:
+     "Extracting keywords..." → "Retrieving documents..." → "Filtering relevance..." → "Generating answer..."
+
+2. **Styling polish**
+   - Consistent spacing and typography
+   - Dark/light mode support (optional)
+   - Smooth transitions between states
+
+3. **Integration with backend**
+   - End-to-end test: upload DOCX → ask question → verify keywords + answer + sources
+   - Verify CORS works correctly
+   - Test error scenarios
+
+4. **Build verification**
+   - `npm run build` succeeds
+   - Production build serves correctly via `npm run preview`
+
+---
+
+## Dependencies
+
+```json
+{
+  "dependencies": {
+    "react": "^18.2.0",
+    "react-dom": "^18.2.0",
+    "@tanstack/react-query": "^5.x",
+    "axios": "^1.6.x",
+    "tailwindcss": "^3.4.x",
+    "lucide-react": "^0.x"
+  },
+  "devDependencies": {
+    "@types/react": "^18.2.x",
+    "@types/react-dom": "^18.2.x",
+    "@vitejs/plugin-react": "^4.2.x",
+    "typescript": "^5.3.x",
+    "vite": "^5.0.x"
+  }
+}
+```
+
+---
+
+## Notes
+
+- Video area in Phase 1 should show a placeholder message: "Video upload coming in Phase 2" or be completely hidden
+- Keywords should be visually distinct from the final answer — consider using badges/tags
+- Source metadata cards should be collapsible to avoid cluttering the response area
+- Consider adding a "copy answer" button for convenience
\ No newline at end of file
diff --git a/development_plan.md b/development_plan.md
index 985da00..a54d87a 100644
--- a/development_plan.md
+++ b/development_plan.md
@@ -2,7 +2,7 @@
 
 **Project Overview**  
 Web-based application built in two phases.  
-- **Phase 1**: Text question → RAG retrieval → Point-form answer (strictly from database)  
+- **Phase 1**: Text question → query decomposition → RAG retrieval → relevance filtering → point-form answer (strictly from database)  
 - **Phase 2**: Video upload + player → real-time audio streaming → ASR transcription → question extraction → Phase 1 RAG flow  
 
 **Tech Stack**  
@@ -14,8 +14,8 @@ Web-based application built in two phases.
     - Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727
 
 - **Models**:  
-  - Embedding: `qwen/qwen3-embedding-4b`  
-  - LLM: `qwen/qwen3.5-35b-a3b`  
+  - Embedding: `qwen/qwen3-embedding-4b` (via sentence-transformers, provider-switchable via `.env`)  
+  - LLM: `qwen/qwen3.5-35b-a3b` (OpenRouter for dev, local vLLM for prod)  
   - ASR: `Qwen/Qwen3-ASR-1.7B`  
 
 **Deployment**  
@@ -58,32 +58,56 @@ app/
 
 - **LLM/ASR Configuration**: Backend reads from `.env` for easy switching between development (OpenRouter / Alibaba Cloud) and production (local vLLM).  
 - **RAG Database**: ChromaDB with metadata support (filename + extracted content metadata).  
-- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers.  
-- **Document Ingestion**: Via UI (project-based demo, no user authentication).  
+- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers, provider-switchable via `.env` (OpenRouter for dev, local vLLM for prod).  
+- **Document Ingestion**: Via UI (project-based demo, no user authentication). Supported formats: DOCX.  
+- **Chunking Strategy**: 1000 tokens per chunk, 200 token overlap. Strategy abstracted for future replacement.  
 - **Video**: MP4 and common formats, maximum 300MB.  
-- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** “Ask from Video” button.  
-- **UI Layout**:  
-  - Top-Left: Video player  
-  - Top-Right: Real-time transcript + text input box  
+- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** "Ask from Video" button.  
+- **UI Layout** (Phase 2 grid, pre-allocated in Phase 1):  
+  - Top-Left: Video player (empty in Phase 1)  
+  - Top-Right: Text input box + extracted keywords display  
   - Bottom Half: RAG response (bullet points with source metadata)  
 - **Authentication**: Public demo (no login required).  
 - **Mobile**: Not required at this stage.  
+- **CORS**: Standard FastAPI CORS middleware for frontend-backend communication.
 
 ---
 
 ## Phase 1: Text Question → RAG → Point-Form Answer (5-7 days)
 
+### RAG Pipeline (3-Step LLM Workflow)
+
+```
+User Question
+    ↓
+[LLM Call 1] Extract key questions + keywords from user input
+    ↓                ← keywords shown to user in UI
+[ChromaDB] Retrieve chunks using extracted keywords
+    ↓
+[LLM Call 2] Single batch relevance filter — evaluate all chunks, drop irrelevant ones
+    ↓
+[LLM Call 3] Generate bullet-point response from filtered chunks only
+```
+
+- **Query Decomposition** (`services/query_decomposer.py`): LLM extracts key questions and search keywords from user's natural language question. Keywords are displayed to the user for transparency.  
+- **Relevance Filtering** (`services/relevance_filter.py`): Single batch LLM call receives all retrieved chunks + original question. Returns relevance verdict for each chunk. Irrelevant chunks are discarded before response generation.  
+- **Strict RAG Prompt**: Final LLM call generates bullet-point answer using ONLY filtered relevant chunks. No external knowledge allowed. Response format enforced via prompt engineering.  
+
 ### Backend (FastAPI)
-- Dynamic configuration via `.env` (LLM base URL, API key, model names).  
+- Dynamic configuration via `.env` (LLM base URL, API key, model names, embedding provider).  
 - `services/rag.py`: Persistent ChromaDB + Qwen embedding + metadata extraction (filename, upload date, content summary).  
-- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM with **strict RAG prompt** (only use retrieved context).  
+- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM.  
+- `services/query_decomposer.py`: LLM-based keyword/question extraction.  
+- `services/relevance_filter.py`: LLM-based batch relevance scoring.  
+- `utils/chunking.py`: DOCX parsing + text chunking (1000 tokens, 200 overlap). Strategy abstracted for future replacement.  
 - Endpoints:  
-  - `POST /api/v1/ingest` – Document upload and ingestion with metadata.  
-  - `POST /api/v1/query` – Question → retrieve → LLM → bullet-point response.
+  - `POST /api/v1/ingest` – DOCX upload, parsing, chunking, embedding, and ingestion with metadata.  
+  - `POST /api/v1/query` – Full 3-step pipeline: decompose → retrieve → filter → respond. Returns bullet-point answer + extracted keywords + source metadata.  
 
 ### Frontend (React + TS)
-- Clean layout: Top-right input box, bottom response area.  
+- Phase 2 grid layout pre-allocated: Top-Left video area (empty/hidden), Top-Right input area, Bottom response area.  
 - Type-safe API calls using TanStack Query.  
+- Display extracted keywords to user (shown before final answer arrives).  
 - Display answer as clean bullet list with source metadata.
 
 ---
@@ -137,4 +161,4 @@ app/
 **File Information**  
 - Filename: `development_plan.md`  
 - Last Updated: April 2026  
-- Status: Ready for implementation
+- Status: Phase 1 clarified, ready for sub-phase planning