docs: add Phase 1 backend and frontend development plans

This commit is contained in:
Woody 2026-04-22 15:47:27 +08:00
parent be48b1d8c7
commit 1f4e3a2572
3 changed files with 395 additions and 15 deletions

View File

@ -0,0 +1,204 @@
# Phase 1 Backend Development Plan
**Source**: `development_plan.md`
**Scope**: FastAPI backend for text-based RAG Q&A
**Estimated Duration**: 3-4 days
**Status**: Draft
---
## Objective
Build a complete FastAPI backend that:
1. Accepts DOCX uploads, chunks text (1000 tokens / 200 overlap), embeds via Qwen, and stores in persistent ChromaDB with metadata
2. Runs a 3-step RAG pipeline: query decomposition → retrieval → relevance filtering → bullet-point response
3. Serves API endpoints for ingestion and querying with full metadata attribution
---
## Acceptance Criteria
- [ ] `POST /api/v1/ingest` accepts DOCX, parses content, chunks at 1000/200, embeds, stores in ChromaDB with filename/upload_date/content_summary
- [ ] `POST /api/v1/query` accepts natural language question, returns JSON with: `keywords`, `answer` (bullet points), `sources` (array of metadata objects)
- [ ] Query pipeline executes 3 LLM calls: decomposition → relevance filter → response generation
- [ ] All LLM/ASR configuration reads from `.env` (OpenRouter for dev)
- [ ] ChromaDB persists to `chroma_db/` directory
- [ ] Chunking strategy is abstracted (interface/class) for future replacement
- [ ] All unit tests pass (`pytest app/test/test_phase1_*.py -v`)
- [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
---
## Acceptance Tests
**File**: `backend/app/test/acceptance/test_acceptance_phase1_ingest.py`
- `test_ingest_docx_with_real_embedding()` — Upload DOCX, verify ChromaDB entries with metadata
**File**: `backend/app/test/acceptance/test_acceptance_phase1_rag_query.py`
- `test_query_with_real_llm()` — Ask question, verify 3-step pipeline produces bullet answer with sources
- `test_query_keywords_displayed()` — Verify response includes extracted keywords
---
## Implementation Tasks
### Day 1: Project Setup & Core Infrastructure
**Task 1.1**: Environment and dependencies
- Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
- Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
- Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading
**Task 1.2**: Database initialization
- Create `backend/app/core/database.py` — ChromaDB persistent client
- Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/`
- Function: `get_or_create_collection(name, embedding_function)`
**Task 1.3**: Project structure
- Create all `__init__.py` files for package structure
- Create `backend/app/main.py` with FastAPI app, CORS middleware
- Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc.
**Task 1.4**: Pydantic schemas
- `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename`
- `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources`
- `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index`
### Day 2: Ingestion Pipeline
**Task 2.1**: DOCX parsing
- `utils/docx_parser.py`: `parse_docx(file_path) -> str`
- Handle paragraphs, tables, headers
- Return plain text with preserved paragraph breaks
**Task 2.2**: Chunking abstraction
- `utils/chunking.py`: Abstract base class `ChunkingStrategy`
- `TokenChunkingStrategy` implementation using tiktoken
- Config: chunk_size=1000, overlap=200
- Method: `chunk(text: str) -> list[str]`
**Task 2.3**: Metadata extraction
- `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]`
- Returns list of metadata dicts matching chunk count
- Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk)
**Task 2.4**: Embedding service
- `services/rag.py`: `RAGService` class
- Initialize embedding function with `qwen/qwen3-embedding-4b`
- Method: `ingest_document(file_path, chunks, metadata_list)`
- Store in ChromaDB collection "documents"
**Task 2.5**: Ingest endpoint
- `routers/ingest.py`: `POST /api/v1/ingest`
- Accept `UploadFile` (DOCX only, validate extension)
- Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
- Return `IngestResponse`
**Task 2.6**: Unit tests
- `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes
- `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow
### Day 3: Query Pipeline (3-Step)
**Task 3.1**: LLM client
- `services/llm_client.py`: `LLMClient` class
- Constructor takes config from `Settings`
- Method: `complete(prompt: str, temperature: float = 0.7) -> str`
- Use httpx with OpenAI-compatible API format
- Handle errors gracefully
**Task 3.2**: Query decomposition
- `services/query_decomposer.py`: `QueryDecomposer` class
- Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
- Method: `decompose(question: str) -> list[str]`
- Parse LLM JSON response into list of keywords
**Task 3.3**: Retrieval from ChromaDB
- `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)`
- Join keywords with space for query text
- Return list of `(chunk_text, metadata, distance)` tuples
**Task 3.4**: Relevance filtering
- `services/relevance_filter.py`: `RelevanceFilter` class
- Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
- Input: list of chunks
- Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
- Batch all chunks in single LLM call
**Task 3.5**: Response generation
- `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str`
- Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
- Include chunk content and metadata in context
- Enforce bullet-point format via prompt
**Task 3.6**: Query endpoint
- `routers/query.py`: `POST /api/v1/query`
- Full pipeline orchestration:
1. Call `query_decomposer.decompose()` → get keywords
2. Call `rag.retrieve()` → get chunks
3. Call `relevance_filter.filter()` → filter chunks
4. Call `rag.generate_response()` → get answer
- Return `QueryResponse` with keywords, answer, sources
### Day 4: Testing & Polish
**Task 4.1**: Unit tests
- `test_phase1_query.py`: Test full pipeline with mocked LLM calls
- `test_phase1_llm_client.py`: Test LLM client error handling
- `test_phase1_rag_service.py`: Test retrieval and response generation
**Task 4.2**: Acceptance tests
- Create real `.env` with OpenRouter credentials
- Run `test_acceptance_phase1_ingest.py` with real embedding
- Run `test_acceptance_phase1_rag_query.py` with real LLM calls
- Verify keywords appear, answer is bullet format, sources have metadata
**Task 4.3**: Error handling
- Add try/except in all endpoints
- Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
- Log errors with context
**Task 4.4**: Documentation
- Update `AGENTS.md` if any conventions changed
- Add docstrings to all public methods
- Verify all imports work
---
## New Services Required
| Service | File | Responsibility |
|---------|------|----------------|
| Config | `core/config.py` | `.env` loading, Settings class |
| Database | `core/database.py` | ChromaDB persistent client |
| LLM Client | `services/llm_client.py` | OpenAI-compatible API wrapper |
| Query Decomposer | `services/query_decomposer.py` | Extract keywords from question |
| Relevance Filter | `services/relevance_filter.py` | Batch score chunk relevance |
| RAG Service | `services/rag.py` | Embedding, retrieval, response generation |
| DOCX Parser | `utils/docx_parser.py` | Extract text from DOCX |
| Chunking | `utils/chunking.py` | Token-based chunking with overlap |
| Metadata | `utils/metadata.py` | Extract file metadata |
---
## Environment Variables
```bash
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your_openrouter_key
LLM_MODEL_NAME=qwen/qwen3.5-35b-a3b
EMBEDDING_MODEL=qwen/qwen3-embedding-4b
EMBEDDING_BASE_URL=https://openrouter.ai/api/v1
CHROMA_DB_PATH=./chroma_db
```
---
## Notes
- Chunking strategy uses ABC pattern for easy future replacement
- Relevance filtering uses single batch call for efficiency
- All LLM calls go through `LLMClient` for consistent error handling
- ChromaDB collection name: "documents"
- Metadata fields: filename, upload_date (ISO format), content_summary, chunk_index
- Response format enforced purely through prompt engineering (no JSON schema)

View File

@ -0,0 +1,152 @@
# Phase 1 Frontend Development Plan
**Source**: `development_plan.md`
**Scope**: React 18 + TypeScript + Vite frontend for text-based RAG Q&A
**Estimated Duration**: 2-3 days
**Status**: Draft
---
## Objective
Build a React frontend that:
1. Pre-allocates Phase 2 grid layout (video area empty/hidden in Phase 1)
2. Allows text input and displays extracted keywords + bullet-point RAG responses with source metadata
3. Uses TanStack Query for type-safe API calls to the FastAPI backend
---
## Acceptance Criteria
- [ ] Phase 2 grid layout renders: Top-Left (empty video placeholder), Top-Right (input + keywords), Bottom (response)
- [ ] User can type a question and submit
- [ ] Extracted keywords displayed prominently before final answer
- [ ] Bullet-point answer displayed with source metadata (filename, upload_date)
- [ ] Loading states for each pipeline step (keywords loading, answer loading)
- [ ] Error handling for API failures
- [ ] Responsive within desktop viewport (no mobile required)
- [ ] All API calls use TanStack Query with proper caching/invalidation
---
## Acceptance Tests
**File**: `frontend/src/test/e2e/phase1_query_flow.spec.ts` (or manual acceptance checklist)
- User types question → sees keywords appear → sees bullet answer with sources
- Empty state handled gracefully
- API error shows user-friendly message
---
## Implementation Tasks
### Day 1: Project Setup & Layout
1. **Project scaffold**
- Initialize Vite project: `npm create vite@latest frontend -- --template react-ts`
- Install dependencies: `tailwindcss`, `postcss`, `autoprefixer`, `@tanstack/react-query`, `axios`
- Configure Tailwind CSS
- Set up shadcn/ui (copy components or install via CLI)
2. **API client**
- `src/lib/api.ts` — Axios instance with base URL configuration
- `src/lib/queries.ts` — TanStack Query hooks:
- `useQueryDocument()` — POST /api/v1/query
- `useIngestDocument()` — POST /api/v1/ingest
- Type-safe request/response types matching backend Pydantic schemas
3. **Layout structure**
- `src/App.tsx` — Root component with Phase 2 grid pre-allocation
- Grid layout using Tailwind CSS:
```
Top-Left (50%): VideoPlaceholder (hidden/empty in Phase 1)
Top-Right (50%): QueryInput + KeywordsDisplay
Bottom (100%): ResponsePanel
```
- Use CSS Grid or Flexbox for clean separation
### Day 2: Components & Integration
1. **QueryInput component**
- `src/components/QueryInput.tsx`
- Textarea for question input
- Submit button with loading state
- Calls `useQueryDocument` mutation on submit
2. **KeywordsDisplay component**
- `src/components/KeywordsDisplay.tsx`
- Shows extracted keywords as tags/chips
- Loading skeleton while keywords are being extracted
- Animated entrance when keywords arrive
3. **ResponsePanel component**
- `src/components/ResponsePanel.tsx`
- Displays bullet-point answer
- Shows source metadata cards (filename, upload_date)
- Loading skeleton while answer is being generated
- Empty state when no query submitted yet
4. **IngestPanel component (optional for Phase 1)**
- `src/components/IngestPanel.tsx`
- Simple file upload for DOCX
- Progress indicator during upload
- Success/error feedback
5. **Error handling**
- Global error boundary
- Toast notifications for API errors
- Retry mechanism for failed queries
### Day 3: Polish & Integration Testing
1. **Loading states**
- Skeleton loaders for each panel
- Step-by-step progress indicator showing pipeline stage:
"Extracting keywords..." → "Retrieving documents..." → "Filtering relevance..." → "Generating answer..."
2. **Styling polish**
- Consistent spacing and typography
- Dark/light mode support (optional)
- Smooth transitions between states
3. **Integration with backend**
- End-to-end test: upload DOCX → ask question → verify keywords + answer + sources
- Verify CORS works correctly
- Test error scenarios
4. **Build verification**
- `npm run build` succeeds
- Production build serves correctly via `npm run preview`
---
## Dependencies
```json
{
"dependencies": {
"react": "^18.2.0",
"react-dom": "^18.2.0",
"@tanstack/react-query": "^5.x",
"axios": "^1.6.x",
"tailwindcss": "^3.4.x",
"lucide-react": "^0.x"
},
"devDependencies": {
"@types/react": "^18.2.x",
"@types/react-dom": "^18.2.x",
"@vitejs/plugin-react": "^4.2.x",
"typescript": "^5.3.x",
"vite": "^5.0.x"
}
}
```
---
## Notes
- Video area in Phase 1 should show a placeholder message: "Video upload coming in Phase 2" or be completely hidden
- Keywords should be visually distinct from the final answer — consider using badges/tags
- Source metadata cards should be collapsible to avoid cluttering the response area
- Consider adding a "copy answer" button for convenience

View File

@ -2,7 +2,7 @@
**Project Overview** **Project Overview**
Web-based application built in two phases. Web-based application built in two phases.
- **Phase 1**: Text question → RAG retrieval → Point-form answer (strictly from database) - **Phase 1**: Text question → query decomposition → RAG retrieval → relevance filtering → point-form answer (strictly from database)
- **Phase 2**: Video upload + player → real-time audio streaming → ASR transcription → question extraction → Phase 1 RAG flow - **Phase 2**: Video upload + player → real-time audio streaming → ASR transcription → question extraction → Phase 1 RAG flow
**Tech Stack** **Tech Stack**
@ -14,8 +14,8 @@ Web-based application built in two phases.
- Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727 - Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727
- **Models**: - **Models**:
- Embedding: `qwen/qwen3-embedding-4b` - Embedding: `qwen/qwen3-embedding-4b` (via sentence-transformers, provider-switchable via `.env`)
- LLM: `qwen/qwen3.5-35b-a3b` - LLM: `qwen/qwen3.5-35b-a3b` (OpenRouter for dev, local vLLM for prod)
- ASR: `Qwen/Qwen3-ASR-1.7B` - ASR: `Qwen/Qwen3-ASR-1.7B`
**Deployment** **Deployment**
@ -58,32 +58,56 @@ app/
- **LLM/ASR Configuration**: Backend reads from `.env` for easy switching between development (OpenRouter / Alibaba Cloud) and production (local vLLM). - **LLM/ASR Configuration**: Backend reads from `.env` for easy switching between development (OpenRouter / Alibaba Cloud) and production (local vLLM).
- **RAG Database**: ChromaDB with metadata support (filename + extracted content metadata). - **RAG Database**: ChromaDB with metadata support (filename + extracted content metadata).
- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers. - **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers, provider-switchable via `.env` (OpenRouter for dev, local vLLM for prod).
- **Document Ingestion**: Via UI (project-based demo, no user authentication). - **Document Ingestion**: Via UI (project-based demo, no user authentication). Supported formats: DOCX.
- **Chunking Strategy**: 1000 tokens per chunk, 200 token overlap. Strategy abstracted for future replacement.
- **Video**: MP4 and common formats, maximum 300MB. - **Video**: MP4 and common formats, maximum 300MB.
- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** “Ask from Video” button. - **ASR Flow**: Both **automatic** (on transcript updates) and **manual** "Ask from Video" button.
- **UI Layout**: - **UI Layout** (Phase 2 grid, pre-allocated in Phase 1):
- Top-Left: Video player - Top-Left: Video player (empty in Phase 1)
- Top-Right: Real-time transcript + text input box - Top-Right: Text input box + extracted keywords display
- Bottom Half: RAG response (bullet points with source metadata) - Bottom Half: RAG response (bullet points with source metadata)
- **Authentication**: Public demo (no login required). - **Authentication**: Public demo (no login required).
- **Mobile**: Not required at this stage. - **Mobile**: Not required at this stage.
- **CORS**: Standard FastAPI CORS middleware for frontend-backend communication.
--- ---
## Phase 1: Text Question → RAG → Point-Form Answer (5-7 days) ## Phase 1: Text Question → RAG → Point-Form Answer (5-7 days)
### RAG Pipeline (3-Step LLM Workflow)
```
User Question
[LLM Call 1] Extract key questions + keywords from user input
↓ ← keywords shown to user in UI
[ChromaDB] Retrieve chunks using extracted keywords
[LLM Call 2] Single batch relevance filter — evaluate all chunks, drop irrelevant ones
[LLM Call 3] Generate bullet-point response from filtered chunks only
```
- **Query Decomposition** (`services/query_decomposer.py`): LLM extracts key questions and search keywords from user's natural language question. Keywords are displayed to the user for transparency.
- **Relevance Filtering** (`services/relevance_filter.py`): Single batch LLM call receives all retrieved chunks + original question. Returns relevance verdict for each chunk. Irrelevant chunks are discarded before response generation.
- **Strict RAG Prompt**: Final LLM call generates bullet-point answer using ONLY filtered relevant chunks. No external knowledge allowed. Response format enforced via prompt engineering.
### Backend (FastAPI) ### Backend (FastAPI)
- Dynamic configuration via `.env` (LLM base URL, API key, model names). - Dynamic configuration via `.env` (LLM base URL, API key, model names, embedding provider).
- `services/rag.py`: Persistent ChromaDB + Qwen embedding + metadata extraction (filename, upload date, content summary). - `services/rag.py`: Persistent ChromaDB + Qwen embedding + metadata extraction (filename, upload date, content summary).
- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM with **strict RAG prompt** (only use retrieved context). - `services/llm_client.py`: OpenAI-compatible client for Qwen LLM.
- `services/query_decomposer.py`: LLM-based keyword/question extraction.
- `services/relevance_filter.py`: LLM-based batch relevance scoring.
- `utils/chunking.py`: DOCX parsing + text chunking (1000 tokens, 200 overlap). Strategy abstracted for future replacement.
- Endpoints: - Endpoints:
- `POST /api/v1/ingest` Document upload and ingestion with metadata. - `POST /api/v1/ingest` DOCX upload, parsing, chunking, embedding, and ingestion with metadata.
- `POST /api/v1/query` Question → retrieve → LLM → bullet-point response. - `POST /api/v1/query` Full 3-step pipeline: decompose → retrieve → filter → respond. Returns bullet-point answer + extracted keywords + source metadata.
### Frontend (React + TS) ### Frontend (React + TS)
- Clean layout: Top-right input box, bottom response area. - Phase 2 grid layout pre-allocated: Top-Left video area (empty/hidden), Top-Right input area, Bottom response area.
- Type-safe API calls using TanStack Query. - Type-safe API calls using TanStack Query.
- Display extracted keywords to user (shown before final answer arrives).
- Display answer as clean bullet list with source metadata. - Display answer as clean bullet list with source metadata.
--- ---
@ -137,4 +161,4 @@ app/
**File Information** **File Information**
- Filename: `development_plan.md` - Filename: `development_plan.md`
- Last Updated: April 2026 - Last Updated: April 2026
- Status: Ready for implementation - Status: Phase 1 clarified, ready for sub-phase planning