From 1f4e3a257262d7b3539b902c5ebb36a754e5be87 Mon Sep 17 00:00:00 2001 From: Woody Date: Wed, 22 Apr 2026 15:47:27 +0800 Subject: [PATCH] docs: add Phase 1 backend and frontend development plans --- .plans/phase1_backend_plan.md | 204 +++++++++++++++++++++++++++++++++ .plans/phase1_frontend_plan.md | 152 ++++++++++++++++++++++++ development_plan.md | 54 ++++++--- 3 files changed, 395 insertions(+), 15 deletions(-) create mode 100644 .plans/phase1_backend_plan.md create mode 100644 .plans/phase1_frontend_plan.md diff --git a/.plans/phase1_backend_plan.md b/.plans/phase1_backend_plan.md new file mode 100644 index 0000000..4cf60c2 --- /dev/null +++ b/.plans/phase1_backend_plan.md @@ -0,0 +1,204 @@ +# Phase 1 Backend Development Plan + +**Source**: `development_plan.md` +**Scope**: FastAPI backend for text-based RAG Q&A +**Estimated Duration**: 3-4 days +**Status**: Draft + +--- + +## Objective + +Build a complete FastAPI backend that: +1. Accepts DOCX uploads, chunks text (1000 tokens / 200 overlap), embeds via Qwen, and stores in persistent ChromaDB with metadata +2. Runs a 3-step RAG pipeline: query decomposition → retrieval → relevance filtering → bullet-point response +3. Serves API endpoints for ingestion and querying with full metadata attribution + +--- + +## Acceptance Criteria + +- [ ] `POST /api/v1/ingest` accepts DOCX, parses content, chunks at 1000/200, embeds, stores in ChromaDB with filename/upload_date/content_summary +- [ ] `POST /api/v1/query` accepts natural language question, returns JSON with: `keywords`, `answer` (bullet points), `sources` (array of metadata objects) +- [ ] Query pipeline executes 3 LLM calls: decomposition → relevance filter → response generation +- [ ] All LLM/ASR configuration reads from `.env` (OpenRouter for dev) +- [ ] ChromaDB persists to `chroma_db/` directory +- [ ] Chunking strategy is abstracted (interface/class) for future replacement +- [ ] All unit tests pass (`pytest app/test/test_phase1_*.py -v`) +- [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`) + +--- + +## Acceptance Tests + +**File**: `backend/app/test/acceptance/test_acceptance_phase1_ingest.py` +- `test_ingest_docx_with_real_embedding()` — Upload DOCX, verify ChromaDB entries with metadata + +**File**: `backend/app/test/acceptance/test_acceptance_phase1_rag_query.py` +- `test_query_with_real_llm()` — Ask question, verify 3-step pipeline produces bullet answer with sources +- `test_query_keywords_displayed()` — Verify response includes extracted keywords + +--- + +## Implementation Tasks + +### Day 1: Project Setup & Core Infrastructure + +**Task 1.1**: Environment and dependencies +- Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken +- Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH +- Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading + +**Task 1.2**: Database initialization +- Create `backend/app/core/database.py` — ChromaDB persistent client +- Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/` +- Function: `get_or_create_collection(name, embedding_function)` + +**Task 1.3**: Project structure +- Create all `__init__.py` files for package structure +- Create `backend/app/main.py` with FastAPI app, CORS middleware +- Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc. + +**Task 1.4**: Pydantic schemas +- `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename` +- `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources` +- `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index` + +### Day 2: Ingestion Pipeline + +**Task 2.1**: DOCX parsing +- `utils/docx_parser.py`: `parse_docx(file_path) -> str` +- Handle paragraphs, tables, headers +- Return plain text with preserved paragraph breaks + +**Task 2.2**: Chunking abstraction +- `utils/chunking.py`: Abstract base class `ChunkingStrategy` +- `TokenChunkingStrategy` implementation using tiktoken +- Config: chunk_size=1000, overlap=200 +- Method: `chunk(text: str) -> list[str]` + +**Task 2.3**: Metadata extraction +- `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]` +- Returns list of metadata dicts matching chunk count +- Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk) + +**Task 2.4**: Embedding service +- `services/rag.py`: `RAGService` class +- Initialize embedding function with `qwen/qwen3-embedding-4b` +- Method: `ingest_document(file_path, chunks, metadata_list)` +- Store in ChromaDB collection "documents" + +**Task 2.5**: Ingest endpoint +- `routers/ingest.py`: `POST /api/v1/ingest` +- Accept `UploadFile` (DOCX only, validate extension) +- Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup +- Return `IngestResponse` + +**Task 2.6**: Unit tests +- `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes +- `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow + +### Day 3: Query Pipeline (3-Step) + +**Task 3.1**: LLM client +- `services/llm_client.py`: `LLMClient` class +- Constructor takes config from `Settings` +- Method: `complete(prompt: str, temperature: float = 0.7) -> str` +- Use httpx with OpenAI-compatible API format +- Handle errors gracefully + +**Task 3.2**: Query decomposition +- `services/query_decomposer.py`: `QueryDecomposer` class +- Prompt template: "Given question: '{question}', extract key search keywords as JSON array" +- Method: `decompose(question: str) -> list[str]` +- Parse LLM JSON response into list of keywords + +**Task 3.3**: Retrieval from ChromaDB +- `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)` +- Join keywords with space for query text +- Return list of `(chunk_text, metadata, distance)` tuples + +**Task 3.4**: Relevance filtering +- `services/relevance_filter.py`: `RelevanceFilter` class +- Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores." +- Input: list of chunks +- Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7) +- Batch all chunks in single LLM call + +**Task 3.5**: Response generation +- `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str` +- Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources." +- Include chunk content and metadata in context +- Enforce bullet-point format via prompt + +**Task 3.6**: Query endpoint +- `routers/query.py`: `POST /api/v1/query` +- Full pipeline orchestration: + 1. Call `query_decomposer.decompose()` → get keywords + 2. Call `rag.retrieve()` → get chunks + 3. Call `relevance_filter.filter()` → filter chunks + 4. Call `rag.generate_response()` → get answer +- Return `QueryResponse` with keywords, answer, sources + +### Day 4: Testing & Polish + +**Task 4.1**: Unit tests +- `test_phase1_query.py`: Test full pipeline with mocked LLM calls +- `test_phase1_llm_client.py`: Test LLM client error handling +- `test_phase1_rag_service.py`: Test retrieval and response generation + +**Task 4.2**: Acceptance tests +- Create real `.env` with OpenRouter credentials +- Run `test_acceptance_phase1_ingest.py` with real embedding +- Run `test_acceptance_phase1_rag_query.py` with real LLM calls +- Verify keywords appear, answer is bullet format, sources have metadata + +**Task 4.3**: Error handling +- Add try/except in all endpoints +- Return proper HTTP status codes (400 for bad input, 500 for LLM errors) +- Log errors with context + +**Task 4.4**: Documentation +- Update `AGENTS.md` if any conventions changed +- Add docstrings to all public methods +- Verify all imports work + +--- + +## New Services Required + +| Service | File | Responsibility | +|---------|------|----------------| +| Config | `core/config.py` | `.env` loading, Settings class | +| Database | `core/database.py` | ChromaDB persistent client | +| LLM Client | `services/llm_client.py` | OpenAI-compatible API wrapper | +| Query Decomposer | `services/query_decomposer.py` | Extract keywords from question | +| Relevance Filter | `services/relevance_filter.py` | Batch score chunk relevance | +| RAG Service | `services/rag.py` | Embedding, retrieval, response generation | +| DOCX Parser | `utils/docx_parser.py` | Extract text from DOCX | +| Chunking | `utils/chunking.py` | Token-based chunking with overlap | +| Metadata | `utils/metadata.py` | Extract file metadata | + +--- + +## Environment Variables + +```bash +LLM_BASE_URL=https://openrouter.ai/api/v1 +LLM_API_KEY=your_openrouter_key +LLM_MODEL_NAME=qwen/qwen3.5-35b-a3b +EMBEDDING_MODEL=qwen/qwen3-embedding-4b +EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 +CHROMA_DB_PATH=./chroma_db +``` + +--- + +## Notes + +- Chunking strategy uses ABC pattern for easy future replacement +- Relevance filtering uses single batch call for efficiency +- All LLM calls go through `LLMClient` for consistent error handling +- ChromaDB collection name: "documents" +- Metadata fields: filename, upload_date (ISO format), content_summary, chunk_index +- Response format enforced purely through prompt engineering (no JSON schema) \ No newline at end of file diff --git a/.plans/phase1_frontend_plan.md b/.plans/phase1_frontend_plan.md new file mode 100644 index 0000000..5857d75 --- /dev/null +++ b/.plans/phase1_frontend_plan.md @@ -0,0 +1,152 @@ +# Phase 1 Frontend Development Plan + +**Source**: `development_plan.md` +**Scope**: React 18 + TypeScript + Vite frontend for text-based RAG Q&A +**Estimated Duration**: 2-3 days +**Status**: Draft + +--- + +## Objective + +Build a React frontend that: +1. Pre-allocates Phase 2 grid layout (video area empty/hidden in Phase 1) +2. Allows text input and displays extracted keywords + bullet-point RAG responses with source metadata +3. Uses TanStack Query for type-safe API calls to the FastAPI backend + +--- + +## Acceptance Criteria + +- [ ] Phase 2 grid layout renders: Top-Left (empty video placeholder), Top-Right (input + keywords), Bottom (response) +- [ ] User can type a question and submit +- [ ] Extracted keywords displayed prominently before final answer +- [ ] Bullet-point answer displayed with source metadata (filename, upload_date) +- [ ] Loading states for each pipeline step (keywords loading, answer loading) +- [ ] Error handling for API failures +- [ ] Responsive within desktop viewport (no mobile required) +- [ ] All API calls use TanStack Query with proper caching/invalidation + +--- + +## Acceptance Tests + +**File**: `frontend/src/test/e2e/phase1_query_flow.spec.ts` (or manual acceptance checklist) +- User types question → sees keywords appear → sees bullet answer with sources +- Empty state handled gracefully +- API error shows user-friendly message + +--- + +## Implementation Tasks + +### Day 1: Project Setup & Layout + +1. **Project scaffold** + - Initialize Vite project: `npm create vite@latest frontend -- --template react-ts` + - Install dependencies: `tailwindcss`, `postcss`, `autoprefixer`, `@tanstack/react-query`, `axios` + - Configure Tailwind CSS + - Set up shadcn/ui (copy components or install via CLI) + +2. **API client** + - `src/lib/api.ts` — Axios instance with base URL configuration + - `src/lib/queries.ts` — TanStack Query hooks: + - `useQueryDocument()` — POST /api/v1/query + - `useIngestDocument()` — POST /api/v1/ingest + - Type-safe request/response types matching backend Pydantic schemas + +3. **Layout structure** + - `src/App.tsx` — Root component with Phase 2 grid pre-allocation + - Grid layout using Tailwind CSS: + ``` + Top-Left (50%): VideoPlaceholder (hidden/empty in Phase 1) + Top-Right (50%): QueryInput + KeywordsDisplay + Bottom (100%): ResponsePanel + ``` + - Use CSS Grid or Flexbox for clean separation + +### Day 2: Components & Integration + +1. **QueryInput component** + - `src/components/QueryInput.tsx` + - Textarea for question input + - Submit button with loading state + - Calls `useQueryDocument` mutation on submit + +2. **KeywordsDisplay component** + - `src/components/KeywordsDisplay.tsx` + - Shows extracted keywords as tags/chips + - Loading skeleton while keywords are being extracted + - Animated entrance when keywords arrive + +3. **ResponsePanel component** + - `src/components/ResponsePanel.tsx` + - Displays bullet-point answer + - Shows source metadata cards (filename, upload_date) + - Loading skeleton while answer is being generated + - Empty state when no query submitted yet + +4. **IngestPanel component (optional for Phase 1)** + - `src/components/IngestPanel.tsx` + - Simple file upload for DOCX + - Progress indicator during upload + - Success/error feedback + +5. **Error handling** + - Global error boundary + - Toast notifications for API errors + - Retry mechanism for failed queries + +### Day 3: Polish & Integration Testing + +1. **Loading states** + - Skeleton loaders for each panel + - Step-by-step progress indicator showing pipeline stage: + "Extracting keywords..." → "Retrieving documents..." → "Filtering relevance..." → "Generating answer..." + +2. **Styling polish** + - Consistent spacing and typography + - Dark/light mode support (optional) + - Smooth transitions between states + +3. **Integration with backend** + - End-to-end test: upload DOCX → ask question → verify keywords + answer + sources + - Verify CORS works correctly + - Test error scenarios + +4. **Build verification** + - `npm run build` succeeds + - Production build serves correctly via `npm run preview` + +--- + +## Dependencies + +```json +{ + "dependencies": { + "react": "^18.2.0", + "react-dom": "^18.2.0", + "@tanstack/react-query": "^5.x", + "axios": "^1.6.x", + "tailwindcss": "^3.4.x", + "lucide-react": "^0.x" + }, + "devDependencies": { + "@types/react": "^18.2.x", + "@types/react-dom": "^18.2.x", + "@vitejs/plugin-react": "^4.2.x", + "typescript": "^5.3.x", + "vite": "^5.0.x" + } +} +``` + +--- + +## Notes + +- Video area in Phase 1 should show a placeholder message: "Video upload coming in Phase 2" or be completely hidden +- Keywords should be visually distinct from the final answer — consider using badges/tags +- Source metadata cards should be collapsible to avoid cluttering the response area +- Consider adding a "copy answer" button for convenience \ No newline at end of file diff --git a/development_plan.md b/development_plan.md index 985da00..a54d87a 100644 --- a/development_plan.md +++ b/development_plan.md @@ -2,7 +2,7 @@ **Project Overview** Web-based application built in two phases. -- **Phase 1**: Text question → RAG retrieval → Point-form answer (strictly from database) +- **Phase 1**: Text question → query decomposition → RAG retrieval → relevance filtering → point-form answer (strictly from database) - **Phase 2**: Video upload + player → real-time audio streaming → ASR transcription → question extraction → Phase 1 RAG flow **Tech Stack** @@ -14,8 +14,8 @@ Web-based application built in two phases. - Alibaba Cloud reference: https://modelstudio.console.alibabacloud.com/ap-southeast-1?switchAgent=101503&tab=doc&productCode=p_efm&switchUserType=3#/doc/?type=model&url=2989727 - **Models**: - - Embedding: `qwen/qwen3-embedding-4b` - - LLM: `qwen/qwen3.5-35b-a3b` + - Embedding: `qwen/qwen3-embedding-4b` (via sentence-transformers, provider-switchable via `.env`) + - LLM: `qwen/qwen3.5-35b-a3b` (OpenRouter for dev, local vLLM for prod) - ASR: `Qwen/Qwen3-ASR-1.7B` **Deployment** @@ -58,32 +58,56 @@ app/ - **LLM/ASR Configuration**: Backend reads from `.env` for easy switching between development (OpenRouter / Alibaba Cloud) and production (local vLLM). - **RAG Database**: ChromaDB with metadata support (filename + extracted content metadata). -- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers. -- **Document Ingestion**: Via UI (project-based demo, no user authentication). +- **Embedding Model**: `qwen/qwen3-embedding-4b` via sentence-transformers, provider-switchable via `.env` (OpenRouter for dev, local vLLM for prod). +- **Document Ingestion**: Via UI (project-based demo, no user authentication). Supported formats: DOCX. +- **Chunking Strategy**: 1000 tokens per chunk, 200 token overlap. Strategy abstracted for future replacement. - **Video**: MP4 and common formats, maximum 300MB. -- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** “Ask from Video” button. -- **UI Layout**: - - Top-Left: Video player - - Top-Right: Real-time transcript + text input box +- **ASR Flow**: Both **automatic** (on transcript updates) and **manual** "Ask from Video" button. +- **UI Layout** (Phase 2 grid, pre-allocated in Phase 1): + - Top-Left: Video player (empty in Phase 1) + - Top-Right: Text input box + extracted keywords display - Bottom Half: RAG response (bullet points with source metadata) - **Authentication**: Public demo (no login required). - **Mobile**: Not required at this stage. +- **CORS**: Standard FastAPI CORS middleware for frontend-backend communication. --- ## Phase 1: Text Question → RAG → Point-Form Answer (5-7 days) +### RAG Pipeline (3-Step LLM Workflow) + +``` +User Question + ↓ +[LLM Call 1] Extract key questions + keywords from user input + ↓ ← keywords shown to user in UI +[ChromaDB] Retrieve chunks using extracted keywords + ↓ +[LLM Call 2] Single batch relevance filter — evaluate all chunks, drop irrelevant ones + ↓ +[LLM Call 3] Generate bullet-point response from filtered chunks only +``` + +- **Query Decomposition** (`services/query_decomposer.py`): LLM extracts key questions and search keywords from user's natural language question. Keywords are displayed to the user for transparency. +- **Relevance Filtering** (`services/relevance_filter.py`): Single batch LLM call receives all retrieved chunks + original question. Returns relevance verdict for each chunk. Irrelevant chunks are discarded before response generation. +- **Strict RAG Prompt**: Final LLM call generates bullet-point answer using ONLY filtered relevant chunks. No external knowledge allowed. Response format enforced via prompt engineering. + ### Backend (FastAPI) -- Dynamic configuration via `.env` (LLM base URL, API key, model names). +- Dynamic configuration via `.env` (LLM base URL, API key, model names, embedding provider). - `services/rag.py`: Persistent ChromaDB + Qwen embedding + metadata extraction (filename, upload date, content summary). -- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM with **strict RAG prompt** (only use retrieved context). +- `services/llm_client.py`: OpenAI-compatible client for Qwen LLM. +- `services/query_decomposer.py`: LLM-based keyword/question extraction. +- `services/relevance_filter.py`: LLM-based batch relevance scoring. +- `utils/chunking.py`: DOCX parsing + text chunking (1000 tokens, 200 overlap). Strategy abstracted for future replacement. - Endpoints: - - `POST /api/v1/ingest` – Document upload and ingestion with metadata. - - `POST /api/v1/query` – Question → retrieve → LLM → bullet-point response. + - `POST /api/v1/ingest` – DOCX upload, parsing, chunking, embedding, and ingestion with metadata. + - `POST /api/v1/query` – Full 3-step pipeline: decompose → retrieve → filter → respond. Returns bullet-point answer + extracted keywords + source metadata. ### Frontend (React + TS) -- Clean layout: Top-right input box, bottom response area. +- Phase 2 grid layout pre-allocated: Top-Left video area (empty/hidden), Top-Right input area, Bottom response area. - Type-safe API calls using TanStack Query. +- Display extracted keywords to user (shown before final answer arrives). - Display answer as clean bullet list with source metadata. --- @@ -137,4 +161,4 @@ app/ **File Information** - Filename: `development_plan.md` - Last Updated: April 2026 -- Status: Ready for implementation +- Status: Phase 1 clarified, ready for sub-phase planning