# Phase 1 Backend Development Plan **Source**: `development_plan.md` **Scope**: FastAPI backend for text-based RAG Q&A **Estimated Duration**: 3-4 days **Status**: Draft --- ## Objective Build a complete FastAPI backend that: 1. Accepts DOCX uploads, chunks text (1000 tokens / 200 overlap), embeds via Qwen, and stores in persistent ChromaDB with metadata 2. Runs a 3-step RAG pipeline: query decomposition → retrieval → relevance filtering → bullet-point response 3. Serves API endpoints for ingestion and querying with full metadata attribution --- ## Acceptance Criteria - [ ] `POST /api/v1/ingest` accepts DOCX, parses content, chunks at 1000/200, embeds, stores in ChromaDB with filename/upload_date/content_summary - [ ] `POST /api/v1/query` accepts natural language question, returns JSON with: `keywords`, `answer` (bullet points), `sources` (array of metadata objects) - [ ] Query pipeline executes 3 LLM calls: decomposition → relevance filter → response generation - [ ] All LLM/ASR configuration reads from `.env` (OpenRouter for dev) - [ ] ChromaDB persists to `chroma_db/` directory - [ ] Chunking strategy is abstracted (interface/class) for future replacement - [ ] All unit tests pass (`pytest app/test/test_phase1_*.py -v`) - [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`) --- ## Acceptance Tests **File**: `backend/app/test/acceptance/test_acceptance_phase1_ingest.py` - `test_ingest_docx_with_real_embedding()` — Upload DOCX, verify ChromaDB entries with metadata **File**: `backend/app/test/acceptance/test_acceptance_phase1_rag_query.py` - `test_query_with_real_llm()` — Ask question, verify 3-step pipeline produces bullet answer with sources - `test_query_keywords_displayed()` — Verify response includes extracted keywords --- ## Implementation Tasks ### Phase 1.1: Project Setup & Core Infrastructure **Test files to write first**: - `test_phase1_config.py` — Test config loads from .env correctly - `test_phase1_database.py` — Test ChromaDB client initialization **Task 1.1.1**: Environment and dependencies - Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken - Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH - Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading **Task 1.1.2**: Database initialization - Create `backend/app/core/database.py` — ChromaDB persistent client - Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/` - Function: `get_or_create_collection(name, embedding_function)` **Task 1.1.3**: Project structure - Create all `__init__.py` files for package structure - Create `backend/app/main.py` with FastAPI app, CORS middleware - Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc. **Task 1.1.4**: Pydantic schemas - `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename` - `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources` - `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index` **Commit**: "feat: Phase 1.1 project setup with config, database, and models" ### Phase 1.2: Ingestion Pipeline **Test files to write first**: - `test_phase1_chunking.py` — Test 1000/200 chunking with various text sizes - `test_phase1_ingest.py` — Mock ChromaDB, test endpoint flow - `test_phase1_metadata.py` — Test metadata extraction **Task 1.2.1**: DOCX parsing - `utils/docx_parser.py`: `parse_docx(file_path) -> str` - Handle paragraphs, tables, headers - Return plain text with preserved paragraph breaks **Task 1.2.2**: Chunking abstraction - `utils/chunking.py`: Abstract base class `ChunkingStrategy` - `TokenChunkingStrategy` implementation using tiktoken - Config: chunk_size=1000, overlap=200 - Method: `chunk(text: str) -> list[str]` **Task 1.2.3**: Metadata extraction - `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]` - Returns list of metadata dicts matching chunk count - Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk) **Task 1.2.4**: Embedding service - `services/rag.py`: `RAGService` class - Initialize embedding function with `qwen/qwen3-embedding-4b` - Method: `ingest_document(file_path, chunks, metadata_list)` - Store in ChromaDB collection "documents" **Task 1.2.5**: Ingest endpoint - `routers/ingest.py`: `POST /api/v1/ingest` - Accept `UploadFile` (DOCX only, validate extension) - Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup - Return `IngestResponse` **Commit**: "feat: Phase 1.2 ingestion pipeline with chunking and metadata" ### Phase 1.3: Query Pipeline (3-Step) **Test files to write first**: - `test_phase1_llm_client.py` — Test LLM client error handling - `test_phase1_rag_service.py` — Test retrieval and response generation - `test_phase1_query.py` — Test full pipeline with mocked LLM calls **Task 1.3.1**: LLM client - `services/llm_client.py`: `LLMClient` class - Constructor takes config from `Settings` - Method: `complete(prompt: str, temperature: float = 0.7) -> str` - Use httpx with OpenAI-compatible API format - Handle errors gracefully **Task 1.3.2**: Query decomposition - `services/query_decomposer.py`: `QueryDecomposer` class - Prompt template: "Given question: '{question}', extract key search keywords as JSON array" - Method: `decompose(question: str) -> list[str]` - Parse LLM JSON response into list of keywords **Task 1.3.3**: Retrieval from ChromaDB - `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)` - Join keywords with space for query text - Return list of `(chunk_text, metadata, distance)` tuples **Task 1.3.4**: Relevance filtering - `services/relevance_filter.py`: `RelevanceFilter` class - Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores." - Input: list of chunks - Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7) - Batch all chunks in single LLM call **Task 1.3.5**: Response generation - `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str` - Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources." - Include chunk content and metadata in context - Enforce bullet-point format via prompt **Task 1.3.6**: Query endpoint - `routers/query.py`: `POST /api/v1/query` - Full pipeline orchestration: 1. Call `query_decomposer.decompose()` → get keywords 2. Call `rag.retrieve()` → get chunks 3. Call `relevance_filter.filter()` → filter chunks 4. Call `rag.generate_response()` → get answer - Return `QueryResponse` with keywords, answer, sources **Commit**: "feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response" ### Phase 1.4: Testing & Polish **Test files to write first**: - `test_acceptance_phase1_ingest.py` — Real embedding test - `test_acceptance_phase1_rag_query.py` — Real LLM pipeline test **Task 1.4.1**: Unit tests - Run `pytest app/test/test_phase1_*.py -v` — all must pass - Add missing test coverage for edge cases **Task 1.4.2**: Acceptance tests - Create real `.env` with OpenRouter credentials - Run `test_acceptance_phase1_ingest.py` with real embedding - Run `test_acceptance_phase1_rag_query.py` with real LLM calls - Verify keywords appear, answer is bullet format, sources have metadata **Task 1.4.3**: Error handling - Add try/except in all endpoints - Return proper HTTP status codes (400 for bad input, 500 for LLM errors) - Log errors with context **Task 1.4.4**: Documentation - Update `AGENTS.md` if any conventions changed - Add docstrings to all public methods - Verify all imports work **Commit**: "feat: Phase 1.4 acceptance tests, error handling, and polish" --- ## New Services Required | Service | File | Responsibility | |---------|------|----------------| | Config | `core/config.py` | `.env` loading, Settings class | | Database | `core/database.py` | ChromaDB persistent client | | LLM Client | `services/llm_client.py` | OpenAI-compatible API wrapper | | Query Decomposer | `services/query_decomposer.py` | Extract keywords from question | | Relevance Filter | `services/relevance_filter.py` | Batch score chunk relevance | | RAG Service | `services/rag.py` | Embedding, retrieval, response generation | | DOCX Parser | `utils/docx_parser.py` | Extract text from DOCX | | Chunking | `utils/chunking.py` | Token-based chunking with overlap | | Metadata | `utils/metadata.py` | Extract file metadata | --- ## Environment Variables ```bash LLM_BASE_URL=https://openrouter.ai/api/v1 LLM_API_KEY=your_openrouter_key LLM_MODEL_NAME=qwen/qwen3.5-35b-a3b EMBEDDING_MODEL=qwen/qwen3-embedding-4b EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 CHROMA_DB_PATH=./chroma_db ``` --- ## Notes - Chunking strategy uses ABC pattern for easy future replacement - Relevance filtering uses single batch call for efficiency - All LLM calls go through `LLMClient` for consistent error handling - ChromaDB collection name: "documents" - Metadata fields: filename, upload_date (ISO format), content_summary, chunk_index - Response format enforced purely through prompt engineering (no JSON schema)