8.4 KiB

Raw Blame History

Phase 1 Backend Development Plan

Source: development_plan.md
Scope: FastAPI backend for text-based RAG Q&A
Estimated Duration: 3-4 days
Status: Draft

Objective

Build a complete FastAPI backend that:

Accepts DOCX uploads, chunks text (1000 tokens / 200 overlap), embeds via Qwen, and stores in persistent ChromaDB with metadata
Runs a 3-step RAG pipeline: query decomposition → retrieval → relevance filtering → bullet-point response
Serves API endpoints for ingestion and querying with full metadata attribution

Acceptance Criteria

POST /api/v1/ingest accepts DOCX, parses content, chunks at 1000/200, embeds, stores in ChromaDB with filename/upload_date/content_summary
POST /api/v1/query accepts natural language question, returns JSON with: keywords, answer (bullet points), sources (array of metadata objects)
Query pipeline executes 3 LLM calls: decomposition → relevance filter → response generation
All LLM/ASR configuration reads from .env (OpenRouter for dev)
ChromaDB persists to chroma_db/ directory
Chunking strategy is abstracted (interface/class) for future replacement
All unit tests pass (pytest app/test/test_phase1_*.py -v)
All acceptance tests pass (pytest app/test/acceptance/ -v -m acceptance)

Acceptance Tests

File: backend/app/test/acceptance/test_acceptance_phase1_ingest.py

test_ingest_docx_with_real_embedding() — Upload DOCX, verify ChromaDB entries with metadata

File: backend/app/test/acceptance/test_acceptance_phase1_rag_query.py

test_query_with_real_llm() — Ask question, verify 3-step pipeline produces bullet answer with sources
test_query_keywords_displayed() — Verify response includes extracted keywords

Implementation Tasks

Day 1: Project Setup & Core Infrastructure

Task 1.1: Environment and dependencies

Create backend/requirements.txt with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
Create backend/.env.example with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
Create backend/app/core/config.py — Pydantic Settings with .env loading

Task 1.2: Database initialization

Create backend/app/core/database.py — ChromaDB persistent client
Function: get_chroma_client() returns persistent client pointing to chroma_db/
Function: get_or_create_collection(name, embedding_function)

Task 1.3: Project structure

Create all __init__.py files for package structure
Create backend/app/main.py with FastAPI app, CORS middleware
Include routers: app.include_router(ingest.router, prefix="/api/v1"), etc.

Task 1.4: Pydantic schemas

models/ingest.py: IngestResponse with document_id, chunk_count, filename
models/query.py: QueryRequest with question; QueryResponse with keywords, answer, sources
models/common.py: SourceMetadata with filename, upload_date, content_summary, chunk_index

Day 2: Ingestion Pipeline

Task 2.1: DOCX parsing

utils/docx_parser.py: parse_docx(file_path) -> str
Handle paragraphs, tables, headers
Return plain text with preserved paragraph breaks

Task 2.2: Chunking abstraction

utils/chunking.py: Abstract base class ChunkingStrategy
TokenChunkingStrategy implementation using tiktoken
Config: chunk_size=1000, overlap=200
Method: chunk(text: str) -> list[str]

Task 2.3: Metadata extraction

utils/metadata.py: extract_metadata(file_path, chunks) -> list[dict]
Returns list of metadata dicts matching chunk count
Each metadata has: filename, upload_date, content_summary (first 200 chars of chunk)

Task 2.4: Embedding service

services/rag.py: RAGService class
Initialize embedding function with qwen/qwen3-embedding-4b
Method: ingest_document(file_path, chunks, metadata_list)
Store in ChromaDB collection "documents"

Task 2.5: Ingest endpoint

routers/ingest.py: POST /api/v1/ingest
Accept UploadFile (DOCX only, validate extension)
Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
Return IngestResponse

Task 2.6: Unit tests

test_phase1_chunking.py: Test 1000/200 chunking with various text sizes
test_phase1_ingest.py: Mock ChromaDB, test endpoint flow

Day 3: Query Pipeline (3-Step)

Task 3.1: LLM client

services/llm_client.py: LLMClient class
Constructor takes config from Settings
Method: complete(prompt: str, temperature: float = 0.7) -> str
Use httpx with OpenAI-compatible API format
Handle errors gracefully

Task 3.2: Query decomposition

services/query_decomposer.py: QueryDecomposer class
Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
Method: decompose(question: str) -> list[str]
Parse LLM JSON response into list of keywords

Task 3.3: Retrieval from ChromaDB

services/rag.py: Add retrieve(query_keywords: list[str], n_results: int = 10)
Join keywords with space for query text
Return list of (chunk_text, metadata, distance) tuples

Task 3.4: Relevance filtering

services/relevance_filter.py: RelevanceFilter class
Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
Input: list of chunks
Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
Batch all chunks in single LLM call

Task 3.5: Response generation

services/rag.py: Add generate_response(question: str, chunks: list, metadata: list) -> str
Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
Include chunk content and metadata in context
Enforce bullet-point format via prompt

Task 3.6: Query endpoint

routers/query.py: POST /api/v1/query
Full pipeline orchestration:
1. Call query_decomposer.decompose() → get keywords
2. Call rag.retrieve() → get chunks
3. Call relevance_filter.filter() → filter chunks
4. Call rag.generate_response() → get answer
Return QueryResponse with keywords, answer, sources

Day 4: Testing & Polish

Task 4.1: Unit tests

test_phase1_query.py: Test full pipeline with mocked LLM calls
test_phase1_llm_client.py: Test LLM client error handling
test_phase1_rag_service.py: Test retrieval and response generation

Task 4.2: Acceptance tests

Create real .env with OpenRouter credentials
Run test_acceptance_phase1_ingest.py with real embedding
Run test_acceptance_phase1_rag_query.py with real LLM calls
Verify keywords appear, answer is bullet format, sources have metadata

Task 4.3: Error handling

Add try/except in all endpoints
Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
Log errors with context

Task 4.4: Documentation

Update AGENTS.md if any conventions changed
Add docstrings to all public methods
Verify all imports work

New Services Required

Service	File	Responsibility
Config	`core/config.py`	`.env` loading, Settings class
Database	`core/database.py`	ChromaDB persistent client
LLM Client	`services/llm_client.py`	OpenAI-compatible API wrapper
Query Decomposer	`services/query_decomposer.py`	Extract keywords from question
Relevance Filter	`services/relevance_filter.py`	Batch score chunk relevance
RAG Service	`services/rag.py`	Embedding, retrieval, response generation
DOCX Parser	`utils/docx_parser.py`	Extract text from DOCX
Chunking	`utils/chunking.py`	Token-based chunking with overlap
Metadata	`utils/metadata.py`	Extract file metadata

Environment Variables

LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your_openrouter_key
LLM_MODEL_NAME=qwen/qwen3.5-35b-a3b
EMBEDDING_MODEL=qwen/qwen3-embedding-4b
EMBEDDING_BASE_URL=https://openrouter.ai/api/v1
CHROMA_DB_PATH=./chroma_db

Notes

Chunking strategy uses ABC pattern for easy future replacement
Relevance filtering uses single batch call for efficiency
All LLM calls go through LLMClient for consistent error handling
ChromaDB collection name: "documents"
Metadata fields: filename, upload_date (ISO format), content_summary, chunk_index
Response format enforced purely through prompt engineering (no JSON schema)

8.4 KiB Raw Blame History