11 KiB
Phase 1 Backend Development Plan
Source: development_plan.md
Scope: FastAPI backend for text-based RAG Q&A
Estimated Duration: 3-4 days
Status: ✅ Complete (Phase 1.1, 1.2, 1.3, 1.4 all done)
Objective
Build a complete FastAPI backend that:
- Accepts DOCX and PDF uploads, chunks text (1000 tokens / 200 overlap), embeds via Qwen, and stores in persistent ChromaDB with metadata
- Runs a 3-step RAG pipeline: query decomposition → retrieval → relevance filtering → bullet-point response
- Serves API endpoints for ingestion and querying with full metadata attribution
Acceptance Criteria
POST /api/v1/ingestaccepts DOCX and PDF, parses content, chunks at 1000/200, embeds, stores in ChromaDB with filename/upload_date/content_summaryPOST /api/v1/queryaccepts natural language question, returns JSON with:keywords,answer(bullet points),sources(array of metadata objects)- Query pipeline executes 3 LLM calls: decomposition → relevance filter → response generation
- All LLM/ASR configuration reads from
.env(OpenRouter for dev) - ChromaDB persists to
chroma_db/directory - Chunking strategy is abstracted (interface/class) for future replacement
- All unit tests pass (
pytest app/test/test_phase1_*.py -v) - All acceptance tests pass (
pytest app/test/acceptance/ -v -m acceptance)
Acceptance Tests
File: backend/app/test/acceptance/test_acceptance_phase1_ingest.py
test_ingest_docx_with_real_embedding()— Upload DOCX, verify ChromaDB entries with metadatatest_ingest_pdf_with_real_embedding()— Upload PDF, verify ChromaDB entries with metadata
File: backend/app/test/acceptance/test_acceptance_phase1_rag_query.py
test_query_with_real_llm()— Ask question, verify 3-step pipeline produces bullet answer with sourcestest_query_keywords_displayed()— Verify response includes extracted keywords
Implementation Tasks
Phase 1.1: Project Setup & Core Infrastructure
Test files to write first:
test_phase1_config.py— Test config loads from .env correctlytest_phase1_database.py— Test ChromaDB client initialization
Task 1.1.1: Environment and dependencies
- Create
backend/requirements.txtwith: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, pypdf, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken - Create
backend/.env.examplewith: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH - Create
backend/app/core/config.py— Pydantic Settings with.envloading
Task 1.1.2: Database initialization
- Create
backend/app/core/database.py— ChromaDB persistent client - Function:
get_chroma_client()returns persistent client pointing tochroma_db/ - Function:
get_or_create_collection(name, embedding_function)
Task 1.1.3: Project structure
- Create all
__init__.pyfiles for package structure - Create
backend/app/main.pywith FastAPI app, CORS middleware - Include routers:
app.include_router(ingest.router, prefix="/api/v1"), etc.
Task 1.1.4: Pydantic schemas
models/ingest.py:IngestResponsewithdocument_id,chunk_count,filenamemodels/query.py:QueryRequestwithquestion;QueryResponsewithkeywords,answer,sourcesmodels/common.py:SourceMetadatawithfilename,upload_date,content_summary,chunk_index
Commit: "feat: Phase 1.1 project setup with config, database, and models"
Status: ✅ Complete
Tests: 5 passed (2 config, 3 database)
Phase 1.2: Ingestion Pipeline
Test files to write first:
test_phase1_chunking.py— Test 1000/200 chunking with various text sizestest_phase1_ingest.py— Mock ChromaDB, test endpoint flowtest_phase1_metadata.py— Test metadata extraction
Task 1.2.1: Document parsing
utils/docx_parser.py:parse_docx(file_path) -> str— Extract text from DOCXutils/pdf_parser.py:parse_pdf(file_path) -> str— Extract text from PDF using pypdf- Both return plain text with preserved paragraph breaks
- Handle edge cases: empty docs, corrupted files, scanned PDFs (skip with warning)
Task 1.2.2: Chunking abstraction
utils/chunking.py: Abstract base classChunkingStrategyTokenChunkingStrategyimplementation using tiktoken- Config: chunk_size=1000, overlap=200
- Method:
chunk(text: str) -> list[str]
Task 1.2.3: Metadata extraction
utils/metadata.py:extract_metadata(file_path, chunks) -> list[dict]- Returns list of metadata dicts matching chunk count
- Each metadata has:
filename,upload_date,content_summary(first 200 chars of chunk)
Task 1.2.4: Embedding service
services/rag.py:RAGServiceclass- Initialize embedding function with
qwen/qwen3-embedding-4b - Method:
ingest_document(file_path, chunks, metadata_list) - Store in ChromaDB collection "documents"
Task 1.2.5: Ingest endpoint
routers/ingest.py:POST /api/v1/ingest- Accept
UploadFile(DOCX and PDF, validate extension) - Route to correct parser based on file extension
- Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
- Return
IngestResponse
Commit: "feat: Phase 1.2 ingestion pipeline with chunking and metadata"
Status: ✅ Complete
Tests: 20 passed, 2 skipped (python-docx not installed in test env)
Coverage: chunking (4), metadata (3), parsers (5), RAGService (6), ingest endpoint (4)
Phase 1.3: Query Pipeline (3-Step)
Test files to write first:
test_phase1_llm_client.py— Test LLM client error handlingtest_phase1_rag_service.py— Test retrieval and response generationtest_phase1_query.py— Test full pipeline with mocked LLM calls
Task 1.3.1: LLM client — ✅ Done in Phase 1.1
services/llm_client.py:LLMClientclass — Implemented- Constructor takes config from
Settings - Method:
complete(prompt: str, temperature: float = 0.7) -> str - Use httpx with OpenAI-compatible API format
- Handle errors gracefully
Task 1.3.2: Query decomposition
services/query_decomposer.py:QueryDecomposerclass — 🔄 Pending- Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
- Method:
decompose(question: str) -> list[str] - Parse LLM JSON response into list of keywords
Task 1.3.3: Retrieval from ChromaDB — ✅ Done in Phase 1.2
services/rag.py:retrieve(query_keywords: list[str], n_results: int = 10)— Implemented- Join keywords with space for query text
- Return list of
(chunk_text, metadata, distance)tuples
Task 1.3.4: Relevance filtering
services/relevance_filter.py:RelevanceFilterclass — 🔄 Pending- Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
- Input: list of chunks
- Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
- Batch all chunks in single LLM call
Task 1.3.5: Response generation — ✅ Done in Phase 1.2
services/rag.py:generate_response(question: str, chunks: list, metadata: list) -> str— Implemented- Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
- Include chunk content and metadata in context
- Enforce bullet-point format via prompt
Task 1.3.6: Query endpoint
routers/query.py:POST /api/v1/query— 🔄 Pending- Full pipeline orchestration:
- Call
query_decomposer.decompose()→ get keywords - Call
rag.retrieve()→ get chunks - Call
relevance_filter.filter()→ filter chunks - Call
rag.generate_response()→ get answer
- Call
- Return
QueryResponsewith keywords, answer, sources
Commit: "feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response"
Status: ✅ Complete
Tests: 13 passed (5 decomposer, 5 relevance filter, 3 query endpoint)
Phase 1.4: Testing & Polish
Test files to write first:
test_acceptance_phase1_ingest.py— Real embedding testtest_acceptance_phase1_rag_query.py— Real LLM pipeline test
Task 1.4.1: Unit tests
- Run
pytest app/test/test_phase1_*.py -v— all must pass - Add missing test coverage for edge cases
Task 1.4.2: Acceptance tests
- Create real
.envwith OpenRouter credentials - Run
test_acceptance_phase1_ingest.pywith real embedding - Run
test_acceptance_phase1_rag_query.pywith real LLM calls - Verify keywords appear, answer is bullet format, sources have metadata
Task 1.4.3: Error handling
- Add try/except in all endpoints
- Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
- Log errors with context
Task 1.4.4: Documentation
- Update
AGENTS.mdif any conventions changed - Add docstrings to all public methods
- Verify all imports work
Commit: "feat: Phase 1.4 acceptance tests, error handling, and polish"
Status: ✅ Complete
Tests: 41 unit tests passed (2 skipped), 5 acceptance tests passed
Acceptance: Full 3-step pipeline verified with real OpenRouter LLM calls
Services Status
| Service | File | Status | Responsibility |
|---|---|---|---|
| Config | core/config.py |
✅ Complete | .env loading, Settings class |
| Database | core/database.py |
✅ Complete | ChromaDB persistent client |
| LLM Client | services/llm_client.py |
✅ Complete | OpenAI-compatible API wrapper |
| Query Decomposer | services/query_decomposer.py |
✅ Complete | Extract keywords from question |
| Relevance Filter | services/relevance_filter.py |
✅ Complete | Batch score chunk relevance |
| RAG Service | services/rag.py |
✅ Complete | Embedding, retrieval, response generation |
| Ingest Router | routers/ingest.py |
✅ Complete | POST /api/v1/ingest endpoint |
| Query Router | routers/query.py |
✅ Complete | POST /api/v1/query endpoint |
| DOCX Parser | utils/docx_parser.py |
✅ Complete | Extract text from DOCX |
| PDF Parser | utils/pdf_parser.py |
✅ Complete | Extract text from PDF |
| Chunking | utils/chunking.py |
✅ Complete | Token-based chunking with overlap |
| Metadata | utils/metadata.py |
✅ Complete | Extract file metadata |
Environment Variables
LLM_BASE_URL=https://openrouter.ai/api/v1
LLM_API_KEY=your_openrouter_key
LLM_MODEL_NAME=qwen/qwen3.5-35b-a3b
EMBEDDING_MODEL=qwen/qwen3-embedding-4b
EMBEDDING_BASE_URL=https://openrouter.ai/api/v1
CHROMA_DB_PATH=./chroma_db
Notes
- Chunking strategy uses ABC pattern for easy future replacement
- Relevance filtering uses single batch call for efficiency
- All LLM calls go through
LLMClientfor consistent error handling - ChromaDB collection name: "documents"
- Metadata fields: filename, upload_date (ISO format), content_summary, chunk_index
- Response format enforced purely through prompt engineering (no JSON schema)