diff --git a/.plans/phase1_backend_plan.md b/.plans/phase1_backend_plan.md index 4cf60c2..0ccfe0f 100644 --- a/.plans/phase1_backend_plan.md +++ b/.plans/phase1_backend_plan.md @@ -42,96 +42,110 @@ Build a complete FastAPI backend that: ## Implementation Tasks -### Day 1: Project Setup & Core Infrastructure +### Phase 1.1: Project Setup & Core Infrastructure -**Task 1.1**: Environment and dependencies +**Test files to write first**: +- `test_phase1_config.py` — Test config loads from .env correctly +- `test_phase1_database.py` — Test ChromaDB client initialization + +**Task 1.1.1**: Environment and dependencies - Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken - Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH - Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading -**Task 1.2**: Database initialization +**Task 1.1.2**: Database initialization - Create `backend/app/core/database.py` — ChromaDB persistent client - Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/` - Function: `get_or_create_collection(name, embedding_function)` -**Task 1.3**: Project structure +**Task 1.1.3**: Project structure - Create all `__init__.py` files for package structure - Create `backend/app/main.py` with FastAPI app, CORS middleware - Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc. -**Task 1.4**: Pydantic schemas +**Task 1.1.4**: Pydantic schemas - `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename` - `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources` - `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index` -### Day 2: Ingestion Pipeline +**Commit**: "feat: Phase 1.1 project setup with config, database, and models" -**Task 2.1**: DOCX parsing +### Phase 1.2: Ingestion Pipeline + +**Test files to write first**: +- `test_phase1_chunking.py` — Test 1000/200 chunking with various text sizes +- `test_phase1_ingest.py` — Mock ChromaDB, test endpoint flow +- `test_phase1_metadata.py` — Test metadata extraction + +**Task 1.2.1**: DOCX parsing - `utils/docx_parser.py`: `parse_docx(file_path) -> str` - Handle paragraphs, tables, headers - Return plain text with preserved paragraph breaks -**Task 2.2**: Chunking abstraction +**Task 1.2.2**: Chunking abstraction - `utils/chunking.py`: Abstract base class `ChunkingStrategy` - `TokenChunkingStrategy` implementation using tiktoken - Config: chunk_size=1000, overlap=200 - Method: `chunk(text: str) -> list[str]` -**Task 2.3**: Metadata extraction +**Task 1.2.3**: Metadata extraction - `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]` - Returns list of metadata dicts matching chunk count - Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk) -**Task 2.4**: Embedding service +**Task 1.2.4**: Embedding service - `services/rag.py`: `RAGService` class - Initialize embedding function with `qwen/qwen3-embedding-4b` - Method: `ingest_document(file_path, chunks, metadata_list)` - Store in ChromaDB collection "documents" -**Task 2.5**: Ingest endpoint +**Task 1.2.5**: Ingest endpoint - `routers/ingest.py`: `POST /api/v1/ingest` - Accept `UploadFile` (DOCX only, validate extension) - Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup - Return `IngestResponse` -**Task 2.6**: Unit tests -- `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes -- `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow +**Commit**: "feat: Phase 1.2 ingestion pipeline with chunking and metadata" -### Day 3: Query Pipeline (3-Step) +### Phase 1.3: Query Pipeline (3-Step) -**Task 3.1**: LLM client +**Test files to write first**: +- `test_phase1_llm_client.py` — Test LLM client error handling +- `test_phase1_rag_service.py` — Test retrieval and response generation +- `test_phase1_query.py` — Test full pipeline with mocked LLM calls + +**Task 1.3.1**: LLM client - `services/llm_client.py`: `LLMClient` class - Constructor takes config from `Settings` - Method: `complete(prompt: str, temperature: float = 0.7) -> str` - Use httpx with OpenAI-compatible API format - Handle errors gracefully -**Task 3.2**: Query decomposition +**Task 1.3.2**: Query decomposition - `services/query_decomposer.py`: `QueryDecomposer` class - Prompt template: "Given question: '{question}', extract key search keywords as JSON array" - Method: `decompose(question: str) -> list[str]` - Parse LLM JSON response into list of keywords -**Task 3.3**: Retrieval from ChromaDB +**Task 1.3.3**: Retrieval from ChromaDB - `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)` - Join keywords with space for query text - Return list of `(chunk_text, metadata, distance)` tuples -**Task 3.4**: Relevance filtering +**Task 1.3.4**: Relevance filtering - `services/relevance_filter.py`: `RelevanceFilter` class - Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores." - Input: list of chunks - Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7) - Batch all chunks in single LLM call -**Task 3.5**: Response generation +**Task 1.3.5**: Response generation - `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str` - Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources." - Include chunk content and metadata in context - Enforce bullet-point format via prompt -**Task 3.6**: Query endpoint +**Task 1.3.6**: Query endpoint - `routers/query.py`: `POST /api/v1/query` - Full pipeline orchestration: 1. Call `query_decomposer.decompose()` → get keywords @@ -140,29 +154,36 @@ Build a complete FastAPI backend that: 4. Call `rag.generate_response()` → get answer - Return `QueryResponse` with keywords, answer, sources -### Day 4: Testing & Polish +**Commit**: "feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response" -**Task 4.1**: Unit tests -- `test_phase1_query.py`: Test full pipeline with mocked LLM calls -- `test_phase1_llm_client.py`: Test LLM client error handling -- `test_phase1_rag_service.py`: Test retrieval and response generation +### Phase 1.4: Testing & Polish -**Task 4.2**: Acceptance tests +**Test files to write first**: +- `test_acceptance_phase1_ingest.py` — Real embedding test +- `test_acceptance_phase1_rag_query.py` — Real LLM pipeline test + +**Task 1.4.1**: Unit tests +- Run `pytest app/test/test_phase1_*.py -v` — all must pass +- Add missing test coverage for edge cases + +**Task 1.4.2**: Acceptance tests - Create real `.env` with OpenRouter credentials - Run `test_acceptance_phase1_ingest.py` with real embedding - Run `test_acceptance_phase1_rag_query.py` with real LLM calls - Verify keywords appear, answer is bullet format, sources have metadata -**Task 4.3**: Error handling +**Task 1.4.3**: Error handling - Add try/except in all endpoints - Return proper HTTP status codes (400 for bad input, 500 for LLM errors) - Log errors with context -**Task 4.4**: Documentation +**Task 1.4.4**: Documentation - Update `AGENTS.md` if any conventions changed - Add docstrings to all public methods - Verify all imports work +**Commit**: "feat: Phase 1.4 acceptance tests, error handling, and polish" + --- ## New Services Required diff --git a/.plans/phase1_frontend_plan.md b/.plans/phase1_frontend_plan.md index 5857d75..bce0b6c 100644 --- a/.plans/phase1_frontend_plan.md +++ b/.plans/phase1_frontend_plan.md @@ -40,7 +40,11 @@ Build a React frontend that: ## Implementation Tasks -### Day 1: Project Setup & Layout +### Phase 1.1: Project Setup & Layout + +**Test files to write first**: +- `src/test/components/Layout.test.tsx` — Test grid renders correctly +- `src/test/lib/api.test.ts` — Test API client configuration 1. **Project scaffold** - Initialize Vite project: `npm create vite@latest frontend -- --template react-ts` @@ -65,7 +69,14 @@ Build a React frontend that: ``` - Use CSS Grid or Flexbox for clean separation -### Day 2: Components & Integration +**Commit**: "feat: Phase 1.1 frontend project setup with layout and API client" + +### Phase 1.2: Components & Integration + +**Test files to write first**: +- `src/test/components/QueryInput.test.tsx` — Test input and submit +- `src/test/components/KeywordsDisplay.test.tsx` — Test keyword rendering +- `src/test/components/ResponsePanel.test.tsx` — Test bullet list and metadata 1. **QueryInput component** - `src/components/QueryInput.tsx` @@ -97,7 +108,12 @@ Build a React frontend that: - Toast notifications for API errors - Retry mechanism for failed queries -### Day 3: Polish & Integration Testing +**Commit**: "feat: Phase 1.2 frontend components with query flow" + +### Phase 1.3: Polish & Integration Testing + +**Test files to write first**: +- `src/test/e2e/query_flow.spec.ts` — End-to-end test with backend 1. **Loading states** - Skeleton loaders for each panel @@ -118,6 +134,8 @@ Build a React frontend that: - `npm run build` succeeds - Production build serves correctly via `npm run preview` +**Commit**: "feat: Phase 1.3 frontend polish, loading states, and integration" + --- ## Dependencies diff --git a/AGENTS.md b/AGENTS.md index 379692a..4be5f8e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -107,15 +107,44 @@ test_phase_.py ## SUB-PHASE DEVELOPMENT -**Workflow**: Plan → Implement → Acceptance Test → Commit +**Workflow**: Plan → Write Test → Implement → Make Test Pass → Commit + +### Sub-Phase Naming + +Use decimal notation: **Phase X.Y** where X = major phase, Y = sub-phase number. + +| Example | Scope | +|---------|-------| +| Phase 1.1 | Project setup, config, database | +| Phase 1.2 | Ingestion pipeline | +| Phase 1.3 | Query pipeline (3-step LLM workflow) | +| Phase 1.4 | Testing & polish | +| Phase 2.1 | Video upload backend | +| Phase 2.2 | ASR integration | + +### Test-First Rule (MANDATORY) + +Every sub-phase follows **test-driven delivery**: + +1. **Write test first** — Before writing implementation code, write the test that defines "done" +2. **Implement** — Write the minimum code to make the test pass +3. **Run test** — Verify test passes (both unit and acceptance where applicable) +4. **Commit** — Only commit after tests pass. Never commit broken tests. +5. **Next sub-phase** — Only start next sub-phase after current is committed + +**Enforcement**: +- Each Implementation Task in a sub-phase plan must list its test file(s) +- Tests must be in the `backend/app/test/` or `frontend/src/test/` directory +- Pre-commit: `pytest` must pass for backend, `npm test` for frontend ### Sub-Phase Plan Template Each sub-phase plan (stored in `.plans/`) must include: 1. **Objective** — What this sub-phase delivers -2. **Acceptance Criteria** — List of behaviors that must work -3. **Acceptance Tests** — `test_acceptance_.py` file(s) with real environment -4. **Implementation Tasks** — Atomic steps to complete +2. **Test Files** — List of test files to write BEFORE implementation +3. **Acceptance Criteria** — List of behaviors that must work +4. **Acceptance Tests** — `test_acceptance_.py` file(s) with real environment +5. **Implementation Tasks** — Atomic steps, each referencing its test file ### Acceptance Testing Rules @@ -151,11 +180,12 @@ def test_query_with_real_llm(): ``` **Sub-phase completion checklist**: +- [ ] All unit tests written BEFORE implementation - [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`) - [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`) - [ ] Code reviewed (self or peer) - [ ] Sub-phase plan marked complete in `.plans/` -- [ ] Git commit with clear message referencing sub-phase plan +- [ ] Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests") ## COMMANDS ```bash