docs: add test-first and Phase X.Y sub-phase naming to AGENTS.md and plans

2026-04-22 15:54:34 +08:00 · 2026-04-22 15:54:34 +08:00 · eeb464528a
parent 1f4e3a2572
commit eeb464528a
3 changed files with 106 additions and 37 deletions
--- a/.plans/phase1_backend_plan.md
+++ b/.plans/phase1_backend_plan.md
@ -42,96 +42,110 @@ Build a complete FastAPI backend that:
 ## Implementation Tasks
-### Day 1: Project Setup & Core Infrastructure
+### Phase 1.1: Project Setup & Core Infrastructure
-**Task 1.1**: Environment and dependencies
+**Test files to write first**:
 - `test_phase1_config.py` — Test config loads from .env correctly
 - `test_phase1_database.py` — Test ChromaDB client initialization
 **Task 1.1.1**: Environment and dependencies
 - Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
 - Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
 - Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading
-**Task 1.2**: Database initialization
+**Task 1.1.2**: Database initialization
 - Create `backend/app/core/database.py` — ChromaDB persistent client
 - Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/`
 - Function: `get_or_create_collection(name, embedding_function)`
-**Task 1.3**: Project structure
+**Task 1.1.3**: Project structure
 - Create all `__init__.py` files for package structure
 - Create `backend/app/main.py` with FastAPI app, CORS middleware
 - Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc.
-**Task 1.4**: Pydantic schemas
+**Task 1.1.4**: Pydantic schemas
 - `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename`
 - `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources`
 - `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index`
-### Day 2: Ingestion Pipeline
+**Commit**: "feat: Phase 1.1 project setup with config, database, and models"
-**Task 2.1**: DOCX parsing
+### Phase 1.2: Ingestion Pipeline
 **Test files to write first**:
 - `test_phase1_chunking.py` — Test 1000/200 chunking with various text sizes
 - `test_phase1_ingest.py` — Mock ChromaDB, test endpoint flow
 - `test_phase1_metadata.py` — Test metadata extraction
 **Task 1.2.1**: DOCX parsing
 - `utils/docx_parser.py`: `parse_docx(file_path) -> str`
 - Handle paragraphs, tables, headers
 - Return plain text with preserved paragraph breaks
-**Task 2.2**: Chunking abstraction
+**Task 1.2.2**: Chunking abstraction
 - `utils/chunking.py`: Abstract base class `ChunkingStrategy`
 - `TokenChunkingStrategy` implementation using tiktoken
 - Config: chunk_size=1000, overlap=200
 - Method: `chunk(text: str) -> list[str]`
-**Task 2.3**: Metadata extraction
+**Task 1.2.3**: Metadata extraction
 - `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]`
 - Returns list of metadata dicts matching chunk count
 - Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk)
-**Task 2.4**: Embedding service
+**Task 1.2.4**: Embedding service
 - `services/rag.py`: `RAGService` class
 - Initialize embedding function with `qwen/qwen3-embedding-4b`
 - Method: `ingest_document(file_path, chunks, metadata_list)`
 - Store in ChromaDB collection "documents"
-**Task 2.5**: Ingest endpoint
+**Task 1.2.5**: Ingest endpoint
 - `routers/ingest.py`: `POST /api/v1/ingest`
 - Accept `UploadFile` (DOCX only, validate extension)
 - Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
 - Return `IngestResponse`
-**Task 2.6**: Unit tests
+**Commit**: "feat: Phase 1.2 ingestion pipeline with chunking and metadata"
 - `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes
 - `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow
-### Day 3: Query Pipeline (3-Step)
+### Phase 1.3: Query Pipeline (3-Step)
-**Task 3.1**: LLM client
+**Test files to write first**:
 - `test_phase1_llm_client.py` — Test LLM client error handling
 - `test_phase1_rag_service.py` — Test retrieval and response generation
 - `test_phase1_query.py` — Test full pipeline with mocked LLM calls
 **Task 1.3.1**: LLM client
 - `services/llm_client.py`: `LLMClient` class
 - Constructor takes config from `Settings`
 - Method: `complete(prompt: str, temperature: float = 0.7) -> str`
 - Use httpx with OpenAI-compatible API format
 - Handle errors gracefully
-**Task 3.2**: Query decomposition
+**Task 1.3.2**: Query decomposition
 - `services/query_decomposer.py`: `QueryDecomposer` class
 - Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
 - Method: `decompose(question: str) -> list[str]`
 - Parse LLM JSON response into list of keywords
-**Task 3.3**: Retrieval from ChromaDB
+**Task 1.3.3**: Retrieval from ChromaDB
 - `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)`
 - Join keywords with space for query text
 - Return list of `(chunk_text, metadata, distance)` tuples
-**Task 3.4**: Relevance filtering
+**Task 1.3.4**: Relevance filtering
 - `services/relevance_filter.py`: `RelevanceFilter` class
 - Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
 - Input: list of chunks
 - Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
 - Batch all chunks in single LLM call
-**Task 3.5**: Response generation
+**Task 1.3.5**: Response generation
 - `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str`
 - Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
 - Include chunk content and metadata in context
 - Enforce bullet-point format via prompt
-**Task 3.6**: Query endpoint
+**Task 1.3.6**: Query endpoint
 - `routers/query.py`: `POST /api/v1/query`
 - Full pipeline orchestration:
  1. Call `query_decomposer.decompose()` → get keywords
@ -140,29 +154,36 @@ Build a complete FastAPI backend that:
  4. Call `rag.generate_response()` → get answer
 - Return `QueryResponse` with keywords, answer, sources
-### Day 4: Testing & Polish
+**Commit**: "feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response"
-**Task 4.1**: Unit tests
+### Phase 1.4: Testing & Polish
 - `test_phase1_query.py`: Test full pipeline with mocked LLM calls
 - `test_phase1_llm_client.py`: Test LLM client error handling
 - `test_phase1_rag_service.py`: Test retrieval and response generation
-**Task 4.2**: Acceptance tests
+**Test files to write first**:
 - `test_acceptance_phase1_ingest.py` — Real embedding test
 - `test_acceptance_phase1_rag_query.py` — Real LLM pipeline test
 **Task 1.4.1**: Unit tests
 - Run `pytest app/test/test_phase1_*.py -v` — all must pass
 - Add missing test coverage for edge cases
 **Task 1.4.2**: Acceptance tests
 - Create real `.env` with OpenRouter credentials
 - Run `test_acceptance_phase1_ingest.py` with real embedding
 - Run `test_acceptance_phase1_rag_query.py` with real LLM calls
 - Verify keywords appear, answer is bullet format, sources have metadata
-**Task 4.3**: Error handling
+**Task 1.4.3**: Error handling
 - Add try/except in all endpoints
 - Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
 - Log errors with context
-**Task 4.4**: Documentation
+**Task 1.4.4**: Documentation
 - Update `AGENTS.md` if any conventions changed
 - Add docstrings to all public methods
 - Verify all imports work
 **Commit**: "feat: Phase 1.4 acceptance tests, error handling, and polish"
 ---
 ## New Services Required
--- a/.plans/phase1_frontend_plan.md
+++ b/.plans/phase1_frontend_plan.md
@ -40,7 +40,11 @@ Build a React frontend that:
 ## Implementation Tasks
-### Day 1: Project Setup & Layout
+### Phase 1.1: Project Setup & Layout
 **Test files to write first**:
 - `src/test/components/Layout.test.tsx` — Test grid renders correctly
 - `src/test/lib/api.test.ts` — Test API client configuration
 1. **Project scaffold**
   - Initialize Vite project: `npm create vite@latest frontend -- --template react-ts`
@ -65,7 +69,14 @@ Build a React frontend that:
     ```
   - Use CSS Grid or Flexbox for clean separation
-### Day 2: Components & Integration
+**Commit**: "feat: Phase 1.1 frontend project setup with layout and API client"
 ### Phase 1.2: Components & Integration
 **Test files to write first**:
 - `src/test/components/QueryInput.test.tsx` — Test input and submit
 - `src/test/components/KeywordsDisplay.test.tsx` — Test keyword rendering
 - `src/test/components/ResponsePanel.test.tsx` — Test bullet list and metadata
 1. **QueryInput component**
   - `src/components/QueryInput.tsx`
@ -97,7 +108,12 @@ Build a React frontend that:
   - Toast notifications for API errors
   - Retry mechanism for failed queries
-### Day 3: Polish & Integration Testing
+**Commit**: "feat: Phase 1.2 frontend components with query flow"
 ### Phase 1.3: Polish & Integration Testing
 **Test files to write first**:
 - `src/test/e2e/query_flow.spec.ts` — End-to-end test with backend
 1. **Loading states**
   - Skeleton loaders for each panel
@ -118,6 +134,8 @@ Build a React frontend that:
   - `npm run build` succeeds
   - Production build serves correctly via `npm run preview`
 **Commit**: "feat: Phase 1.3 frontend polish, loading states, and integration"
 ---
 ## Dependencies
--- a/AGENTS.md
+++ b/AGENTS.md
@ -107,15 +107,44 @@ test_phase<N>_<module_or_feature>.py
 ## SUB-PHASE DEVELOPMENT
-**Workflow**: Plan → Implement → Acceptance Test → Commit
+**Workflow**: Plan → Write Test → Implement → Make Test Pass → Commit
 ### Sub-Phase Naming
 Use decimal notation: **Phase X.Y** where X = major phase, Y = sub-phase number.
 | Example | Scope |
 |---------|-------|
 | Phase 1.1 | Project setup, config, database |
 | Phase 1.2 | Ingestion pipeline |
 | Phase 1.3 | Query pipeline (3-step LLM workflow) |
 | Phase 1.4 | Testing & polish |
 | Phase 2.1 | Video upload backend |
 | Phase 2.2 | ASR integration |
 ### Test-First Rule (MANDATORY)
 Every sub-phase follows **test-driven delivery**:
 1. **Write test first** — Before writing implementation code, write the test that defines "done"
 2. **Implement** — Write the minimum code to make the test pass
 3. **Run test** — Verify test passes (both unit and acceptance where applicable)
 4. **Commit** — Only commit after tests pass. Never commit broken tests.
 5. **Next sub-phase** — Only start next sub-phase after current is committed
 **Enforcement**:
 - Each Implementation Task in a sub-phase plan must list its test file(s)
 - Tests must be in the `backend/app/test/` or `frontend/src/test/` directory
 - Pre-commit: `pytest` must pass for backend, `npm test` for frontend
 ### Sub-Phase Plan Template
 Each sub-phase plan (stored in `.plans/`) must include:
 1. **Objective** — What this sub-phase delivers
-2. **Acceptance Criteria** — List of behaviors that must work
+2. **Test Files** — List of test files to write BEFORE implementation
-3. **Acceptance Tests** — `test_acceptance_<subphase>.py` file(s) with real environment
+3. **Acceptance Criteria** — List of behaviors that must work
-4. **Implementation Tasks** — Atomic steps to complete
+4. **Acceptance Tests** — `test_acceptance_<subphase>.py` file(s) with real environment
 5. **Implementation Tasks** — Atomic steps, each referencing its test file
 ### Acceptance Testing Rules
@ -151,11 +180,12 @@ def test_query_with_real_llm():
 ```
 **Sub-phase completion checklist**:
 - [ ] All unit tests written BEFORE implementation
 - [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`)
 - [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
 - [ ] Code reviewed (self or peer)
 - [ ] Sub-phase plan marked complete in `.plans/`
- [ ] Git commit with clear message referencing sub-phase plan
+- [ ] Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests")
 ## COMMANDS
 ```bash