docs: add test-first and Phase X.Y sub-phase naming to AGENTS.md and plans

This commit is contained in:
Woody 2026-04-22 15:54:34 +08:00
parent 1f4e3a2572
commit eeb464528a
3 changed files with 106 additions and 37 deletions

View File

@ -42,96 +42,110 @@ Build a complete FastAPI backend that:
## Implementation Tasks ## Implementation Tasks
### Day 1: Project Setup & Core Infrastructure ### Phase 1.1: Project Setup & Core Infrastructure
**Task 1.1**: Environment and dependencies **Test files to write first**:
- `test_phase1_config.py` — Test config loads from .env correctly
- `test_phase1_database.py` — Test ChromaDB client initialization
**Task 1.1.1**: Environment and dependencies
- Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken - Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
- Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH - Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
- Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading - Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading
**Task 1.2**: Database initialization **Task 1.1.2**: Database initialization
- Create `backend/app/core/database.py` — ChromaDB persistent client - Create `backend/app/core/database.py` — ChromaDB persistent client
- Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/` - Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/`
- Function: `get_or_create_collection(name, embedding_function)` - Function: `get_or_create_collection(name, embedding_function)`
**Task 1.3**: Project structure **Task 1.1.3**: Project structure
- Create all `__init__.py` files for package structure - Create all `__init__.py` files for package structure
- Create `backend/app/main.py` with FastAPI app, CORS middleware - Create `backend/app/main.py` with FastAPI app, CORS middleware
- Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc. - Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc.
**Task 1.4**: Pydantic schemas **Task 1.1.4**: Pydantic schemas
- `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename` - `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename`
- `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources` - `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources`
- `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index` - `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index`
### Day 2: Ingestion Pipeline **Commit**: "feat: Phase 1.1 project setup with config, database, and models"
**Task 2.1**: DOCX parsing ### Phase 1.2: Ingestion Pipeline
**Test files to write first**:
- `test_phase1_chunking.py` — Test 1000/200 chunking with various text sizes
- `test_phase1_ingest.py` — Mock ChromaDB, test endpoint flow
- `test_phase1_metadata.py` — Test metadata extraction
**Task 1.2.1**: DOCX parsing
- `utils/docx_parser.py`: `parse_docx(file_path) -> str` - `utils/docx_parser.py`: `parse_docx(file_path) -> str`
- Handle paragraphs, tables, headers - Handle paragraphs, tables, headers
- Return plain text with preserved paragraph breaks - Return plain text with preserved paragraph breaks
**Task 2.2**: Chunking abstraction **Task 1.2.2**: Chunking abstraction
- `utils/chunking.py`: Abstract base class `ChunkingStrategy` - `utils/chunking.py`: Abstract base class `ChunkingStrategy`
- `TokenChunkingStrategy` implementation using tiktoken - `TokenChunkingStrategy` implementation using tiktoken
- Config: chunk_size=1000, overlap=200 - Config: chunk_size=1000, overlap=200
- Method: `chunk(text: str) -> list[str]` - Method: `chunk(text: str) -> list[str]`
**Task 2.3**: Metadata extraction **Task 1.2.3**: Metadata extraction
- `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]` - `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]`
- Returns list of metadata dicts matching chunk count - Returns list of metadata dicts matching chunk count
- Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk) - Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk)
**Task 2.4**: Embedding service **Task 1.2.4**: Embedding service
- `services/rag.py`: `RAGService` class - `services/rag.py`: `RAGService` class
- Initialize embedding function with `qwen/qwen3-embedding-4b` - Initialize embedding function with `qwen/qwen3-embedding-4b`
- Method: `ingest_document(file_path, chunks, metadata_list)` - Method: `ingest_document(file_path, chunks, metadata_list)`
- Store in ChromaDB collection "documents" - Store in ChromaDB collection "documents"
**Task 2.5**: Ingest endpoint **Task 1.2.5**: Ingest endpoint
- `routers/ingest.py`: `POST /api/v1/ingest` - `routers/ingest.py`: `POST /api/v1/ingest`
- Accept `UploadFile` (DOCX only, validate extension) - Accept `UploadFile` (DOCX only, validate extension)
- Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup - Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
- Return `IngestResponse` - Return `IngestResponse`
**Task 2.6**: Unit tests **Commit**: "feat: Phase 1.2 ingestion pipeline with chunking and metadata"
- `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes
- `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow
### Day 3: Query Pipeline (3-Step) ### Phase 1.3: Query Pipeline (3-Step)
**Task 3.1**: LLM client **Test files to write first**:
- `test_phase1_llm_client.py` — Test LLM client error handling
- `test_phase1_rag_service.py` — Test retrieval and response generation
- `test_phase1_query.py` — Test full pipeline with mocked LLM calls
**Task 1.3.1**: LLM client
- `services/llm_client.py`: `LLMClient` class - `services/llm_client.py`: `LLMClient` class
- Constructor takes config from `Settings` - Constructor takes config from `Settings`
- Method: `complete(prompt: str, temperature: float = 0.7) -> str` - Method: `complete(prompt: str, temperature: float = 0.7) -> str`
- Use httpx with OpenAI-compatible API format - Use httpx with OpenAI-compatible API format
- Handle errors gracefully - Handle errors gracefully
**Task 3.2**: Query decomposition **Task 1.3.2**: Query decomposition
- `services/query_decomposer.py`: `QueryDecomposer` class - `services/query_decomposer.py`: `QueryDecomposer` class
- Prompt template: "Given question: '{question}', extract key search keywords as JSON array" - Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
- Method: `decompose(question: str) -> list[str]` - Method: `decompose(question: str) -> list[str]`
- Parse LLM JSON response into list of keywords - Parse LLM JSON response into list of keywords
**Task 3.3**: Retrieval from ChromaDB **Task 1.3.3**: Retrieval from ChromaDB
- `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)` - `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)`
- Join keywords with space for query text - Join keywords with space for query text
- Return list of `(chunk_text, metadata, distance)` tuples - Return list of `(chunk_text, metadata, distance)` tuples
**Task 3.4**: Relevance filtering **Task 1.3.4**: Relevance filtering
- `services/relevance_filter.py`: `RelevanceFilter` class - `services/relevance_filter.py`: `RelevanceFilter` class
- Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores." - Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
- Input: list of chunks - Input: list of chunks
- Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7) - Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
- Batch all chunks in single LLM call - Batch all chunks in single LLM call
**Task 3.5**: Response generation **Task 1.3.5**: Response generation
- `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str` - `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str`
- Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources." - Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
- Include chunk content and metadata in context - Include chunk content and metadata in context
- Enforce bullet-point format via prompt - Enforce bullet-point format via prompt
**Task 3.6**: Query endpoint **Task 1.3.6**: Query endpoint
- `routers/query.py`: `POST /api/v1/query` - `routers/query.py`: `POST /api/v1/query`
- Full pipeline orchestration: - Full pipeline orchestration:
1. Call `query_decomposer.decompose()` → get keywords 1. Call `query_decomposer.decompose()` → get keywords
@ -140,29 +154,36 @@ Build a complete FastAPI backend that:
4. Call `rag.generate_response()` → get answer 4. Call `rag.generate_response()` → get answer
- Return `QueryResponse` with keywords, answer, sources - Return `QueryResponse` with keywords, answer, sources
### Day 4: Testing & Polish **Commit**: "feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response"
**Task 4.1**: Unit tests ### Phase 1.4: Testing & Polish
- `test_phase1_query.py`: Test full pipeline with mocked LLM calls
- `test_phase1_llm_client.py`: Test LLM client error handling
- `test_phase1_rag_service.py`: Test retrieval and response generation
**Task 4.2**: Acceptance tests **Test files to write first**:
- `test_acceptance_phase1_ingest.py` — Real embedding test
- `test_acceptance_phase1_rag_query.py` — Real LLM pipeline test
**Task 1.4.1**: Unit tests
- Run `pytest app/test/test_phase1_*.py -v` — all must pass
- Add missing test coverage for edge cases
**Task 1.4.2**: Acceptance tests
- Create real `.env` with OpenRouter credentials - Create real `.env` with OpenRouter credentials
- Run `test_acceptance_phase1_ingest.py` with real embedding - Run `test_acceptance_phase1_ingest.py` with real embedding
- Run `test_acceptance_phase1_rag_query.py` with real LLM calls - Run `test_acceptance_phase1_rag_query.py` with real LLM calls
- Verify keywords appear, answer is bullet format, sources have metadata - Verify keywords appear, answer is bullet format, sources have metadata
**Task 4.3**: Error handling **Task 1.4.3**: Error handling
- Add try/except in all endpoints - Add try/except in all endpoints
- Return proper HTTP status codes (400 for bad input, 500 for LLM errors) - Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
- Log errors with context - Log errors with context
**Task 4.4**: Documentation **Task 1.4.4**: Documentation
- Update `AGENTS.md` if any conventions changed - Update `AGENTS.md` if any conventions changed
- Add docstrings to all public methods - Add docstrings to all public methods
- Verify all imports work - Verify all imports work
**Commit**: "feat: Phase 1.4 acceptance tests, error handling, and polish"
--- ---
## New Services Required ## New Services Required

View File

@ -40,7 +40,11 @@ Build a React frontend that:
## Implementation Tasks ## Implementation Tasks
### Day 1: Project Setup & Layout ### Phase 1.1: Project Setup & Layout
**Test files to write first**:
- `src/test/components/Layout.test.tsx` — Test grid renders correctly
- `src/test/lib/api.test.ts` — Test API client configuration
1. **Project scaffold** 1. **Project scaffold**
- Initialize Vite project: `npm create vite@latest frontend -- --template react-ts` - Initialize Vite project: `npm create vite@latest frontend -- --template react-ts`
@ -65,7 +69,14 @@ Build a React frontend that:
``` ```
- Use CSS Grid or Flexbox for clean separation - Use CSS Grid or Flexbox for clean separation
### Day 2: Components & Integration **Commit**: "feat: Phase 1.1 frontend project setup with layout and API client"
### Phase 1.2: Components & Integration
**Test files to write first**:
- `src/test/components/QueryInput.test.tsx` — Test input and submit
- `src/test/components/KeywordsDisplay.test.tsx` — Test keyword rendering
- `src/test/components/ResponsePanel.test.tsx` — Test bullet list and metadata
1. **QueryInput component** 1. **QueryInput component**
- `src/components/QueryInput.tsx` - `src/components/QueryInput.tsx`
@ -97,7 +108,12 @@ Build a React frontend that:
- Toast notifications for API errors - Toast notifications for API errors
- Retry mechanism for failed queries - Retry mechanism for failed queries
### Day 3: Polish & Integration Testing **Commit**: "feat: Phase 1.2 frontend components with query flow"
### Phase 1.3: Polish & Integration Testing
**Test files to write first**:
- `src/test/e2e/query_flow.spec.ts` — End-to-end test with backend
1. **Loading states** 1. **Loading states**
- Skeleton loaders for each panel - Skeleton loaders for each panel
@ -118,6 +134,8 @@ Build a React frontend that:
- `npm run build` succeeds - `npm run build` succeeds
- Production build serves correctly via `npm run preview` - Production build serves correctly via `npm run preview`
**Commit**: "feat: Phase 1.3 frontend polish, loading states, and integration"
--- ---
## Dependencies ## Dependencies

View File

@ -107,15 +107,44 @@ test_phase<N>_<module_or_feature>.py
## SUB-PHASE DEVELOPMENT ## SUB-PHASE DEVELOPMENT
**Workflow**: Plan → Implement → Acceptance Test → Commit **Workflow**: Plan → Write Test → Implement → Make Test Pass → Commit
### Sub-Phase Naming
Use decimal notation: **Phase X.Y** where X = major phase, Y = sub-phase number.
| Example | Scope |
|---------|-------|
| Phase 1.1 | Project setup, config, database |
| Phase 1.2 | Ingestion pipeline |
| Phase 1.3 | Query pipeline (3-step LLM workflow) |
| Phase 1.4 | Testing & polish |
| Phase 2.1 | Video upload backend |
| Phase 2.2 | ASR integration |
### Test-First Rule (MANDATORY)
Every sub-phase follows **test-driven delivery**:
1. **Write test first** — Before writing implementation code, write the test that defines "done"
2. **Implement** — Write the minimum code to make the test pass
3. **Run test** — Verify test passes (both unit and acceptance where applicable)
4. **Commit** — Only commit after tests pass. Never commit broken tests.
5. **Next sub-phase** — Only start next sub-phase after current is committed
**Enforcement**:
- Each Implementation Task in a sub-phase plan must list its test file(s)
- Tests must be in the `backend/app/test/` or `frontend/src/test/` directory
- Pre-commit: `pytest` must pass for backend, `npm test` for frontend
### Sub-Phase Plan Template ### Sub-Phase Plan Template
Each sub-phase plan (stored in `.plans/`) must include: Each sub-phase plan (stored in `.plans/`) must include:
1. **Objective** — What this sub-phase delivers 1. **Objective** — What this sub-phase delivers
2. **Acceptance Criteria** — List of behaviors that must work 2. **Test Files** — List of test files to write BEFORE implementation
3. **Acceptance Tests**`test_acceptance_<subphase>.py` file(s) with real environment 3. **Acceptance Criteria** — List of behaviors that must work
4. **Implementation Tasks** — Atomic steps to complete 4. **Acceptance Tests**`test_acceptance_<subphase>.py` file(s) with real environment
5. **Implementation Tasks** — Atomic steps, each referencing its test file
### Acceptance Testing Rules ### Acceptance Testing Rules
@ -151,11 +180,12 @@ def test_query_with_real_llm():
``` ```
**Sub-phase completion checklist**: **Sub-phase completion checklist**:
- [ ] All unit tests written BEFORE implementation
- [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`) - [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`)
- [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`) - [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
- [ ] Code reviewed (self or peer) - [ ] Code reviewed (self or peer)
- [ ] Sub-phase plan marked complete in `.plans/` - [ ] Sub-phase plan marked complete in `.plans/`
- [ ] Git commit with clear message referencing sub-phase plan - [ ] Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests")
## COMMANDS ## COMMANDS
```bash ```bash