docs: add test-first and Phase X.Y sub-phase naming to AGENTS.md and plans
This commit is contained in:
parent
1f4e3a2572
commit
eeb464528a
|
|
@ -42,96 +42,110 @@ Build a complete FastAPI backend that:
|
||||||
|
|
||||||
## Implementation Tasks
|
## Implementation Tasks
|
||||||
|
|
||||||
### Day 1: Project Setup & Core Infrastructure
|
### Phase 1.1: Project Setup & Core Infrastructure
|
||||||
|
|
||||||
**Task 1.1**: Environment and dependencies
|
**Test files to write first**:
|
||||||
|
- `test_phase1_config.py` — Test config loads from .env correctly
|
||||||
|
- `test_phase1_database.py` — Test ChromaDB client initialization
|
||||||
|
|
||||||
|
**Task 1.1.1**: Environment and dependencies
|
||||||
- Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
|
- Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
|
||||||
- Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
|
- Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
|
||||||
- Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading
|
- Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading
|
||||||
|
|
||||||
**Task 1.2**: Database initialization
|
**Task 1.1.2**: Database initialization
|
||||||
- Create `backend/app/core/database.py` — ChromaDB persistent client
|
- Create `backend/app/core/database.py` — ChromaDB persistent client
|
||||||
- Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/`
|
- Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/`
|
||||||
- Function: `get_or_create_collection(name, embedding_function)`
|
- Function: `get_or_create_collection(name, embedding_function)`
|
||||||
|
|
||||||
**Task 1.3**: Project structure
|
**Task 1.1.3**: Project structure
|
||||||
- Create all `__init__.py` files for package structure
|
- Create all `__init__.py` files for package structure
|
||||||
- Create `backend/app/main.py` with FastAPI app, CORS middleware
|
- Create `backend/app/main.py` with FastAPI app, CORS middleware
|
||||||
- Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc.
|
- Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc.
|
||||||
|
|
||||||
**Task 1.4**: Pydantic schemas
|
**Task 1.1.4**: Pydantic schemas
|
||||||
- `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename`
|
- `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename`
|
||||||
- `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources`
|
- `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources`
|
||||||
- `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index`
|
- `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index`
|
||||||
|
|
||||||
### Day 2: Ingestion Pipeline
|
**Commit**: "feat: Phase 1.1 project setup with config, database, and models"
|
||||||
|
|
||||||
**Task 2.1**: DOCX parsing
|
### Phase 1.2: Ingestion Pipeline
|
||||||
|
|
||||||
|
**Test files to write first**:
|
||||||
|
- `test_phase1_chunking.py` — Test 1000/200 chunking with various text sizes
|
||||||
|
- `test_phase1_ingest.py` — Mock ChromaDB, test endpoint flow
|
||||||
|
- `test_phase1_metadata.py` — Test metadata extraction
|
||||||
|
|
||||||
|
**Task 1.2.1**: DOCX parsing
|
||||||
- `utils/docx_parser.py`: `parse_docx(file_path) -> str`
|
- `utils/docx_parser.py`: `parse_docx(file_path) -> str`
|
||||||
- Handle paragraphs, tables, headers
|
- Handle paragraphs, tables, headers
|
||||||
- Return plain text with preserved paragraph breaks
|
- Return plain text with preserved paragraph breaks
|
||||||
|
|
||||||
**Task 2.2**: Chunking abstraction
|
**Task 1.2.2**: Chunking abstraction
|
||||||
- `utils/chunking.py`: Abstract base class `ChunkingStrategy`
|
- `utils/chunking.py`: Abstract base class `ChunkingStrategy`
|
||||||
- `TokenChunkingStrategy` implementation using tiktoken
|
- `TokenChunkingStrategy` implementation using tiktoken
|
||||||
- Config: chunk_size=1000, overlap=200
|
- Config: chunk_size=1000, overlap=200
|
||||||
- Method: `chunk(text: str) -> list[str]`
|
- Method: `chunk(text: str) -> list[str]`
|
||||||
|
|
||||||
**Task 2.3**: Metadata extraction
|
**Task 1.2.3**: Metadata extraction
|
||||||
- `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]`
|
- `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]`
|
||||||
- Returns list of metadata dicts matching chunk count
|
- Returns list of metadata dicts matching chunk count
|
||||||
- Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk)
|
- Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk)
|
||||||
|
|
||||||
**Task 2.4**: Embedding service
|
**Task 1.2.4**: Embedding service
|
||||||
- `services/rag.py`: `RAGService` class
|
- `services/rag.py`: `RAGService` class
|
||||||
- Initialize embedding function with `qwen/qwen3-embedding-4b`
|
- Initialize embedding function with `qwen/qwen3-embedding-4b`
|
||||||
- Method: `ingest_document(file_path, chunks, metadata_list)`
|
- Method: `ingest_document(file_path, chunks, metadata_list)`
|
||||||
- Store in ChromaDB collection "documents"
|
- Store in ChromaDB collection "documents"
|
||||||
|
|
||||||
**Task 2.5**: Ingest endpoint
|
**Task 1.2.5**: Ingest endpoint
|
||||||
- `routers/ingest.py`: `POST /api/v1/ingest`
|
- `routers/ingest.py`: `POST /api/v1/ingest`
|
||||||
- Accept `UploadFile` (DOCX only, validate extension)
|
- Accept `UploadFile` (DOCX only, validate extension)
|
||||||
- Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
|
- Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
|
||||||
- Return `IngestResponse`
|
- Return `IngestResponse`
|
||||||
|
|
||||||
**Task 2.6**: Unit tests
|
**Commit**: "feat: Phase 1.2 ingestion pipeline with chunking and metadata"
|
||||||
- `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes
|
|
||||||
- `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow
|
|
||||||
|
|
||||||
### Day 3: Query Pipeline (3-Step)
|
### Phase 1.3: Query Pipeline (3-Step)
|
||||||
|
|
||||||
**Task 3.1**: LLM client
|
**Test files to write first**:
|
||||||
|
- `test_phase1_llm_client.py` — Test LLM client error handling
|
||||||
|
- `test_phase1_rag_service.py` — Test retrieval and response generation
|
||||||
|
- `test_phase1_query.py` — Test full pipeline with mocked LLM calls
|
||||||
|
|
||||||
|
**Task 1.3.1**: LLM client
|
||||||
- `services/llm_client.py`: `LLMClient` class
|
- `services/llm_client.py`: `LLMClient` class
|
||||||
- Constructor takes config from `Settings`
|
- Constructor takes config from `Settings`
|
||||||
- Method: `complete(prompt: str, temperature: float = 0.7) -> str`
|
- Method: `complete(prompt: str, temperature: float = 0.7) -> str`
|
||||||
- Use httpx with OpenAI-compatible API format
|
- Use httpx with OpenAI-compatible API format
|
||||||
- Handle errors gracefully
|
- Handle errors gracefully
|
||||||
|
|
||||||
**Task 3.2**: Query decomposition
|
**Task 1.3.2**: Query decomposition
|
||||||
- `services/query_decomposer.py`: `QueryDecomposer` class
|
- `services/query_decomposer.py`: `QueryDecomposer` class
|
||||||
- Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
|
- Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
|
||||||
- Method: `decompose(question: str) -> list[str]`
|
- Method: `decompose(question: str) -> list[str]`
|
||||||
- Parse LLM JSON response into list of keywords
|
- Parse LLM JSON response into list of keywords
|
||||||
|
|
||||||
**Task 3.3**: Retrieval from ChromaDB
|
**Task 1.3.3**: Retrieval from ChromaDB
|
||||||
- `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)`
|
- `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)`
|
||||||
- Join keywords with space for query text
|
- Join keywords with space for query text
|
||||||
- Return list of `(chunk_text, metadata, distance)` tuples
|
- Return list of `(chunk_text, metadata, distance)` tuples
|
||||||
|
|
||||||
**Task 3.4**: Relevance filtering
|
**Task 1.3.4**: Relevance filtering
|
||||||
- `services/relevance_filter.py`: `RelevanceFilter` class
|
- `services/relevance_filter.py`: `RelevanceFilter` class
|
||||||
- Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
|
- Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
|
||||||
- Input: list of chunks
|
- Input: list of chunks
|
||||||
- Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
|
- Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
|
||||||
- Batch all chunks in single LLM call
|
- Batch all chunks in single LLM call
|
||||||
|
|
||||||
**Task 3.5**: Response generation
|
**Task 1.3.5**: Response generation
|
||||||
- `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str`
|
- `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str`
|
||||||
- Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
|
- Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
|
||||||
- Include chunk content and metadata in context
|
- Include chunk content and metadata in context
|
||||||
- Enforce bullet-point format via prompt
|
- Enforce bullet-point format via prompt
|
||||||
|
|
||||||
**Task 3.6**: Query endpoint
|
**Task 1.3.6**: Query endpoint
|
||||||
- `routers/query.py`: `POST /api/v1/query`
|
- `routers/query.py`: `POST /api/v1/query`
|
||||||
- Full pipeline orchestration:
|
- Full pipeline orchestration:
|
||||||
1. Call `query_decomposer.decompose()` → get keywords
|
1. Call `query_decomposer.decompose()` → get keywords
|
||||||
|
|
@ -140,29 +154,36 @@ Build a complete FastAPI backend that:
|
||||||
4. Call `rag.generate_response()` → get answer
|
4. Call `rag.generate_response()` → get answer
|
||||||
- Return `QueryResponse` with keywords, answer, sources
|
- Return `QueryResponse` with keywords, answer, sources
|
||||||
|
|
||||||
### Day 4: Testing & Polish
|
**Commit**: "feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response"
|
||||||
|
|
||||||
**Task 4.1**: Unit tests
|
### Phase 1.4: Testing & Polish
|
||||||
- `test_phase1_query.py`: Test full pipeline with mocked LLM calls
|
|
||||||
- `test_phase1_llm_client.py`: Test LLM client error handling
|
|
||||||
- `test_phase1_rag_service.py`: Test retrieval and response generation
|
|
||||||
|
|
||||||
**Task 4.2**: Acceptance tests
|
**Test files to write first**:
|
||||||
|
- `test_acceptance_phase1_ingest.py` — Real embedding test
|
||||||
|
- `test_acceptance_phase1_rag_query.py` — Real LLM pipeline test
|
||||||
|
|
||||||
|
**Task 1.4.1**: Unit tests
|
||||||
|
- Run `pytest app/test/test_phase1_*.py -v` — all must pass
|
||||||
|
- Add missing test coverage for edge cases
|
||||||
|
|
||||||
|
**Task 1.4.2**: Acceptance tests
|
||||||
- Create real `.env` with OpenRouter credentials
|
- Create real `.env` with OpenRouter credentials
|
||||||
- Run `test_acceptance_phase1_ingest.py` with real embedding
|
- Run `test_acceptance_phase1_ingest.py` with real embedding
|
||||||
- Run `test_acceptance_phase1_rag_query.py` with real LLM calls
|
- Run `test_acceptance_phase1_rag_query.py` with real LLM calls
|
||||||
- Verify keywords appear, answer is bullet format, sources have metadata
|
- Verify keywords appear, answer is bullet format, sources have metadata
|
||||||
|
|
||||||
**Task 4.3**: Error handling
|
**Task 1.4.3**: Error handling
|
||||||
- Add try/except in all endpoints
|
- Add try/except in all endpoints
|
||||||
- Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
|
- Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
|
||||||
- Log errors with context
|
- Log errors with context
|
||||||
|
|
||||||
**Task 4.4**: Documentation
|
**Task 1.4.4**: Documentation
|
||||||
- Update `AGENTS.md` if any conventions changed
|
- Update `AGENTS.md` if any conventions changed
|
||||||
- Add docstrings to all public methods
|
- Add docstrings to all public methods
|
||||||
- Verify all imports work
|
- Verify all imports work
|
||||||
|
|
||||||
|
**Commit**: "feat: Phase 1.4 acceptance tests, error handling, and polish"
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## New Services Required
|
## New Services Required
|
||||||
|
|
|
||||||
|
|
@ -40,7 +40,11 @@ Build a React frontend that:
|
||||||
|
|
||||||
## Implementation Tasks
|
## Implementation Tasks
|
||||||
|
|
||||||
### Day 1: Project Setup & Layout
|
### Phase 1.1: Project Setup & Layout
|
||||||
|
|
||||||
|
**Test files to write first**:
|
||||||
|
- `src/test/components/Layout.test.tsx` — Test grid renders correctly
|
||||||
|
- `src/test/lib/api.test.ts` — Test API client configuration
|
||||||
|
|
||||||
1. **Project scaffold**
|
1. **Project scaffold**
|
||||||
- Initialize Vite project: `npm create vite@latest frontend -- --template react-ts`
|
- Initialize Vite project: `npm create vite@latest frontend -- --template react-ts`
|
||||||
|
|
@ -65,7 +69,14 @@ Build a React frontend that:
|
||||||
```
|
```
|
||||||
- Use CSS Grid or Flexbox for clean separation
|
- Use CSS Grid or Flexbox for clean separation
|
||||||
|
|
||||||
### Day 2: Components & Integration
|
**Commit**: "feat: Phase 1.1 frontend project setup with layout and API client"
|
||||||
|
|
||||||
|
### Phase 1.2: Components & Integration
|
||||||
|
|
||||||
|
**Test files to write first**:
|
||||||
|
- `src/test/components/QueryInput.test.tsx` — Test input and submit
|
||||||
|
- `src/test/components/KeywordsDisplay.test.tsx` — Test keyword rendering
|
||||||
|
- `src/test/components/ResponsePanel.test.tsx` — Test bullet list and metadata
|
||||||
|
|
||||||
1. **QueryInput component**
|
1. **QueryInput component**
|
||||||
- `src/components/QueryInput.tsx`
|
- `src/components/QueryInput.tsx`
|
||||||
|
|
@ -97,7 +108,12 @@ Build a React frontend that:
|
||||||
- Toast notifications for API errors
|
- Toast notifications for API errors
|
||||||
- Retry mechanism for failed queries
|
- Retry mechanism for failed queries
|
||||||
|
|
||||||
### Day 3: Polish & Integration Testing
|
**Commit**: "feat: Phase 1.2 frontend components with query flow"
|
||||||
|
|
||||||
|
### Phase 1.3: Polish & Integration Testing
|
||||||
|
|
||||||
|
**Test files to write first**:
|
||||||
|
- `src/test/e2e/query_flow.spec.ts` — End-to-end test with backend
|
||||||
|
|
||||||
1. **Loading states**
|
1. **Loading states**
|
||||||
- Skeleton loaders for each panel
|
- Skeleton loaders for each panel
|
||||||
|
|
@ -118,6 +134,8 @@ Build a React frontend that:
|
||||||
- `npm run build` succeeds
|
- `npm run build` succeeds
|
||||||
- Production build serves correctly via `npm run preview`
|
- Production build serves correctly via `npm run preview`
|
||||||
|
|
||||||
|
**Commit**: "feat: Phase 1.3 frontend polish, loading states, and integration"
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
|
||||||
40
AGENTS.md
40
AGENTS.md
|
|
@ -107,15 +107,44 @@ test_phase<N>_<module_or_feature>.py
|
||||||
|
|
||||||
## SUB-PHASE DEVELOPMENT
|
## SUB-PHASE DEVELOPMENT
|
||||||
|
|
||||||
**Workflow**: Plan → Implement → Acceptance Test → Commit
|
**Workflow**: Plan → Write Test → Implement → Make Test Pass → Commit
|
||||||
|
|
||||||
|
### Sub-Phase Naming
|
||||||
|
|
||||||
|
Use decimal notation: **Phase X.Y** where X = major phase, Y = sub-phase number.
|
||||||
|
|
||||||
|
| Example | Scope |
|
||||||
|
|---------|-------|
|
||||||
|
| Phase 1.1 | Project setup, config, database |
|
||||||
|
| Phase 1.2 | Ingestion pipeline |
|
||||||
|
| Phase 1.3 | Query pipeline (3-step LLM workflow) |
|
||||||
|
| Phase 1.4 | Testing & polish |
|
||||||
|
| Phase 2.1 | Video upload backend |
|
||||||
|
| Phase 2.2 | ASR integration |
|
||||||
|
|
||||||
|
### Test-First Rule (MANDATORY)
|
||||||
|
|
||||||
|
Every sub-phase follows **test-driven delivery**:
|
||||||
|
|
||||||
|
1. **Write test first** — Before writing implementation code, write the test that defines "done"
|
||||||
|
2. **Implement** — Write the minimum code to make the test pass
|
||||||
|
3. **Run test** — Verify test passes (both unit and acceptance where applicable)
|
||||||
|
4. **Commit** — Only commit after tests pass. Never commit broken tests.
|
||||||
|
5. **Next sub-phase** — Only start next sub-phase after current is committed
|
||||||
|
|
||||||
|
**Enforcement**:
|
||||||
|
- Each Implementation Task in a sub-phase plan must list its test file(s)
|
||||||
|
- Tests must be in the `backend/app/test/` or `frontend/src/test/` directory
|
||||||
|
- Pre-commit: `pytest` must pass for backend, `npm test` for frontend
|
||||||
|
|
||||||
### Sub-Phase Plan Template
|
### Sub-Phase Plan Template
|
||||||
|
|
||||||
Each sub-phase plan (stored in `.plans/`) must include:
|
Each sub-phase plan (stored in `.plans/`) must include:
|
||||||
1. **Objective** — What this sub-phase delivers
|
1. **Objective** — What this sub-phase delivers
|
||||||
2. **Acceptance Criteria** — List of behaviors that must work
|
2. **Test Files** — List of test files to write BEFORE implementation
|
||||||
3. **Acceptance Tests** — `test_acceptance_<subphase>.py` file(s) with real environment
|
3. **Acceptance Criteria** — List of behaviors that must work
|
||||||
4. **Implementation Tasks** — Atomic steps to complete
|
4. **Acceptance Tests** — `test_acceptance_<subphase>.py` file(s) with real environment
|
||||||
|
5. **Implementation Tasks** — Atomic steps, each referencing its test file
|
||||||
|
|
||||||
### Acceptance Testing Rules
|
### Acceptance Testing Rules
|
||||||
|
|
||||||
|
|
@ -151,11 +180,12 @@ def test_query_with_real_llm():
|
||||||
```
|
```
|
||||||
|
|
||||||
**Sub-phase completion checklist**:
|
**Sub-phase completion checklist**:
|
||||||
|
- [ ] All unit tests written BEFORE implementation
|
||||||
- [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`)
|
- [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`)
|
||||||
- [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
|
- [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
|
||||||
- [ ] Code reviewed (self or peer)
|
- [ ] Code reviewed (self or peer)
|
||||||
- [ ] Sub-phase plan marked complete in `.plans/`
|
- [ ] Sub-phase plan marked complete in `.plans/`
|
||||||
- [ ] Git commit with clear message referencing sub-phase plan
|
- [ ] Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests")
|
||||||
|
|
||||||
## COMMANDS
|
## COMMANDS
|
||||||
```bash
|
```bash
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue