diff --git a/.plans/phase1_backend_plan.md b/.plans/phase1_backend_plan.md
index 4cf60c2..0ccfe0f 100644
--- a/.plans/phase1_backend_plan.md
+++ b/.plans/phase1_backend_plan.md
@@ -42,96 +42,110 @@ Build a complete FastAPI backend that:
 
 ## Implementation Tasks
 
-### Day 1: Project Setup & Core Infrastructure
+### Phase 1.1: Project Setup & Core Infrastructure
 
-**Task 1.1**: Environment and dependencies
+**Test files to write first**:
+- `test_phase1_config.py` — Test config loads from .env correctly
+- `test_phase1_database.py` — Test ChromaDB client initialization
+
+**Task 1.1.1**: Environment and dependencies
 - Create `backend/requirements.txt` with: fastapi, uvicorn[standard], pydantic, pydantic-settings, chromadb, sentence-transformers, python-docx, python-dotenv, httpx, pytest, pytest-asyncio, tiktoken
 - Create `backend/.env.example` with: LLM_BASE_URL, LLM_API_KEY, LLM_MODEL_NAME, EMBEDDING_MODEL, EMBEDDING_BASE_URL, CHROMA_DB_PATH
 - Create `backend/app/core/config.py` — Pydantic Settings with `.env` loading
 
-**Task 1.2**: Database initialization
+**Task 1.1.2**: Database initialization
 - Create `backend/app/core/database.py` — ChromaDB persistent client
 - Function: `get_chroma_client()` returns persistent client pointing to `chroma_db/`
 - Function: `get_or_create_collection(name, embedding_function)`
 
-**Task 1.3**: Project structure
+**Task 1.1.3**: Project structure
 - Create all `__init__.py` files for package structure
 - Create `backend/app/main.py` with FastAPI app, CORS middleware
 - Include routers: `app.include_router(ingest.router, prefix="/api/v1")`, etc.
 
-**Task 1.4**: Pydantic schemas
+**Task 1.1.4**: Pydantic schemas
 - `models/ingest.py`: `IngestResponse` with `document_id`, `chunk_count`, `filename`
 - `models/query.py`: `QueryRequest` with `question`; `QueryResponse` with `keywords`, `answer`, `sources`
 - `models/common.py`: `SourceMetadata` with `filename`, `upload_date`, `content_summary`, `chunk_index`
 
-### Day 2: Ingestion Pipeline
+**Commit**: "feat: Phase 1.1 project setup with config, database, and models"
 
-**Task 2.1**: DOCX parsing
+### Phase 1.2: Ingestion Pipeline
+
+**Test files to write first**:
+- `test_phase1_chunking.py` — Test 1000/200 chunking with various text sizes
+- `test_phase1_ingest.py` — Mock ChromaDB, test endpoint flow
+- `test_phase1_metadata.py` — Test metadata extraction
+
+**Task 1.2.1**: DOCX parsing
 - `utils/docx_parser.py`: `parse_docx(file_path) -> str`
 - Handle paragraphs, tables, headers
 - Return plain text with preserved paragraph breaks
 
-**Task 2.2**: Chunking abstraction
+**Task 1.2.2**: Chunking abstraction
 - `utils/chunking.py`: Abstract base class `ChunkingStrategy`
 - `TokenChunkingStrategy` implementation using tiktoken
 - Config: chunk_size=1000, overlap=200
 - Method: `chunk(text: str) -> list[str]`
 
-**Task 2.3**: Metadata extraction
+**Task 1.2.3**: Metadata extraction
 - `utils/metadata.py`: `extract_metadata(file_path, chunks) -> list[dict]`
 - Returns list of metadata dicts matching chunk count
 - Each metadata has: `filename`, `upload_date`, `content_summary` (first 200 chars of chunk)
 
-**Task 2.4**: Embedding service
+**Task 1.2.4**: Embedding service
 - `services/rag.py`: `RAGService` class
 - Initialize embedding function with `qwen/qwen3-embedding-4b`
 - Method: `ingest_document(file_path, chunks, metadata_list)`
 - Store in ChromaDB collection "documents"
 
-**Task 2.5**: Ingest endpoint
+**Task 1.2.5**: Ingest endpoint
 - `routers/ingest.py`: `POST /api/v1/ingest`
 - Accept `UploadFile` (DOCX only, validate extension)
 - Orchestration: save temp → parse → chunk → extract metadata → embed → store → cleanup
 - Return `IngestResponse`
 
-**Task 2.6**: Unit tests
-- `test_phase1_chunking.py`: Test 1000/200 chunking with various text sizes
-- `test_phase1_ingest.py`: Mock ChromaDB, test endpoint flow
+**Commit**: "feat: Phase 1.2 ingestion pipeline with chunking and metadata"
 
-### Day 3: Query Pipeline (3-Step)
+### Phase 1.3: Query Pipeline (3-Step)
 
-**Task 3.1**: LLM client
+**Test files to write first**:
+- `test_phase1_llm_client.py` — Test LLM client error handling
+- `test_phase1_rag_service.py` — Test retrieval and response generation
+- `test_phase1_query.py` — Test full pipeline with mocked LLM calls
+
+**Task 1.3.1**: LLM client
 - `services/llm_client.py`: `LLMClient` class
 - Constructor takes config from `Settings`
 - Method: `complete(prompt: str, temperature: float = 0.7) -> str`
 - Use httpx with OpenAI-compatible API format
 - Handle errors gracefully
 
-**Task 3.2**: Query decomposition
+**Task 1.3.2**: Query decomposition
 - `services/query_decomposer.py`: `QueryDecomposer` class
 - Prompt template: "Given question: '{question}', extract key search keywords as JSON array"
 - Method: `decompose(question: str) -> list[str]`
 - Parse LLM JSON response into list of keywords
 
-**Task 3.3**: Retrieval from ChromaDB
+**Task 1.3.3**: Retrieval from ChromaDB
 - `services/rag.py`: Add `retrieve(query_keywords: list[str], n_results: int = 10)`
 - Join keywords with space for query text
 - Return list of `(chunk_text, metadata, distance)` tuples
 
-**Task 3.4**: Relevance filtering
+**Task 1.3.4**: Relevance filtering
 - `services/relevance_filter.py`: `RelevanceFilter` class
 - Prompt: "Given question '{question}' and these document chunks, rate each 0-10 for relevance. Return JSON array of scores."
 - Input: list of chunks
 - Output: filtered list of (chunk, metadata) with score > threshold (e.g., 7)
 - Batch all chunks in single LLM call
 
-**Task 3.5**: Response generation
+**Task 1.3.5**: Response generation
 - `services/rag.py`: Add `generate_response(question: str, chunks: list, metadata: list) -> str`
 - Prompt: "Answer question using ONLY these document chunks. Format as bullet points. Cite sources."
 - Include chunk content and metadata in context
 - Enforce bullet-point format via prompt
 
-**Task 3.6**: Query endpoint
+**Task 1.3.6**: Query endpoint
 - `routers/query.py`: `POST /api/v1/query`
 - Full pipeline orchestration:
   1. Call `query_decomposer.decompose()` → get keywords
@@ -140,29 +154,36 @@ Build a complete FastAPI backend that:
   4. Call `rag.generate_response()` → get answer
 - Return `QueryResponse` with keywords, answer, sources
 
-### Day 4: Testing & Polish
+**Commit**: "feat: Phase 1.3 query pipeline with decomposition, relevance filter, and response"
 
-**Task 4.1**: Unit tests
-- `test_phase1_query.py`: Test full pipeline with mocked LLM calls
-- `test_phase1_llm_client.py`: Test LLM client error handling
-- `test_phase1_rag_service.py`: Test retrieval and response generation
+### Phase 1.4: Testing & Polish
 
-**Task 4.2**: Acceptance tests
+**Test files to write first**:
+- `test_acceptance_phase1_ingest.py` — Real embedding test
+- `test_acceptance_phase1_rag_query.py` — Real LLM pipeline test
+
+**Task 1.4.1**: Unit tests
+- Run `pytest app/test/test_phase1_*.py -v` — all must pass
+- Add missing test coverage for edge cases
+
+**Task 1.4.2**: Acceptance tests
 - Create real `.env` with OpenRouter credentials
 - Run `test_acceptance_phase1_ingest.py` with real embedding
 - Run `test_acceptance_phase1_rag_query.py` with real LLM calls
 - Verify keywords appear, answer is bullet format, sources have metadata
 
-**Task 4.3**: Error handling
+**Task 1.4.3**: Error handling
 - Add try/except in all endpoints
 - Return proper HTTP status codes (400 for bad input, 500 for LLM errors)
 - Log errors with context
 
-**Task 4.4**: Documentation
+**Task 1.4.4**: Documentation
 - Update `AGENTS.md` if any conventions changed
 - Add docstrings to all public methods
 - Verify all imports work
 
+**Commit**: "feat: Phase 1.4 acceptance tests, error handling, and polish"
+
 ---
 
 ## New Services Required
diff --git a/.plans/phase1_frontend_plan.md b/.plans/phase1_frontend_plan.md
index 5857d75..bce0b6c 100644
--- a/.plans/phase1_frontend_plan.md
+++ b/.plans/phase1_frontend_plan.md
@@ -40,7 +40,11 @@ Build a React frontend that:
 
 ## Implementation Tasks
 
-### Day 1: Project Setup & Layout
+### Phase 1.1: Project Setup & Layout
+
+**Test files to write first**:
+- `src/test/components/Layout.test.tsx` — Test grid renders correctly
+- `src/test/lib/api.test.ts` — Test API client configuration
 
 1. **Project scaffold**
    - Initialize Vite project: `npm create vite@latest frontend -- --template react-ts`
@@ -65,7 +69,14 @@ Build a React frontend that:
      ```
    - Use CSS Grid or Flexbox for clean separation
 
-### Day 2: Components & Integration
+**Commit**: "feat: Phase 1.1 frontend project setup with layout and API client"
+
+### Phase 1.2: Components & Integration
+
+**Test files to write first**:
+- `src/test/components/QueryInput.test.tsx` — Test input and submit
+- `src/test/components/KeywordsDisplay.test.tsx` — Test keyword rendering
+- `src/test/components/ResponsePanel.test.tsx` — Test bullet list and metadata
 
 1. **QueryInput component**
    - `src/components/QueryInput.tsx`
@@ -97,7 +108,12 @@ Build a React frontend that:
    - Toast notifications for API errors
    - Retry mechanism for failed queries
 
-### Day 3: Polish & Integration Testing
+**Commit**: "feat: Phase 1.2 frontend components with query flow"
+
+### Phase 1.3: Polish & Integration Testing
+
+**Test files to write first**:
+- `src/test/e2e/query_flow.spec.ts` — End-to-end test with backend
 
 1. **Loading states**
    - Skeleton loaders for each panel
@@ -118,6 +134,8 @@ Build a React frontend that:
    - `npm run build` succeeds
    - Production build serves correctly via `npm run preview`
 
+**Commit**: "feat: Phase 1.3 frontend polish, loading states, and integration"
+
 ---
 
 ## Dependencies
diff --git a/AGENTS.md b/AGENTS.md
index 379692a..4be5f8e 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -107,15 +107,44 @@ test_phase<N>_<module_or_feature>.py
 
 ## SUB-PHASE DEVELOPMENT
 
-**Workflow**: Plan → Implement → Acceptance Test → Commit
+**Workflow**: Plan → Write Test → Implement → Make Test Pass → Commit
+
+### Sub-Phase Naming
+
+Use decimal notation: **Phase X.Y** where X = major phase, Y = sub-phase number.
+
+| Example | Scope |
+|---------|-------|
+| Phase 1.1 | Project setup, config, database |
+| Phase 1.2 | Ingestion pipeline |
+| Phase 1.3 | Query pipeline (3-step LLM workflow) |
+| Phase 1.4 | Testing & polish |
+| Phase 2.1 | Video upload backend |
+| Phase 2.2 | ASR integration |
+
+### Test-First Rule (MANDATORY)
+
+Every sub-phase follows **test-driven delivery**:
+
+1. **Write test first** — Before writing implementation code, write the test that defines "done"
+2. **Implement** — Write the minimum code to make the test pass
+3. **Run test** — Verify test passes (both unit and acceptance where applicable)
+4. **Commit** — Only commit after tests pass. Never commit broken tests.
+5. **Next sub-phase** — Only start next sub-phase after current is committed
+
+**Enforcement**:
+- Each Implementation Task in a sub-phase plan must list its test file(s)
+- Tests must be in the `backend/app/test/` or `frontend/src/test/` directory
+- Pre-commit: `pytest` must pass for backend, `npm test` for frontend
 
 ### Sub-Phase Plan Template
 
 Each sub-phase plan (stored in `.plans/`) must include:
 1. **Objective** — What this sub-phase delivers
-2. **Acceptance Criteria** — List of behaviors that must work
-3. **Acceptance Tests** — `test_acceptance_<subphase>.py` file(s) with real environment
-4. **Implementation Tasks** — Atomic steps to complete
+2. **Test Files** — List of test files to write BEFORE implementation
+3. **Acceptance Criteria** — List of behaviors that must work
+4. **Acceptance Tests** — `test_acceptance_<subphase>.py` file(s) with real environment
+5. **Implementation Tasks** — Atomic steps, each referencing its test file
 
 ### Acceptance Testing Rules
 
@@ -151,11 +180,12 @@ def test_query_with_real_llm():
 ```
 
 **Sub-phase completion checklist**:
+- [ ] All unit tests written BEFORE implementation
 - [ ] All unit tests pass (`pytest app/test/test_phase*.py -v`)
 - [ ] All acceptance tests pass (`pytest app/test/acceptance/ -v -m acceptance`)
 - [ ] Code reviewed (self or peer)
 - [ ] Sub-phase plan marked complete in `.plans/`
-- [ ] Git commit with clear message referencing sub-phase plan
+- [ ] Git commit with clear message referencing sub-phase (e.g., "feat: Phase 1.2 ingestion pipeline with tests")
 
 ## COMMANDS
 ```bash