docs: update enhancement plan with sub-phase 1.5.2 completion status

Mark sub-phase 1.5.2 (backend CRUD) as complete. Update acceptance criteria, risk mitigations, and test plan. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 19:05:01 +08:00 · 2026-04-23 19:05:01 +08:00 · 9a7329c5f8
parent c6abe5c335
commit 9a7329c5f8
1 changed files with 11 additions and 9 deletions
--- a/.plans/phase1_enhancement_plan.md
+++ b/.plans/phase1_enhancement_plan.md
@ -2,7 +2,7 @@

 **Source**: User request (2026-04-23)  
 **Scope**: Frontend navigation + RAG Database management page + page-aware chunking with chunk PDFs  
-**Status**: 🔄 In Progress — Feature 1 ✅ Complete, Features 2-3 pending
+**Status**: 🔄 In Progress — Feature 1 ✅ Complete, Feature 2 backend ✅ Complete, Feature 2 frontend & Feature 3 pending

 ---

@ -33,8 +33,8 @@ Enhance the existing Phase 1 application with three features:

 ### What's Missing (Gaps This Plan Fills)
 - ~~No routing or multi-page support~~ ✅ Done in Feature 1
- No way to view what's stored in ChromaDB
- No way to delete documents or chunks
+- ~~No way to view what's stored in ChromaDB~~ ✅ Backend CRUD done (sub-phase 1.5.2)
+- ~~No way to delete documents or chunks~~ ✅ Backend CRUD done (sub-phase 1.5.2)
 - No page-level awareness in chunking (all pages flattened before token splitting)
 - No persistent chunk files (chunks only exist as ChromaDB document text)
 - No clickable links in RAG responses to view source chunks
@ -202,9 +202,9 @@ class DeleteResponse(BaseModel):

 ### 2.5 Acceptance Criteria

- [ ] `GET /api/v1/documents` returns all documents with chunk counts
- [ ] `DELETE /api/v1/documents/{document_id}` removes all chunks from ChromaDB + associated chunk PDFs
- [ ] `DELETE /api/v1/chunks/{chunk_id}` removes a single chunk
+- [x] `GET /api/v1/documents` returns all documents with chunk counts
+- [x] `DELETE /api/v1/documents/{document_id}` removes all chunks from ChromaDB + associated chunk PDFs
+- [x] `DELETE /api/v1/chunks/{chunk_id}` removes a single chunk
 - [ ] RAG Database page shows all documents with chunk counts
 - [ ] User can expand a document to see its chunks
 - [ ] User can delete a document (with confirmation)
@ -530,7 +530,7 @@ Feature 3 (Page-Aware Chunking) ← Modifies ingestion pipeline
 | Sub-Phase | Feature | Scope | Backend | Frontend | Status |
 |-----------|---------|-------|---------|----------|--------|
 | 1.5.1 | 1 | Nav bar + routing + page scaffold | None | NavBar, LTTPage, RAGDatabasePage, App.tsx refactor | ✅ Complete |
-| 1.5.2 | 2 | Backend CRUD for documents/chunks | documents router, RAGService methods, schemas | None | 📋 Pending |
+| 1.5.2 | 2 | Backend CRUD for documents/chunks | documents router, RAGService methods, schemas | None | ✅ Complete |
 | 1.5.3 | 2 | Frontend RAG Database page | None | RAGDatabasePage, DocumentList, ChunkList, DocumentUpload, API hooks | 📋 Pending |
 | 1.5.4 | 3 | Page-aware parsing & chunking | pdf_parser, chunking, metadata enhancements | None | 📋 Pending |
 | 1.5.5 | 3 | Chunk PDF generation & storage | pdf_extractor, config, ingest pipeline refactor | None | 📋 Pending |
@ -580,12 +580,14 @@ document_chunk/

 | Risk | Impact | Mitigation |
 |------|--------|------------|
-| ChromaDB has no native "group by document" query | `list_documents()` needs manual grouping from all metadata | Group by `filename` + `document_id` in metadata. Cache result if slow. |
+| ~~ChromaDB has no native "group by document" query~~ | ~~`list_documents()` needs manual grouping from all metadata~~ | ✅ Resolved: Groups by document_id extracted from chunk IDs via `rsplit("_", 1)` |
 | Large PDFs → many chunk PDF files | Disk usage grows | One PDF per unique page (not per chunk). Pages shared by chunks reuse same file. |
 | Chunk spans multiple pages | Ambiguous page assignment | Tag chunk with STARTING page only. Note in UI. |
 | DOCX has no page numbers | `page_number` is None for DOCX chunks | Graceful degradation — show "N/A" or hide page info for DOCX. |
 | Deleting documents must clean up chunk files | Orphan files if deletion fails | Delete files after successful ChromaDB deletion. Log failures for manual cleanup. |
 | Path traversal in chunk PDF endpoint | Security risk | Validate `file_path` doesn't contain `..` or absolute paths. Use whitelist of known files. |
+| ChromaDB 1.5.8 requires `name()` on embedding functions | `_EmbeddingFunctionWrapper` crashes on `collection.get()` | ✅ Fixed: Added `name()` method returning `"custom_embedding_wrapper"` |
+| Existing ChromaDB data corrupted (HNSW segment error) | Endpoints return 500 against existing `chroma_db/` | Pre-existing issue. Works with fresh DB. May need `chroma_db` reset for production. |

 ---

@ -613,7 +615,7 @@ None — all resolved.

 | File | Coverage |
 |------|----------|
-| `test_phase1_documents_router.py` | GET /documents, DELETE /documents/{id}, DELETE /chunks/{id} |
+| `test_phase1_documents_router.py` | ✅ GET /documents, DELETE /documents/{id}, DELETE /chunks/{id} (8 tests, all pass) |
 | `test_phase1_pdf_parser_pages.py` | parse_pdf_by_page() — multi-page PDFs, single-page, empty |
 | `test_phase1_page_aware_chunking.py` | chunk_with_pages() — cross-page chunks, single-page chunks |
 | `test_phase1_pdf_extractor.py` | extract_page_as_pdf() — valid page, out-of-range, corrupt PDF |