Commit Graph

4 Commits

Author SHA1 Message Date
Woody 73c1789698 fix: Q\&A chunking always fell back to token — LLM never called, missing API fields
Three bugs caused 'Chunk by Question' to silently produce token chunks:

1. QuestionChunkingStrategy.chunk_pages() had a broken event-loop check
   that always skipped LLM structure detection in FastAPI's async context.
   Fixed by making chunk_pages() async and removing the is_running() guard.

2. get_chunking_strategy() factory never passed an LLMClient to
   QuestionChunkingStrategy. Fixed by creating LLMClient in the factory
   with graceful fallback to regex-only when config is incomplete.

3. rag.list_documents() and list_chunks() didn't extract strategy_type
   or Q&A fields from ChromaDB metadata, so the frontend always showed
   chunking_strategy='token' and null Q&A fields. Fixed by reading
   these fields from ChromaDB and routing them through the API.

Also: TokenChunkingStrategy.chunk_pages() made async for consistency
with the question strategy; ingest router updated to await it.
Tests updated (asyncio.run() for sync tests, async mock chunk_pages).
2026-05-15 14:46:45 +08:00
Woody d49756f374 feat: add chunk PDF serving endpoint and frontend clickable source links (1.5.6)
- Add page_number and chunk_file_path to SourceMetadata model and query router
- Add GET /chunks/{file_path}/pdf endpoint with path traversal protection
- Add View PDF links in ResponsePanel source cards and ChunkList component
- Update TypeScript types and API helper for chunk PDF URLs
- Add backend tests (5) and frontend ChunkList tests (7)
- Update enhancement plan: all 3 features complete
2026-04-24 11:49:39 +08:00
Woody 4732b4949c feat(backend): clean up chunk PDFs on document and chunk deletion
Delete document endpoint now removes associated chunk PDF files from document_chunk/ before ChromaDB deletion. Delete chunk endpoint removes individual chunk PDF. Missing files logged as warnings, not errors.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 10:53:34 +08:00
Woody f21085b3df feat(backend): add documents CRUD endpoints and tests
Add 4 REST endpoints for RAG database management: GET /documents, GET /documents/{id}/chunks, DELETE /documents/{id}, DELETE /chunks/{id}. Register documents router in main.py. 8 unit tests covering all CRUD operations.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 19:02:28 +08:00