Woody
b48c23001e
fix(backend): preserve original filename in chunk metadata instead of temp file name
...
When uploading files, the backend passes them through NamedTemporaryFile, causing os.path.basename to return temp names like 'tmp90i7xqa8.pdf'. Added original_filename parameter to extract_metadata() so the actual upload filename is stored.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent )
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-24 10:14:58 +08:00
Woody
4a22b906e4
refactor(backend): update ingest and query routers
...
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent )
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-04-23 13:26:32 +08:00
Woody
7493b3aaf6
feat: Phase 1.4 acceptance tests, error handling, and polish
...
- Implement acceptance tests for ingest (real ChromaDB) and query (real LLM)
- Full 3-step RAG pipeline verified: decompose → retrieve → filter → generate
- Add logging to ingest and query routers
- Improve error handling: empty doc detection, proper HTTPException re-raising
- Add .txt file support to ingest endpoint
- Fix query router: strip distance from retrieve tuples before relevance filter
- Update plan: Phase 1 backend complete (all acceptance criteria met)
- Tests: 41 unit passed, 5 acceptance passed (real OpenRouter calls)
2026-04-22 17:45:50 +08:00
Woody
d94abaac77
feat: Phase 1.2 ingestion pipeline with chunking and metadata
...
- Add document parsers (DOCX, PDF) with lazy imports
- Add TokenChunkingStrategy with ABC for future replacement
- Add metadata extraction (filename, upload_date, content_summary)
- Add RAGService for ChromaDB ingestion/retrieval/response generation
- Add POST /api/v1/ingest endpoint with file validation
- Test-first: 20 passed, 2 skipped (python-docx not installed)
2026-04-22 16:49:52 +08:00