Replace mocked DB/internal-services with real ChromaDB/SQLite via tmp_path.
Only mock truly external APIs (LLM, embedding for deterministic vectors).
13 test files rewritten (314 pass, 0 fail):
- Route tests: use TestClient + real ChromaDB, seed test data
- Service tests: use real PersistentClient/SQLite instances
- Pipeline tests: TestClient hits SSE /query endpoint, verify history
- Converted unittest.TestCase to pytest where applicable
Plus: fix metadata.py to filter None values from ChromaDB metadata
(pre-existing bug caught by real-DB ingestion tests)
Enhance extract_metadata() with three new optional fields for page-aware chunking support. Validates list length mismatches. Fully backward compatible — existing callers unaffected.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
When uploading files, the backend passes them through NamedTemporaryFile, causing os.path.basename to return temp names like 'tmp90i7xqa8.pdf'. Added original_filename parameter to extract_metadata() so the actual upload filename is stored.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>