# Debug Log: Upload 500 Error — Phase 1 Frontend **Date**: 2026-04-23 **Issue**: Document upload (DOCX/PDF) via frontend returns "Request failed with status code 500" **Status**: ✅ Resolved --- ## Symptoms - Uploading `NEC4 ACC.docx` → HTTP 500: `DOCX library is not installed` - Uploading `NEC4 ACC.pdf` → HTTP 500: `'function' object has no attribute 'name'` - Query endpoint also failing with same `.name` error --- ## Root Cause Analysis **Environment**: Backend was running on global Anaconda Python 3.13 with packages that did NOT match `requirements.txt`. | Package | requirements.txt | Actually Installed | Impact | |---------|-----------------|-------------------|--------| | python-docx | 1.1.0 | **Missing** | DOCX parsing fails | | chromadb | 0.4.22 | 1.5.8 | API mismatch — embedding function signature changed | | numpy | (transitive) | 2.4.4 | ChromaDB 0.4.22 uses `np.float_` (removed in NumPy 2.0) | | pytest | 8.0.0 | 8.0.0 | Conflicts with pytest-asyncio==0.23.4 (requires pytest<8) | --- ## Fixes Applied ### 1. Created Project Venv (Python 3.11) The pinned packages in `requirements.txt` require Python ≤3.11 (tiktoken, pydantic-core have no prebuilt wheels for 3.13). ```bash cd backend python3.11 -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` **Also fixed**: `pytest==8.0.0` → `pytest==7.4.4` in `requirements.txt` (dependency conflict). ### 2. Fixed NumPy Compatibility ChromaDB 0.4.22 references `np.float_` which was removed in NumPy 2.0. ```bash pip install 'numpy<2' # Downgraded to 1.26.4 ``` ### 3. Cleared Incompatible ChromaDB Database Old `backend/chroma_db/` was created by ChromaDB 1.5.8 and incompatible with 0.4.22 schema. ```bash rm -rf backend/chroma_db ``` ### 4. Fixed Embedding Function Wrapper ChromaDB 0.4.22 validates embedding function signatures against `EmbeddingFunction` protocol (`__call__(self, input)`). The original code passed a plain function which: 1. Failed signature validation (`'function' object has no attribute 'name'`) 2. Used `asyncio.run()` which cannot be called inside a running event loop **File**: `backend/app/core/database.py` **Before**: ```python def get_embedding_function_settings(settings): def _wrap(texts: list[str]) -> list[list[float]]: return asyncio.run(client.embed(texts)) return _wrap ``` **After**: ```python class _EmbeddingFunctionWrapper: def __init__(self, settings): self.settings = settings def __call__(self, input): from concurrent.futures import ThreadPoolExecutor def _run_in_thread(texts): client = EmbeddingClient(self.settings) loop = asyncio.new_event_loop() asyncio.set_event_loop(loop) try: return loop.run_until_complete(client.embed(texts)) finally: loop.close() with ThreadPoolExecutor(max_workers=1) as executor: return executor.submit(_run_in_thread, input).result() ``` Key changes: - Class-based wrapper with `__call__(self, input)` signature matching protocol - Thread pool isolation for async event loop (avoids `asyncio.run()` inside running loop) - Per-call `EmbeddingClient` instance + fresh event loop in thread --- ## Files Changed | File | Change | |------|--------| | `backend/requirements.txt` | `pytest==7.4.4` (was `8.0.0`) | | `backend/app/core/database.py` | Added `_EmbeddingFunctionWrapper` class | | `backend/.venv/` | New Python 3.11 venv (gitignored) | | `backend/chroma_db/` | Cleared and recreated | --- ## Verification | Test | Result | |------|--------| | DOCX upload (`NEC4 ACC.docx`, 315KB) | ✅ HTTP 200, 1 chunk | | PDF upload (`NEC4 ACC.pdf`) | ✅ HTTP 200, 101 chunks | | Query endpoint (`"What is NEC4 ACC?"`) | ✅ HTTP 200, keywords + bullet answer + sources | --- ## Prevention 1. **Always use the venv**: `source backend/.venv/bin/activate` before running backend 2. **Never run backend in global env**: Package versions drift silently 3. **Clear `chroma_db/` when upgrading ChromaDB**: Schema is not forward-compatible 4. **Pin Python version**: Add `python_requires=">=3.9,<3.12"` to project config --- ## Related - Frontend issue was a **red herring** — the frontend correctly displayed the 500 error. The actual bugs were all backend-side. - Query endpoint also affected because it shares the same `RAGService` → `get_chroma_client()` → embedding function path.