legco_ai_assistant/.plans/debug_2026-04-23_upload_500...

4.4 KiB

Debug Log: Upload 500 Error — Phase 1 Frontend

Date: 2026-04-23 Issue: Document upload (DOCX/PDF) via frontend returns "Request failed with status code 500" Status: Resolved


Symptoms

  • Uploading NEC4 ACC.docx → HTTP 500: DOCX library is not installed
  • Uploading NEC4 ACC.pdf → HTTP 500: 'function' object has no attribute 'name'
  • Query endpoint also failing with same .name error

Root Cause Analysis

Environment: Backend was running on global Anaconda Python 3.13 with packages that did NOT match requirements.txt.

Package requirements.txt Actually Installed Impact
python-docx 1.1.0 Missing DOCX parsing fails
chromadb 0.4.22 1.5.8 API mismatch — embedding function signature changed
numpy (transitive) 2.4.4 ChromaDB 0.4.22 uses np.float_ (removed in NumPy 2.0)
pytest 8.0.0 8.0.0 Conflicts with pytest-asyncio==0.23.4 (requires pytest<8)

Fixes Applied

1. Created Project Venv (Python 3.11)

The pinned packages in requirements.txt require Python ≤3.11 (tiktoken, pydantic-core have no prebuilt wheels for 3.13).

cd backend
python3.11 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Also fixed: pytest==8.0.0pytest==7.4.4 in requirements.txt (dependency conflict).

2. Fixed NumPy Compatibility

ChromaDB 0.4.22 references np.float_ which was removed in NumPy 2.0.

pip install 'numpy<2'  # Downgraded to 1.26.4

3. Cleared Incompatible ChromaDB Database

Old backend/chroma_db/ was created by ChromaDB 1.5.8 and incompatible with 0.4.22 schema.

rm -rf backend/chroma_db

4. Fixed Embedding Function Wrapper

ChromaDB 0.4.22 validates embedding function signatures against EmbeddingFunction protocol (__call__(self, input)). The original code passed a plain function which:

  1. Failed signature validation ('function' object has no attribute 'name')
  2. Used asyncio.run() which cannot be called inside a running event loop

File: backend/app/core/database.py

Before:

def get_embedding_function_settings(settings):
    def _wrap(texts: list[str]) -> list[list[float]]:
        return asyncio.run(client.embed(texts))
    return _wrap

After:

class _EmbeddingFunctionWrapper:
    def __init__(self, settings):
        self.settings = settings

    def __call__(self, input):
        from concurrent.futures import ThreadPoolExecutor

        def _run_in_thread(texts):
            client = EmbeddingClient(self.settings)
            loop = asyncio.new_event_loop()
            asyncio.set_event_loop(loop)
            try:
                return loop.run_until_complete(client.embed(texts))
            finally:
                loop.close()

        with ThreadPoolExecutor(max_workers=1) as executor:
            return executor.submit(_run_in_thread, input).result()

Key changes:

  • Class-based wrapper with __call__(self, input) signature matching protocol
  • Thread pool isolation for async event loop (avoids asyncio.run() inside running loop)
  • Per-call EmbeddingClient instance + fresh event loop in thread

Files Changed

File Change
backend/requirements.txt pytest==7.4.4 (was 8.0.0)
backend/app/core/database.py Added _EmbeddingFunctionWrapper class
backend/.venv/ New Python 3.11 venv (gitignored)
backend/chroma_db/ Cleared and recreated

Verification

Test Result
DOCX upload (NEC4 ACC.docx, 315KB) HTTP 200, 1 chunk
PDF upload (NEC4 ACC.pdf) HTTP 200, 101 chunks
Query endpoint ("What is NEC4 ACC?") HTTP 200, keywords + bullet answer + sources

Prevention

  1. Always use the venv: source backend/.venv/bin/activate before running backend
  2. Never run backend in global env: Package versions drift silently
  3. Clear chroma_db/ when upgrading ChromaDB: Schema is not forward-compatible
  4. Pin Python version: Add python_requires=">=3.9,<3.12" to project config

  • Frontend issue was a red herring — the frontend correctly displayed the 500 error. The actual bugs were all backend-side.
  • Query endpoint also affected because it shares the same RAGServiceget_chroma_client() → embedding function path.