legco_ai_assistant/README.md

# LegCo Reranker

RAG-powered document Q&A app. Upload PDFs, ask questions in Cantonese, get bullet-point answers with citations.

## Quick Start (Dev)

```bash
# Backend
cd backend
cp .env.example .env    # edit .env with your LLM API key
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Frontend
cd frontend
npm install
npm run dev
```

Backend → `http://localhost:8000` | Frontend → `http://localhost:5173`

## Deploy with Docker

### Prerequisites

- Docker 24+ and Docker Compose v2
- OpenRouter API key (or compatible LLM provider)

### Setup

```bash
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and model names

# 2. Build and start
docker compose up -d --build

# 3. Check health
curl http://localhost:8000/health
```

The app is served at `http://localhost:8000` — both the API and the frontend UI.

### Volumes

| Volume | Purpose |
|--------|---------|
| `chroma_data` | ChromaDB vector store (persistent) |
| `chunk_data` | Extracted PDF page files |
| `sqlite_data` | Prompt templates and query history |

### Environment Variables

All configurable via `backend/.env`:

| Variable | Default | Description |
|----------|---------|-------------|
| `LLM_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint |
| `LLM_API_KEY` | — | API key for LLM provider |
| `LLM_MODEL_NAME` | `qwen/qwen3.5-35b-a3b` | Chat model |
| `EMBEDDING_MODEL` | `qwen/qwen3-embedding-4b` | Embedding model |
| `EMBEDDING_API_KEY` | — | API key for embeddings (falls back to `LLM_API_KEY`) |
| `RETRIEVAL_N_RESULTS` | `10` | Chunks per sub-question |
| `RELEVANCE_THRESHOLD` | `7.0` | Min relevance score (0-10) |
| `LLM_TIMEOUT` | `60.0` | LLM request timeout in seconds |

### Production: Nginx Reverse Proxy

```nginx
# Include nginx.conf in your site config
# Key settings:
# - client_max_body_size 350M   (allow large PDF uploads)
# - proxy_read_timeout 300s     (LLM calls can take minutes)
```

```bash
# Install nginx
sudo apt install nginx

# Copy config
sudo cp nginx.conf /etc/nginx/sites-available/legco
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
```

### Stopping

```bash
docker compose down
```

### Updating

```bash
git pull
docker compose up -d --build
```

## Architecture

```
User → Nginx (80) → Uvicorn (8000)
                         ├── FastAPI API (/api/v1/*)
                         └── Static Frontend (/*)
                              └── React 18 + Vite + Tailwind
```

### RAG Pipeline (Per-Sub-Question)

```
User Question
  → [LLM] Decompose into 2-5 sub-questions
  → [ChromaDB] Retrieve 10 chunks per sub-question
  → [LLM] Score all chunks against their own sub-question (single call)
  → [LLM] Generate markdown response per sub-question
  → SSE stream with per-sub-question sources
```

## Notes

- PDF upload limit: 300MB
- Desktop only (not mobile-optimized)
- No authentication (public demo)
- All LLM calls routed through configurable base URL