Multi-stage Dockerfile: Node builds frontend, Python serves both API and static files. docker-compose.yml with named volumes for ChromaDB, chunks, and SQLite data. nginx.conf as reverse proxy with 350M upload limit and 300s LLM proxy timeout. README with dev setup, deploy steps, env vars table, and architecture diagram. Backend main.py: add catch-all route to serve frontend/dist/static files in production. Only activates when dist/ exists. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> |
||
|---|---|---|
| .plans | ||
| backend | ||
| frontend | ||
| .env.txt | ||
| .gitignore | ||
| AGENTS.md | ||
| Dockerfile | ||
| README.md | ||
| development_plan.md | ||
| docker-compose.yml | ||
| nginx.conf | ||
README.md
LegCo Reranker
RAG-powered document Q&A app. Upload PDFs, ask questions in Cantonese, get bullet-point answers with citations.
Quick Start (Dev)
# Backend
cd backend
cp .env.example .env # edit .env with your LLM API key
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Frontend
cd frontend
npm install
npm run dev
Backend → http://localhost:8000 | Frontend → http://localhost:5173
Deploy with Docker
Prerequisites
- Docker 24+ and Docker Compose v2
- OpenRouter API key (or compatible LLM provider)
Setup
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and model names
# 2. Build and start
docker compose up -d --build
# 3. Check health
curl http://localhost:8000/health
The app is served at http://localhost:8000 — both the API and the frontend UI.
Volumes
| Volume | Purpose |
|---|---|
chroma_data |
ChromaDB vector store (persistent) |
chunk_data |
Extracted PDF page files |
sqlite_data |
Prompt templates and query history |
Environment Variables
All configurable via backend/.env:
| Variable | Default | Description |
|---|---|---|
LLM_BASE_URL |
https://openrouter.ai/api/v1 |
LLM API endpoint |
LLM_API_KEY |
— | API key for LLM provider |
LLM_MODEL_NAME |
qwen/qwen3.5-35b-a3b |
Chat model |
EMBEDDING_MODEL |
qwen/qwen3-embedding-4b |
Embedding model |
EMBEDDING_API_KEY |
— | API key for embeddings (falls back to LLM_API_KEY) |
RETRIEVAL_N_RESULTS |
10 |
Chunks per sub-question |
RELEVANCE_THRESHOLD |
7.0 |
Min relevance score (0-10) |
LLM_TIMEOUT |
60.0 |
LLM request timeout in seconds |
Production: Nginx Reverse Proxy
# Include nginx.conf in your site config
# Key settings:
# - client_max_body_size 350M (allow large PDF uploads)
# - proxy_read_timeout 300s (LLM calls can take minutes)
# Install nginx
sudo apt install nginx
# Copy config
sudo cp nginx.conf /etc/nginx/sites-available/legco
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
Stopping
docker compose down
Updating
git pull
docker compose up -d --build
Architecture
User → Nginx (80) → Uvicorn (8000)
├── FastAPI API (/api/v1/*)
└── Static Frontend (/*)
└── React 18 + Vite + Tailwind
RAG Pipeline (Per-Sub-Question)
User Question
→ [LLM] Decompose into 2-5 sub-questions
→ [ChromaDB] Retrieve 10 chunks per sub-question
→ [LLM] Score all chunks against their own sub-question (single call)
→ [LLM] Generate markdown response per sub-question
→ SSE stream with per-sub-question sources
Notes
- PDF upload limit: 300MB
- Desktop only (not mobile-optimized)
- No authentication (public demo)
- All LLM calls routed through configurable base URL