isDisabled, handleSubmit, and Half Question onClick all checked question.trim() instead of displayValue.trim(). Since question state is only updated on onFinalTranscript (complete sentences), interim ASR delta text shown in the textarea via partialText was invisible to the disabled check — buttons stayed disabled until sentence end. Fix: use displayValue which includes partialText when user hasn't typed. |
||
|---|---|---|
| .examples | ||
| .plans | ||
| backend | ||
| frontend | ||
| .env.txt | ||
| .gitignore | ||
| AGENTS.md | ||
| Dockerfile | ||
| README.md | ||
| development_plan.md | ||
| docker-compose.yml | ||
| nginx.conf | ||
| package-lock.json | ||
| package.json | ||
README.md
LegCo Reranker
RAG-powered document Q&A app with video ASR. Upload PDFs, upload videos with Cantonese ASR transcription, ask questions, get bullet-point answers with citations.
Quick Start (Dev)
# Backend
cd backend
cp .env.example .env # edit .env with your LLM API key AND DashScope API key (for video ASR)
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Frontend
cd frontend
pnpm install
pnpm run dev
Backend → http://localhost:8000 | Frontend → http://localhost:5173
Deploy with Docker
Prerequisites
- Docker 24+ and Docker Compose v2
- OpenRouter API key (or compatible LLM provider)
- Alibaba Cloud DashScope API key (for video ASR transcription)
Setup
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and model names
# 2. Build and start
docker compose up -d --build
# 3. Check health
curl http://localhost:8000/health
The app is served at http://localhost:8000 — both the API and the frontend UI.
Volumes
| Volume | Purpose |
|---|---|
chroma_data |
ChromaDB vector store (persistent) |
chunk_data |
Extracted PDF page files |
sqlite_data |
Prompt templates and query history |
uploads_data |
Uploaded video files (persistent) |
Environment Variables
All configurable via backend/.env:
| Variable | Default | Description |
|---|---|---|
LLM_BASE_URL |
https://openrouter.ai/api/v1 |
LLM API endpoint |
LLM_API_KEY |
— | API key for LLM provider |
LLM_MODEL_NAME |
qwen/qwen3.5-35b-a3b |
Chat model |
LLM_TIMEOUT |
60.0 |
LLM request timeout in seconds |
LLM_ENABLE_THINKING |
false |
Enable LLM thinking/reasoning tokens |
VLLM_ENGINE |
false |
Use vLLM-format extra_body instead of OpenRouter |
EMBEDDING_MODEL |
qwen/qwen3-embedding-4b |
Embedding model |
EMBEDDING_BASE_URL |
https://openrouter.ai/api/v1 |
Embedding API endpoint |
EMBEDDING_API_KEY |
— | API key for embeddings (falls back to LLM_API_KEY) |
CHROMA_DB_PATH |
./chroma_db |
ChromaDB persistent storage |
CHUNK_SIZE |
1000 |
Token chunk size |
CHUNK_OVERLAP |
200 |
Token chunk overlap |
RETRIEVAL_N_RESULTS |
10 |
Chunks per sub-question |
RELEVANCE_THRESHOLD |
7.0 |
Min relevance score (0-10) |
PROMPTS_DB_PATH |
./data/prompts.db |
Prompt templates SQLite |
HISTORY_DB_PATH |
./data/history.db |
Query history SQLite |
CORS_ORIGINS |
["http://localhost:5173","http://localhost:3000"] |
Allowed CORS origins |
DASHSCOPE_API_KEY |
— | Alibaba Cloud DashScope API key (for video ASR) |
ASR_MODEL_NAME |
qwen3-asr-flash |
ASR model for batch transcription |
ASR_REALTIME_MODEL_NAME |
qwen3-asr-flash-realtime |
ASR model for real-time streaming |
VIDEO_UPLOAD_DIR |
./uploads |
Video file storage directory |
MAX_VIDEO_SIZE_MB |
300 |
Maximum video upload size |
SUPPORTED_VIDEO_FORMATS |
.mp4, .webm, .mov, .avi, .mkv |
Allowed video file extensions |
Production: Nginx Reverse Proxy
# Include nginx.conf in your site config
# Key settings:
# - client_max_body_size 350M (allow large PDF uploads)
# - proxy_read_timeout 300s (LLM calls can take minutes)
# Install nginx
sudo apt install nginx
# Copy config
sudo cp nginx.conf /etc/nginx/sites-available/legco
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
Stopping
docker compose down
Updating
git pull
docker compose up -d --build
Cross-Platform Build (aarch64 → amd64)
When building on an aarch64/ARM64 machine (Apple Silicon, ARM Windows WSL2, Raspberry Pi) for deployment to an x86_64/amd64 server:
1. Install buildx
# Download buildx for arm64
BUILDX_VERSION=$(wget -qO- https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | head -1 | cut -d'"' -f4)
wget "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-arm64" -O ~/.docker/cli-plugins/docker-buildx
chmod +x ~/.docker/cli-plugins/docker-buildx
2. Register QEMU for amd64 emulation
docker run --privileged --rm tonistiigi/binfmt --install all
3. Build for amd64
DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t legco_reranker:amd64 .
4. Export and transfer to server
# Save image to tar file
docker save legco_reranker:amd64 -o legco_reranker_amd64.tar
# Compress (~762MB → ~250MB)
gzip legco_reranker_amd64.tar
# Transfer to server
scp legco_reranker_amd64.tar.gz user@server:/path/
# On the x86_64 server:
gunzip legco_reranker_amd64.tar.gz
docker load -i legco_reranker_amd64.tar
# Run
docker run -d --name legco -p 80:8000 --env-file backend/.env \
-v chroma_data:/app/chroma_db \
-v chunk_data:/app/document_chunk \
-v sqlite_data:/app/data \
legco_reranker:amd64
5. Test run (local, port 8888)
Before transferring to the server, test the amd64 image locally. Pass all config inline (no --env-file):
docker run -d --name legco_test -p 8888:8000 \
-e LLM_BASE_URL=https://openrouter.ai/api/v1 \
-e LLM_API_KEY=your_key_here \
-e LLM_MODEL_NAME=qwen/qwen3.6-35b-a3b \
-e LLM_TIMEOUT=60.0 \
-e LLM_ENABLE_THINKING=false \
-e VLLM_ENGINE=false \
-e EMBEDDING_MODEL=qwen/qwen3-embedding-4b \
-e EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 \
-e EMBEDDING_API_KEY=your_key_here \
-e CHROMA_DB_PATH=./chroma_db \
-e CHUNK_SIZE=1000 \
-e CHUNK_OVERLAP=200 \
-e RETRIEVAL_N_RESULTS=10 \
-e RELEVANCE_THRESHOLD=7.0 \
-e PROMPTS_DB_PATH=./data/prompts.db \
-e HISTORY_DB_PATH=./data/history.db \
-e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
-e DASHSCOPE_API_KEY=your_dashscope_key \
-e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
-e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
-e VIDEO_UPLOAD_DIR=./uploads \
-e MAX_VIDEO_SIZE_MB=300 \
-v ~/woody/legco/data/chroma_db:/app/chroma_db \
-v ~/woody/legco/data/document_chunk:/app/document_chunk \
-v ~/woody/legco/data/data:/app/data \
legco_reranker:amd64.01.02
# Verify
curl http://localhost:8888/health
# Clean up
docker rm -f legco_test
Architecture
User → Nginx (80) → Uvicorn (8000)
├── FastAPI API (/api/v1/*)
└── Static Frontend (/*)
└── React 18 + Vite + Tailwind
RAG Pipeline (Per-Sub-Question)
User Question
→ [LLM] Decompose into 2-5 sub-questions
→ [ChromaDB] Retrieve 10 chunks per sub-question
→ [LLM] Score all chunks against their own sub-question (single call)
→ [LLM] Generate markdown response per sub-question
→ SSE stream with per-sub-question sources
Video Q&A (Phase 2)
Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline
Streaming Mode (real-time):
- Upload video → press play → transcript flows into QueryInput in real time
- Audio captured from video element (no microphone needed)
- Auto-starts on play, stops on pause/end
Full Transcript Mode (batch):
- Click "Full Transcript" button under video player
- Server extracts audio via ffmpeg → Full DashScope transcription
- Complete transcript fills QueryInput
Requirements:
DASHSCOPE_API_KEYin.envffmpegon server (for batch transcription)dashscopePython package (inrequirements.txt)
Installing ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Static build (no root, any Linux)
mkdir -p ~/.local/bin
wget -qO- https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz | tar -xJ -C /tmp
cp /tmp/ffmpeg-*-static/ffmpeg ~/.local/bin/
Notes
- PDF upload limit: 300MB
- Video upload limit: 300MB (same as PDF)
- ffmpeg required on server (for video transcription)
- DashScope ASR supports Cantonese (yue), Mandarin (zh), English (en), auto-detect
- Desktop only (not mobile-optimized)
- No authentication (public demo)
- All LLM calls routed through configurable base URL