|
|
||
|---|---|---|
| .examples | ||
| .plans | ||
| backend | ||
| frontend | ||
| .env.txt | ||
| .gitignore | ||
| AGENTS.md | ||
| Dockerfile | ||
| README.md | ||
| development_plan.md | ||
| docker-compose.yml | ||
| nginx.conf | ||
| package-lock.json | ||
| package.json | ||
| start.sh | ||
README.md
LegCo Reranker
RAG-powered document Q&A app with video ASR. Upload PDFs, upload videos with Cantonese ASR transcription, ask questions, get bullet-point answers with citations.
Quick Start (Dev)
# Backend
cd backend
cp .env.example .env # edit .env with your LLM API key AND DashScope API key (for video ASR)
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
# Frontend
cd frontend
pnpm install
pnpm run dev
Backend → http://localhost:8000 | Frontend → http://localhost:5173
Deploy with Docker
Prerequisites
- Docker 24+ and Docker Compose v2
- OpenRouter API key (or compatible LLM provider)
- Alibaba Cloud DashScope API key (for video ASR transcription)
Setup
# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and model names
# 2. Build and start
docker compose up -d --build
# 3. Check health
curl http://localhost:8000/health
The app is served at http://localhost:8000 — both the API and the frontend UI.
Volumes
| Volume | Purpose |
|---|---|
chroma_data |
ChromaDB vector store (persistent) |
chunk_data |
Extracted PDF page files |
sqlite_data |
Prompt templates and query history |
uploads_data |
Uploaded video files (persistent) |
Environment Variables
All configurable via backend/.env:
| Variable | Default | Description |
|---|---|---|
LLM_BASE_URL |
https://openrouter.ai/api/v1 |
LLM API endpoint |
LLM_API_KEY |
— | API key for LLM provider |
LLM_MODEL_NAME |
qwen/qwen3.5-35b-a3b |
Chat model |
LLM_TIMEOUT |
60.0 |
LLM request timeout in seconds |
LLM_ENABLE_THINKING |
false |
Enable LLM thinking/reasoning tokens |
VLLM_ENGINE |
false |
Use vLLM-format extra_body instead of OpenRouter |
EMBEDDING_MODEL |
qwen/qwen3-embedding-4b |
Embedding model |
EMBEDDING_BASE_URL |
https://openrouter.ai/api/v1 |
Embedding API endpoint |
EMBEDDING_API_KEY |
— | API key for embeddings (falls back to LLM_API_KEY) |
CHROMA_DB_PATH |
./chroma_db |
ChromaDB persistent storage |
CHUNK_SIZE |
1000 |
Token chunk size |
CHUNK_OVERLAP |
200 |
Token chunk overlap |
RETRIEVAL_N_RESULTS |
10 |
Chunks per sub-question |
RELEVANCE_THRESHOLD |
7.0 |
Min relevance score (0-10) |
PROMPTS_DB_PATH |
./data/prompts.db |
Prompt templates SQLite |
HISTORY_DB_PATH |
./data/history.db |
Query history SQLite |
CORS_ORIGINS |
["http://localhost:5173","http://localhost:3000"] |
Allowed CORS origins |
DASHSCOPE_API_KEY |
— | Alibaba Cloud DashScope API key (for video ASR) |
ASR_MODEL_NAME |
qwen3-asr-flash |
ASR model for batch transcription |
ASR_REALTIME_MODEL_NAME |
qwen3-asr-flash-realtime |
ASR model for real-time streaming |
VIDEO_UPLOAD_DIR |
./uploads |
Video file storage directory |
MAX_VIDEO_SIZE_MB |
300 |
Maximum video upload size |
SUPPORTED_VIDEO_FORMATS |
.mp4, .webm, .mov, .avi, .mkv |
Allowed video file extensions |
Production: Nginx Reverse Proxy
# Include nginx.conf in your site config
# Key settings:
# - client_max_body_size 350M (allow large PDF uploads)
# - proxy_read_timeout 300s (LLM calls can take minutes)
# Install nginx
sudo apt install nginx
# Copy config
sudo cp nginx.conf /etc/nginx/sites-available/legco
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx
Stopping
docker compose down
Updating
git pull
docker compose up -d --build
Cross-Platform Build (aarch64 → amd64)
When building on an aarch64/ARM64 machine (Apple Silicon, ARM Windows WSL2, Raspberry Pi) for deployment to an x86_64/amd64 server:
1. Install buildx
# Download buildx for arm64
BUILDX_VERSION=$(wget -qO- https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | head -1 | cut -d'"' -f4)
wget "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-arm64" -O ~/.docker/cli-plugins/docker-buildx
chmod +x ~/.docker/cli-plugins/docker-buildx
2. Register QEMU for amd64 emulation
docker run --privileged --rm tonistiigi/binfmt --install all
3. Build for amd64
DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t legco_reranker:amd64 .
4. Export and transfer to server
# Save image to tar file
docker save legco_reranker:amd64 -o legco_reranker_amd64.tar
# Compress (~762MB → ~250MB)
gzip legco_reranker_amd64.tar
# Transfer to server
scp legco_reranker_amd64.tar.gz user@server:/path/
# On the x86_64 server:
gunzip legco_reranker_amd64.tar.gz
docker load -i legco_reranker_amd64.tar
# Run
docker run -d --name legco -p 80:80 -p 443:443 --env-file backend/.env \
-v chroma_data:/app/chroma_db \
-v chunk_data:/app/document_chunk \
-v sqlite_data:/app/data \
legco_reranker:amd64
5. Test run (local, port 8888)
Before transferring to the server, test the amd64 image locally. Pass all config inline (no --env-file):
docker run -d --name legco_test -p 8888:443 \
-e LLM_BASE_URL=https://openrouter.ai/api/v1 \
-e LLM_API_KEY=your_key_here \
-e LLM_MODEL_NAME=qwen/qwen3.6-35b-a3b \
-e LLM_TIMEOUT=60.0 \
-e LLM_ENABLE_THINKING=false \
-e VLLM_ENGINE=false \
-e EMBEDDING_MODEL=qwen/qwen3-embedding-4b \
-e EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 \
-e EMBEDDING_API_KEY=your_key_here \
-e CHROMA_DB_PATH=./chroma_db \
-e CHUNK_SIZE=1000 \
-e CHUNK_OVERLAP=200 \
-e RETRIEVAL_N_RESULTS=10 \
-e RELEVANCE_THRESHOLD=7.0 \
-e PROMPTS_DB_PATH=./data/prompts.db \
-e HISTORY_DB_PATH=./data/history.db \
-e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
-e DASHSCOPE_API_KEY=your_dashscope_key \
-e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
-e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
-e VIDEO_UPLOAD_DIR=./uploads \
-e MAX_VIDEO_SIZE_MB=300 \
-v ~/woody/legco/data/chroma_db:/app/chroma_db \
-v ~/woody/legco/data/document_chunk:/app/document_chunk \
-v ~/woody/legco/data/data:/app/data \
legco_reranker:amd64.01.02
# Verify (accept self-signed cert with -k)
curl -k https://localhost:8888/health
# Clean up
docker rm -f legco_test
Architecture
User → Nginx (80) → Uvicorn (8000)
├── FastAPI API (/api/v1/*)
└── Static Frontend (/*)
└── React 18 + Vite + Tailwind
RAG Pipeline (Per-Sub-Question)
User Question
→ [LLM] Decompose into 2-5 sub-questions
→ [ChromaDB] Retrieve 10 chunks per sub-question
→ [LLM] Score all chunks against their own sub-question (single call)
→ [LLM] Generate markdown response per sub-question
→ SSE stream with per-sub-question sources
Video Q&A (Phase 2)
Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline
Streaming Mode (real-time):
- Upload video → press play → transcript flows into QueryInput in real time
- Audio captured from video element (no microphone needed)
- Auto-starts on play, stops on pause/end
Full Transcript Mode (batch):
- Click "Full Transcript" button under video player
- Server extracts audio via ffmpeg → Full DashScope transcription
- Complete transcript fills QueryInput
Requirements:
DASHSCOPE_API_KEYin.envffmpegon server (for batch transcription)dashscopePython package (inrequirements.txt)
System Audio Capture & Listen Mic (Phase 4)
Two additional live audio sources alongside video Upload:
System Audio Capture
Captures audio output from any application on your computer (browser tab, Spotify, Zoom) via getDisplayMedia().
How to use:
- Select the "System Audio" tab in the LTTPage source selector
- Click "Start Capture"
- Choose a browser tab or window in the permission dialog — make sure "Share audio" is checked
- Real-time Cantonese ASR transcription flows into the QueryInput
- Edit the transcript while capturing continues, then submit your query
Use cases: Transcribing YouTube videos, podcasts, lectures, or meetings playing on your computer without downloading files.
Listen Mic
Captures microphone input via getUserMedia().
How to use:
- Select the "Listen Mic" tab
- Click "Start Listening"
- Allow microphone access when prompted
- Speak — real-time transcription flows into QueryInput
- Edit transcript while listening, then submit your query
Use cases: Recording live meetings, dictating questions verbally, transcribing spoken Cantonese in real time.
Browser Compatibility
System Audio (getDisplayMedia):
| Platform / Browser | Tab Audio | System Audio | Supported |
|---|---|---|---|
| Chrome/Edge (Windows) | ✅ | ✅ | Full support |
| Chrome/Edge (macOS 14.2+) | ✅ | ✅ | Full support |
| Chrome/Edge (Linux) | ✅ | ❌ | Tab audio only |
| Firefox | ❌ | ❌ | Not supported |
| Safari | ❌ | ❌ | Not supported |
Listen Mic (getUserMedia): Universally supported in all modern browsers (Chrome, Firefox, Safari, Edge).
Limitations
- System Audio capture requires Chrome or Edge (Chromium-based browsers)
- No "Full Transcript" button — streaming ASR only (no batch transcription for live sources)
getDisplayMedia()always shows a screen/tab picker even for audio-only capture (browser limitation)- Each capture session generates a new UUID; the WebSocket reconnects on every Start/Stop
Configuration
# In backend/.env — feature toggles (default: true)
SYSTEM_AUDIO_ENABLED=true
MIC_ENABLED=true
Installing ffmpeg
# Ubuntu/Debian
sudo apt install ffmpeg
# macOS
brew install ffmpeg
# Static build (no root, any Linux)
mkdir -p ~/.local/bin
wget -qO- https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz | tar -xJ -C /tmp
cp /tmp/ffmpeg-*-static/ffmpeg ~/.local/bin/
Accuracy Testing API (Package 9)
Backend endpoints for generating test results and evaluating RAG pipeline accuracy. Designed for programmatic use — call via curl, Python requests, or any HTTP client.
All endpoints are at /api/v1/test/* and accessible through the same domain as the frontend (nginx proxies all paths to FastAPI).
1. Generate Test Result (Text)
Run the full RAG pipeline on a text question and capture every intermediate stage.
# Basic text generation
curl http://localhost:8000/api/v1/test/generate/text \
-H "Content-Type: application/json" \
-d '{
"question": "立法會今日討論咗咩議題?",
"profile": "A",
"label": "Test run 2026-05-25"
}'
# Response includes:
# - extracted_key_questions: decomposed sub-questions
# - retrieval: per-sub-question chunks with metadata and distance scores
# - filtered: chunks after relevance filter (with relevance_score)
# - response: final answer with source citations
# - timing: per-stage timing in ms
Response (partial):
{
"result_id": "a1b2c3d4e5f6",
"input_type": "text",
"profile": "A",
"label": "Test run 2026-05-25",
"input": { "text": "立法會今日討論咗咩議題?" },
"extracted_key_questions": [
"立法會:今日討論的主要議題",
"會議記錄:近期立法會會議的討論內容"
],
"retrieval": {
"total_chunks_retrieved": 20,
"retriever_time_ms": 456
},
"filtered": {
"total_chunks_filtered": 14,
"filter_time_ms": 789
},
"response": {
"final_answer": "## Sub-question 0: ...\n\n- 今日立法會討論了三項主要議題... [meeting_minutes.pdf, page 1]",
"generate_time_ms": 1011
},
"timing": {
"decomposer_time_ms": 234,
"total_time_ms": 2490
}
}
2. Generate Test Result (Audio)
Transcribe audio via ASR, then run the RAG pipeline on the transcribed text. Optionally provide a reference transcript for later CER/WER evaluation.
# Audio generation with reference transcript (for later CER/WER scoring)
curl http://localhost:8000/api/v1/test/generate/audio \
-F "audio_file=@legco_clip.wav" \
-F "profile=A" \
-F "reference_transcript=立法會今日討論咗咩議題?" \
-F "language=yue" \
-F "label=Cantonese LegCo audio test"
# Without reference transcript (CER/WER will return N/A in evaluation)
curl http://localhost:8000/api/v1/test/generate/audio \
-F "audio_file=@meeting.mp3" \
-F "profile=B" \
-F "language=yue"
Compared to the text endpoint, the audio result includes extra fields:
{
"input_type": "audio",
"input": {
"text": "立法會今日討論咗咩議題?",
"reference_transcript": "立法會今日討論咗咩議題?",
"audio_filename": "legco_clip.wav",
"audio_duration_seconds": 45.2,
"asr_language": "yue"
},
"timing": {
"asr_time_ms": 1234,
"total_time_ms": 3724
}
}
3. Evaluate Test Result
Run all four evaluation dimensions on a previously generated result:
- (i) Audio transcription accuracy — CER/WER (only for audio inputs with reference transcript)
- (ii) Key questions quality — Two evaluator LLMs score against 4-dimension rubric, scores averaged
- (iii) Chunk accuracy — LLM determines ground truth chunks, computes precision/recall/F1
- (iv) Response completeness — Generate ideal response from ground truth chunks, compare
# Evaluate a previously saved result
curl http://localhost:8000/api/v1/test/evaluate \
-H "Content-Type: application/json" \
-d '{
"result_id": "a1b2c3d4e5f6",
"evaluation_config": {
"key_questions_evaluators": [
{
"model_name": "deepseek-v4-pro",
"base_url": "https://api.deepseek.com",
"api_key_env": "DP_API_KEY",
"enable_thinking": true
},
{
"model_name": "qwen3-7b-max",
"base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"api_key_env": "DASHSCOPE_API_KEY",
"enable_thinking": true
}
],
"chunk_evaluator": {
"model_name": "qwen/qwen3.6-35b-a3b",
"base_url": "https://openrouter.ai/api/v1",
"api_key_env": "LLM_API_KEY",
"enable_thinking": true
},
"response_evaluator": {
"model_name": "qwen/qwen3.6-35b-a3b",
"base_url": "https://openrouter.ai/api/v1",
"api_key_env": "LLM_API_KEY",
"enable_thinking": true
}
}
}'
Response (partial — shows scoring structure):
{
"evaluation_id": "eval-abc123",
"result_id": "a1b2c3d4e5f6",
"status": "completed",
"audio_evaluation": {
"status": "completed",
"cer": 0.052,
"wer": 0.083
},
"key_questions_evaluation": {
"average_scores": {
"dimension_1_準確性": 36.0,
"dimension_2_完整性": 22.5,
"dimension_3_清晰度": 17.5,
"dimension_4_簡潔性": 13.5
},
"average_total": 89.5
},
"chunk_evaluation": {
"overall_unfiltered": { "avg_precision": 0.60, "avg_recall": 1.00, "avg_f1": 0.75 },
"overall_filtered": { "avg_precision": 1.00, "avg_recall": 1.00, "avg_f1": 1.00 }
},
"response_evaluation": {
"overall_completeness": 0.85,
"overall_factual_accuracy": 0.92
}
}
4. Manage Results & Evaluations
# List all saved test results
curl http://localhost:8000/api/v1/test/results?limit=10&offset=0
# Retrieve a specific result
curl http://localhost:8000/api/v1/test/results/a1b2c3d4e5f6
# Delete a result
curl -X DELETE http://localhost:8000/api/v1/test/results/a1b2c3d4e5f6
# List all evaluation results
curl http://localhost:8000/api/v1/test/evaluations?limit=10
# Retrieve a specific evaluation
curl http://localhost:8000/api/v1/test/evaluations/eval-abc123
# Delete an evaluation
curl -X DELETE http://localhost:8000/api/v1/test/evaluations/eval-abc123
Full Workflow Example
# 1. Generate a test result
RESULT=$(curl -s http://localhost:8000/api/v1/test/generate/text \
-H "Content-Type: application/json" \
-d '{"question": "立法會討論咗咩房屋政策?", "profile": "A", "label": "housing policy test"}')
RESULT_ID=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['result_id'])")
echo "Generated result: $RESULT_ID"
# 2. Evaluate that result
curl -s http://localhost:8000/api/v1/test/evaluate \
-H "Content-Type: application/json" \
-d "{
\"result_id\": \"$RESULT_ID\",
\"evaluation_config\": {
\"key_questions_evaluators\": [
{\"model_name\": \"deepseek-v4-pro\", \"base_url\": \"https://api.deepseek.com\", \"api_key_env\": \"DP_API_KEY\", \"enable_thinking\": true},
{\"model_name\": \"qwen3-7b-max\", \"base_url\": \"https://dashscope.aliyuncs.com/compatible-mode/v1\", \"api_key_env\": \"DASHSCOPE_API_KEY\", \"enable_thinking\": true}
],
\"chunk_evaluator\": {\"model_name\": \"qwen/qwen3.6-35b-a3b\", \"base_url\": \"https://openrouter.ai/api/v1\", \"api_key_env\": \"LLM_API_KEY\", \"enable_thinking\": true},
\"response_evaluator\": {\"model_name\": \"qwen/qwen3.6-35b-a3b\", \"base_url\": \"https://openrouter.ai/api/v1\", \"api_key_env\": \"LLM_API_KEY\", \"enable_thinking\": true}
}
}" | python3 -m json.tool
# 3. Check results
curl -s http://localhost:8000/api/v1/test/results?limit=5 | python3 -m json.tool
curl -s http://localhost:8000/api/v1/test/evaluations?limit=5 | python3 -m json.tool
Key Questions Marking Scheme (4 Dimensions)
| 維度 | 權重 | 滿分 |
|---|---|---|
| 1. 準確性 (Fidelity) | 40分 | 完全忠於原意,數字/關鍵詞無誤 |
| 2. 完整性 (Completeness) | 25分 | 涵蓋所有關鍵元素(問題+背景+目的) |
| 3. 清晰度 (Clarity) | 20分 | 語言精準、邏輯清楚、易讀易懂 |
| 4. 簡潔性 (Conciseness) | 15分 | 最少字數表達最完整意思 |
Requirements for Evaluation
- All evaluation prompts, marking schemes, and LLM interactions are in Chinese
- Both key questions evaluator models must succeed (3 retries each) — no partial scores
- Chunk evaluation processes ALL chunks from ALL documents in batches of 10
- Thinking mode (
enable_thinking: true) is enabled on all evaluation models - Stored results and evaluations are not auto-deleted — manage via DELETE endpoints
Notes
- PDF upload limit: 300MB
- Video upload limit: 300MB (same as PDF)
- ffmpeg required on server (for video transcription)
- DashScope ASR supports Cantonese (yue), Mandarin (zh), English (en), auto-detect
- Desktop only (not mobile-optimized)
- No authentication (public demo)
- All LLM calls routed through configurable base URL