# LegCo Reranker RAG-powered document Q&A app with video ASR. Upload PDFs, upload videos with Cantonese ASR transcription, ask questions, get bullet-point answers with citations. ## Quick Start (Dev) ```bash # Backend cd backend cp .env.example .env # edit .env with your LLM API key AND DashScope API key (for video ASR) pip install -r requirements.txt uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload # Frontend cd frontend pnpm install pnpm run dev ``` Backend → `http://localhost:8000` | Frontend → `http://localhost:5173` ## Deploy with Docker ### Prerequisites - Docker 24+ and Docker Compose v2 - OpenRouter API key (or compatible LLM provider) - Alibaba Cloud DashScope API key (for video ASR transcription) ### Setup ```bash # 1. Configure environment cp backend/.env.example backend/.env # Edit backend/.env with your API keys and model names # 2. Build and start docker compose up -d --build # 3. Check health curl http://localhost:8000/health ``` The app is served at `http://localhost:8000` — both the API and the frontend UI. ### Volumes | Volume | Purpose | |--------|---------| | `chroma_data` | ChromaDB vector store (persistent) | | `chunk_data` | Extracted PDF page files | | `sqlite_data` | Prompt templates and query history | | `uploads_data` | Uploaded video files (persistent) | ### Environment Variables All configurable via `backend/.env`: | Variable | Default | Description | |----------|---------|-------------| | `LLM_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint | | `LLM_API_KEY` | — | API key for LLM provider | | `LLM_MODEL_NAME` | `qwen/qwen3.5-35b-a3b` | Chat model | | `LLM_TIMEOUT` | `60.0` | LLM request timeout in seconds | | `LLM_ENABLE_THINKING` | `false` | Enable LLM thinking/reasoning tokens | | `VLLM_ENGINE` | `false` | Use vLLM-format `extra_body` instead of OpenRouter | | `EMBEDDING_MODEL` | `qwen/qwen3-embedding-4b` | Embedding model | | `EMBEDDING_BASE_URL` | `https://openrouter.ai/api/v1` | Embedding API endpoint | | `EMBEDDING_API_KEY` | — | API key for embeddings (falls back to `LLM_API_KEY`) | | `CHROMA_DB_PATH` | `./chroma_db` | ChromaDB persistent storage | | `CHUNK_SIZE` | `1000` | Token chunk size | | `CHUNK_OVERLAP` | `200` | Token chunk overlap | | `RETRIEVAL_N_RESULTS` | `10` | Chunks per sub-question | | `RELEVANCE_THRESHOLD` | `7.0` | Min relevance score (0-10) | | `PROMPTS_DB_PATH` | `./data/prompts.db` | Prompt templates SQLite | | `HISTORY_DB_PATH` | `./data/history.db` | Query history SQLite | | `CORS_ORIGINS` | `["http://localhost:5173","http://localhost:3000"]` | Allowed CORS origins | | `DASHSCOPE_API_KEY` | — | Alibaba Cloud DashScope API key (for video ASR) | | `ASR_MODEL_NAME` | `qwen3-asr-flash` | ASR model for batch transcription | | `ASR_REALTIME_MODEL_NAME` | `qwen3-asr-flash-realtime` | ASR model for real-time streaming | | `VIDEO_UPLOAD_DIR` | `./uploads` | Video file storage directory | | `MAX_VIDEO_SIZE_MB` | `300` | Maximum video upload size | | `SUPPORTED_VIDEO_FORMATS` | `.mp4, .webm, .mov, .avi, .mkv` | Allowed video file extensions | ### Production: Nginx Reverse Proxy ```nginx # Include nginx.conf in your site config # Key settings: # - client_max_body_size 350M (allow large PDF uploads) # - proxy_read_timeout 300s (LLM calls can take minutes) ``` ```bash # Install nginx sudo apt install nginx # Copy config sudo cp nginx.conf /etc/nginx/sites-available/legco sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/ sudo nginx -t && sudo systemctl reload nginx ``` ### Stopping ```bash docker compose down ``` ### Updating ```bash git pull docker compose up -d --build ``` ### Cross-Platform Build (aarch64 → amd64) When building on an aarch64/ARM64 machine (Apple Silicon, ARM Windows WSL2, Raspberry Pi) for deployment to an x86_64/amd64 server: #### 1. Install buildx ```bash # Download buildx for arm64 BUILDX_VERSION=$(wget -qO- https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | head -1 | cut -d'"' -f4) wget "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-arm64" -O ~/.docker/cli-plugins/docker-buildx chmod +x ~/.docker/cli-plugins/docker-buildx ``` #### 2. Register QEMU for amd64 emulation ```bash docker run --privileged --rm tonistiigi/binfmt --install all ``` #### 3. Build for amd64 ```bash DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t legco_reranker:amd64 . ``` #### 4. Export and transfer to server ```bash # Save image to tar file docker save legco_reranker:amd64 -o legco_reranker_amd64.tar # Compress (~762MB → ~250MB) gzip legco_reranker_amd64.tar # Transfer to server scp legco_reranker_amd64.tar.gz user@server:/path/ # On the x86_64 server: gunzip legco_reranker_amd64.tar.gz docker load -i legco_reranker_amd64.tar # Run docker run -d --name legco -p 80:80 -p 443:443 --env-file backend/.env \ -v chroma_data:/app/chroma_db \ -v chunk_data:/app/document_chunk \ -v sqlite_data:/app/data \ legco_reranker:amd64 ``` #### 5. Test run (local, port 8888) Before transferring to the server, test the amd64 image locally. Pass all config inline (no `--env-file`): ```bash docker run -d --name legco_test -p 8888:443 \ -e LLM_BASE_URL=https://openrouter.ai/api/v1 \ -e LLM_API_KEY=your_key_here \ -e LLM_MODEL_NAME=qwen/qwen3.6-35b-a3b \ -e LLM_TIMEOUT=60.0 \ -e LLM_ENABLE_THINKING=false \ -e VLLM_ENGINE=false \ -e EMBEDDING_MODEL=qwen/qwen3-embedding-4b \ -e EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 \ -e EMBEDDING_API_KEY=your_key_here \ -e CHROMA_DB_PATH=./chroma_db \ -e CHUNK_SIZE=1000 \ -e CHUNK_OVERLAP=200 \ -e RETRIEVAL_N_RESULTS=10 \ -e RELEVANCE_THRESHOLD=7.0 \ -e PROMPTS_DB_PATH=./data/prompts.db \ -e HISTORY_DB_PATH=./data/history.db \ -e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \ -e DASHSCOPE_API_KEY=your_dashscope_key \ -e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \ -e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \ -e VIDEO_UPLOAD_DIR=./uploads \ -e MAX_VIDEO_SIZE_MB=300 \ -v ~/woody/legco/data/chroma_db:/app/chroma_db \ -v ~/woody/legco/data/document_chunk:/app/document_chunk \ -v ~/woody/legco/data/data:/app/data \ legco_reranker:amd64.01.02 # Verify (accept self-signed cert with -k) curl -k https://localhost:8888/health # Clean up docker rm -f legco_test ``` ## Architecture ``` User → Nginx (80) → Uvicorn (8000) ├── FastAPI API (/api/v1/*) └── Static Frontend (/*) └── React 18 + Vite + Tailwind ``` ### RAG Pipeline (Per-Sub-Question) ``` User Question → [LLM] Decompose into 2-5 sub-questions → [ChromaDB] Retrieve 10 chunks per sub-question → [LLM] Score all chunks against their own sub-question (single call) → [LLM] Generate markdown response per sub-question → SSE stream with per-sub-question sources ``` ### Video Q&A (Phase 2) ``` Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline ``` **Streaming Mode (real-time):** - Upload video → press play → transcript flows into QueryInput in real time - Audio captured from video element (no microphone needed) - Auto-starts on play, stops on pause/end **Full Transcript Mode (batch):** - Click "Full Transcript" button under video player - Server extracts audio via ffmpeg → Full DashScope transcription - Complete transcript fills QueryInput **Requirements:** - `DASHSCOPE_API_KEY` in `.env` - `ffmpeg` on server (for batch transcription) - `dashscope` Python package (in `requirements.txt`) ### System Audio Capture & Listen Mic (Phase 4) Two additional live audio sources alongside video Upload: #### System Audio Capture Captures audio output from any application on your computer (browser tab, Spotify, Zoom) via `getDisplayMedia()`. **How to use:** 1. Select the **"System Audio"** tab in the LTTPage source selector 2. Click **"Start Capture"** 3. Choose a browser tab or window in the permission dialog — make sure **"Share audio"** is checked 4. Real-time Cantonese ASR transcription flows into the QueryInput 5. Edit the transcript while capturing continues, then submit your query **Use cases:** Transcribing YouTube videos, podcasts, lectures, or meetings playing on your computer without downloading files. #### Listen Mic Captures microphone input via `getUserMedia()`. **How to use:** 1. Select the **"Listen Mic"** tab 2. Click **"Start Listening"** 3. Allow microphone access when prompted 4. Speak — real-time transcription flows into QueryInput 5. Edit transcript while listening, then submit your query **Use cases:** Recording live meetings, dictating questions verbally, transcribing spoken Cantonese in real time. #### Browser Compatibility **System Audio (`getDisplayMedia`):** | Platform / Browser | Tab Audio | System Audio | Supported | |--------------------|-----------|-------------|-----------| | Chrome/Edge (Windows) | ✅ | ✅ | **Full support** | | Chrome/Edge (macOS 14.2+) | ✅ | ✅ | **Full support** | | Chrome/Edge (Linux) | ✅ | ❌ | Tab audio only | | Firefox | ❌ | ❌ | Not supported | | Safari | ❌ | ❌ | Not supported | **Listen Mic (`getUserMedia`):** Universally supported in all modern browsers (Chrome, Firefox, Safari, Edge). #### Limitations - System Audio capture requires Chrome or Edge (Chromium-based browsers) - No "Full Transcript" button — streaming ASR only (no batch transcription for live sources) - `getDisplayMedia()` always shows a screen/tab picker even for audio-only capture (browser limitation) - Each capture session generates a new UUID; the WebSocket reconnects on every Start/Stop #### Configuration ```bash # In backend/.env — feature toggles (default: true) SYSTEM_AUDIO_ENABLED=true MIC_ENABLED=true ``` ### Installing ffmpeg ```bash # Ubuntu/Debian sudo apt install ffmpeg # macOS brew install ffmpeg # Static build (no root, any Linux) mkdir -p ~/.local/bin wget -qO- https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz | tar -xJ -C /tmp cp /tmp/ffmpeg-*-static/ffmpeg ~/.local/bin/ ``` ## Accuracy Testing API (Package 9) Backend endpoints for generating test results and evaluating RAG pipeline accuracy. Designed for programmatic use — call via `curl`, Python `requests`, or any HTTP client. All endpoints are at `/api/v1/test/*` and accessible through the same domain as the frontend (nginx proxies all paths to FastAPI). ### 1. Generate Test Result (Text) Run the full RAG pipeline on a text question and capture every intermediate stage. ```bash # Basic text generation curl http://localhost:8000/api/v1/test/generate/text \ -H "Content-Type: application/json" \ -d '{ "question": "立法會今日討論咗咩議題?", "profile": "A", "label": "Test run 2026-05-25" }' # Response includes: # - extracted_key_questions: decomposed sub-questions # - retrieval: per-sub-question chunks with metadata and distance scores # - filtered: chunks after relevance filter (with relevance_score) # - response: final answer with source citations # - timing: per-stage timing in ms ``` **Response** (partial): ```json { "result_id": "a1b2c3d4e5f6", "input_type": "text", "profile": "A", "label": "Test run 2026-05-25", "input": { "text": "立法會今日討論咗咩議題?" }, "extracted_key_questions": [ "立法會:今日討論的主要議題", "會議記錄:近期立法會會議的討論內容" ], "retrieval": { "total_chunks_retrieved": 20, "retriever_time_ms": 456 }, "filtered": { "total_chunks_filtered": 14, "filter_time_ms": 789 }, "response": { "final_answer": "## Sub-question 0: ...\n\n- 今日立法會討論了三項主要議題... [meeting_minutes.pdf, page 1]", "generate_time_ms": 1011 }, "timing": { "decomposer_time_ms": 234, "total_time_ms": 2490 } } ``` ### 2. Generate Test Result (Audio) Transcribe audio via ASR, then run the RAG pipeline on the transcribed text. Optionally provide a reference transcript for later CER/WER evaluation. ```bash # Audio generation with reference transcript (for later CER/WER scoring) curl http://localhost:8000/api/v1/test/generate/audio \ -F "audio_file=@legco_clip.wav" \ -F "profile=A" \ -F "reference_transcript=立法會今日討論咗咩議題?" \ -F "language=yue" \ -F "label=Cantonese LegCo audio test" # Without reference transcript (CER/WER will return N/A in evaluation) curl http://localhost:8000/api/v1/test/generate/audio \ -F "audio_file=@meeting.mp3" \ -F "profile=B" \ -F "language=yue" ``` Compared to the text endpoint, the audio result includes extra fields: ```json { "input_type": "audio", "input": { "text": "立法會今日討論咗咩議題?", "reference_transcript": "立法會今日討論咗咩議題?", "audio_filename": "legco_clip.wav", "audio_duration_seconds": 45.2, "asr_language": "yue" }, "timing": { "asr_time_ms": 1234, "total_time_ms": 3724 } } ``` ### 3. Evaluate Test Result Run all four evaluation dimensions on a previously generated result: - **(i) Audio transcription accuracy** — CER/WER (only for audio inputs with reference transcript) - **(ii) Key questions quality** — Two evaluator LLMs score against 4-dimension rubric, scores averaged - **(iii) Chunk accuracy** — LLM determines ground truth chunks, computes precision/recall/F1 - **(iv) Response completeness** — Generate ideal response from ground truth chunks, compare ```bash # Evaluate a previously saved result curl http://localhost:8000/api/v1/test/evaluate \ -H "Content-Type: application/json" \ -d '{ "result_id": "a1b2c3d4e5f6", "evaluation_config": { "key_questions_evaluators": [ { "model_name": "deepseek-v4-pro", "base_url": "https://api.deepseek.com", "api_key_env": "DP_API_KEY", "enable_thinking": true }, { "model_name": "qwen3-7b-max", "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1", "api_key_env": "DASHSCOPE_API_KEY", "enable_thinking": true } ], "chunk_evaluator": { "model_name": "qwen/qwen3.6-35b-a3b", "base_url": "https://openrouter.ai/api/v1", "api_key_env": "LLM_API_KEY", "enable_thinking": true }, "response_evaluator": { "model_name": "qwen/qwen3.6-35b-a3b", "base_url": "https://openrouter.ai/api/v1", "api_key_env": "LLM_API_KEY", "enable_thinking": true } } }' ``` **Response** (partial — shows scoring structure): ```json { "evaluation_id": "eval-abc123", "result_id": "a1b2c3d4e5f6", "status": "completed", "audio_evaluation": { "status": "completed", "cer": 0.052, "wer": 0.083 }, "key_questions_evaluation": { "average_scores": { "dimension_1_準確性": 36.0, "dimension_2_完整性": 22.5, "dimension_3_清晰度": 17.5, "dimension_4_簡潔性": 13.5 }, "average_total": 89.5 }, "chunk_evaluation": { "overall_unfiltered": { "avg_precision": 0.60, "avg_recall": 1.00, "avg_f1": 0.75 }, "overall_filtered": { "avg_precision": 1.00, "avg_recall": 1.00, "avg_f1": 1.00 } }, "response_evaluation": { "overall_completeness": 0.85, "overall_factual_accuracy": 0.92 } } ``` ### 4. Manage Results & Evaluations ```bash # List all saved test results curl http://localhost:8000/api/v1/test/results?limit=10&offset=0 # Retrieve a specific result curl http://localhost:8000/api/v1/test/results/a1b2c3d4e5f6 # Delete a result curl -X DELETE http://localhost:8000/api/v1/test/results/a1b2c3d4e5f6 # List all evaluation results curl http://localhost:8000/api/v1/test/evaluations?limit=10 # Retrieve a specific evaluation curl http://localhost:8000/api/v1/test/evaluations/eval-abc123 # Delete an evaluation curl -X DELETE http://localhost:8000/api/v1/test/evaluations/eval-abc123 ``` ### Full Workflow Example ```bash # 1. Generate a test result RESULT=$(curl -s http://localhost:8000/api/v1/test/generate/text \ -H "Content-Type: application/json" \ -d '{"question": "立法會討論咗咩房屋政策?", "profile": "A", "label": "housing policy test"}') RESULT_ID=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['result_id'])") echo "Generated result: $RESULT_ID" # 2. Evaluate that result curl -s http://localhost:8000/api/v1/test/evaluate \ -H "Content-Type: application/json" \ -d "{ \"result_id\": \"$RESULT_ID\", \"evaluation_config\": { \"key_questions_evaluators\": [ {\"model_name\": \"deepseek-v4-pro\", \"base_url\": \"https://api.deepseek.com\", \"api_key_env\": \"DP_API_KEY\", \"enable_thinking\": true}, {\"model_name\": \"qwen3-7b-max\", \"base_url\": \"https://dashscope.aliyuncs.com/compatible-mode/v1\", \"api_key_env\": \"DASHSCOPE_API_KEY\", \"enable_thinking\": true} ], \"chunk_evaluator\": {\"model_name\": \"qwen/qwen3.6-35b-a3b\", \"base_url\": \"https://openrouter.ai/api/v1\", \"api_key_env\": \"LLM_API_KEY\", \"enable_thinking\": true}, \"response_evaluator\": {\"model_name\": \"qwen/qwen3.6-35b-a3b\", \"base_url\": \"https://openrouter.ai/api/v1\", \"api_key_env\": \"LLM_API_KEY\", \"enable_thinking\": true} } }" | python3 -m json.tool # 3. Check results curl -s http://localhost:8000/api/v1/test/results?limit=5 | python3 -m json.tool curl -s http://localhost:8000/api/v1/test/evaluations?limit=5 | python3 -m json.tool ``` ### Key Questions Marking Scheme (4 Dimensions) | 維度 | 權重 | 滿分 | |------|------|------| | 1. 準確性 (Fidelity) | 40分 | 完全忠於原意,數字/關鍵詞無誤 | | 2. 完整性 (Completeness) | 25分 | 涵蓋所有關鍵元素(問題+背景+目的) | | 3. 清晰度 (Clarity) | 20分 | 語言精準、邏輯清楚、易讀易懂 | | 4. 簡潔性 (Conciseness) | 15分 | 最少字數表達最完整意思 | ### Requirements for Evaluation - All evaluation prompts, marking schemes, and LLM interactions are in **Chinese** - Both key questions evaluator models must succeed (3 retries each) — no partial scores - Chunk evaluation processes ALL chunks from ALL documents in batches of 10 - Thinking mode (`enable_thinking: true`) is enabled on all evaluation models - Stored results and evaluations are not auto-deleted — manage via DELETE endpoints ## Notes - PDF upload limit: 300MB - Video upload limit: 300MB (same as PDF) - ffmpeg required on server (for video transcription) - DashScope ASR supports Cantonese (yue), Mandarin (zh), English (en), auto-detect - Desktop only (not mobile-optimized) - No authentication (public demo) - All LLM calls routed through configurable base URL