Go to file
Woody 3e1f053f73 docs: update plan status to implemented and add Package 9 API examples to README 2026-05-25 20:27:24 +08:00
.examples feat: Phase 2.1 config + infrastructure and 2.2 video upload backend 2026-05-06 13:08:19 +08:00
.plans docs: update plan status to implemented and add Package 9 API examples to README 2026-05-25 20:27:24 +08:00
backend feat: add Sub-Phase 9.3 evaluation API endpoint and 9.4 polish 2026-05-25 19:30:17 +08:00
frontend fix: use relative /api/v1 fallback instead of hardcoded localhost:8000 2026-05-18 17:27:28 +08:00
.env.txt init: project setup with AGENTS.md, test structure, and plan directory 2026-04-22 15:22:29 +08:00
.gitignore chore: update .gitignore and add accuracy testing enhancement plan 2026-05-25 18:14:55 +08:00
AGENTS.md docs: sync plan files with actual implementation — Phase 4 complete 2026-05-15 10:00:45 +08:00
Dockerfile feat: HTTPS support with nginx reverse proxy 2026-05-18 14:47:22 +08:00
README.md docs: update plan status to implemented and add Package 9 API examples to README 2026-05-25 20:27:24 +08:00
development_plan.md docs: sync plan files with actual implementation — Phase 4 complete 2026-05-15 10:00:45 +08:00
docker-compose.yml feat: HTTPS support with nginx reverse proxy 2026-05-18 14:47:22 +08:00
nginx.conf feat: HTTPS support with nginx reverse proxy 2026-05-18 14:47:22 +08:00
package-lock.json chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update 2026-05-14 20:26:17 +08:00
package.json chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update 2026-05-14 20:26:17 +08:00
start.sh feat: HTTPS support with nginx reverse proxy 2026-05-18 14:47:22 +08:00

README.md

LegCo Reranker

RAG-powered document Q&A app with video ASR. Upload PDFs, upload videos with Cantonese ASR transcription, ask questions, get bullet-point answers with citations.

Quick Start (Dev)

# Backend
cd backend
cp .env.example .env    # edit .env with your LLM API key AND DashScope API key (for video ASR)
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Frontend
cd frontend
pnpm install
pnpm run dev

Backend → http://localhost:8000 | Frontend → http://localhost:5173

Deploy with Docker

Prerequisites

  • Docker 24+ and Docker Compose v2
  • OpenRouter API key (or compatible LLM provider)
  • Alibaba Cloud DashScope API key (for video ASR transcription)

Setup

# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and model names

# 2. Build and start
docker compose up -d --build

# 3. Check health
curl http://localhost:8000/health

The app is served at http://localhost:8000 — both the API and the frontend UI.

Volumes

Volume Purpose
chroma_data ChromaDB vector store (persistent)
chunk_data Extracted PDF page files
sqlite_data Prompt templates and query history
uploads_data Uploaded video files (persistent)

Environment Variables

All configurable via backend/.env:

Variable Default Description
LLM_BASE_URL https://openrouter.ai/api/v1 LLM API endpoint
LLM_API_KEY API key for LLM provider
LLM_MODEL_NAME qwen/qwen3.5-35b-a3b Chat model
LLM_TIMEOUT 60.0 LLM request timeout in seconds
LLM_ENABLE_THINKING false Enable LLM thinking/reasoning tokens
VLLM_ENGINE false Use vLLM-format extra_body instead of OpenRouter
EMBEDDING_MODEL qwen/qwen3-embedding-4b Embedding model
EMBEDDING_BASE_URL https://openrouter.ai/api/v1 Embedding API endpoint
EMBEDDING_API_KEY API key for embeddings (falls back to LLM_API_KEY)
CHROMA_DB_PATH ./chroma_db ChromaDB persistent storage
CHUNK_SIZE 1000 Token chunk size
CHUNK_OVERLAP 200 Token chunk overlap
RETRIEVAL_N_RESULTS 10 Chunks per sub-question
RELEVANCE_THRESHOLD 7.0 Min relevance score (0-10)
PROMPTS_DB_PATH ./data/prompts.db Prompt templates SQLite
HISTORY_DB_PATH ./data/history.db Query history SQLite
CORS_ORIGINS ["http://localhost:5173","http://localhost:3000"] Allowed CORS origins
DASHSCOPE_API_KEY Alibaba Cloud DashScope API key (for video ASR)
ASR_MODEL_NAME qwen3-asr-flash ASR model for batch transcription
ASR_REALTIME_MODEL_NAME qwen3-asr-flash-realtime ASR model for real-time streaming
VIDEO_UPLOAD_DIR ./uploads Video file storage directory
MAX_VIDEO_SIZE_MB 300 Maximum video upload size
SUPPORTED_VIDEO_FORMATS .mp4, .webm, .mov, .avi, .mkv Allowed video file extensions

Production: Nginx Reverse Proxy

# Include nginx.conf in your site config
# Key settings:
# - client_max_body_size 350M   (allow large PDF uploads)
# - proxy_read_timeout 300s     (LLM calls can take minutes)
# Install nginx
sudo apt install nginx

# Copy config
sudo cp nginx.conf /etc/nginx/sites-available/legco
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Stopping

docker compose down

Updating

git pull
docker compose up -d --build

Cross-Platform Build (aarch64 → amd64)

When building on an aarch64/ARM64 machine (Apple Silicon, ARM Windows WSL2, Raspberry Pi) for deployment to an x86_64/amd64 server:

1. Install buildx

# Download buildx for arm64
BUILDX_VERSION=$(wget -qO- https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | head -1 | cut -d'"' -f4)
wget "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-arm64" -O ~/.docker/cli-plugins/docker-buildx
chmod +x ~/.docker/cli-plugins/docker-buildx

2. Register QEMU for amd64 emulation

docker run --privileged --rm tonistiigi/binfmt --install all

3. Build for amd64

DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t legco_reranker:amd64 .

4. Export and transfer to server

# Save image to tar file
docker save legco_reranker:amd64 -o legco_reranker_amd64.tar

# Compress (~762MB → ~250MB)
gzip legco_reranker_amd64.tar

# Transfer to server
scp legco_reranker_amd64.tar.gz user@server:/path/

# On the x86_64 server:
gunzip legco_reranker_amd64.tar.gz
docker load -i legco_reranker_amd64.tar

# Run
docker run -d --name legco -p 80:80 -p 443:443 --env-file backend/.env \
  -v chroma_data:/app/chroma_db \
  -v chunk_data:/app/document_chunk \
  -v sqlite_data:/app/data \
  legco_reranker:amd64

5. Test run (local, port 8888)

Before transferring to the server, test the amd64 image locally. Pass all config inline (no --env-file):

docker run -d --name legco_test -p 8888:443 \
  -e LLM_BASE_URL=https://openrouter.ai/api/v1 \
  -e LLM_API_KEY=your_key_here \
  -e LLM_MODEL_NAME=qwen/qwen3.6-35b-a3b \
  -e LLM_TIMEOUT=60.0 \
  -e LLM_ENABLE_THINKING=false \
  -e VLLM_ENGINE=false \
  -e EMBEDDING_MODEL=qwen/qwen3-embedding-4b \
  -e EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 \
  -e EMBEDDING_API_KEY=your_key_here \
  -e CHROMA_DB_PATH=./chroma_db \
  -e CHUNK_SIZE=1000 \
  -e CHUNK_OVERLAP=200 \
  -e RETRIEVAL_N_RESULTS=10 \
  -e RELEVANCE_THRESHOLD=7.0 \
  -e PROMPTS_DB_PATH=./data/prompts.db \
  -e HISTORY_DB_PATH=./data/history.db \
  -e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
  -e DASHSCOPE_API_KEY=your_dashscope_key \
  -e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
  -e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
  -e VIDEO_UPLOAD_DIR=./uploads \
  -e MAX_VIDEO_SIZE_MB=300 \
  -v ~/woody/legco/data/chroma_db:/app/chroma_db \
  -v ~/woody/legco/data/document_chunk:/app/document_chunk \
  -v ~/woody/legco/data/data:/app/data \
  legco_reranker:amd64.01.02

# Verify (accept self-signed cert with -k)
curl -k https://localhost:8888/health

# Clean up
docker rm -f legco_test

Architecture

User → Nginx (80) → Uvicorn (8000)
                         ├── FastAPI API (/api/v1/*)
                         └── Static Frontend (/*)
                              └── React 18 + Vite + Tailwind

RAG Pipeline (Per-Sub-Question)

User Question
  → [LLM] Decompose into 2-5 sub-questions
  → [ChromaDB] Retrieve 10 chunks per sub-question
  → [LLM] Score all chunks against their own sub-question (single call)
  → [LLM] Generate markdown response per sub-question
  → SSE stream with per-sub-question sources

Video Q&A (Phase 2)

Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline

Streaming Mode (real-time):

  • Upload video → press play → transcript flows into QueryInput in real time
  • Audio captured from video element (no microphone needed)
  • Auto-starts on play, stops on pause/end

Full Transcript Mode (batch):

  • Click "Full Transcript" button under video player
  • Server extracts audio via ffmpeg → Full DashScope transcription
  • Complete transcript fills QueryInput

Requirements:

  • DASHSCOPE_API_KEY in .env
  • ffmpeg on server (for batch transcription)
  • dashscope Python package (in requirements.txt)

System Audio Capture & Listen Mic (Phase 4)

Two additional live audio sources alongside video Upload:

System Audio Capture

Captures audio output from any application on your computer (browser tab, Spotify, Zoom) via getDisplayMedia().

How to use:

  1. Select the "System Audio" tab in the LTTPage source selector
  2. Click "Start Capture"
  3. Choose a browser tab or window in the permission dialog — make sure "Share audio" is checked
  4. Real-time Cantonese ASR transcription flows into the QueryInput
  5. Edit the transcript while capturing continues, then submit your query

Use cases: Transcribing YouTube videos, podcasts, lectures, or meetings playing on your computer without downloading files.

Listen Mic

Captures microphone input via getUserMedia().

How to use:

  1. Select the "Listen Mic" tab
  2. Click "Start Listening"
  3. Allow microphone access when prompted
  4. Speak — real-time transcription flows into QueryInput
  5. Edit transcript while listening, then submit your query

Use cases: Recording live meetings, dictating questions verbally, transcribing spoken Cantonese in real time.

Browser Compatibility

System Audio (getDisplayMedia):

Platform / Browser Tab Audio System Audio Supported
Chrome/Edge (Windows) Full support
Chrome/Edge (macOS 14.2+) Full support
Chrome/Edge (Linux) Tab audio only
Firefox Not supported
Safari Not supported

Listen Mic (getUserMedia): Universally supported in all modern browsers (Chrome, Firefox, Safari, Edge).

Limitations

  • System Audio capture requires Chrome or Edge (Chromium-based browsers)
  • No "Full Transcript" button — streaming ASR only (no batch transcription for live sources)
  • getDisplayMedia() always shows a screen/tab picker even for audio-only capture (browser limitation)
  • Each capture session generates a new UUID; the WebSocket reconnects on every Start/Stop

Configuration

# In backend/.env — feature toggles (default: true)
SYSTEM_AUDIO_ENABLED=true
MIC_ENABLED=true

Installing ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Static build (no root, any Linux)
mkdir -p ~/.local/bin
wget -qO- https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz | tar -xJ -C /tmp
cp /tmp/ffmpeg-*-static/ffmpeg ~/.local/bin/

Accuracy Testing API (Package 9)

Backend endpoints for generating test results and evaluating RAG pipeline accuracy. Designed for programmatic use — call via curl, Python requests, or any HTTP client.

All endpoints are at /api/v1/test/* and accessible through the same domain as the frontend (nginx proxies all paths to FastAPI).

1. Generate Test Result (Text)

Run the full RAG pipeline on a text question and capture every intermediate stage.

# Basic text generation
curl http://localhost:8000/api/v1/test/generate/text \
  -H "Content-Type: application/json" \
  -d '{
    "question": "立法會今日討論咗咩議題?",
    "profile": "A",
    "label": "Test run 2026-05-25"
  }'

# Response includes:
#   - extracted_key_questions: decomposed sub-questions
#   - retrieval: per-sub-question chunks with metadata and distance scores
#   - filtered: chunks after relevance filter (with relevance_score)
#   - response: final answer with source citations
#   - timing: per-stage timing in ms

Response (partial):

{
  "result_id": "a1b2c3d4e5f6",
  "input_type": "text",
  "profile": "A",
  "label": "Test run 2026-05-25",
  "input": { "text": "立法會今日討論咗咩議題?" },
  "extracted_key_questions": [
    "立法會:今日討論的主要議題",
    "會議記錄:近期立法會會議的討論內容"
  ],
  "retrieval": {
    "total_chunks_retrieved": 20,
    "retriever_time_ms": 456
  },
  "filtered": {
    "total_chunks_filtered": 14,
    "filter_time_ms": 789
  },
  "response": {
    "final_answer": "## Sub-question 0: ...\n\n- 今日立法會討論了三項主要議題... [meeting_minutes.pdf, page 1]",
    "generate_time_ms": 1011
  },
  "timing": {
    "decomposer_time_ms": 234,
    "total_time_ms": 2490
  }
}

2. Generate Test Result (Audio)

Transcribe audio via ASR, then run the RAG pipeline on the transcribed text. Optionally provide a reference transcript for later CER/WER evaluation.

# Audio generation with reference transcript (for later CER/WER scoring)
curl http://localhost:8000/api/v1/test/generate/audio \
  -F "audio_file=@legco_clip.wav" \
  -F "profile=A" \
  -F "reference_transcript=立法會今日討論咗咩議題?" \
  -F "language=yue" \
  -F "label=Cantonese LegCo audio test"

# Without reference transcript (CER/WER will return N/A in evaluation)
curl http://localhost:8000/api/v1/test/generate/audio \
  -F "audio_file=@meeting.mp3" \
  -F "profile=B" \
  -F "language=yue"

Compared to the text endpoint, the audio result includes extra fields:

{
  "input_type": "audio",
  "input": {
    "text": "立法會今日討論咗咩議題?",
    "reference_transcript": "立法會今日討論咗咩議題?",
    "audio_filename": "legco_clip.wav",
    "audio_duration_seconds": 45.2,
    "asr_language": "yue"
  },
  "timing": {
    "asr_time_ms": 1234,
    "total_time_ms": 3724
  }
}

3. Evaluate Test Result

Run all four evaluation dimensions on a previously generated result:

  • (i) Audio transcription accuracy — CER/WER (only for audio inputs with reference transcript)
  • (ii) Key questions quality — Two evaluator LLMs score against 4-dimension rubric, scores averaged
  • (iii) Chunk accuracy — LLM determines ground truth chunks, computes precision/recall/F1
  • (iv) Response completeness — Generate ideal response from ground truth chunks, compare
# Evaluate a previously saved result
curl http://localhost:8000/api/v1/test/evaluate \
  -H "Content-Type: application/json" \
  -d '{
    "result_id": "a1b2c3d4e5f6",
    "evaluation_config": {
      "key_questions_evaluators": [
        {
          "model_name": "deepseek-v4-pro",
          "base_url": "https://api.deepseek.com",
          "api_key_env": "DP_API_KEY",
          "enable_thinking": true
        },
        {
          "model_name": "qwen3-7b-max",
          "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
          "api_key_env": "DASHSCOPE_API_KEY",
          "enable_thinking": true
        }
      ],
      "chunk_evaluator": {
        "model_name": "qwen/qwen3.6-35b-a3b",
        "base_url": "https://openrouter.ai/api/v1",
        "api_key_env": "LLM_API_KEY",
        "enable_thinking": true
      },
      "response_evaluator": {
        "model_name": "qwen/qwen3.6-35b-a3b",
        "base_url": "https://openrouter.ai/api/v1",
        "api_key_env": "LLM_API_KEY",
        "enable_thinking": true
      }
    }
  }'

Response (partial — shows scoring structure):

{
  "evaluation_id": "eval-abc123",
  "result_id": "a1b2c3d4e5f6",
  "status": "completed",
  "audio_evaluation": {
    "status": "completed",
    "cer": 0.052,
    "wer": 0.083
  },
  "key_questions_evaluation": {
    "average_scores": {
      "dimension_1_準確性": 36.0,
      "dimension_2_完整性": 22.5,
      "dimension_3_清晰度": 17.5,
      "dimension_4_簡潔性": 13.5
    },
    "average_total": 89.5
  },
  "chunk_evaluation": {
    "overall_unfiltered": { "avg_precision": 0.60, "avg_recall": 1.00, "avg_f1": 0.75 },
    "overall_filtered": { "avg_precision": 1.00, "avg_recall": 1.00, "avg_f1": 1.00 }
  },
  "response_evaluation": {
    "overall_completeness": 0.85,
    "overall_factual_accuracy": 0.92
  }
}

4. Manage Results & Evaluations

# List all saved test results
curl http://localhost:8000/api/v1/test/results?limit=10&offset=0

# Retrieve a specific result
curl http://localhost:8000/api/v1/test/results/a1b2c3d4e5f6

# Delete a result
curl -X DELETE http://localhost:8000/api/v1/test/results/a1b2c3d4e5f6

# List all evaluation results
curl http://localhost:8000/api/v1/test/evaluations?limit=10

# Retrieve a specific evaluation
curl http://localhost:8000/api/v1/test/evaluations/eval-abc123

# Delete an evaluation
curl -X DELETE http://localhost:8000/api/v1/test/evaluations/eval-abc123

Full Workflow Example

# 1. Generate a test result
RESULT=$(curl -s http://localhost:8000/api/v1/test/generate/text \
  -H "Content-Type: application/json" \
  -d '{"question": "立法會討論咗咩房屋政策?", "profile": "A", "label": "housing policy test"}')
RESULT_ID=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['result_id'])")
echo "Generated result: $RESULT_ID"

# 2. Evaluate that result
curl -s http://localhost:8000/api/v1/test/evaluate \
  -H "Content-Type: application/json" \
  -d "{
    \"result_id\": \"$RESULT_ID\",
    \"evaluation_config\": {
      \"key_questions_evaluators\": [
        {\"model_name\": \"deepseek-v4-pro\", \"base_url\": \"https://api.deepseek.com\", \"api_key_env\": \"DP_API_KEY\", \"enable_thinking\": true},
        {\"model_name\": \"qwen3-7b-max\", \"base_url\": \"https://dashscope.aliyuncs.com/compatible-mode/v1\", \"api_key_env\": \"DASHSCOPE_API_KEY\", \"enable_thinking\": true}
      ],
      \"chunk_evaluator\": {\"model_name\": \"qwen/qwen3.6-35b-a3b\", \"base_url\": \"https://openrouter.ai/api/v1\", \"api_key_env\": \"LLM_API_KEY\", \"enable_thinking\": true},
      \"response_evaluator\": {\"model_name\": \"qwen/qwen3.6-35b-a3b\", \"base_url\": \"https://openrouter.ai/api/v1\", \"api_key_env\": \"LLM_API_KEY\", \"enable_thinking\": true}
    }
  }" | python3 -m json.tool

# 3. Check results
curl -s http://localhost:8000/api/v1/test/results?limit=5 | python3 -m json.tool
curl -s http://localhost:8000/api/v1/test/evaluations?limit=5 | python3 -m json.tool

Key Questions Marking Scheme (4 Dimensions)

維度 權重 滿分
1. 準確性 (Fidelity) 40分 完全忠於原意,數字/關鍵詞無誤
2. 完整性 (Completeness) 25分 涵蓋所有關鍵元素(問題+背景+目的)
3. 清晰度 (Clarity) 20分 語言精準、邏輯清楚、易讀易懂
4. 簡潔性 (Conciseness) 15分 最少字數表達最完整意思

Requirements for Evaluation

  • All evaluation prompts, marking schemes, and LLM interactions are in Chinese
  • Both key questions evaluator models must succeed (3 retries each) — no partial scores
  • Chunk evaluation processes ALL chunks from ALL documents in batches of 10
  • Thinking mode (enable_thinking: true) is enabled on all evaluation models
  • Stored results and evaluations are not auto-deleted — manage via DELETE endpoints

Notes

  • PDF upload limit: 300MB
  • Video upload limit: 300MB (same as PDF)
  • ffmpeg required on server (for video transcription)
  • DashScope ASR supports Cantonese (yue), Mandarin (zh), English (en), auto-detect
  • Desktop only (not mobile-optimized)
  • No authentication (public demo)
  • All LLM calls routed through configurable base URL