Go to file

Woody 5da74ec24c docs: add Phase 5 OpenRouter ASR implementation plan Complete implementation plan with architecture (Factory+Strategy pattern), provider comparison (DashScope vs OpenRouter), configuration, 7 implementation tasks, test plan, acceptance criteria, and implementation notes including decisions made (circular import resolution, separate API key, sync-to-async DashScope wrapper). Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>		2026-05-19 09:49:22 +08:00
.examples	feat: Phase 2.1 config + infrastructure and 2.2 video upload backend	2026-05-06 13:08:19 +08:00
.plans	docs: add Phase 5 OpenRouter ASR implementation plan	2026-05-19 09:49:22 +08:00
backend	test: update Phase 2 tests for ASR provider abstraction	2026-05-19 09:48:58 +08:00
frontend	fix: use relative /api/v1 fallback instead of hardcoded localhost:8000	2026-05-18 17:27:28 +08:00
.env.txt	init: project setup with AGENTS.md, test structure, and plan directory	2026-04-22 15:22:29 +08:00
.gitignore	chore: gitignore .research, switch to flash, tighten sub-questions	2026-05-04 16:38:58 +08:00
AGENTS.md	docs: sync plan files with actual implementation — Phase 4 complete	2026-05-15 10:00:45 +08:00
Dockerfile	feat: HTTPS support with nginx reverse proxy	2026-05-18 14:47:22 +08:00
README.md	feat: HTTPS support with nginx reverse proxy	2026-05-18 14:47:22 +08:00
development_plan.md	docs: sync plan files with actual implementation — Phase 4 complete	2026-05-15 10:00:45 +08:00
docker-compose.yml	feat: HTTPS support with nginx reverse proxy	2026-05-18 14:47:22 +08:00
nginx.conf	feat: HTTPS support with nginx reverse proxy	2026-05-18 14:47:22 +08:00
package-lock.json	chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update	2026-05-14 20:26:17 +08:00
package.json	chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update	2026-05-14 20:26:17 +08:00
start.sh	feat: HTTPS support with nginx reverse proxy	2026-05-18 14:47:22 +08:00

README.md

LegCo Reranker

RAG-powered document Q&A app with video ASR. Upload PDFs, upload videos with Cantonese ASR transcription, ask questions, get bullet-point answers with citations.

Quick Start (Dev)

# Backend
cd backend
cp .env.example .env    # edit .env with your LLM API key AND DashScope API key (for video ASR)
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Frontend
cd frontend
pnpm install
pnpm run dev

Backend → http://localhost:8000 | Frontend → http://localhost:5173

Deploy with Docker

Prerequisites

Docker 24+ and Docker Compose v2
OpenRouter API key (or compatible LLM provider)
Alibaba Cloud DashScope API key (for video ASR transcription)

Setup

# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and model names

# 2. Build and start
docker compose up -d --build

# 3. Check health
curl http://localhost:8000/health

The app is served at http://localhost:8000 — both the API and the frontend UI.

Volumes

Volume	Purpose
`chroma_data`	ChromaDB vector store (persistent)
`chunk_data`	Extracted PDF page files
`sqlite_data`	Prompt templates and query history
`uploads_data`	Uploaded video files (persistent)

Environment Variables

All configurable via backend/.env:

Variable	Default	Description
`LLM_BASE_URL`	`https://openrouter.ai/api/v1`	LLM API endpoint
`LLM_API_KEY`	—	API key for LLM provider
`LLM_MODEL_NAME`	`qwen/qwen3.5-35b-a3b`	Chat model
`LLM_TIMEOUT`	`60.0`	LLM request timeout in seconds
`LLM_ENABLE_THINKING`	`false`	Enable LLM thinking/reasoning tokens
`VLLM_ENGINE`	`false`	Use vLLM-format `extra_body` instead of OpenRouter
`EMBEDDING_MODEL`	`qwen/qwen3-embedding-4b`	Embedding model
`EMBEDDING_BASE_URL`	`https://openrouter.ai/api/v1`	Embedding API endpoint
`EMBEDDING_API_KEY`	—	API key for embeddings (falls back to `LLM_API_KEY`)
`CHROMA_DB_PATH`	`./chroma_db`	ChromaDB persistent storage
`CHUNK_SIZE`	`1000`	Token chunk size
`CHUNK_OVERLAP`	`200`	Token chunk overlap
`RETRIEVAL_N_RESULTS`	`10`	Chunks per sub-question
`RELEVANCE_THRESHOLD`	`7.0`	Min relevance score (0-10)
`PROMPTS_DB_PATH`	`./data/prompts.db`	Prompt templates SQLite
`HISTORY_DB_PATH`	`./data/history.db`	Query history SQLite
`CORS_ORIGINS`	`["http://localhost:5173","http://localhost:3000"]`	Allowed CORS origins
`DASHSCOPE_API_KEY`	—	Alibaba Cloud DashScope API key (for video ASR)
`ASR_MODEL_NAME`	`qwen3-asr-flash`	ASR model for batch transcription
`ASR_REALTIME_MODEL_NAME`	`qwen3-asr-flash-realtime`	ASR model for real-time streaming
`VIDEO_UPLOAD_DIR`	`./uploads`	Video file storage directory
`MAX_VIDEO_SIZE_MB`	`300`	Maximum video upload size
`SUPPORTED_VIDEO_FORMATS`	`.mp4, .webm, .mov, .avi, .mkv`	Allowed video file extensions

Production: Nginx Reverse Proxy

# Include nginx.conf in your site config
# Key settings:
# - client_max_body_size 350M   (allow large PDF uploads)
# - proxy_read_timeout 300s     (LLM calls can take minutes)

# Install nginx
sudo apt install nginx

# Copy config
sudo cp nginx.conf /etc/nginx/sites-available/legco
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Stopping

docker compose down

Updating

git pull
docker compose up -d --build

Cross-Platform Build (aarch64 → amd64)

When building on an aarch64/ARM64 machine (Apple Silicon, ARM Windows WSL2, Raspberry Pi) for deployment to an x86_64/amd64 server:

1. Install buildx

# Download buildx for arm64
BUILDX_VERSION=$(wget -qO- https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | head -1 | cut -d'"' -f4)
wget "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-arm64" -O ~/.docker/cli-plugins/docker-buildx
chmod +x ~/.docker/cli-plugins/docker-buildx

2. Register QEMU for amd64 emulation

docker run --privileged --rm tonistiigi/binfmt --install all

3. Build for amd64

DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t legco_reranker:amd64 .

4. Export and transfer to server

# Save image to tar file
docker save legco_reranker:amd64 -o legco_reranker_amd64.tar

# Compress (~762MB → ~250MB)
gzip legco_reranker_amd64.tar

# Transfer to server
scp legco_reranker_amd64.tar.gz user@server:/path/

# On the x86_64 server:
gunzip legco_reranker_amd64.tar.gz
docker load -i legco_reranker_amd64.tar

# Run
docker run -d --name legco -p 80:80 -p 443:443 --env-file backend/.env \
  -v chroma_data:/app/chroma_db \
  -v chunk_data:/app/document_chunk \
  -v sqlite_data:/app/data \
  legco_reranker:amd64

5. Test run (local, port 8888)

Before transferring to the server, test the amd64 image locally. Pass all config inline (no --env-file):

docker run -d --name legco_test -p 8888:443 \
  -e LLM_BASE_URL=https://openrouter.ai/api/v1 \
  -e LLM_API_KEY=your_key_here \
  -e LLM_MODEL_NAME=qwen/qwen3.6-35b-a3b \
  -e LLM_TIMEOUT=60.0 \
  -e LLM_ENABLE_THINKING=false \
  -e VLLM_ENGINE=false \
  -e EMBEDDING_MODEL=qwen/qwen3-embedding-4b \
  -e EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 \
  -e EMBEDDING_API_KEY=your_key_here \
  -e CHROMA_DB_PATH=./chroma_db \
  -e CHUNK_SIZE=1000 \
  -e CHUNK_OVERLAP=200 \
  -e RETRIEVAL_N_RESULTS=10 \
  -e RELEVANCE_THRESHOLD=7.0 \
  -e PROMPTS_DB_PATH=./data/prompts.db \
  -e HISTORY_DB_PATH=./data/history.db \
  -e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
  -e DASHSCOPE_API_KEY=your_dashscope_key \
  -e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
  -e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
  -e VIDEO_UPLOAD_DIR=./uploads \
  -e MAX_VIDEO_SIZE_MB=300 \
  -v ~/woody/legco/data/chroma_db:/app/chroma_db \
  -v ~/woody/legco/data/document_chunk:/app/document_chunk \
  -v ~/woody/legco/data/data:/app/data \
  legco_reranker:amd64.01.02

# Verify (accept self-signed cert with -k)
curl -k https://localhost:8888/health

# Clean up
docker rm -f legco_test

Architecture

User → Nginx (80) → Uvicorn (8000)
                         ├── FastAPI API (/api/v1/*)
                         └── Static Frontend (/*)
                              └── React 18 + Vite + Tailwind

RAG Pipeline (Per-Sub-Question)

User Question
  → [LLM] Decompose into 2-5 sub-questions
  → [ChromaDB] Retrieve 10 chunks per sub-question
  → [LLM] Score all chunks against their own sub-question (single call)
  → [LLM] Generate markdown response per sub-question
  → SSE stream with per-sub-question sources

Video Q&A (Phase 2)

Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline

Streaming Mode (real-time):

Upload video → press play → transcript flows into QueryInput in real time
Audio captured from video element (no microphone needed)
Auto-starts on play, stops on pause/end

Full Transcript Mode (batch):

Click "Full Transcript" button under video player
Server extracts audio via ffmpeg → Full DashScope transcription
Complete transcript fills QueryInput

Requirements:

DASHSCOPE_API_KEY in .env
ffmpeg on server (for batch transcription)
dashscope Python package (in requirements.txt)

System Audio Capture & Listen Mic (Phase 4)

Two additional live audio sources alongside video Upload:

System Audio Capture

Captures audio output from any application on your computer (browser tab, Spotify, Zoom) via getDisplayMedia().

How to use:

Select the "System Audio" tab in the LTTPage source selector
Click "Start Capture"
Choose a browser tab or window in the permission dialog — make sure "Share audio" is checked
Real-time Cantonese ASR transcription flows into the QueryInput
Edit the transcript while capturing continues, then submit your query

Use cases: Transcribing YouTube videos, podcasts, lectures, or meetings playing on your computer without downloading files.

Listen Mic

Captures microphone input via getUserMedia().

How to use:

Select the "Listen Mic" tab
Click "Start Listening"
Allow microphone access when prompted
Speak — real-time transcription flows into QueryInput
Edit transcript while listening, then submit your query

Use cases: Recording live meetings, dictating questions verbally, transcribing spoken Cantonese in real time.

Browser Compatibility

System Audio (getDisplayMedia):

Platform / Browser	Tab Audio	System Audio	Supported
Chrome/Edge (Windows)	✅	✅	Full support
Chrome/Edge (macOS 14.2+)	✅	✅	Full support
Chrome/Edge (Linux)	✅	❌	Tab audio only
Firefox	❌	❌	Not supported
Safari	❌	❌	Not supported

Listen Mic (getUserMedia): Universally supported in all modern browsers (Chrome, Firefox, Safari, Edge).

Limitations

System Audio capture requires Chrome or Edge (Chromium-based browsers)
No "Full Transcript" button — streaming ASR only (no batch transcription for live sources)
getDisplayMedia() always shows a screen/tab picker even for audio-only capture (browser limitation)
Each capture session generates a new UUID; the WebSocket reconnects on every Start/Stop

Configuration

# In backend/.env — feature toggles (default: true)
SYSTEM_AUDIO_ENABLED=true
MIC_ENABLED=true

Installing ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Static build (no root, any Linux)
mkdir -p ~/.local/bin
wget -qO- https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz | tar -xJ -C /tmp
cp /tmp/ffmpeg-*-static/ffmpeg ~/.local/bin/

Notes

PDF upload limit: 300MB
Video upload limit: 300MB (same as PDF)
ffmpeg required on server (for video transcription)
DashScope ASR supports Cantonese (yue), Mandarin (zh), English (en), auto-detect
Desktop only (not mobile-optimized)
No authentication (public demo)
All LLM calls routed through configurable base URL