Go to file
Woody 7c03137577 fix: mic transcript disappearing after stop
useMediaStreamASR cleanup() cleared partialTranscript on stop,
causing live ASR text to vanish from QueryInput. Unlike video
ASR (which has onFinalTranscript to persist via queryText),
mic and system-audio hooks rely on partialTranscript for
display. Keep partialTranscript populated with the final
transcript instead of clearing it.
2026-05-14 23:19:11 +08:00
.examples feat: Phase 2.1 config + infrastructure and 2.2 video upload backend 2026-05-06 13:08:19 +08:00
.plans feat: Phase 4 — System Audio & Listen Mic capture into ASR → RAG 2026-05-14 22:55:06 +08:00
backend feat: Phase 4 — System Audio & Listen Mic capture into ASR → RAG 2026-05-14 22:55:06 +08:00
frontend fix: mic transcript disappearing after stop 2026-05-14 23:19:11 +08:00
.env.txt init: project setup with AGENTS.md, test structure, and plan directory 2026-04-22 15:22:29 +08:00
.gitignore chore: gitignore .research, switch to flash, tighten sub-questions 2026-05-04 16:38:58 +08:00
AGENTS.md docs: use pnpm instead of npm in dev commands 2026-05-14 20:22:33 +08:00
Dockerfile fix: add ffmpeg, uploads volume to Docker deployment for Phase 2 2026-05-07 11:32:09 +08:00
README.md docs: use pnpm instead of npm in dev commands 2026-05-14 20:22:33 +08:00
development_plan.md chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update 2026-05-14 20:26:17 +08:00
docker-compose.yml fix: add ffmpeg, uploads volume to Docker deployment for Phase 2 2026-05-07 11:32:09 +08:00
nginx.conf feat(deploy): add Dockerfile, compose, nginx config, and README 2026-04-27 17:17:53 +08:00
package-lock.json chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update 2026-05-14 20:26:17 +08:00
package.json chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update 2026-05-14 20:26:17 +08:00

README.md

LegCo Reranker

RAG-powered document Q&A app with video ASR. Upload PDFs, upload videos with Cantonese ASR transcription, ask questions, get bullet-point answers with citations.

Quick Start (Dev)

# Backend
cd backend
cp .env.example .env    # edit .env with your LLM API key AND DashScope API key (for video ASR)
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Frontend
cd frontend
pnpm install
pnpm run dev

Backend → http://localhost:8000 | Frontend → http://localhost:5173

Deploy with Docker

Prerequisites

  • Docker 24+ and Docker Compose v2
  • OpenRouter API key (or compatible LLM provider)
  • Alibaba Cloud DashScope API key (for video ASR transcription)

Setup

# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and model names

# 2. Build and start
docker compose up -d --build

# 3. Check health
curl http://localhost:8000/health

The app is served at http://localhost:8000 — both the API and the frontend UI.

Volumes

Volume Purpose
chroma_data ChromaDB vector store (persistent)
chunk_data Extracted PDF page files
sqlite_data Prompt templates and query history
uploads_data Uploaded video files (persistent)

Environment Variables

All configurable via backend/.env:

Variable Default Description
LLM_BASE_URL https://openrouter.ai/api/v1 LLM API endpoint
LLM_API_KEY API key for LLM provider
LLM_MODEL_NAME qwen/qwen3.5-35b-a3b Chat model
LLM_TIMEOUT 60.0 LLM request timeout in seconds
LLM_ENABLE_THINKING false Enable LLM thinking/reasoning tokens
VLLM_ENGINE false Use vLLM-format extra_body instead of OpenRouter
EMBEDDING_MODEL qwen/qwen3-embedding-4b Embedding model
EMBEDDING_BASE_URL https://openrouter.ai/api/v1 Embedding API endpoint
EMBEDDING_API_KEY API key for embeddings (falls back to LLM_API_KEY)
CHROMA_DB_PATH ./chroma_db ChromaDB persistent storage
CHUNK_SIZE 1000 Token chunk size
CHUNK_OVERLAP 200 Token chunk overlap
RETRIEVAL_N_RESULTS 10 Chunks per sub-question
RELEVANCE_THRESHOLD 7.0 Min relevance score (0-10)
PROMPTS_DB_PATH ./data/prompts.db Prompt templates SQLite
HISTORY_DB_PATH ./data/history.db Query history SQLite
CORS_ORIGINS ["http://localhost:5173","http://localhost:3000"] Allowed CORS origins
DASHSCOPE_API_KEY Alibaba Cloud DashScope API key (for video ASR)
ASR_MODEL_NAME qwen3-asr-flash ASR model for batch transcription
ASR_REALTIME_MODEL_NAME qwen3-asr-flash-realtime ASR model for real-time streaming
VIDEO_UPLOAD_DIR ./uploads Video file storage directory
MAX_VIDEO_SIZE_MB 300 Maximum video upload size
SUPPORTED_VIDEO_FORMATS .mp4, .webm, .mov, .avi, .mkv Allowed video file extensions

Production: Nginx Reverse Proxy

# Include nginx.conf in your site config
# Key settings:
# - client_max_body_size 350M   (allow large PDF uploads)
# - proxy_read_timeout 300s     (LLM calls can take minutes)
# Install nginx
sudo apt install nginx

# Copy config
sudo cp nginx.conf /etc/nginx/sites-available/legco
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Stopping

docker compose down

Updating

git pull
docker compose up -d --build

Cross-Platform Build (aarch64 → amd64)

When building on an aarch64/ARM64 machine (Apple Silicon, ARM Windows WSL2, Raspberry Pi) for deployment to an x86_64/amd64 server:

1. Install buildx

# Download buildx for arm64
BUILDX_VERSION=$(wget -qO- https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | head -1 | cut -d'"' -f4)
wget "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-arm64" -O ~/.docker/cli-plugins/docker-buildx
chmod +x ~/.docker/cli-plugins/docker-buildx

2. Register QEMU for amd64 emulation

docker run --privileged --rm tonistiigi/binfmt --install all

3. Build for amd64

DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t legco_reranker:amd64 .

4. Export and transfer to server

# Save image to tar file
docker save legco_reranker:amd64 -o legco_reranker_amd64.tar

# Compress (~762MB → ~250MB)
gzip legco_reranker_amd64.tar

# Transfer to server
scp legco_reranker_amd64.tar.gz user@server:/path/

# On the x86_64 server:
gunzip legco_reranker_amd64.tar.gz
docker load -i legco_reranker_amd64.tar

# Run
docker run -d --name legco -p 80:8000 --env-file backend/.env \
  -v chroma_data:/app/chroma_db \
  -v chunk_data:/app/document_chunk \
  -v sqlite_data:/app/data \
  legco_reranker:amd64

5. Test run (local, port 8888)

Before transferring to the server, test the amd64 image locally. Pass all config inline (no --env-file):

docker run -d --name legco_test -p 8888:8000 \
  -e LLM_BASE_URL=https://openrouter.ai/api/v1 \
  -e LLM_API_KEY=your_key_here \
  -e LLM_MODEL_NAME=qwen/qwen3.6-35b-a3b \
  -e LLM_TIMEOUT=60.0 \
  -e LLM_ENABLE_THINKING=false \
  -e VLLM_ENGINE=false \
  -e EMBEDDING_MODEL=qwen/qwen3-embedding-4b \
  -e EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 \
  -e EMBEDDING_API_KEY=your_key_here \
  -e CHROMA_DB_PATH=./chroma_db \
  -e CHUNK_SIZE=1000 \
  -e CHUNK_OVERLAP=200 \
  -e RETRIEVAL_N_RESULTS=10 \
  -e RELEVANCE_THRESHOLD=7.0 \
  -e PROMPTS_DB_PATH=./data/prompts.db \
  -e HISTORY_DB_PATH=./data/history.db \
  -e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
  -e DASHSCOPE_API_KEY=your_dashscope_key \
  -e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
  -e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
  -e VIDEO_UPLOAD_DIR=./uploads \
  -e MAX_VIDEO_SIZE_MB=300 \
  -v ~/woody/legco/data/chroma_db:/app/chroma_db \
  -v ~/woody/legco/data/document_chunk:/app/document_chunk \
  -v ~/woody/legco/data/data:/app/data \
  legco_reranker:amd64.01.02

# Verify
curl http://localhost:8888/health

# Clean up
docker rm -f legco_test

Architecture

User → Nginx (80) → Uvicorn (8000)
                         ├── FastAPI API (/api/v1/*)
                         └── Static Frontend (/*)
                              └── React 18 + Vite + Tailwind

RAG Pipeline (Per-Sub-Question)

User Question
  → [LLM] Decompose into 2-5 sub-questions
  → [ChromaDB] Retrieve 10 chunks per sub-question
  → [LLM] Score all chunks against their own sub-question (single call)
  → [LLM] Generate markdown response per sub-question
  → SSE stream with per-sub-question sources

Video Q&A (Phase 2)

Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline

Streaming Mode (real-time):

  • Upload video → press play → transcript flows into QueryInput in real time
  • Audio captured from video element (no microphone needed)
  • Auto-starts on play, stops on pause/end

Full Transcript Mode (batch):

  • Click "Full Transcript" button under video player
  • Server extracts audio via ffmpeg → Full DashScope transcription
  • Complete transcript fills QueryInput

Requirements:

  • DASHSCOPE_API_KEY in .env
  • ffmpeg on server (for batch transcription)
  • dashscope Python package (in requirements.txt)

Installing ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Static build (no root, any Linux)
mkdir -p ~/.local/bin
wget -qO- https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz | tar -xJ -C /tmp
cp /tmp/ffmpeg-*-static/ffmpeg ~/.local/bin/

Notes

  • PDF upload limit: 300MB
  • Video upload limit: 300MB (same as PDF)
  • ffmpeg required on server (for video transcription)
  • DashScope ASR supports Cantonese (yue), Mandarin (zh), English (en), auto-detect
  • Desktop only (not mobile-optimized)
  • No authentication (public demo)
  • All LLM calls routed through configurable base URL