Go to file

Woody 7c03137577 fix: mic transcript disappearing after stop useMediaStreamASR cleanup() cleared partialTranscript on stop, causing live ASR text to vanish from QueryInput. Unlike video ASR (which has onFinalTranscript to persist via queryText), mic and system-audio hooks rely on partialTranscript for display. Keep partialTranscript populated with the final transcript instead of clearing it.		2026-05-14 23:19:11 +08:00
.examples	feat: Phase 2.1 config + infrastructure and 2.2 video upload backend	2026-05-06 13:08:19 +08:00
.plans	feat: Phase 4 — System Audio & Listen Mic capture into ASR → RAG	2026-05-14 22:55:06 +08:00
backend	feat: Phase 4 — System Audio & Listen Mic capture into ASR → RAG	2026-05-14 22:55:06 +08:00
frontend	fix: mic transcript disappearing after stop	2026-05-14 23:19:11 +08:00
.env.txt	init: project setup with AGENTS.md, test structure, and plan directory	2026-04-22 15:22:29 +08:00
.gitignore	chore: gitignore .research, switch to flash, tighten sub-questions	2026-05-04 16:38:58 +08:00
AGENTS.md	docs: use pnpm instead of npm in dev commands	2026-05-14 20:22:33 +08:00
Dockerfile	fix: add ffmpeg, uploads volume to Docker deployment for Phase 2	2026-05-07 11:32:09 +08:00
README.md	docs: use pnpm instead of npm in dev commands	2026-05-14 20:22:33 +08:00
development_plan.md	chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update	2026-05-14 20:26:17 +08:00
docker-compose.yml	fix: add ffmpeg, uploads volume to Docker deployment for Phase 2	2026-05-07 11:32:09 +08:00
nginx.conf	feat(deploy): add Dockerfile, compose, nginx config, and README	2026-04-27 17:17:53 +08:00
package-lock.json	chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update	2026-05-14 20:26:17 +08:00
package.json	chore: add pnpm lockfiles, Phase 4 plan, and dev plan status update	2026-05-14 20:26:17 +08:00

README.md

LegCo Reranker

RAG-powered document Q&A app with video ASR. Upload PDFs, upload videos with Cantonese ASR transcription, ask questions, get bullet-point answers with citations.

Quick Start (Dev)

# Backend
cd backend
cp .env.example .env    # edit .env with your LLM API key AND DashScope API key (for video ASR)
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Frontend
cd frontend
pnpm install
pnpm run dev

Backend → http://localhost:8000 | Frontend → http://localhost:5173

Deploy with Docker

Prerequisites

Docker 24+ and Docker Compose v2
OpenRouter API key (or compatible LLM provider)
Alibaba Cloud DashScope API key (for video ASR transcription)

Setup

# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your API keys and model names

# 2. Build and start
docker compose up -d --build

# 3. Check health
curl http://localhost:8000/health

The app is served at http://localhost:8000 — both the API and the frontend UI.

Volumes

Volume	Purpose
`chroma_data`	ChromaDB vector store (persistent)
`chunk_data`	Extracted PDF page files
`sqlite_data`	Prompt templates and query history
`uploads_data`	Uploaded video files (persistent)

Environment Variables

All configurable via backend/.env:

Variable	Default	Description
`LLM_BASE_URL`	`https://openrouter.ai/api/v1`	LLM API endpoint
`LLM_API_KEY`	—	API key for LLM provider
`LLM_MODEL_NAME`	`qwen/qwen3.5-35b-a3b`	Chat model
`LLM_TIMEOUT`	`60.0`	LLM request timeout in seconds
`LLM_ENABLE_THINKING`	`false`	Enable LLM thinking/reasoning tokens
`VLLM_ENGINE`	`false`	Use vLLM-format `extra_body` instead of OpenRouter
`EMBEDDING_MODEL`	`qwen/qwen3-embedding-4b`	Embedding model
`EMBEDDING_BASE_URL`	`https://openrouter.ai/api/v1`	Embedding API endpoint
`EMBEDDING_API_KEY`	—	API key for embeddings (falls back to `LLM_API_KEY`)
`CHROMA_DB_PATH`	`./chroma_db`	ChromaDB persistent storage
`CHUNK_SIZE`	`1000`	Token chunk size
`CHUNK_OVERLAP`	`200`	Token chunk overlap
`RETRIEVAL_N_RESULTS`	`10`	Chunks per sub-question
`RELEVANCE_THRESHOLD`	`7.0`	Min relevance score (0-10)
`PROMPTS_DB_PATH`	`./data/prompts.db`	Prompt templates SQLite
`HISTORY_DB_PATH`	`./data/history.db`	Query history SQLite
`CORS_ORIGINS`	`["http://localhost:5173","http://localhost:3000"]`	Allowed CORS origins
`DASHSCOPE_API_KEY`	—	Alibaba Cloud DashScope API key (for video ASR)
`ASR_MODEL_NAME`	`qwen3-asr-flash`	ASR model for batch transcription
`ASR_REALTIME_MODEL_NAME`	`qwen3-asr-flash-realtime`	ASR model for real-time streaming
`VIDEO_UPLOAD_DIR`	`./uploads`	Video file storage directory
`MAX_VIDEO_SIZE_MB`	`300`	Maximum video upload size
`SUPPORTED_VIDEO_FORMATS`	`.mp4, .webm, .mov, .avi, .mkv`	Allowed video file extensions

Production: Nginx Reverse Proxy

# Include nginx.conf in your site config
# Key settings:
# - client_max_body_size 350M   (allow large PDF uploads)
# - proxy_read_timeout 300s     (LLM calls can take minutes)

# Install nginx
sudo apt install nginx

# Copy config
sudo cp nginx.conf /etc/nginx/sites-available/legco
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
sudo nginx -t && sudo systemctl reload nginx

Stopping

docker compose down

Updating

git pull
docker compose up -d --build

Cross-Platform Build (aarch64 → amd64)

When building on an aarch64/ARM64 machine (Apple Silicon, ARM Windows WSL2, Raspberry Pi) for deployment to an x86_64/amd64 server:

1. Install buildx

# Download buildx for arm64
BUILDX_VERSION=$(wget -qO- https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | head -1 | cut -d'"' -f4)
wget "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-arm64" -O ~/.docker/cli-plugins/docker-buildx
chmod +x ~/.docker/cli-plugins/docker-buildx

2. Register QEMU for amd64 emulation

docker run --privileged --rm tonistiigi/binfmt --install all

3. Build for amd64

DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t legco_reranker:amd64 .

4. Export and transfer to server

# Save image to tar file
docker save legco_reranker:amd64 -o legco_reranker_amd64.tar

# Compress (~762MB → ~250MB)
gzip legco_reranker_amd64.tar

# Transfer to server
scp legco_reranker_amd64.tar.gz user@server:/path/

# On the x86_64 server:
gunzip legco_reranker_amd64.tar.gz
docker load -i legco_reranker_amd64.tar

# Run
docker run -d --name legco -p 80:8000 --env-file backend/.env \
  -v chroma_data:/app/chroma_db \
  -v chunk_data:/app/document_chunk \
  -v sqlite_data:/app/data \
  legco_reranker:amd64

5. Test run (local, port 8888)

Before transferring to the server, test the amd64 image locally. Pass all config inline (no --env-file):

docker run -d --name legco_test -p 8888:8000 \
  -e LLM_BASE_URL=https://openrouter.ai/api/v1 \
  -e LLM_API_KEY=your_key_here \
  -e LLM_MODEL_NAME=qwen/qwen3.6-35b-a3b \
  -e LLM_TIMEOUT=60.0 \
  -e LLM_ENABLE_THINKING=false \
  -e VLLM_ENGINE=false \
  -e EMBEDDING_MODEL=qwen/qwen3-embedding-4b \
  -e EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 \
  -e EMBEDDING_API_KEY=your_key_here \
  -e CHROMA_DB_PATH=./chroma_db \
  -e CHUNK_SIZE=1000 \
  -e CHUNK_OVERLAP=200 \
  -e RETRIEVAL_N_RESULTS=10 \
  -e RELEVANCE_THRESHOLD=7.0 \
  -e PROMPTS_DB_PATH=./data/prompts.db \
  -e HISTORY_DB_PATH=./data/history.db \
  -e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
  -e DASHSCOPE_API_KEY=your_dashscope_key \
  -e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
  -e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
  -e VIDEO_UPLOAD_DIR=./uploads \
  -e MAX_VIDEO_SIZE_MB=300 \
  -v ~/woody/legco/data/chroma_db:/app/chroma_db \
  -v ~/woody/legco/data/document_chunk:/app/document_chunk \
  -v ~/woody/legco/data/data:/app/data \
  legco_reranker:amd64.01.02

# Verify
curl http://localhost:8888/health

# Clean up
docker rm -f legco_test

Architecture

User → Nginx (80) → Uvicorn (8000)
                         ├── FastAPI API (/api/v1/*)
                         └── Static Frontend (/*)
                              └── React 18 + Vite + Tailwind

RAG Pipeline (Per-Sub-Question)

User Question
  → [LLM] Decompose into 2-5 sub-questions
  → [ChromaDB] Retrieve 10 chunks per sub-question
  → [LLM] Score all chunks against their own sub-question (single call)
  → [LLM] Generate markdown response per sub-question
  → SSE stream with per-sub-question sources

Video Q&A (Phase 2)

Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline

Streaming Mode (real-time):

Upload video → press play → transcript flows into QueryInput in real time
Audio captured from video element (no microphone needed)
Auto-starts on play, stops on pause/end

Full Transcript Mode (batch):

Click "Full Transcript" button under video player
Server extracts audio via ffmpeg → Full DashScope transcription
Complete transcript fills QueryInput

Requirements:

DASHSCOPE_API_KEY in .env
ffmpeg on server (for batch transcription)
dashscope Python package (in requirements.txt)

Installing ffmpeg

# Ubuntu/Debian
sudo apt install ffmpeg

# macOS
brew install ffmpeg

# Static build (no root, any Linux)
mkdir -p ~/.local/bin
wget -qO- https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz | tar -xJ -C /tmp
cp /tmp/ffmpeg-*-static/ffmpeg ~/.local/bin/

Notes

PDF upload limit: 300MB
Video upload limit: 300MB (same as PDF)
ffmpeg required on server (for video transcription)
DashScope ASR supports Cantonese (yue), Mandarin (zh), English (en), auto-detect
Desktop only (not mobile-optimized)
No authentication (public demo)
All LLM calls routed through configurable base URL