309 lines
9.6 KiB
Markdown
309 lines
9.6 KiB
Markdown
# LegCo Reranker
|
|
|
|
RAG-powered document Q&A app with video ASR. Upload PDFs, upload videos with Cantonese ASR transcription, ask questions, get bullet-point answers with citations.
|
|
|
|
## Quick Start (Dev)
|
|
|
|
```bash
|
|
# Backend
|
|
cd backend
|
|
cp .env.example .env # edit .env with your LLM API key AND DashScope API key (for video ASR)
|
|
pip install -r requirements.txt
|
|
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload
|
|
|
|
# Frontend
|
|
cd frontend
|
|
npm install
|
|
npm run dev
|
|
```
|
|
|
|
Backend → `http://localhost:8000` | Frontend → `http://localhost:5173`
|
|
|
|
## Deploy with Docker
|
|
|
|
### Prerequisites
|
|
|
|
- Docker 24+ and Docker Compose v2
|
|
- OpenRouter API key (or compatible LLM provider)
|
|
- Alibaba Cloud DashScope API key (for video ASR transcription)
|
|
|
|
### Setup
|
|
|
|
```bash
|
|
# 1. Configure environment
|
|
cp backend/.env.example backend/.env
|
|
# Edit backend/.env with your API keys and model names
|
|
|
|
# 2. Build and start
|
|
docker compose up -d --build
|
|
|
|
# 3. Check health
|
|
curl http://localhost:8000/health
|
|
```
|
|
|
|
The app is served at `http://localhost:8000` — both the API and the frontend UI.
|
|
|
|
### Volumes
|
|
|
|
| Volume | Purpose |
|
|
|--------|---------|
|
|
| `chroma_data` | ChromaDB vector store (persistent) |
|
|
| `chunk_data` | Extracted PDF page files |
|
|
| `sqlite_data` | Prompt templates and query history |
|
|
| `uploads_data` | Uploaded video files (persistent) |
|
|
|
|
### Environment Variables
|
|
|
|
All configurable via `backend/.env`:
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `LLM_BASE_URL` | `https://openrouter.ai/api/v1` | LLM API endpoint |
|
|
| `LLM_API_KEY` | — | API key for LLM provider |
|
|
| `LLM_MODEL_NAME` | `qwen/qwen3.5-35b-a3b` | Chat model |
|
|
| `LLM_TIMEOUT` | `60.0` | LLM request timeout in seconds |
|
|
| `LLM_ENABLE_THINKING` | `false` | Enable LLM thinking/reasoning tokens |
|
|
| `VLLM_ENGINE` | `false` | Use vLLM-format `extra_body` instead of OpenRouter |
|
|
| `EMBEDDING_MODEL` | `qwen/qwen3-embedding-4b` | Embedding model |
|
|
| `EMBEDDING_BASE_URL` | `https://openrouter.ai/api/v1` | Embedding API endpoint |
|
|
| `EMBEDDING_API_KEY` | — | API key for embeddings (falls back to `LLM_API_KEY`) |
|
|
| `CHROMA_DB_PATH` | `./chroma_db` | ChromaDB persistent storage |
|
|
| `CHUNK_SIZE` | `1000` | Token chunk size |
|
|
| `CHUNK_OVERLAP` | `200` | Token chunk overlap |
|
|
| `RETRIEVAL_N_RESULTS` | `10` | Chunks per sub-question |
|
|
| `RELEVANCE_THRESHOLD` | `7.0` | Min relevance score (0-10) |
|
|
| `PROMPTS_DB_PATH` | `./data/prompts.db` | Prompt templates SQLite |
|
|
| `HISTORY_DB_PATH` | `./data/history.db` | Query history SQLite |
|
|
| `CORS_ORIGINS` | `["http://localhost:5173","http://localhost:3000"]` | Allowed CORS origins |
|
|
| `DASHSCOPE_API_KEY` | — | Alibaba Cloud DashScope API key (for video ASR) |
|
|
| `ASR_MODEL_NAME` | `qwen3-asr-flash` | ASR model for batch transcription |
|
|
| `ASR_REALTIME_MODEL_NAME` | `qwen3-asr-flash-realtime` | ASR model for real-time streaming |
|
|
| `VIDEO_UPLOAD_DIR` | `./uploads` | Video file storage directory |
|
|
| `MAX_VIDEO_SIZE_MB` | `300` | Maximum video upload size |
|
|
| `SUPPORTED_VIDEO_FORMATS` | `.mp4, .webm, .mov, .avi, .mkv` | Allowed video file extensions |
|
|
|
|
### Production: Nginx Reverse Proxy
|
|
|
|
```nginx
|
|
# Include nginx.conf in your site config
|
|
# Key settings:
|
|
# - client_max_body_size 350M (allow large PDF uploads)
|
|
# - proxy_read_timeout 300s (LLM calls can take minutes)
|
|
```
|
|
|
|
```bash
|
|
# Install nginx
|
|
sudo apt install nginx
|
|
|
|
# Copy config
|
|
sudo cp nginx.conf /etc/nginx/sites-available/legco
|
|
sudo ln -s /etc/nginx/sites-available/legco /etc/nginx/sites-enabled/
|
|
sudo nginx -t && sudo systemctl reload nginx
|
|
```
|
|
|
|
### Stopping
|
|
|
|
```bash
|
|
docker compose down
|
|
```
|
|
|
|
### Updating
|
|
|
|
```bash
|
|
git pull
|
|
docker compose up -d --build
|
|
```
|
|
|
|
### Cross-Platform Build (aarch64 → amd64)
|
|
|
|
When building on an aarch64/ARM64 machine (Apple Silicon, ARM Windows WSL2, Raspberry Pi) for deployment to an x86_64/amd64 server:
|
|
|
|
#### 1. Install buildx
|
|
|
|
```bash
|
|
# Download buildx for arm64
|
|
BUILDX_VERSION=$(wget -qO- https://api.github.com/repos/docker/buildx/releases/latest | grep tag_name | head -1 | cut -d'"' -f4)
|
|
wget "https://github.com/docker/buildx/releases/download/${BUILDX_VERSION}/buildx-${BUILDX_VERSION}.linux-arm64" -O ~/.docker/cli-plugins/docker-buildx
|
|
chmod +x ~/.docker/cli-plugins/docker-buildx
|
|
```
|
|
|
|
#### 2. Register QEMU for amd64 emulation
|
|
|
|
```bash
|
|
docker run --privileged --rm tonistiigi/binfmt --install all
|
|
```
|
|
|
|
#### 3. Build for amd64
|
|
|
|
```bash
|
|
DOCKER_BUILDKIT=1 docker build --platform linux/amd64 -t legco_reranker:amd64 .
|
|
```
|
|
|
|
#### 4. Export and transfer to server
|
|
|
|
```bash
|
|
# Save image to tar file
|
|
docker save legco_reranker:amd64 -o legco_reranker_amd64.tar
|
|
|
|
# Compress (~762MB → ~250MB)
|
|
gzip legco_reranker_amd64.tar
|
|
|
|
# Transfer to server
|
|
scp legco_reranker_amd64.tar.gz user@server:/path/
|
|
|
|
# On the x86_64 server:
|
|
gunzip legco_reranker_amd64.tar.gz
|
|
docker load -i legco_reranker_amd64.tar
|
|
|
|
# Run
|
|
docker run -d --name legco -p 80:8000 --env-file backend/.env \
|
|
-v chroma_data:/app/chroma_db \
|
|
-v chunk_data:/app/document_chunk \
|
|
-v sqlite_data:/app/data \
|
|
legco_reranker:amd64
|
|
```
|
|
|
|
#### 5. Test run (local, port 8888)
|
|
|
|
Before transferring to the server, test the amd64 image locally. Pass all config inline (no `--env-file`):
|
|
|
|
```bash
|
|
docker run -d --name legco_test -p 8888:8000 \
|
|
-e LLM_BASE_URL=https://openrouter.ai/api/v1 \
|
|
-e LLM_API_KEY=your_key_here \
|
|
-e LLM_MODEL_NAME=qwen/qwen3.6-35b-a3b \
|
|
-e LLM_TIMEOUT=60.0 \
|
|
-e LLM_ENABLE_THINKING=false \
|
|
-e VLLM_ENGINE=false \
|
|
-e EMBEDDING_MODEL=qwen/qwen3-embedding-4b \
|
|
-e EMBEDDING_BASE_URL=https://openrouter.ai/api/v1 \
|
|
-e EMBEDDING_API_KEY=your_key_here \
|
|
-e CHROMA_DB_PATH=./chroma_db \
|
|
-e CHUNK_SIZE=1000 \
|
|
-e CHUNK_OVERLAP=200 \
|
|
-e RETRIEVAL_N_RESULTS=10 \
|
|
-e RELEVANCE_THRESHOLD=7.0 \
|
|
-e PROMPTS_DB_PATH=./data/prompts.db \
|
|
-e HISTORY_DB_PATH=./data/history.db \
|
|
-e CORS_ORIGINS='["http://localhost:5173","http://localhost:3000"]' \
|
|
-e DASHSCOPE_API_KEY=your_dashscope_key \
|
|
-e ASR_MODEL_NAME=qwen3-asr-flash-2026-02-10 \
|
|
-e ASR_REALTIME_MODEL_NAME=qwen3-asr-flash-realtime-2026-02-10 \
|
|
-e VIDEO_UPLOAD_DIR=./uploads \
|
|
-e MAX_VIDEO_SIZE_MB=300 \
|
|
-v ~/woody/legco/data/chroma_db:/app/chroma_db \
|
|
-v ~/woody/legco/data/document_chunk:/app/document_chunk \
|
|
-v ~/woody/legco/data/data:/app/data \
|
|
legco_reranker:amd64.01.02
|
|
|
|
# Verify
|
|
curl http://localhost:8888/health
|
|
|
|
# Clean up
|
|
docker rm -f legco_test
|
|
```
|
|
|
|
## Architecture
|
|
|
|
```
|
|
User → Nginx (80) → Uvicorn (8000)
|
|
├── FastAPI API (/api/v1/*)
|
|
└── Static Frontend (/*)
|
|
└── React 18 + Vite + Tailwind
|
|
```
|
|
|
|
### RAG Pipeline (Per-Sub-Question)
|
|
|
|
```
|
|
User Question
|
|
→ [LLM] Decompose into 2-5 sub-questions
|
|
→ [ChromaDB] Retrieve 10 chunks per sub-question
|
|
→ [LLM] Score all chunks against their own sub-question (single call)
|
|
→ [LLM] Generate markdown response per sub-question
|
|
→ SSE stream with per-sub-question sources
|
|
```
|
|
|
|
### Video Q&A (Phase 2)
|
|
|
|
```
|
|
Video → Audio → DashScope ASR → Transcript → QueryInput → RAG Pipeline
|
|
```
|
|
|
|
**Streaming Mode (real-time):**
|
|
- Upload video → press play → transcript flows into QueryInput in real time
|
|
- Audio captured from video element (no microphone needed)
|
|
- Auto-starts on play, stops on pause/end
|
|
|
|
**Full Transcript Mode (batch):**
|
|
- Click "Full Transcript" button under video player
|
|
- Server extracts audio via ffmpeg → Full DashScope transcription
|
|
- Complete transcript fills QueryInput
|
|
|
|
**Requirements:**
|
|
- `DASHSCOPE_API_KEY` in `.env`
|
|
- `ffmpeg` on server (for batch transcription)
|
|
- `dashscope` Python package (in `requirements.txt`)
|
|
|
|
### YouTube Live Stream Proxy (Phase 3)
|
|
|
|
Proxy YouTube live streams and VODs through the backend, with real-time ASR transcription piped into the RAG pipeline — no file upload needed.
|
|
|
|
```
|
|
YouTube URL → yt-dlp extract → HLS manifest URLs
|
|
↓
|
|
HLS Proxy (backend): rewrites segment URLs → client fetches via proxy
|
|
↓
|
|
Frontend: hls.js plays video/audio → AudioContext → WebSocket → ASR → transcript
|
|
```
|
|
|
|
**How to use:**
|
|
1. Toggle source from "Upload" to "YouTube" in the video panel
|
|
2. Paste a YouTube URL (live stream or VOD)
|
|
3. Click "Load Stream" — backend extracts streams via yt-dlp
|
|
4. Press play — video plays via hls.js, audio feeds real-time ASR
|
|
5. Transcript flows into QueryInput as you watch
|
|
|
|
**Configuration:**
|
|
|
|
| Variable | Default | Description |
|
|
|----------|---------|-------------|
|
|
| `YOUTUBE_PROXY_ENABLED` | `false` | Enable YouTube proxy feature |
|
|
| `YT_DLP_TIMEOUT` | `30` | yt-dlp extraction timeout (seconds) |
|
|
| `YT_DLP_CACHE_TTL` | `300` | Cache TTL for extracted stream info |
|
|
|
|
**Requirements:**
|
|
- `YOUTUBE_PROXY_ENABLED=true` in `.env`
|
|
- `yt-dlp` (auto-installed via `requirements.txt`)
|
|
- `DASHSCOPE_API_KEY` in `.env` (for ASR)
|
|
|
|
**Known limitations:**
|
|
- YouTube may require PO tokens for some videos (especially live streams) — stream may need re-extraction if tokens expire
|
|
- Video quality limited to 480p max (no quality selector in UI — low resolution sufficient for reference viewing)
|
|
- YouTube segment URLs expire after ~6 hours
|
|
- "Full Transcript" button hidden for YouTube source (streaming ASR only)
|
|
|
|
### Installing ffmpeg
|
|
|
|
```bash
|
|
# Ubuntu/Debian
|
|
sudo apt install ffmpeg
|
|
|
|
# macOS
|
|
brew install ffmpeg
|
|
|
|
# Static build (no root, any Linux)
|
|
mkdir -p ~/.local/bin
|
|
wget -qO- https://johnvansickle.com/ffmpeg/releases/ffmpeg-release-amd64-static.tar.xz | tar -xJ -C /tmp
|
|
cp /tmp/ffmpeg-*-static/ffmpeg ~/.local/bin/
|
|
```
|
|
|
|
## Notes
|
|
|
|
- PDF upload limit: 300MB
|
|
- Video upload limit: 300MB (same as PDF)
|
|
- ffmpeg required on server (for video transcription)
|
|
- DashScope ASR supports Cantonese (yue), Mandarin (zh), English (en), auto-detect
|
|
- Desktop only (not mobile-optimized)
|
|
- No authentication (public demo)
|
|
- All LLM calls routed through configurable base URL
|