legco_ai_assistant/.examples
Woody 9934749d2b feat: Phase 2.1 config + infrastructure and 2.2 video upload backend
- Add DashScope ASR and video upload config fields to Settings
- Create Pydantic models (video.py, asr.py)
- Create VideoService with validation, save, serve, delete
- Create ASR client stub with float32_to_s16le utility
- Implement POST /api/v1/video/upload with streaming validation
- Implement GET /api/v1/video/{video_id} with FileResponse
- Create WebSocket ASR endpoint stub
- Register new routers in main.py
- Update .env.example and requirements.txt
- Add reference examples for DashScope integration
- 8 tests passing (3 config + 5 video upload)
2026-05-06 13:08:19 +08:00
..
README.md feat: Phase 2.1 config + infrastructure and 2.2 video upload backend 2026-05-06 13:08:19 +08:00
alibaba_asr_backend.py feat: Phase 2.1 config + infrastructure and 2.2 video upload backend 2026-05-06 13:08:19 +08:00
alibaba_asr_frontend_react.tsx feat: Phase 2.1 config + infrastructure and 2.2 video upload backend 2026-05-06 13:08:19 +08:00
alibaba_asr_frontend_vanilla.html feat: Phase 2.1 config + infrastructure and 2.2 video upload backend 2026-05-06 13:08:19 +08:00

README.md

Alibaba Cloud DashScope ASR — Reference Examples

Adapted from /mnt/c/Users/woody/Documents/projects/voice input/ (Cantonese voice-to-text web app).

Files

File What Language
alibaba_asr_backend.py FastAPI WebSocket proxy to DashScope realtime ASR Python
alibaba_asr_frontend_vanilla.html Browser audio capture + WebSocket (vanilla JS) HTML/JS
alibaba_asr_frontend_react.tsx React/TS hook + component for audio capture TypeScript/React

Architecture

Browser (Float32 PCM, 16kHz mono)
  │  WebSocket: send(float32Data.buffer)
  ▼
FastAPI Backend (/ws/asr/{video_id})
  │  Convert Float32 → S16_LE → base64
  ▼
Alibaba Cloud DashScope (wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime)
  │  Model: qwen3-asr-flash-realtime
  ▼  Language: yue (Cantonese)
Transcript JSON → Browser

Key Details

  • Audio format: Float32 PCM, 16kHz, mono (browser) → S16_LE PCM, 16kHz, mono, base64 (DashScope)
  • Model: qwen3-asr-flash-realtime (WebSocket realtime, unlimited duration)
  • Endpoint: wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
  • SDK: pip install dashscope>=0.4.0
  • Cantonese: Language code yue (works natively with DashScope)
  • VAD: Server-side (Alibaba Cloud handles voice activity detection)
  • Pricing: ~$0.00009/second
  • Features: Punctuation, ITN, filler word filtering, multi-language auto-detect

Dependencies

# Python
dashscope>=0.4.0
openai>=1.52.0
zhconv>=1.4.0       # Simplified → Traditional Chinese (optional)

# No additional JS deps needed — native Web APIs only:
#   WebSocket, AudioContext, ScriptProcessorNode, getUserMedia