Alibaba Cloud DashScope ASR — Reference Examples
Adapted from /mnt/c/Users/woody/Documents/projects/voice input/ (Cantonese voice-to-text web app).
Files
| File |
What |
Language |
alibaba_asr_backend.py |
FastAPI WebSocket proxy to DashScope realtime ASR |
Python |
alibaba_asr_frontend_vanilla.html |
Browser audio capture + WebSocket (vanilla JS) |
HTML/JS |
alibaba_asr_frontend_react.tsx |
React/TS hook + component for audio capture |
TypeScript/React |
Architecture
Browser (Float32 PCM, 16kHz mono)
│ WebSocket: send(float32Data.buffer)
▼
FastAPI Backend (/ws/asr/{video_id})
│ Convert Float32 → S16_LE → base64
▼
Alibaba Cloud DashScope (wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime)
│ Model: qwen3-asr-flash-realtime
▼ Language: yue (Cantonese)
Transcript JSON → Browser
Key Details
- Audio format: Float32 PCM, 16kHz, mono (browser) → S16_LE PCM, 16kHz, mono, base64 (DashScope)
- Model:
qwen3-asr-flash-realtime (WebSocket realtime, unlimited duration)
- Endpoint:
wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime
- SDK:
pip install dashscope>=0.4.0
- Cantonese: Language code
yue (works natively with DashScope)
- VAD: Server-side (Alibaba Cloud handles voice activity detection)
- Pricing: ~$0.00009/second
- Features: Punctuation, ITN, filler word filtering, multi-language auto-detect
Dependencies
# Python
dashscope>=0.4.0
openai>=1.52.0
zhconv>=1.4.0 # Simplified → Traditional Chinese (optional)
# No additional JS deps needed — native Web APIs only:
# WebSocket, AudioContext, ScriptProcessorNode, getUserMedia