# Alibaba Cloud DashScope ASR — Reference Examples Adapted from `/mnt/c/Users/woody/Documents/projects/voice input/` (Cantonese voice-to-text web app). ## Files | File | What | Language | |------|------|----------| | `alibaba_asr_backend.py` | FastAPI WebSocket proxy to DashScope realtime ASR | Python | | `alibaba_asr_frontend_vanilla.html` | Browser audio capture + WebSocket (vanilla JS) | HTML/JS | | `alibaba_asr_frontend_react.tsx` | React/TS hook + component for audio capture | TypeScript/React | ## Architecture ``` Browser (Float32 PCM, 16kHz mono) │ WebSocket: send(float32Data.buffer) ▼ FastAPI Backend (/ws/asr/{video_id}) │ Convert Float32 → S16_LE → base64 ▼ Alibaba Cloud DashScope (wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime) │ Model: qwen3-asr-flash-realtime ▼ Language: yue (Cantonese) Transcript JSON → Browser ``` ## Key Details - **Audio format**: Float32 PCM, 16kHz, mono (browser) → S16_LE PCM, 16kHz, mono, base64 (DashScope) - **Model**: `qwen3-asr-flash-realtime` (WebSocket realtime, unlimited duration) - **Endpoint**: `wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime` - **SDK**: `pip install dashscope>=0.4.0` - **Cantonese**: Language code `yue` (works natively with DashScope) - **VAD**: Server-side (Alibaba Cloud handles voice activity detection) - **Pricing**: ~$0.00009/second - **Features**: Punctuation, ITN, filler word filtering, multi-language auto-detect ## Dependencies ``` # Python dashscope>=0.4.0 openai>=1.52.0 zhconv>=1.4.0 # Simplified → Traditional Chinese (optional) # No additional JS deps needed — native Web APIs only: # WebSocket, AudioContext, ScriptProcessorNode, getUserMedia ```