Agora Voice Agent with AssemblyAI Universal-3 Pro Streaming
Build a real-time transcription bot that joins Agora channels, captures participant audio as PCM frames, and streams it to AssemblyAI Universal-3 Pro Streaming — with 307ms P50 latency and support for 99+ languages.
Architecture
Browser/Mobile clients
│ WebRTC (Agora SDK)
▼
Agora Channel
│ server subscribes as bot user
▼
Python Server Bot
(agora-python-server-sdk)
│ PcmAudioFrame per participant
│ sample_rate=16000, pcm_s16le
▼
AssemblyAI Universal-3 Pro Streaming
wss://streaming.assemblyai.com/v3/ws
│ Turn events with transcript
▼
Your application logic
(drive LLM, store transcript, trigger webhook)
Why Agora + AssemblyAI?
| Metric | AssemblyAI Universal-3 Pro | Agora Built-in STT |
|---|---|---|
| P50 latency | 307ms | ~600–900ms |
| Word Error Rate | 8.9% | ~14–18% |
| Speaker diarization | ✅ Real-time | ❌ |
| LLM Gateway | ✅ 20+ models | ❌ |
| Languages | 99+ | Limited |
| Audio formats | PCM, μ-law, Opus | PCM only |
Prerequisites
- Python 3.9+
- Agora account — App ID and App Certificate
- AssemblyAI API key
Quick Start
git clone https://github.com/kelseyefoster/voice-agent-agora-universal-3-pro
cd voice-agent-agora-universal-3-pro
pip install -r requirements.txt
cp .env.example .env
# Fill in AGORA_APP_ID, AGORA_APP_CERT, ASSEMBLYAI_API_KEY
python bot.py --channel my-channel
Environment Setup
AGORA_APP_ID=your_agora_app_id
AGORA_APP_CERT=your_agora_certificate
AGORA_CHANNEL=my-channel
AGORA_BOT_UID=9999
ASSEMBLYAI_API_KEY=your_assemblyai_api_key
Obtain Agora credentials from the Agora Console and your AssemblyAI API key from the AssemblyAI dashboard.
Core Integration
The bot operates concurrently for each participant: pulling audio frames from Agora, forwarding them to AssemblyAI, and handling transcript events.
import asyncio
import json
import os
import websockets
from agora.rtc.agora_service import AgoraService, AgoraServiceConfig
from agora.rtc.rtc_connection import RTCConnConfig
from agora.rtc.agora_base import (
ClientRoleType,
ChannelProfileType,
AudioScenarioType,
)
SAMPLE_RATE = 16000
CHANNELS = 1
AAI_WS_URL = (
"wss://streaming.assemblyai.com/v3/ws"
f"?sample_rate={SAMPLE_RATE}"
"&speech_model=u3-rt-pro"
"&format_turns=true"
)
async def stream_participant(agora_channel, uid: int, api_key: str):
headers = {"Authorization": api_key}
async with websockets.connect(AAI_WS_URL, additional_headers=headers) as ws:
begin = json.loads(await ws.recv())
print(f"[uid={uid}] AAI session: {begin['id']}")
async def send_audio():
async for frame in agora_channel.get_audio_frames(uid):
await ws.send(frame.data)
async def recv_transcripts():
async for message in ws:
event = json.loads(message)
if event["type"] == "Turn" and event.get("end_of_turn"):
print(f"[uid={uid}] {event['transcript']}")
await asyncio.gather(send_audio(), recv_transcripts())
Audio Format
Configure Agora to output 16 kHz mono before subscribing — this eliminates resampling and matches AssemblyAI's preferred format:
agora_channel.set_playback_audio_frame_before_mixing_parameters(
num_of_channels=1,
sample_rate=16000,
)
agora_channel.subscribe_all_audio()
Each PcmAudioFrame contains 160 samples (10ms) of 16-bit little-endian PCM. AssemblyAI streams them directly without buffering.
Handling Transcripts
The Turn event fires at natural speech boundaries. Route it to your LLM, database, or webhook:
async def recv_transcripts(ws, uid: int):
async for message in ws:
event = json.loads(message)
if event["type"] == "Turn" and event.get("end_of_turn"):
transcript = event["transcript"]
print(f"[uid={uid}] {transcript}")
await send_to_llm(uid, transcript)
Terminating Cleanly
Send a Terminate message to flush the final turn:
async def close_stream(ws):
await ws.send(json.dumps({"type": "Terminate"}))
async for message in ws:
event = json.loads(message)
if event["type"] == "Termination":
print(f"Audio processed: {event['audio_duration_seconds']}s")
break
Production Token Generation
pip install agora-token-builder
from agora_token_builder import RtcTokenBuilder, Role_Subscriber
import time
def generate_bot_token(app_id: str, app_cert: str, channel: str, uid: int) -> str:
expire = int(time.time()) + 3600
return RtcTokenBuilder.buildTokenWithUid(
app_id, app_cert, channel, uid, Role_Subscriber, expire
)
token = generate_bot_token(
os.environ["AGORA_APP_ID"],
os.environ["AGORA_APP_CERT"],
channel,
bot_uid,
)
connection.connect(token, channel, str(bot_uid))
Top comments (0)