Pipecat Voice Agent with AssemblyAI Universal-3 Pro Streaming
Build a real-time voice agent using Pipecat — Daily.co's open-source Voice AI framework — with the AssemblyAI Universal-3 Pro Streaming model as the speech-to-text engine.
Pipecat's modular pipeline design means you can swap any component without touching the rest. AssemblyAI has a first-party Pipecat plugin with full Universal-3 Pro Streaming support — no manual WebSocket wiring required.
Why AssemblyAI in Pipecat?
| Metric | AssemblyAI Universal-3 Pro | Deepgram Nova-3 |
|---|---|---|
| P50 latency | 307 ms | 516 ms |
| P99 latency | 1,012 ms | 1,907 ms |
| Word Error Rate | 8.14% | 9.87% |
| Neural turn detection | ✅ | ❌ (VAD only) |
| Mid-session prompting | ✅ | ❌ |
| Anti-hallucination | ✅ | ❌ |
| Real-time diarization | ✅ | ❌ |
The 41% latency advantage is noticeable in live conversation — and neural turn detection means fewer awkward double-responses when users pause mid-thought.
Prerequisites
- Python 3.11+
- AssemblyAI API key
- Daily.co API key
- OpenAI API key
- Cartesia API key
Quick Start
git clone https://github.com/kelseyefoster/voice-agent-pipecat-universal-3-pro
cd voice-agent-pipecat-universal-3-pro
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
python create_room.py
python bot.py --url https://your-name.daily.co/your-room
Open the room URL in your browser and begin conversing.
Key Configuration Examples
Keyterm Prompting
stt = AssemblyAISTTService(
connection_params=AssemblyAIConnectionParams(
api_key=os.environ["ASSEMBLYAI_API_KEY"],
speech_model="u3-rt-pro",
keyterms_prompt=["AssemblyAI", "Universal-3", "Pipecat", "YourBrandName"],
)
)
Supports up to 1,000 terms per session — particularly valuable for medical, legal, and financial domains.
Real-Time Speaker Diarization
connection_params=AssemblyAIConnectionParams(
api_key=os.environ["ASSEMBLYAI_API_KEY"],
speech_model="u3-rt-pro",
speaker_labels=True,
max_speakers=2,
)
Multilingual Support
connection_params=AssemblyAIConnectionParams(
api_key=os.environ["ASSEMBLYAI_API_KEY"],
speech_model="u3-rt-pro",
language_detection=True,
)
Supported languages: English, Spanish, French, German, Italian, Portuguese.
Turn Detection Tuning
connection_params=AssemblyAIConnectionParams(
api_key=os.environ["ASSEMBLYAI_API_KEY"],
speech_model="u3-rt-pro",
end_of_turn_confidence_threshold=0.7,
min_end_of_turn_silence_when_confident=300,
max_turn_silence=1000,
)
Deploy to PipecatCloud
pip install pipecatcloud
pcc auth login
pcc init
pcc secrets set my-agent-secrets --file .env
pcc deploy
Top comments (0)