Mart Schweiger

Posted on Apr 3 • Originally published at assemblyai.com

Pipecat Voice Agent with AssemblyAI Universal-3 Pro Streaming

#ai #python #tutorial #assemblyai

Pipecat Voice Agent with AssemblyAI Universal-3 Pro Streaming

Build a real-time voice agent using Pipecat — Daily.co's open-source Voice AI framework — with the AssemblyAI Universal-3 Pro Streaming model as the speech-to-text engine.

Pipecat's modular pipeline design means you can swap any component without touching the rest. AssemblyAI has a first-party Pipecat plugin with full Universal-3 Pro Streaming support — no manual WebSocket wiring required.

Why AssemblyAI in Pipecat?

Metric	AssemblyAI Universal-3 Pro	Deepgram Nova-3
P50 latency	307 ms	516 ms
P99 latency	1,012 ms	1,907 ms
Word Error Rate	8.14%	9.87%
Neural turn detection	✅	❌ (VAD only)
Mid-session prompting	✅	❌
Anti-hallucination	✅	❌
Real-time diarization	✅	❌

The 41% latency advantage is noticeable in live conversation — and neural turn detection means fewer awkward double-responses when users pause mid-thought.

Prerequisites

Python 3.11+
AssemblyAI API key
Daily.co API key
OpenAI API key
Cartesia API key

Quick Start

git clone https://github.com/kelseyefoster/voice-agent-pipecat-universal-3-pro
cd voice-agent-pipecat-universal-3-pro

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env

python create_room.py
python bot.py --url https://your-name.daily.co/your-room

Open the room URL in your browser and begin conversing.

Key Configuration Examples

Keyterm Prompting

stt = AssemblyAISTTService(
    connection_params=AssemblyAIConnectionParams(
        api_key=os.environ["ASSEMBLYAI_API_KEY"],
        speech_model="u3-rt-pro",
        keyterms_prompt=["AssemblyAI", "Universal-3", "Pipecat", "YourBrandName"],
    )
)

Supports up to 1,000 terms per session — particularly valuable for medical, legal, and financial domains.

Real-Time Speaker Diarization

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    speaker_labels=True,
    max_speakers=2,
)

Multilingual Support

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    language_detection=True,
)

Supported languages: English, Spanish, French, German, Italian, Portuguese.

Turn Detection Tuning

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    end_of_turn_confidence_threshold=0.7,
    min_end_of_turn_silence_when_confident=300,
    max_turn_silence=1000,
)

Deploy to PipecatCloud

pip install pipecatcloud
pcc auth login
pcc init
pcc secrets set my-agent-secrets --file .env
pcc deploy

DEV Community

Pipecat Voice Agent with AssemblyAI Universal-3 Pro Streaming

Pipecat Voice Agent with AssemblyAI Universal-3 Pro Streaming

Why AssemblyAI in Pipecat?

Prerequisites

Quick Start

Key Configuration Examples

Keyterm Prompting

Real-Time Speaker Diarization

Multilingual Support

Turn Detection Tuning

Deploy to PipecatCloud

Resources

Top comments (0)