DEV Community

Mart Schweiger
Mart Schweiger

Posted on • Originally published at assemblyai.com

Pipecat Voice Agent with AssemblyAI Universal-3 Pro Streaming

Pipecat Voice Agent with AssemblyAI Universal-3 Pro Streaming

Build a real-time voice agent using Pipecat — Daily.co's open-source Voice AI framework — with the AssemblyAI Universal-3 Pro Streaming model as the speech-to-text engine.

Pipecat's modular pipeline design means you can swap any component without touching the rest. AssemblyAI has a first-party Pipecat plugin with full Universal-3 Pro Streaming support — no manual WebSocket wiring required.

Why AssemblyAI in Pipecat?

Metric AssemblyAI Universal-3 Pro Deepgram Nova-3
P50 latency 307 ms 516 ms
P99 latency 1,012 ms 1,907 ms
Word Error Rate 8.14% 9.87%
Neural turn detection ❌ (VAD only)
Mid-session prompting
Anti-hallucination
Real-time diarization

The 41% latency advantage is noticeable in live conversation — and neural turn detection means fewer awkward double-responses when users pause mid-thought.

Prerequisites

  • Python 3.11+
  • AssemblyAI API key
  • Daily.co API key
  • OpenAI API key
  • Cartesia API key

Quick Start

git clone https://github.com/kelseyefoster/voice-agent-pipecat-universal-3-pro
cd voice-agent-pipecat-universal-3-pro

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

cp .env.example .env

python create_room.py
python bot.py --url https://your-name.daily.co/your-room
Enter fullscreen mode Exit fullscreen mode

Open the room URL in your browser and begin conversing.

Key Configuration Examples

Keyterm Prompting

stt = AssemblyAISTTService(
    connection_params=AssemblyAIConnectionParams(
        api_key=os.environ["ASSEMBLYAI_API_KEY"],
        speech_model="u3-rt-pro",
        keyterms_prompt=["AssemblyAI", "Universal-3", "Pipecat", "YourBrandName"],
    )
)
Enter fullscreen mode Exit fullscreen mode

Supports up to 1,000 terms per session — particularly valuable for medical, legal, and financial domains.

Real-Time Speaker Diarization

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    speaker_labels=True,
    max_speakers=2,
)
Enter fullscreen mode Exit fullscreen mode

Multilingual Support

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    language_detection=True,
)
Enter fullscreen mode Exit fullscreen mode

Supported languages: English, Spanish, French, German, Italian, Portuguese.

Turn Detection Tuning

connection_params=AssemblyAIConnectionParams(
    api_key=os.environ["ASSEMBLYAI_API_KEY"],
    speech_model="u3-rt-pro",
    end_of_turn_confidence_threshold=0.7,
    min_end_of_turn_silence_when_confident=300,
    max_turn_silence=1000,
)
Enter fullscreen mode Exit fullscreen mode

Deploy to PipecatCloud

pip install pipecatcloud
pcc auth login
pcc init
pcc secrets set my-agent-secrets --file .env
pcc deploy
Enter fullscreen mode Exit fullscreen mode

Resources

Top comments (0)