Text-to-Speech (TTS) plays a crucial role in modern AI agents, especially those operating in telephony, customer support, voice bots, and conversational interfaces. If your agent needs to “talk,” you need a reliable and natural-sounding TTS system.
In this guide, we’ll explore how to integrate Papla Media’s TTS engine into the VideoSDK Agent Framework to generate smooth, high-quality speech responses from your AI agent.
Whether you're building voice assistants, IVR flows, or real-time conversational bots, Papla TTS is a strong addition to your pipeline.
Why Papla Media TTS?
Papla Media provides fast, high-quality TTS with:
- Natural, expressive voices
- Quick response time : ideal for real-time interactions
- Simple configuration & flexible model selection
- Seamless integration with VideoSDK Agent Pipelines
This makes it great for telephony agents, WhatsApp voice interactions, and AI-driven customer workflows.
Getting Started
1) Pre-requisites
- PaplaTTS doesnot support python3.13 use python version < 3.13
-
VideoSDK account to get your
VIDEOSDK_TOKEN. - Videosdk meeting ID
2) Project Setup
Create a project folder with the following structure:
├── main.py # Core logic of your AI agent
├── requirements.txt # Python dependencies
└── .env # Store your API keys
3) Create and activate your virtual environment
for macOs
python3.12 -m venv venv
source venv/bin/activate
for windows
python -m venv venv
venv\Scripts\activate
4) Install all the dependencies
VideoSDK provides Papla support as a separate plugin:
pip install "videosdk-agents[deepgram,openai,elevenlabs,silero,turn_detector,papla]"
5) Import the PaplaTTS Module
The plugin exposes the PaplaTTS class, which we can attach to a CascadingPipeline.
from videosdk.plugins.papla import PaplaTTS
from videosdk.agents import CascadingPipeline
6) Set Up Authentication
Papla requires an API key. You can generate this from your Papla Media dashboard.
Add it to your .env:
PAPLA_API_KEY=your-papla-media-api-key
DEEPGRAM_API_KEY = "Yor Deepgram API Key"
OPENAI_API_KEY = "Your OpenAI API Key"
VIDEOSDK_AUTH_TOKEN = "VideoSDK Auth token"
API Keys - Get API keys Papla ↗, OpenAI ↗, ElevenLabs ↗ & VideoSDK Dashboard ↗ follow to guide to generate videosdk token
Integrating Papla TTS into Your Agent
The integration is extremely straightforward.
Initialize the Papla TTS Engine
tts = PaplaTTS(
# When PAPLA_API_KEY exists in .env, remove this parameter
api_key="your-papla-media-api-key",
)
Add TTS to the Agent Pipeline
pipeline = CascadingPipeline(
tts=tts
)
Your agent pipeline now supports Papla TTS.
Every text response generated by your LLM will be passed to Papla Media and converted into speech before being sent to the user.
Papla TTS Configuration Options
You can customize the TTS behavior using the following parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
model_id |
str | "papla_p1" |
Selects the TTS model/voice |
api_key |
str | — | Your Papla API key (optional if using .env) |
base_url |
str | "https://api.papla.media/v1" |
Use only if you're pointing to a custom API endpoint |
How Papla Fits in the CascadingPipeline
VideoSDK’s CascadingPipeline processes every message/event in a structured flow:
import asyncio, os
from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob,ConversationFlow
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILL
from typing import AsyncIterator
from videosdk.plugins.papla import PaplaTTS
## Pre-downloading the Turn Detector model
pre_download_model()
class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(instructions="You are a helpful voice assistant that can answer questions and help with tasks.")
async def on_enter(self): await self.session.say("Hello! How can I help?")
async def on_exit(self): await self.session.say("Goodbye!")
async def start_session(context: JobContext):
# Create agent and conversation flow
agent = MyVoiceAgent()
conversation_flow = ConversationFlow(agent)
# Create pipeline
pipeline = CascadingPipeline(
stt=DeepgramSTT(model="nova-2", language="en"),
llm=OpenAILLM(model="gpt-4o"),
tts=PaplaTTS(),
vad=SileroVAD(threshold=0.35),
turn_detector=TurnDetector(threshold=0.8)
)
session = AgentSession(
agent=agent,
pipeline=pipeline,
conversation_flow=conversation_flow
)
try:
await context.connect()
await session.start()
# Keep the session running until manually terminated
await asyncio.Event().wait()
finally:
# Clean up resources when done
await session.close()
await context.shutdown()
def make_context() -> JobContext:
room_options = RoomOptions(
# room_id="YOUR_MEETING_ID", # Set to join a pre-created room; omit to auto-create
name="VideoSDK Cascaded Agent",
playground=True
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
Run your file:
python main.py
you can run in console also
python main.py console
Conclusion
Papla Media TTS is a powerful, fast, and easy-to-integrate solution for generating natural speech inside VideoSDK’s Agent Framework. With just a few lines of code, your agent can transform text responses into lifelike audio perfect for telephony and other voice-first use cases.
If you're building conversational AI agents, this integration is one of the simplest ways to add high-quality TTS into your workflow.
Resources and Next Step
- Our Open-source framework for building real-time multimodal conversational AI agents : https://github.com/videosdk-live/agents
- Build telephony agent using videosdk : https://docs.videosdk.live/ai_agents/ai-phone-agent-quick-start
- Build whatsapp agent using paplaAI : https://docs.videosdk.live/ai_agents/whatsapp-voice-agent-quick-start
💡 We’d love to hear from you!
- Did you manage to set up your first AI voice agent in Python?
- What challenges did you face while integrating cascading pipeline?
- Are you more interested in cascading pipeline or realtime pipeline?
- How do you see AI voice assistants transforming customer experience in your business?
👉 Share your thoughts, roadblocks, or success stories in the comments or join our Discord community ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!
Top comments (0)