DEV Community

Cover image for How to Build an AI Telephony Agent in Python: Beginner’s Guide
Chaitrali Kakde
Chaitrali Kakde

Posted on • Edited on

How to Build an AI Telephony Agent in Python: Beginner’s Guide

Introduction: Why AI Telephony Agents Matter

Are you looking to create an AI voice assistant that can answer phone calls automatically? AI telephony agents are revolutionizing how businesses handle customer support and real-time communication.

In this guide, we’ll walk you through building a fully functional AI telephony agent in Python, even if you’re a beginner. By the end of this tutorial, your AI agent will be able to:

  • Answer inbound phone calls automatically
  • Make outbound calls
  • Understand human speech using Speech-to-Text (STT)
  • Respond intelligently with Large Language Models (LLM)
  • Speak naturally with Text-to-Speech (TTS)

Prerequisites for Building an AI Telephony Agent

Here we are using the realtime pipeline, if you want to use cascading pipleine then follow this guide: https://docs.videosdk.live/ai_agents/core-components/cascading-pipeline

Before starting, make sure you have the following:

  1. Python 3.12+ installed on your machine.
  2. VideoSDK account to get your VIDEOSDK_TOKEN.
  3. API keys:
    • Google Api keys for realtime pipeline
  4. Basic understanding of Python (functions, classes, async programming).

Tip for beginners: Store your API keys in a .env file to keep them secure.


Project Setup

Create a project folder with the following structure:

├── main.py          # Core logic of your AI agent
├── requirements.txt # Python dependencies
└── .env             # Store your API keys
Enter fullscreen mode Exit fullscreen mode

.env file example:

VIDEOSDK_TOKEN="your_videosdk_token_here"
GOOGLE_API_KEY="your_google_api_key_here"
Enter fullscreen mode Exit fullscreen mode

requirements.txt example:

videosdk-agents==0.0.32
videosdk-plugins-google==0.0.32
python-dotenv==1.1.1
requests==2.31.
Enter fullscreen mode Exit fullscreen mode

These libraries will handle speech recognition, AI responses, and voice synthesis.


Step 1: Writing Your AI Telephony Agent

Here’s a beginner-friendly Python script for your AI agent:

import asyncio
import traceback
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, RoomOptions, WorkerJob, Options
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv
import os
import logging
logging.basicConfig(level=logging.INFO)

load_dotenv()

# Define the agent's behavior and personality
class MyVoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful AI assistant that answers phone calls. Keep your responses concise and friendly.",
        )

    async def on_enter(self) -> None:
        await self.session.say("Hello! I'm your real-time assistant. How can I help you today?")

    async def on_exit(self) -> None:
        await self.session.say("Goodbye! It was great talking with you!")

async def start_session(context: JobContext):
    # Configure the Gemini model for real-time voice
    model = GeminiRealtime(
        api_key=os.getenv("GOOGLE_API_KEY"),
        config=GeminiLiveConfig(
            voice="Leda",
            response_modalities=["AUDIO"]
        )
    )
    pipeline = RealTimePipeline(model=model)
    session = AgentSession(agent=MyVoiceAgent(), pipeline=pipeline)

    try:
        await context.connect()
        await session.start()
        await asyncio.Event().wait()
    finally:
        await session.close()
        await context.shutdown()

if __name__ == "__main__":
    try:
        # Register the agent with a unique ID
        options = Options(
            agent_id="MyTelephonyAgent",  # CRITICAL: Unique identifier for routing
            register=True,               # REQUIRED: Register with VideoSDK for telephony
            max_processes=10,            # Concurrent calls to handle
            host="localhost",
            port=8081,
            )
        job = WorkerJob(entrypoint=start_session, options=options)
        job.start()
    except Exception as e:
        traceback.print_exc()
Enter fullscreen mode Exit fullscreen mode

Step 2: Running Your AI Agent Locally

# Create a virtual environment
python3 -m venv .venv

# Activate virtual environment
source .venv/bin/activate  # macOS/Linux
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Run the AI agent
python main.py
Enter fullscreen mode Exit fullscreen mode

Keep this terminal open your agent must stay active to answer calls.


Step 3: Connect Your AI Agent to the Phone Network

1. Configure an Inbound Gateway

Purchase a Number and Create a SIP Trunk in Twilio

  • Log in to your Twilio Console.
  • Purchase a phone number if you don't already have one.
  • Create a new SIP Trunk in the Twilio Voice section.

Configure Inbound Gateway in VideoSDK

  • Open the VideoSDK Dashboard.
  • Go to Telephony > Inbound Gateway.

Inbound gateway videosdk dashboard

  • Click Add Gateway and enter your Twilio number to create an inbound gateway.

Inbound gateway videosdk dashboard

  • After creation, you will see an Inbound Gateway URI (e.g., sip:your-org-id.sip.videosdk.live). Copy this URI.

Configure Twilio SIP Trunk Origination

  • In your Twilio SIP Trunk, go to the Origination section.
  • Add the copied Inbound Gateway URI as the Origination target.
  • Save your changes.

Twilio origination uri

2. Configure an Outbound gateway

Configure Twilio SIP Trunk Termination

  • In your Twilio SIP Trunk, go to the Termination section.
  • Set up the Termination SIP URI (the address VideoSDK will use for outbound calls).

twilio termination sip trunk uri

  • Add allowed IP addresses and set up authentication credentials (username and password) for the trunk.

twilio sip uri credentials

Configure Outbound Gateway in VideoSDK

  • In the VideoSDK Dashboard, go to Telephony > Outbound Gateway.
  • Click Add Gateway and enter the Twilio Termination URI and authentication credentials.

outbound gateway videosdk dashboard

  • Save the gateway.

Add routing rules

  • Go to Telephony > Routing Rules and click Add.

videosdk telephonic agent routing rules

  • Configure the rule:
    • Gateway: Select the Inbound/outbound Gateway you just created.
    • Numbers: Add the phone number associated with the gateway.
    • Dispatch: Choose Agent.
    • Agent Type: Set to Self Hosted.
    • Agent ID: Enter MyTelephonyAgent. This must match the agent_id in your main.py file.

videosdk ai agent routing rules

  • Click Create to save the rule.

Initiate an Outbound Call

Once your outbound gateway is configured, you can initiate outbound calls using the VideoSDK API. This allows you to dial out from a meeting to any external phone number or SIP endpoint.

Use the following API to trigger an outbound call: Parameters:

  • gatewayId: ID of the outbound SIP gateway to use.
  • sipCallTo: The destination phone number (E.164 format).
  • destinationRoomId: (Optional) The room ID to connect the call to.

POST request to https://api.videosdk.live/v2/sip/call

NodeJS


import fetch from 'node-fetch';
const options = {
    method: "POST",
    headers: {
        "Authorization": "$YOUR_TOKEN",
        "Content-Type": "application/json",
    },
    body: JSON.stringify({
        "gatewayId" : "gw_123456789",
        "sipCallTo" : "+14155550123"
    }),
};
const url= `https://api.videosdk.live/v2/sip/call`;
const response = await fetch(url, options);
const data = await response.json();
console.log(data);

Enter fullscreen mode Exit fullscreen mode

RESPONSE

{
"message": "Call initiated successfully",
"data": {
"callId": "call_123456789",
"status": "INITIATED",
"roomId": "room_123456",
"sipCallTo": "+14155550123",
"sipCallFrom": "+14155559876",
"gatewayId": "gw_123456789",
"metadata": {
"campaignId": "cmp_123",
"source": "crm"
},
"timelog": [
{
"status": "INITIATED",
"timestamp": "2025-08-21T11:45:00.000Z"
}
]
}
}

Conclusion

You’ve just learned how to build a complete AI telephony agent in Python using VideoSDK from writing your first script to running it locally and connecting it to the global phone network. With the help of realtime pipeline, your agent can now answer inbound calls, make outbound calls, and interact with users in real time.

  • 🚀 Try it yourself: Clone this setup and customize your own AI voice agent today.
  • 📚 Explore more: Check out the VideoSDK documentation for more features.
  • 💡 Build smarter assistants: Experiment with different voices, languages, and AI models to create a unique experience.
  • Resources: https://youtu.be/WgEvRs0zqcI?si=qWKudU-qYIVEYXeo check this video for more clarity

💡 We’d love to hear from you!

  • Did you manage to set up your first AI telephony agent in Python?
  • What challenges did you face while integrating with SIP providers like Twilio?

👉 Share your thoughts, roadblocks, or success stories in the comments or join our Discord community ↗ . We’re excited to learn from your journey and help you build even better AI-powered communication tools!

Top comments (0)