VideoSDK makes it extremely simple for developers to build real-time conversational AI agents that run over any communication channel including web, mobile, telephony, and now WhatsApp voice calls.
With VideoSDK’s SIP Gateway, you can connect WhatsApp calls directly into your AI agent without managing telephony infrastructure, media servers, SIP stacks, codecs, or real-time streaming pipelines. VideoSDK handles everything end-to-end so you can focus on your conversation logic.
This guide walks you through how to build a WhatsApp AI Voice Agent powered by VideoSDK, where all call processing, audio streaming, routing, and agent execution happens seamlessly inside the VideoSDK platform.
What You Can Build With VideoSDK SIP Gateway
Using VideoSDK’s Agent SDK + SIP Gateway, you can build:
- AI customer support agents
- Appointment-booking assistants
- Product recommendation bots
- Voice-driven automation
- Multi-turn conversational agents
- Custom IVR logic, decision trees, or LLM-driven flows
All of these run in real time with millisecond-level audio streaming latency.
How VideoSDK Handles a WhatsApp Voice Call
When a WhatsApp user initiates a call, the VideoSDK platform handles the entire pipeline:
- The call is forwarded via SIP from the Meta Business Platform.
- VideoSDK SIP Gateway receives the call and negotiates media.
- VideoSDK applies your configured Routing Rules.
- Your VideoSDK AI Agent is spun up or assigned automatically.
- The Agent receives real-time audio and processes it using STT → LLM → TTS.
- VideoSDK streams audio back to the caller with ultra-low latency.
Prerequisites
To let VideoSDK receive WhatsApp calls, you must configure SIP forwarding on the Meta platform.
This is a one-time setup and requires:
- Meta Business Manager
- WhatsApp Business Account (WABA)
- A verified phone number
-
Meta Developer App with
whatsapp_business_managementpermission - A permanent user access token
Once SIP forwarding is enabled, VideoSDK becomes the call destination for your WhatsApp number.
Integrating inbound/outbound WhatsApp calls requires updating your number's settings via the Meta Graph API. This guide covers the process in Part 3: Enable WhatsApp SIP Forwarding. For a deeper understanding of the API, refer to the official Meta Graph API overview.
Part 1: Build and Run Your Custom Voice Agent
Step 1: Project Setup
Create a dedicated directory for your AI agent project and add the following files:
your-agent/
├── .env # Stores your API keys
├── requirements.txt # Lists Python dependencies
└── main.py # Your agent logic
This structure keeps your configuration clean and your code easy to manage as the agent grows.
Step 2: Add Credentials & Dependencies
1. Add Credentials
Inside your .env file, add your API keys:
VIDEOSDK_AUTH_TOKEN="your_videosdk_token_here"
GOOGLE_API_KEY="your_google_api_key_here"
- VideoSDK Auth Token : get it from your VideoSDK dashboard
- Google API Key : required for Gemini Realtime STT/LLM/TTS (if using Google plugin)
2. Install Dependencies
Add the required dependencies to requirements.txt:
videosdk-agents==0.0.45
videosdk-plugins-google==0.0.45
python-dotenv==1.1.1
Step 3: Create Your AI Agent Logic ( below code is realtime implementation )
if you want to configure stt, llm and tts providers differently use cascading pipeline instead of realtime pipeline :
import asyncio, os, traceback, logging
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, WorkerJob, Options
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv
logging.basicConfig(level=logging.INFO)
load_dotenv()
# Define the agent's behavior and personality
class MyWhatsappAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a friendly and helpful assistant answering WhatsApp calls. Keep your responses concise and clear.",
)
async def on_enter(self) -> None:
await self.session.say("Hello! You've reached the VideoSDK assistant. How can I help you today?")
async def on_exit(self) -> None:
await self.session.say("Thank you for calling. Goodbye!")
async def start_session(context: JobContext):
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
api_key=os.getenv("GOOGLE_API_KEY"),
config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"])
)
pipeline = RealTimePipeline(model=model)
session = AgentSession(agent=MyWhatsappAgent(), pipeline=pipeline)
try:
await context.connect()
await session.start()
await asyncio.Event().wait()
finally:
await session.close()
await context.shutdown()
if __name__ == "__main__":
try:
options = Options(
agent_id="agent1", # CRITICAL: Unique ID for routing
register=True, # REQUIRED: Register with VideoSDK for telephony
max_processes=10,
)
job = WorkerJob(entrypoint=start_session, options=options)
job.start()
except Exception as e:
traceback.print_exc()
Step 4 : Run the agent
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install packages
pip install -r requirements.txt
# Run the agent
python main.py
Part 2: Configure VideoSDK Gateways and Routing
1. Configure an Inbound Gateway
Purchase a Number and Create a SIP Trunk in Twilio
- Log in to your Twilio Console.
- Purchase a phone number if you don't already have one.
- Create a new SIP Trunk in the Twilio Voice section.
Configure Inbound Gateway in VideoSDK
- Open the VideoSDK Dashboard.
- Go to Telephony > Inbound Gateway.
- Click Add Gateway and enter your Twilio number to create an inbound gateway.
- After creation, you will see an Inbound Gateway URI (e.g.,
sip:your-org-id.sip.videosdk.live). Copy this URI.
Configure Twilio SIP Trunk Origination
- In your Twilio SIP Trunk, go to the Origination section.
- Add the copied Inbound Gateway URI as the Origination target.
- Save your changes.
2. Configure an Outbound gateway
Configure Twilio SIP Trunk Termination
- In your Twilio SIP Trunk, go to the Termination section.
Set up the Termination SIP URI (the address VideoSDK will use for outbound calls).
Add allowed IP addresses and set up authentication credentials (username and password) for the trunk.
Configure Outbound Gateway in VideoSDK
- In the VideoSDK Dashboard, go to Telephony > Outbound Gateway.
- Click Add Gateway and enter the Twilio Termination URI and authentication credentials.
- Save the gateway.
Add routing rules
- Go to Telephony > Routing Rules and click Add.
-
Configure the rule:
- Gateway: Select the Inbound/outbound Gateway you just created.
- Numbers: Add the phone number associated with the gateway.
- Dispatch: Choose Agent.
-
Agent Type: Set to
Self Hosted. -
Agent ID: Enter
MyTelephonyAgent. This must match theagent_idin yourmain.pyfile.
Click Create to save the rule.
Part 3: Enable WhatsApp SIP Forwarding
Now, we'll instruct Meta to forward incoming WhatsApp calls to your VideoSDK Inbound Gateway. This is done via the Meta Graph API.
Step 1: API Request
Use the following curl command to update your WhatsApp phone number's settings
curl --location 'https://graph.facebook.com/v19.0/{{phone_number_id}}/settings' \
--header 'Authorization: Bearer {{access_token}}' \
--header 'Content-Type: application/json' \
--data '{ "calling": { "status": "ENABLED", "sip": { "status": "ENABLED", "servers": [ { "hostname": "9WXXXXXXX.sip.videosdk.live" } ] }, "srtp_key_exchange_protocol": "DTLS" } }'
Replace the placeholders:
-
{{phone_number_id}}: Your WhatsApp Business Phone Number ID from the Meta dashboard. -
{{access_token}}: A valid User or System User access token withwhatsapp_business_managementpermission.
Time to Talk! Test Your Agent
Keep Your Agent Running
Make sure your main.py script is still running locally before making or receiving calls. The agent must be active to handle any communication.
Receive an Inbound Call
- Ensure your
main.pyscript is still running locally. - Using a different WhatsApp account, place a voice call to your WhatsApp Business number.
- Your local agent will answer, and you'll hear its greeting. Start a conversation!
Make an Outbound Call
To have your agent initiate a call to a WhatsApp number, use the VideoSDK SIP Call API.
curl --request POST \
--url https://api.videosdk.live/v2/sip/call \
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
--header 'Content-Type: application/json' \
--data '{ "gatewayId": "your_outbound_gateway_id", "sipCallTo": "whatsapp_number_to_call" }'
This commands your agent to dial out through your configured outbound gateway.
You’ve now seen how to build an AI-powered WhatsApp Voice Agent using VideoSDK—from setting up your Python agent locally to connecting it with real WhatsApp phone numbers through VideoSDK’s SIP Gateway. With the Realtime Pipeline doing the heavy lifting, your agent can answer WhatsApp calls instantly, process live audio with STT → LLM → TTS, and deliver natural, low-latency conversations without any telephony infrastructure on your end.
- Try it yourself: Clone this setup and customize your own AI voice agent today.
- Explore more: Check out the VideoSDK documentation for more features.
- Build smarter assistants: Experiment with different voices, languages, and AI models to create a unique experience.
- Resources: https://youtu.be/KWfCWE8S_4U?si=f08FfapQkVCfrlGh check this video for more clarity
We’d love to hear from you!
- Did you manage to set up your first AI Whatsapp agent in Python?
- What challenges did you face while integrating with SIP providers like Twilio?
👉 Share your thoughts, roadblocks, or success stories in the comments or join our Discord community ↗ . We’re excited to learn from your journey and help you build even better AI-powered communication tools!






Top comments (0)