DEV Community

Cover image for How to Build an AI WhatsApp Voice Agent with VideoSDK: Step-by-Step Guide
Chaitrali Kakde
Chaitrali Kakde

Posted on

How to Build an AI WhatsApp Voice Agent with VideoSDK: Step-by-Step Guide

VideoSDK makes it extremely simple for developers to build real-time conversational AI agents that run over any communication channel including web, mobile, telephony, and now WhatsApp voice calls.

With VideoSDK’s SIP Gateway, you can connect WhatsApp calls directly into your AI agent without managing telephony infrastructure, media servers, SIP stacks, codecs, or real-time streaming pipelines. VideoSDK handles everything end-to-end so you can focus on your conversation logic.

This guide walks you through how to build a WhatsApp AI Voice Agent powered by VideoSDK, where all call processing, audio streaming, routing, and agent execution happens seamlessly inside the VideoSDK platform.

What You Can Build With VideoSDK SIP Gateway

Using VideoSDK’s Agent SDK + SIP Gateway, you can build:

  • AI customer support agents
  • Appointment-booking assistants
  • Product recommendation bots
  • Voice-driven automation
  • Multi-turn conversational agents
  • Custom IVR logic, decision trees, or LLM-driven flows

All of these run in real time with millisecond-level audio streaming latency.

How VideoSDK Handles a WhatsApp Voice Call

When a WhatsApp user initiates a call, the VideoSDK platform handles the entire pipeline:

whatsapp-architecture

  1. The call is forwarded via SIP from the Meta Business Platform.
  2. VideoSDK SIP Gateway receives the call and negotiates media.
  3. VideoSDK applies your configured Routing Rules.
  4. Your VideoSDK AI Agent is spun up or assigned automatically.
  5. The Agent receives real-time audio and processes it using STT → LLM → TTS.
  6. VideoSDK streams audio back to the caller with ultra-low latency.

Prerequisites

To let VideoSDK receive WhatsApp calls, you must configure SIP forwarding on the Meta platform.

This is a one-time setup and requires:

Once SIP forwarding is enabled, VideoSDK becomes the call destination for your WhatsApp number.

Integrating inbound/outbound WhatsApp calls requires updating your number's settings via the Meta Graph API. This guide covers the process in Part 3: Enable WhatsApp SIP Forwarding. For a deeper understanding of the API, refer to the official Meta Graph API overview.

Part 1: Build and Run Your Custom Voice Agent

Step 1: Project Setup

Create a dedicated directory for your AI agent project and add the following files:

your-agent/
 ├── .env                  # Stores your API keys
 ├── requirements.txt      # Lists Python dependencies
 └── main.py               # Your agent logic
Enter fullscreen mode Exit fullscreen mode

This structure keeps your configuration clean and your code easy to manage as the agent grows.

Step 2: Add Credentials & Dependencies

1. Add Credentials

Inside your .env file, add your API keys:

VIDEOSDK_AUTH_TOKEN="your_videosdk_token_here"
GOOGLE_API_KEY="your_google_api_key_here"
Enter fullscreen mode Exit fullscreen mode

2. Install Dependencies

Add the required dependencies to requirements.txt:

videosdk-agents==0.0.45
videosdk-plugins-google==0.0.45
python-dotenv==1.1.1
Enter fullscreen mode Exit fullscreen mode

Step 3: Create Your AI Agent Logic ( below code is realtime implementation )

if you want to configure stt, llm and tts providers differently use cascading pipeline instead of realtime pipeline :

import asyncio, os, traceback, logging
from videosdk.agents import Agent, AgentSession, RealTimePipeline, JobContext, WorkerJob, Options
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
from dotenv import load_dotenv

logging.basicConfig(level=logging.INFO)
load_dotenv()

# Define the agent's behavior and personality
class MyWhatsappAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a friendly and helpful assistant answering WhatsApp calls. Keep your responses concise and clear.",
        )
    async def on_enter(self) -> None:
        await self.session.say("Hello! You've reached the VideoSDK assistant. How can I help you today?")
    async def on_exit(self) -> None:
        await self.session.say("Thank you for calling. Goodbye!")

async def start_session(context: JobContext):

    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        api_key=os.getenv("GOOGLE_API_KEY"),
        config=GeminiLiveConfig(voice="Leda", response_modalities=["AUDIO"])
    )

    pipeline = RealTimePipeline(model=model)
    session = AgentSession(agent=MyWhatsappAgent(), pipeline=pipeline)

    try:
        await context.connect()
        await session.start()
        await asyncio.Event().wait()
    finally:
        await session.close()
        await context.shutdown()

if __name__ == "__main__":
    try:
        options = Options(
            agent_id="agent1",  # CRITICAL: Unique ID for routing
            register=True,      # REQUIRED: Register with VideoSDK for telephony
            max_processes=10,
        )
        job = WorkerJob(entrypoint=start_session, options=options)
        job.start()
    except Exception as e:
        traceback.print_exc()
Enter fullscreen mode Exit fullscreen mode

Step 4 : Run the agent

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install packages
pip install -r requirements.txt

# Run the agent
python main.py
Enter fullscreen mode Exit fullscreen mode

Part 2: Configure VideoSDK Gateways and Routing

1. Configure an Inbound Gateway

Purchase a Number and Create a SIP Trunk in Twilio

  • Log in to your Twilio Console.
  • Purchase a phone number if you don't already have one.
  • Create a new SIP Trunk in the Twilio Voice section.

Configure Inbound Gateway in VideoSDK

  • Open the VideoSDK Dashboard.
  • Go to Telephony > Inbound Gateway.

Inbound gateway videosdk dashboard

  • Click Add Gateway and enter your Twilio number to create an inbound gateway.

Inbound gateway videosdk dashboard

  • After creation, you will see an Inbound Gateway URI (e.g., sip:your-org-id.sip.videosdk.live). Copy this URI.

Configure Twilio SIP Trunk Origination

  • In your Twilio SIP Trunk, go to the Origination section.
  • Add the copied Inbound Gateway URI as the Origination target.
  • Save your changes.

2. Configure an Outbound gateway

Configure Twilio SIP Trunk Termination

  • In your Twilio SIP Trunk, go to the Termination section.
  • Set up the Termination SIP URI (the address VideoSDK will use for outbound calls).

  • Add allowed IP addresses and set up authentication credentials (username and password) for the trunk.

twilio sip uri credentials

Configure Outbound Gateway in VideoSDK

  • In the VideoSDK Dashboard, go to Telephony > Outbound Gateway.
  • Click Add Gateway and enter the Twilio Termination URI and authentication credentials.

outbound gateway videosdk dashboard

  • Save the gateway.

Add routing rules

  • Go to Telephony > Routing Rules and click Add.

  • Configure the rule:

    • Gateway: Select the Inbound/outbound Gateway you just created.
    • Numbers: Add the phone number associated with the gateway.
    • Dispatch: Choose Agent.
    • Agent Type: Set to Self Hosted.
    • Agent ID: Enter MyTelephonyAgent. This must match the agent_id in your main.py file.
  • Click Create to save the rule.

Part 3: Enable WhatsApp SIP Forwarding

Now, we'll instruct Meta to forward incoming WhatsApp calls to your VideoSDK Inbound Gateway. This is done via the Meta Graph API.

Step 1: API Request

Use the following curl command to update your WhatsApp phone number's settings

curl --location 'https://graph.facebook.com/v19.0/{{phone_number_id}}/settings' \
--header 'Authorization: Bearer {{access_token}}' \
--header 'Content-Type: application/json' \
--data '{ "calling": { "status": "ENABLED", "sip": { "status": "ENABLED", "servers": [ { "hostname": "9WXXXXXXX.sip.videosdk.live" } ] }, "srtp_key_exchange_protocol": "DTLS" } }'
Enter fullscreen mode Exit fullscreen mode

Replace the placeholders:

  • {{phone_number_id}}: Your WhatsApp Business Phone Number ID from the Meta dashboard.
  • {{access_token}}: A valid User or System User access token with whatsapp_business_management permission.

Time to Talk! Test Your Agent

Keep Your Agent Running

Make sure your main.py script is still running locally before making or receiving calls. The agent must be active to handle any communication.

Receive an Inbound Call

  1. Ensure your main.py script is still running locally.
  2. Using a different WhatsApp account, place a voice call to your WhatsApp Business number.
  3. Your local agent will answer, and you'll hear its greeting. Start a conversation!

Make an Outbound Call

To have your agent initiate a call to a WhatsApp number, use the VideoSDK SIP Call API.

curl --request POST \
--url https://api.videosdk.live/v2/sip/call \
--header 'Authorization: YOUR_VIDEOSDK_TOKEN' \
--header 'Content-Type: application/json' \
--data '{ "gatewayId": "your_outbound_gateway_id", "sipCallTo": "whatsapp_number_to_call" }'
Enter fullscreen mode Exit fullscreen mode

This commands your agent to dial out through your configured outbound gateway.

You’ve now seen how to build an AI-powered WhatsApp Voice Agent using VideoSDK—from setting up your Python agent locally to connecting it with real WhatsApp phone numbers through VideoSDK’s SIP Gateway. With the Realtime Pipeline doing the heavy lifting, your agent can answer WhatsApp calls instantly, process live audio with STT → LLM → TTS, and deliver natural, low-latency conversations without any telephony infrastructure on your end.

  • Try it yourself: Clone this setup and customize your own AI voice agent today.
  • Explore more: Check out the VideoSDK documentation for more features.
  • Build smarter assistants: Experiment with different voices, languages, and AI models to create a unique experience.
  • Resources: https://youtu.be/KWfCWE8S_4U?si=f08FfapQkVCfrlGh check this video for more clarity

We’d love to hear from you!

  • Did you manage to set up your first AI Whatsapp agent in Python?
  • What challenges did you face while integrating with SIP providers like Twilio?

👉 Share your thoughts, roadblocks, or success stories in the comments or join our Discord community ↗ . We’re excited to learn from your journey and help you build even better AI-powered communication tools!

Top comments (0)