How to enable DTMF Events in Telephony AI Agent

#ai #api #tutorial

Not every caller wants to speak to a voice agent. In many call scenarios, users expect to press a key to make a selection, confirm an action, or move forward in a call flow. This is especially common in menu-based systems, short responses, or situations where speech recognition may not be reliable

DTMF (Dual-Tone Multi-Frequency) input gives voice agents a clear and predictable way to handle these interactions. When a caller presses a key on their phone, the agent receives that input instantly and can use it to control the call flow or trigger application logic.

In this post, we’ll explore how DTMF events can be used in a VideoSDK-powered voice agent, starting from common interaction patterns and moving into how the system processes keypad input in real time.

Typical Interaction Patterns Using DTMF

DTMF input is commonly used at decision points in a call, such as:

Selecting options from a call menu
Confirming or canceling an action
Providing short numeric input
Navigating between steps in a call flow

These interactions are simple, fast, and familiar to callers, which makes them a good fit for structured voice experiences.

How It Works

DTMF Event Detection: The agent detects key presses (0–9, *, #) from the caller during a call session.
Real-Time Processing: Each key press generates a DTMF event that is delivered to the agent immediately.
Callback Integration: A user-defined callback function handles incoming DTMF events.
Action Execution: The agent executes actions or triggers workflows based on the received DTMF input like building IVR flows, collecting user input, or triggering actions in your application.

Step 1 : Enabling DTMF Events

DTMF event detection can be enabled in two ways:

Via Dashboard:

When creating or editing a SIP gateway in the VideoSDK dashboard, enable the DTMF option.

Via API:
Set the enableDtmf parameter to true when creating or updating a SIP gateway using the API.

curl    -H 'Authorization: $YOUR_TOKEN' \ 
  -H 'Content-Type: application/json' \ 
  -d '{
    "name" : "Twilio Inbound Gateway",
    "enableDtmf" : "true",
    "numbers" : ["+0123456789"]

  }' \ 
  -XPOST https://api.videosdk.live/v2/sip/inbound-gateways

Once enabled, DTMF events will be detected and published for all calls routed through that gateway.

Step 2 . Implementation

To set up inbound calls, outbound calls, and routing rules check out the Quick Start Example.

from videosdk.agents import AgentSession, DTMFHandler

async def entrypoint(ctx: JobContext):

    async def dtmf_callback(digit: int):
        if digit == 1:
            agent.instructions = "You are a Sales Representative. Your goal is to sell our products"
            await agent.session.say(
                "Routing you to Sales. Hi, I'm from Sales. How can I help you today?"
            )
        elif digit == 2:
            agent.instructions = "You are a Support Specialist. Your goal is to help customers with technical issues."
            await agent.session.say(
                "Routing you to Support. Hi, I'm from Support. What issue are you facing?"
            )
        else:
            await agent.session.say(
                "Invalid input. Press 1 for Sales or 2 for Support."
            )

    dtmf_handler = DTMFHandler(dtmf_callback)

    session = AgentSession(
        dtmf_handler = dtmf_handler,
    )

Full Working Example

import logging
from videosdk.agents import Agent, AgentSession, CascadingPipeline,WorkerJob,ConversationFlow, JobContext, RoomOptions, Options,DTMFHandler
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", handlers=[logging.StreamHandler()])
pre_download_model()
class VoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful voice assistant that can answer questions."
        )
    async def on_enter(self) -> None:
        await self.session.say("Hello, how can I help you today?")

    async def on_exit(self) -> None:
        await self.session.say("Goodbye!")

async def entrypoint(ctx: JobContext):

    agent = VoiceAgent()
    conversation_flow = ConversationFlow(agent)

    pipeline=CascadingPipeline(
        stt=DeepgramSTT(),
        llm=OpenAILLM(),
        tts=ElevenLabsTTS(),
        vad=SileroVAD(),
        turn_detector=TurnDetector()
    )

    async def dtmf_callback(message):
        print("DTMF message received:", message)

    dtmf_handler = DTMFHandler(dtmf_callback)

    session = AgentSession(
        agent=agent, 
        pipeline=pipeline,
        conversation_flow=conversation_flow,
        dtmf_handler = dtmf_handler,
    )

    await session.start(wait_for_participant=True, run_until_shutdown=True)

def make_context() -> JobContext:
    room_options = RoomOptions(name="DTMF Agent Test", playground=True)
    return JobContext(room_options=room_options) 

if __name__ == "__main__":
    job = WorkerJob(entrypoint=entrypoint, jobctx=make_context, options=Options(agent_id="YOUR_AGENT_ID", max_processes=2, register=True, host="localhost", port=8081))
    job.start()

By enabling DTMF detection and handling events at the agent level, you can build predictable call flows, guide users through menus, and trigger application logic without interrupting the call experience. When combined with voice input, DTMF gives you more control over how users interact with your agent.

This makes DTMF a practical addition to any voice agent that needs clear, deterministic user input during a call.

Resources and Next Steps

Explore the dtmf-implementation on github.
To set up inbound calls, outbound calls, and routing rules check out the Quick Start Example.
Learn how to deploy your AI Agents.
Explore more: Check out the VideoSDK documentation for more features.
👉 Share your thoughts, roadblocks, or success stories in the comments or join our Discord community ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!