<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Chaitrali Kakde</title>
    <description>The latest articles on DEV Community by Chaitrali Kakde (@chaitrali_kakde).</description>
    <link>https://dev.to/chaitrali_kakde</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3526274%2F4eedffba-502b-483d-b9d2-d0ea6a83cc45.jpg</url>
      <title>DEV Community: Chaitrali Kakde</title>
      <link>https://dev.to/chaitrali_kakde</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chaitrali_kakde"/>
    <language>en</language>
    <item>
      <title>Build AI Voice Agent with Gemini 3.1 Flash Live and VideoSDK</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Tue, 31 Mar 2026 13:20:13 +0000</pubDate>
      <link>https://dev.to/video-sdk/build-ai-voice-agent-with-gemini-31-flash-live-and-videosdk-12ep</link>
      <guid>https://dev.to/video-sdk/build-ai-voice-agent-with-gemini-31-flash-live-and-videosdk-12ep</guid>
      <description>&lt;p&gt;Google just launched Gemini 3.1 Flash Live Preview its most capable real-time voice and audio model yet. If you're building AI voice agents, conversational apps, or anything that needs low-latency audio intelligence, this model is a big deal. And with VideoSDK's Python SDK, plugging it into your app takes just a few minutes.&lt;/p&gt;

&lt;p&gt;In this blog, we'll walk through what the new model can do, and then build a working voice agent step by step using VideoSDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's New in Gemini 3.1 Flash Live Preview
&lt;/h2&gt;

&lt;p&gt;Google describes this as its "highest-quality audio and voice model yet," and there are a few things that actually back that up.&lt;/p&gt;

&lt;p&gt;It's built for real-time, audio-first experiences. Unlike models that convert speech to text and then process it, Gemini 3.1 Flash Live works audio-to-audio meaning it hears you and responds as audio, keeping the conversation feeling natural and fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here's what stands out:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Lower latency than before. Compared to 2.5 Flash Native Audio, this model is noticeably faster. Fewer awkward pauses, snappier responses. That matters a lot when you're building voice agents where delays break the experience.&lt;/li&gt;
&lt;li&gt;It actually understands how you say things. The model picks up on acoustic nuances, pitch, pace, tone. So it can tell when you're asking a casual question vs. when you sound urgent or confused.&lt;/li&gt;
&lt;li&gt;Better background noise handling. It filters out noise more effectively, which means it works in real environments, not just quiet studios.&lt;/li&gt;
&lt;li&gt;Multilingual out of the box. Over 90 languages supported for real-time conversations.&lt;/li&gt;
&lt;li&gt;Longer conversation memory. It can follow the thread of a conversation for twice as long as the previous generation. So your agent won't "forget" what was said earlier in a long session.&lt;/li&gt;
&lt;li&gt;Tool use during live conversations. This one is huge for agent builders. The model can now trigger external tools (APIs, functions, searches) while a live conversation is happening not just at the end of a turn.&lt;/li&gt;
&lt;li&gt;Multimodal awareness. It handles audio and video inputs together, so you can build agents that respond to what they see and hear at the same time.&lt;/li&gt;
&lt;li&gt;The model ID is: gemini-3.1-flash-live-preview&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building a Voice Agent with VideoSDK
&lt;/h2&gt;

&lt;p&gt;VideoSDK gives you everything you need to wire Gemini 3.1 Flash Live into a real voice application. Here's how to get set up from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 : Create and Activate a Python Virtual Environment
&lt;/h2&gt;

&lt;p&gt;First, create a clean Python environment so your project dependencies stay isolated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python3&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="n"&gt;venv&lt;/span&gt; &lt;span class="n"&gt;venv&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Activate it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS/Linux&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="n"&gt;venv&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nb"&gt;bin&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;activate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Windows&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;venv&lt;/span&gt;\&lt;span class="n"&gt;Scripts&lt;/span&gt;\&lt;span class="n"&gt;activate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see (venv) in your terminal, which means you're good to go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 : Set Up Your Environment Variables
&lt;/h2&gt;

&lt;p&gt;Create a .env file in your project root and add your API keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;VIDEOSDK_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your_videosdk_token_here&lt;/span&gt;
&lt;span class="n"&gt;GOOGLE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your_google_api_key_here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can get your VideoSDK auth token from the VideoSDK dashboard and your Google API key from Google AI Studio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; when &lt;code&gt;GOOGLE_API_KEY&lt;/code&gt; is set in your .env file, do not pass &lt;code&gt;api_key&lt;/code&gt; as a parameter in your code the SDK picks it up automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 : Install the Required Packages
&lt;/h2&gt;

&lt;p&gt;Install VideoSDK's agents SDK along with the Google plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;videosdk-agents[google]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4 : Create Your Agent (main.py)
&lt;/h2&gt;

&lt;p&gt;Create a file called main.py in your project folder and paste in the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WorkerJob&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GeminiRealtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GeminiLiveConfig&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%(asctime)s - %(name)s - %(levelname)s - %(message)s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyVoiceAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You Are VideoSDK&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Voice Agent.You are a helpful voice assistant that can answer questions and help with tasks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_enter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, how can I help you today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goodbye!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MyVoiceAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GeminiRealtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-live-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
&lt;/span&gt;        &lt;span class="c1"&gt;# api_key="AIXXXXXXXXXXXXXXXXXXXX", 
&lt;/span&gt;        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;GeminiLiveConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Leda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
&lt;/span&gt;            &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_for_participant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_until_shutdown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;room_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="c1"&gt;# room_id="&amp;lt;room_id&amp;gt;", # Replace it with your actual room_id
&lt;/span&gt;        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gemini Realtime Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;playground&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WorkerJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entrypoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  To run the agent:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you run this command, a playground URL will appear in your terminal. You can use this URL to interact with your AI agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Can You Build With This?
&lt;/h2&gt;

&lt;p&gt;Gemini 3.1 Flash Live + VideoSDK opens up a pretty wide range of real-world use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer support voice bots. Replace or supplement your call center with agents that actually understand tone and can handle multilingual customers in real time.&lt;/li&gt;
&lt;li&gt;AI meeting assistants. Agents that join calls, take notes, answer questions from participants, and trigger follow-up actions mid-conversation.&lt;/li&gt;
&lt;li&gt;Healthcare intake agents. Voice-based triage agents that collect patient information, ask follow-up questions, and route to the right department all in a natural spoken conversation.&lt;/li&gt;
&lt;li&gt;Language tutors. Real-time conversation partners that catch pronunciation issues, adjust their pace based on the learner, and respond naturally.&lt;/li&gt;
&lt;li&gt;Voice-controlled IoT and home automation. Agents that listen continuously, understand context, and trigger device actions through tool use all in sub-second response times.&lt;/li&gt;
&lt;li&gt;Live interview prep tools. Candidates practice answering questions aloud and get spoken feedback instantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Gemini 3.1 Flash Live Preview is a meaningful step forward for real-time voice AI. The improvements in latency, noise handling, multilingual support, and especially live tool use make it a strong foundation for production voice agents.&lt;/p&gt;

&lt;p&gt;VideoSDK wraps all of that into a clean Python SDK that gets you from zero to a running agent in a handful of lines. Whether you're prototyping or building something you intend to ship, the setup here gives you a solid starting point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps and Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Check &lt;a href="https://docs.videosdk.live/ai_agents/plugins/realtime/google-live-api" rel="noopener noreferrer"&gt;Gemini3.1 implementation docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://docs.videosdk.live/ai_agents/deployments/introduction" rel="noopener noreferrer"&gt;deploy your agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;Discord community &lt;/a&gt;↗. We’re excited to learn from your journey and help you build even better AI-powered communication&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Google Gemini 3.1 Flash Live is more powerful than you think</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Tue, 31 Mar 2026 10:27:49 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/building-real-time-ai-voice-agents-with-google-gemini-31-flash-live-and-videosdk-26f5</link>
      <guid>https://dev.to/chaitrali_kakde/building-real-time-ai-voice-agents-with-google-gemini-31-flash-live-and-videosdk-26f5</guid>
      <description>&lt;p&gt;Google just launched Gemini 3.1 Flash Live Preview its most capable real-time voice and audio model yet. If you're building AI voice agents, conversational apps, or anything that needs low-latency audio intelligence, this model is a big deal. And with VideoSDK's Python SDK, plugging it into your app takes just a few minutes.&lt;/p&gt;

&lt;p&gt;In this blog, we'll walk through what the new model can do, and then build a working voice agent step by step using VideoSDK.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's New in Gemini 3.1 Flash Live Preview
&lt;/h2&gt;

&lt;p&gt;Google describes this as its "highest-quality audio and voice model yet," and there are a few things that actually back that up.&lt;/p&gt;

&lt;p&gt;It's built for real-time, audio-first experiences. Unlike models that convert speech to text and then process it, Gemini 3.1 Flash Live works audio-to-audio meaning it hears you and responds as audio, keeping the conversation feeling natural and fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Here's what stands out:
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Lower latency than before. Compared to 2.5 Flash Native Audio, this model is noticeably faster. Fewer awkward pauses, snappier responses. That matters a lot when you're building voice agents where delays break the experience.&lt;/li&gt;
&lt;li&gt;It actually understands how you say things. The model picks up on acoustic nuances, pitch, pace, tone. So it can tell when you're asking a casual question vs. when you sound urgent or confused.&lt;/li&gt;
&lt;li&gt;Better background noise handling. It filters out noise more effectively, which means it works in real environments, not just quiet studios.&lt;/li&gt;
&lt;li&gt;Multilingual out of the box. Over 90 languages supported for real-time conversations.&lt;/li&gt;
&lt;li&gt;Longer conversation memory. It can follow the thread of a conversation for twice as long as the previous generation. So your agent won't "forget" what was said earlier in a long session.&lt;/li&gt;
&lt;li&gt;Tool use during live conversations. This one is huge for agent builders. The model can now trigger external tools (APIs, functions, searches) while a live conversation is happening not just at the end of a turn.&lt;/li&gt;
&lt;li&gt;Multimodal awareness. It handles audio and video inputs together, so you can build agents that respond to what they see and hear at the same time.&lt;/li&gt;
&lt;li&gt;The model ID is: gemini-3.1-flash-live-preview&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Building a Voice Agent with VideoSDK
&lt;/h2&gt;

&lt;p&gt;VideoSDK gives you everything you need to wire Gemini 3.1 Flash Live into a real voice application. Here's how to get set up from scratch.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 : Create and Activate a Python Virtual Environment
&lt;/h2&gt;

&lt;p&gt;First, create a clean Python environment so your project dependencies stay isolated.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python3&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="n"&gt;venv&lt;/span&gt; &lt;span class="n"&gt;venv&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Activate it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;macOS/Linux&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;source&lt;/span&gt; &lt;span class="n"&gt;venv&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nb"&gt;bin&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;activate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Windows&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;venv&lt;/span&gt;\&lt;span class="n"&gt;Scripts&lt;/span&gt;\&lt;span class="n"&gt;activate&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see (venv) in your terminal, which means you're good to go.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 : Set Up Your Environment Variables
&lt;/h2&gt;

&lt;p&gt;Create a .env file in your project root and add your API keys:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;VIDEOSDK_AUTH_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your_videosdk_token_here&lt;/span&gt;
&lt;span class="n"&gt;GOOGLE_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your_google_api_key_here&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can get your VideoSDK auth token from the VideoSDK dashboard and your Google API key from Google AI Studio.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; when &lt;code&gt;GOOGLE_API_KEY&lt;/code&gt; is set in your .env file, do not pass &lt;code&gt;api_key&lt;/code&gt; as a parameter in your code the SDK picks it up automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 : Install the Required Packages
&lt;/h2&gt;

&lt;p&gt;Install VideoSDK's agents SDK along with the Google plugin:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;videosdk-agents[google]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4 : Create Your Agent (main.py)
&lt;/h2&gt;

&lt;p&gt;Create a file called main.py in your project folder and paste in the following code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WorkerJob&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GeminiRealtime&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GeminiLiveConfig&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%(asctime)s - %(name)s - %(levelname)s - %(message)s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyVoiceAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You Are VideoSDK&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s Voice Agent.You are a helpful voice assistant that can answer questions and help with tasks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_enter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, how can I help you today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goodbye!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MyVoiceAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GeminiRealtime&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-live-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="c1"&gt;# When GOOGLE_API_KEY is set in .env - DON'T pass api_key parameter
&lt;/span&gt;        &lt;span class="c1"&gt;# api_key="AIXXXXXXXXXXXXXXXXXXXX", 
&lt;/span&gt;        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;GeminiLiveConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Leda&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, and Zephyr.
&lt;/span&gt;            &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_for_participant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_until_shutdown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;room_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="c1"&gt;# room_id="&amp;lt;room_id&amp;gt;", # Replace it with your actual room_id
&lt;/span&gt;        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Gemini Realtime Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;playground&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WorkerJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entrypoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  To run the agent:
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once you run this command, a playground URL will appear in your terminal. You can use this URL to interact with your AI agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Can You Build With This?
&lt;/h2&gt;

&lt;p&gt;Gemini 3.1 Flash Live + VideoSDK opens up a pretty wide range of real-world use cases:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Customer support voice bots. Replace or supplement your call center with agents that actually understand tone and can handle multilingual customers in real time.&lt;/li&gt;
&lt;li&gt;AI meeting assistants. Agents that join calls, take notes, answer questions from participants, and trigger follow-up actions mid-conversation.&lt;/li&gt;
&lt;li&gt;Healthcare intake agents. Voice-based triage agents that collect patient information, ask follow-up questions, and route to the right department all in a natural spoken conversation.&lt;/li&gt;
&lt;li&gt;Language tutors. Real-time conversation partners that catch pronunciation issues, adjust their pace based on the learner, and respond naturally.&lt;/li&gt;
&lt;li&gt;Voice-controlled IoT and home automation. Agents that listen continuously, understand context, and trigger device actions through tool use all in sub-second response times.&lt;/li&gt;
&lt;li&gt;Live interview prep tools. Candidates practice answering questions aloud and get spoken feedback instantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Gemini 3.1 Flash Live Preview is a meaningful step forward for real-time voice AI. The improvements in latency, noise handling, multilingual support, and especially live tool use make it a strong foundation for production voice agents.&lt;/p&gt;

&lt;p&gt;VideoSDK wraps all of that into a clean Python SDK that gets you from zero to a running agent in a handful of lines. Whether you're prototyping or building something you intend to ship, the setup here gives you a solid starting point.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps and Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Check &lt;a href="https://docs.videosdk.live/ai_agents/plugins/realtime/google-live-api" rel="noopener noreferrer"&gt;Gemini3.1 implementation docs&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://docs.videosdk.live/ai_agents/deployments/introduction" rel="noopener noreferrer"&gt;deploy your agents&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;Discord community &lt;/a&gt;↗. We’re excited to learn from your journey and help you build even better AI-powered communication&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Why You Don’t Need 3 API Keys to Build an AI Voice Agent</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Fri, 20 Feb 2026 09:54:33 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/why-you-dont-need-3-api-keys-to-build-an-ai-voice-agent-3jmm</link>
      <guid>https://dev.to/chaitrali_kakde/why-you-dont-need-3-api-keys-to-build-an-ai-voice-agent-3jmm</guid>
      <description>&lt;p&gt;Building AI voice agents used to mean juggling multiple providers — one for speech-to-text (STT), another for language models (LLMs), and yet another for text-to-speech (TTS). Each came with separate API keys, dashboards, billing, quotas, integration headaches, and failure points. The result? Powerful systems but slow to build, hard to maintain, and painful to scale.&lt;/p&gt;

&lt;p&gt;Today, that complexity is no longer necessary.&lt;/p&gt;

&lt;p&gt;With Inferencing in VideoSDK AI Voice Agents, you don’t need three different API keys or vendor accounts. Everything STT, LLM, TTS, and realtime models runs through a single unified platform, directly inside your voice pipeline using the Agent Runtime Dashboard and Python Agents SDK.&lt;/p&gt;

&lt;p&gt;Inferencing works seamlessly with both the CascadingPipeline and the RealtimePipeline, giving you the flexibility to build modular, staged agents or fully streaming, low-latency voice experiences. Whether you need incremental transcripts, tool-calling workflows, or native realtime audio conversations, VideoSDK lets you do it all — without the API chaos.&lt;/p&gt;

&lt;p&gt;Sign up to dashboard to check inference - &lt;a href="https://dub.sh/zXYQt7V" rel="noopener noreferrer"&gt;url&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What is VideoSDK Inference?
&lt;/h2&gt;

&lt;p&gt;VideoSDK Inference is a managed gateway that gives you access to multiple AI models. All without providing your own API keys for providers like Sarvam AI or Google Gemini.&lt;/p&gt;

&lt;p&gt;Authentication, routing, retries, and billing are handled by VideoSDK usage is simply charged against your VideoSDK account balance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fme0goxxv5c6xjv2hv17p.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fme0goxxv5c6xjv2hv17p.webp" alt=" " width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Supported Categories&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;STT: Sarvam, Google, Deepgram&lt;/li&gt;
&lt;li&gt;LLMs: Google Gemini&lt;/li&gt;
&lt;li&gt;TTS: Sarvam, Google, Cartesia&lt;/li&gt;
&lt;li&gt;Realtime: Gemini Native Audio&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Inferencing via &lt;a href="https://dub.sh/zXYQt7V" rel="noopener noreferrer"&gt;Agent Runtime Dashboard&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Inferencing in VideoSDK is now fully accessible through the dashboard, giving developers direct control over model selection and pipeline configuration without needing to manage infrastructure manually.&lt;/p&gt;

&lt;p&gt;Sign up to dashboard - &lt;a href="https://dub.sh/zXYQt7V" rel="noopener noreferrer"&gt;url&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dub.sh/zXYQt7V" rel="noopener noreferrer"&gt;From the dashboard, developers can&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select STT, LLM, TTS, or Realtime models and enable them in the pipeline with a single click.&lt;/li&gt;
&lt;li&gt;Switch providers instantly, allowing rapid experimentation and iteration .&lt;/li&gt;
&lt;li&gt;Attach deployment endpoints for web or telephony, making the agent immediately accessible to users.&lt;/li&gt;
&lt;li&gt;With this approach, ideas move from configuration to live, interactive conversations in minutes, making it possible to test new workflows, swap models, or iterate on conversational design almost instantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Inferencing via Code (Agents SDK)
&lt;/h2&gt;

&lt;p&gt;With VideoSDK Inferencing, developers can now integrate STT, LLM, TTS, and Realtime models directly into their voice agents all handled inside the VideoSDK. This enables rapid experimentation, modular pipelines, and low-latency real-time conversations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;The Inference plugin is included in the core VideoSDK Agents SDK. Install it via&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;videosdk&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Importing Inference Classes&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can import the Inference classes directly from videosdk.agents.inference:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from videosdk.agents.inference import STT, LLM, TTS, Realtime
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CascadingPipeline Example
&lt;/h2&gt;

&lt;p&gt;The CascadingPipeline is ideal for modular, stage-by-stage processing. Here’s an example of building a simple agent using STT, LLM, and TTS via the VideoSDK Inference Gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;pipeline = CascadingPipeline(
        stt=STT.sarvam(model_id="saarika:v2.5", language="en-IN"),
        llm=LLM.google(model="gemini-2.5-flash"),
        tts=TTS.sarvam(model_id="bulbul:v2", speaker="anushka", language="en-IN"),
        vad=SileroVAD()
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  RealTimePipeline Example
&lt;/h2&gt;

&lt;p&gt;For low-latency, fully streaming voice agents, the RealTimePipeline handles Realtime inference with minimal delay. Here’s an example using Gemini Live Native Audio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RealTimePipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Realtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gemini&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash-native-audio-preview-12-2025&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Puck&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;language_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-US&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With this approach, developers retain:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full programmatic control over pipeline stages, model parameters, and execution behavior.&lt;/li&gt;
&lt;li&gt;Modular provider replacement, making it easy to swap STT, LLM, or TTS engines.
The result: a fully configurable, production-ready AI voice agent that can be deployed in minutes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Voice AI is no longer limited by model capability. It’s limited by how fast you can deploy it. With Inferencing in VideoSDK AI Voice Agents, deployment becomes effortless. Whether through the dashboard or programmatically via the SDK, you can build, select, enable, and go live in minutes.&lt;/p&gt;

&lt;p&gt;The era of modular, low-latency, real-time voice agents is here. With Inferencing, your ideas move from concept to conversation faster than ever.&lt;/p&gt;

&lt;p&gt;Build. Select. Configure. Go live.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources and Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;For More Information Read the &lt;a href="https://docs.videosdk.live/ai_agents/plugins/inference/videosdk-inference" rel="noopener noreferrer"&gt;Inference documentation.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://docs.videosdk.live/ai_agents/deployments/introduction" rel="noopener noreferrer"&gt;deploy your AI Agents.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Sign up at &lt;a href="https://dub.sh/zXYQt7V" rel="noopener noreferrer"&gt;VideoSDK Dashboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;Discord community&lt;/a&gt; ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/li&gt;
&lt;/ul&gt;

</description>
    </item>
    <item>
      <title>Introducing Phone Numbers: Build AI Telephony Agents in 60 seconds</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Tue, 17 Feb 2026 07:13:43 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/introducing-phone-numbers-build-ai-telephony-agents-in-60-seconds-3b33</link>
      <guid>https://dev.to/chaitrali_kakde/introducing-phone-numbers-build-ai-telephony-agents-in-60-seconds-3b33</guid>
      <description>&lt;p&gt;Today, we’re launching &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK Phone Numbers&lt;/a&gt; a first-party telephony capability that lets you connect voice agents and calling workflows directly to the phone network, without relying on third-party providers like Twilio.&lt;/p&gt;

&lt;p&gt;You can now purchase phone numbers straight from the &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK Dashboard&lt;/a&gt;, attach them to your agent or calling logic, and go live in minutes.&lt;/p&gt;

&lt;p&gt;No external accounts. No SIP trunk configuration. No extra setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4crcadaxwdn31znch5h.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4crcadaxwdn31znch5h.webp" alt=" " width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fewer Hops. Better Calls.
&lt;/h2&gt;

&lt;p&gt;Until now, developers building phone-based voice experiences on VideoSDK typically had to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sign up with a third-party telephony provider (like Twilio)&lt;/li&gt;
&lt;li&gt;Purchase and manage phone numbers externally&lt;/li&gt;
&lt;li&gt;Configure SIP trunks and credentials&lt;/li&gt;
&lt;li&gt;Integrate and debug multiple systems before making the first call&lt;/li&gt;
&lt;li&gt;By offering native phone numbers directly within VideoSDK, we remove those extra layers and make phone-based voice apps significantly easier to build and operate in just 3 steps.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzy7n7jaeyrqmc9lu2y0o.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzy7n7jaeyrqmc9lu2y0o.webp" alt=" " width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Getting started with &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK Phone Numbers&lt;/a&gt; is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Buy a Phone Number : Search and purchase available phone numbers directly from the VideoSDK Dashboard.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Attach Your Agent or Logic : Associate the phone number with your voice agent or inbound call routing logic so incoming calls are handled automatically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Go Live : Call the number and your agent answers instantly.&lt;br&gt;
Manage numbers, update routing, and monitor usage all from one place.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/K4g51dBkrrk"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Build
&lt;/h2&gt;

&lt;p&gt;Phone calls remain critical across many industries. With VideoSDK Phone Numbers, you can spin up phone-based voice agents in minutes for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sales qualification and lead routing&lt;/li&gt;
&lt;li&gt;Order tracking and delivery updates&lt;/li&gt;
&lt;li&gt;Appointment booking and status updates&lt;/li&gt;
&lt;li&gt;Restaurant and service business call automation&lt;/li&gt;
&lt;li&gt;If your application involves real-time voice and the phone network, this feature is built for you&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Sign up at &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK - dashboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Have feedback or ideas? Comment below or join our Discord to build and ship AI call agents faster with &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;VideoSDK Discord community&lt;/a&gt; ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>twilio</category>
      <category>agents</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Announcing VideoSDK Inference: One Magic API for Every Voice AI Model</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Tue, 17 Feb 2026 07:08:59 +0000</pubDate>
      <link>https://dev.to/video-sdk/announcing-videosdk-inference-one-magic-api-for-every-voice-ai-model-41bg</link>
      <guid>https://dev.to/video-sdk/announcing-videosdk-inference-one-magic-api-for-every-voice-ai-model-41bg</guid>
      <description>&lt;p&gt;Building AI voice agents has always been powerful, but slow. You had models STT, LLMs, TTS and the tools to use them. But maintaining accounts across multiple vendors for speech recognition, language models, and speech synthesis, each with its own keys, quotas, billing, and APIs was a major challenge.&lt;/p&gt;

&lt;p&gt;Today, that changes.&lt;/p&gt;

&lt;p&gt;We’re thrilled to announce Inferencing in &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK AI Voice Agents&lt;/a&gt; a unified way to run STT, LLM, TTS, and Realtime models directly inside your voice pipeline without managing multiple accounts through Agent Runtime Dashboard and Python Agents SDK.&lt;/p&gt;

&lt;p&gt;Inferencing works in both the &lt;a href="https://docs.videosdk.live/ai_agents/core-components/cascading-pipeline" rel="noopener noreferrer"&gt;CascadingPipeline&lt;/a&gt; and the &lt;a href="https://docs.videosdk.live/ai_agents/core-components/realtime-pipeline" rel="noopener noreferrer"&gt;RealtimePipeline&lt;/a&gt;, giving you full flexibility to build modular or fully streaming voice agents. Whether you want incremental transcripts, staged execution, or fully native realtime audio, Inferencing makes it easy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Categories
&lt;/h2&gt;

&lt;p&gt;STT: Sarvam, Google, Deepgram&lt;br&gt;
LLMs: Google Gemini&lt;br&gt;
TTS: Sarvam, Google, Cartesia&lt;br&gt;
Realtime: Gemini Native Audio&lt;/p&gt;
&lt;h2&gt;
  
  
  What is VideoSDK Inference?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK Inference&lt;/a&gt; is a managed gateway that gives you access to multiple AI models. All without providing your own API keys for providers like Sarvam AI or Google Gemini.&lt;/p&gt;

&lt;p&gt;Authentication, routing, retries, and billing are handled by VideoSDK usage is simply charged against your VideoSDK account balance.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14ls0kk7kadxoln35gt4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F14ls0kk7kadxoln35gt4.png" alt=" " width="800" height="247"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Inferencing via &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;Agent Runtime Dashboard&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Inferencing in VideoSDK is now fully accessible through the dashboard, giving developers direct control over model selection and pipeline configuration without needing to manage infrastructure manually.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;From the dashboard&lt;/a&gt;, developers can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select STT, LLM, TTS, or Realtime models and enable them in the pipeline with a single click.&lt;/li&gt;
&lt;li&gt;Switch providers instantly, allowing rapid experimentation and iteration .&lt;/li&gt;
&lt;li&gt;Attach deployment endpoints for web or telephony, making the agent immediately accessible to users.&lt;/li&gt;
&lt;li&gt;With this approach, ideas move from configuration to live, interactive conversations in minutes, making it possible to test new workflows, swap models, or iterate on conversational design almost instantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Watch this video to get started&lt;/strong&gt; 👇&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/2XvhVJETSrU"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  Inferencing via &lt;a href="https://dub.sh/q854DMD" rel="noopener noreferrer"&gt;Code (Agents SDK)&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;With VideoSDK Inferencing, developers can now integrate STT, LLM, TTS, and Realtime models directly into their voice agents all handled inside the VideoSDK. This enables rapid experimentation, modular pipelines, and low-latency real-time conversations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Installation&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The Inference plugin is included in the core VideoSDK Agents SDK. Install it via&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;videosdk&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  CascadingPipeline Example
&lt;/h2&gt;

&lt;p&gt;The CascadingPipeline is ideal for modular, stage-by-stage processing. Here’s an example of building a simple agent using STT, LLM, and TTS via the VideoSDK Inference Gateway:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CascadingPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;stt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;STT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sarvam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;saarika:v2.5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-IN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LLM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;google&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;tts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TTS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sarvam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bulbul:v2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;speaker&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anushka&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-IN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;vad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SileroVAD&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;RealTimePipeline Example&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For low-latency, fully streaming voice agents, the RealTimePipeline handles Realtime inference with minimal delay. Here’s an example using Gemini Live Native Audio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RealTimePipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Realtime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gemini&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash-native-audio-preview-12-2025&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;voice&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Puck&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;language_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en-US&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;*&lt;em&gt;With this approach, developers retain *&lt;/em&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full programmatic control over pipeline stages, model parameters, and execution behavior.&lt;/li&gt;
&lt;li&gt;Modular provider replacement, making it easy to swap STT, LLM, or TTS engines.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The result:&lt;/strong&gt; a fully configurable, production-ready AI voice agent that can be deployed in minutes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Voice AI is no longer limited by model capability. It’s limited by how fast you can deploy it. With Inferencing in VideoSDK AI Voice Agents, deployment becomes effortless. Whether through the dashboard or programmatically via the SDK, you can build, select, enable, and go live in minutes.&lt;/p&gt;

&lt;p&gt;The era of modular, low-latency, real-time voice agents is here. With Inferencing, your ideas move from concept to conversation faster than ever.&lt;/p&gt;

&lt;p&gt;Build. Select. Configure. Go live.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources and Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;For More Information Read the &lt;a href="https://docs.videosdk.live/ai_agents/plugins/inference/videosdk-inference" rel="noopener noreferrer"&gt;Inference documentation.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://docs.videosdk.live/ai_agents/deployments/introduction" rel="noopener noreferrer"&gt;deploy your AI Agents&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Sign up at &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK Dashboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;Discord community&lt;/a&gt; ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Introducing Phone Numbers: Build AI Telephony Agents in 60 seconds</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Tue, 17 Feb 2026 06:55:33 +0000</pubDate>
      <link>https://dev.to/video-sdk/introducing-phone-numbers-build-ai-telephony-agents-in-60-seconds-4la5</link>
      <guid>https://dev.to/video-sdk/introducing-phone-numbers-build-ai-telephony-agents-in-60-seconds-4la5</guid>
      <description>&lt;p&gt;Today, we’re launching &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK Phone Numbers&lt;/a&gt; a first-party telephony capability that lets you connect voice agents and calling workflows directly to the phone network, without relying on third-party providers like Twilio.&lt;/p&gt;

&lt;p&gt;You can now purchase phone numbers straight from the &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK Dashboard&lt;/a&gt;, attach them to your agent or calling logic, and go live in minutes.&lt;/p&gt;

&lt;p&gt;No external accounts. No SIP trunk configuration. No extra setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4crcadaxwdn31znch5h.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe4crcadaxwdn31znch5h.webp" alt=" " width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Fewer Hops. Better Calls.
&lt;/h2&gt;

&lt;p&gt;Until now, developers building phone-based voice experiences on VideoSDK typically had to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sign up with a third-party telephony provider (like Twilio)&lt;/li&gt;
&lt;li&gt;Purchase and manage phone numbers externally&lt;/li&gt;
&lt;li&gt;Configure SIP trunks and credentials&lt;/li&gt;
&lt;li&gt;Integrate and debug multiple systems before making the first call&lt;/li&gt;
&lt;li&gt;By offering native phone numbers directly within VideoSDK, we remove those extra layers and make phone-based voice apps significantly easier to build and operate in just 3 steps.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzy7n7jaeyrqmc9lu2y0o.webp" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzy7n7jaeyrqmc9lu2y0o.webp" alt=" " width="800" height="321"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Getting started with &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK Phone Numbers&lt;/a&gt; is simple:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Buy a Phone Number : Search and purchase available phone numbers directly from the VideoSDK Dashboard.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Attach Your Agent or Logic : Associate the phone number with your voice agent or inbound call routing logic so incoming calls are handled automatically.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Go Live : Call the number and your agent answers instantly.&lt;br&gt;
Manage numbers, update routing, and monitor usage all from one place.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/K4g51dBkrrk"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h2&gt;
  
  
  What You Can Build
&lt;/h2&gt;

&lt;p&gt;Phone calls remain critical across many industries. With VideoSDK Phone Numbers, you can spin up phone-based voice agents in minutes for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sales qualification and lead routing&lt;/li&gt;
&lt;li&gt;Order tracking and delivery updates&lt;/li&gt;
&lt;li&gt;Appointment booking and status updates&lt;/li&gt;
&lt;li&gt;Restaurant and service business call automation&lt;/li&gt;
&lt;li&gt;If your application involves real-time voice and the phone network, this feature is built for you&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Sign up at &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK - dashboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Have feedback or ideas? Comment below or join our Discord to build and ship AI call agents faster with &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;VideoSDK Discord community&lt;/a&gt; ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>twilio</category>
      <category>agents</category>
      <category>techtalks</category>
    </item>
    <item>
      <title>Introducing the Gladia Speech to Text Plugin in VideoSDK</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Wed, 21 Jan 2026 07:41:00 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/introducing-the-gladia-speech-to-text-plugin-in-videosdk-4c27</link>
      <guid>https://dev.to/chaitrali_kakde/introducing-the-gladia-speech-to-text-plugin-in-videosdk-4c27</guid>
      <description>&lt;p&gt;Speech-to-text is the first and most critical step in any voice agent. If transcription is slow or inaccurate, everything downstream reasoning and response suffers. Gladia STT is built for real-time transcription with strong multilingual support, fast partial results, and handling of code-switching.&lt;/p&gt;

&lt;p&gt;In this guide, we’ll walk through how to integrate Gladia STT with the VideoSDK Agents SDK and use it as a reliable input layer for voice-driven applications&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Gladia STT?
&lt;/h2&gt;

&lt;p&gt;Many voice agents operate in environments where users switch languages mid-sentence or expect instant feedback while speaking. Gladia is optimized for these scenarios. It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Low-latency transcription&lt;/li&gt;
&lt;li&gt;Support for multiple languages&lt;/li&gt;
&lt;li&gt;Automatic code-switching&lt;/li&gt;
&lt;li&gt;Partial transcripts for faster turn detection&lt;/li&gt;
&lt;li&gt;This makes it a strong choice for real-time agents, live calls, and interactive voice applications.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key Features
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Transcription&lt;/strong&gt; : Gladia streams transcription results as audio is processed, reducing perceived latency in conversations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multilingual Support&lt;/strong&gt; : You can specify one or more languages, making it suitable for global or multilingual users.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Code-Switching&lt;/strong&gt; : Gladia can automatically detect and switch languages within the same conversation without manual intervention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Partial Transcripts&lt;/strong&gt; : By enabling partial transcripts, agents can start reasoning before the user finishes speaking, improving responsiveness.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;Install the Gladia-STT VideoSDK Agents package:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;pip install "videosdk-plugins-gladia"&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Authentication
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Sign up at Gladia : signup link&lt;/li&gt;
&lt;li&gt;Sign up at VideoSDK - authentication token
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;GLADIA_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;your_api_key_here&lt;/span&gt;
&lt;span class="n"&gt;VIDEOSDK_AUTH_TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When using environment variables, you don’t need to pass the API key directly in code the SDK reads it automatically.&lt;/p&gt;

&lt;p&gt;Importing Gladia STT&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.gladia&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GladiaSTT&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Basic Usage Example
&lt;/h2&gt;

&lt;p&gt;Below is a minimal example showing how to configure Gladia STT and attach it to a cascading pipeline.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.gladia&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GladiaSTT&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CascadingPipeline&lt;/span&gt;

&lt;span class="c1"&gt;# Initialize the Gladia STT model
&lt;/span&gt;&lt;span class="n"&gt;stt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GladiaSTT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-gladia-api-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;languages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;code_switching&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;receive_partial_transcripts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;#  Add stt to a cascading pipeline
&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CascadingPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This setup enables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Real-time transcription&lt;/li&gt;
&lt;li&gt;Automatic language switching&lt;/li&gt;
&lt;li&gt;Partial transcripts for faster downstream processing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Configuration Options
&lt;/h2&gt;

&lt;p&gt;Gladia STT provides fine-grained control over transcription behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;languages: List of language codes to detect (e.g., ["en", "fr"])&lt;/li&gt;
&lt;li&gt;code_switching: Enables automatic language switching&lt;/li&gt;
&lt;li&gt;receive_partial_transcripts: Streams interim results for lower latency&lt;/li&gt;
&lt;li&gt;model: STT model to use (default: "solaria-1")&lt;/li&gt;
&lt;li&gt;input_sample_rate: Incoming audio sample rate&lt;/li&gt;
&lt;li&gt;output_sample_rate: Processing sample rate&lt;/li&gt;
&lt;li&gt;encoding: Audio encoding format&lt;/li&gt;
&lt;li&gt;bit_depth: Audio bit depth&lt;/li&gt;
&lt;li&gt;channels: Number of audio channels (mono or stereo)&lt;/li&gt;
&lt;li&gt;These parameters let you tune accuracy, latency, and compatibility with your audio pipeline.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Gladia STT provides a strong foundation for real-time voice agents by combining speed, accuracy, and multilingual flexibility. When integrated with VideoSDK’s agent pipelines, it enables agents to listen effectively even in dynamic, multilingual conversations. A reliable STT layer like Gladia helps ensure that downstream reasoning and responses stay accurate, responsive, and consistent.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources and Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Explore the &lt;a href="https://docs.videosdk.live/ai_agents/plugins/stt/gladia" rel="noopener noreferrer"&gt;gladia stt&lt;/a&gt;documentation&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://docs.videosdk.live/ai_agents/deployments/introduction" rel="noopener noreferrer"&gt;deploy your AI Agents.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Explore more: Check out the &lt;a href="https://dub.sh/gGWIuJh" rel="noopener noreferrer"&gt;VideoSDK documentation&lt;/a&gt; for more features.&lt;/li&gt;
&lt;li&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;Discord community &lt;/a&gt;↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>api</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to enable Voice Mail Detection in AI Voice Agents</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Fri, 09 Jan 2026 10:46:25 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/how-to-enable-voice-mail-detection-in-ai-voice-agents-1k34</link>
      <guid>https://dev.to/chaitrali_kakde/how-to-enable-voice-mail-detection-in-ai-voice-agents-1k34</guid>
      <description>&lt;p&gt;When building outbound calling workflows, one of the most common challenge is handling unanswered calls that get redirected to voicemail systems. Without proper detection, an AI agent may continue speaking unnecessarily or wait indefinitely, leading to wasted resources and poor user experience.&lt;/p&gt;

&lt;p&gt;Voice Mail Detection in VideoSDK solves this problem by automatically identifying voicemail scenarios and allowing your agent to take the appropriate action such as leaving a message or ending the call gracefully.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Problem This Solves
&lt;/h2&gt;

&lt;p&gt;In outbound calling workflows, unanswered calls are often routed to voicemail systems. Without detection, agents may continue speaking or wait unnecessarily.&lt;/p&gt;

&lt;p&gt;Voice Mail Detection lets you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Detect voicemail systems automatically&lt;/li&gt;
&lt;li&gt;Control how your agent responds&lt;/li&gt;
&lt;li&gt;End calls cleanly after voicemail handling&lt;/li&gt;
&lt;li&gt;Enabling Voice Mail Detection&lt;/li&gt;
&lt;li&gt;To use voicemail detection, import and add VoiceMailDetector to your agent configuration and register a callback that defines how voicemail should be handled.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;VoiceMailDetector&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAILLM&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;voice_mail_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Voice Mail message received:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;voicemail&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VoiceMailDetector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAILLM&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;custom_callback_voicemail&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;voice_mail_detector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;voicemail&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Full Working Example
&lt;/h2&gt;

&lt;p&gt;To set up incoming call handling, outbound calling, and routing rules, check out the Quick Start Example&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CascadingPipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;WorkerJob&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;ConversationFlow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;VoiceMailDetector&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.deepgram&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DeepgramSTT&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAILLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.elevenlabs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ElevenLabsTTS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.silero&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SileroVAD&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.turn_detector&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TurnDetector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pre_download_model&lt;/span&gt;

&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%(asctime)s - %(name)s - %(levelname)s - %(message)s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;span class="nf"&gt;pre_download_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VoiceAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful voice assistant that can answer questions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_enter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, how can I help you today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goodbye!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;entrypoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VoiceAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conversation_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;CascadingPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;stt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;DeepgramSTT&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAILLM&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;tts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ElevenLabsTTS&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;vad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SileroVAD&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;turn_detector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TurnDetector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;voice_mail_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Voice Mail message received:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;voice_mail_detector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VoiceMailDetector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAILLM&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;duration&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;voice_mail_callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conversation_flow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation_flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;voice_mail_detector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;voice_mail_detector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_for_participant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_until_shutdown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;room_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Voice Mail Detector Test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;playground&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WorkerJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entrypoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;entrypoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_AGENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_processes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;register&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8081&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Voice Mail Detection ensures your AI agent handles unanswered calls intelligently. By automatically detecting voicemail and triggering the right action, it prevents wasted time, improves call efficiency, and makes outbound workflows reliable and production-ready.&lt;/p&gt;

&lt;p&gt;Resources and Next Steps&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explore the voice mail detection implementation on github.&lt;/li&gt;
&lt;li&gt;Read voice mail detection docs.&lt;/li&gt;
&lt;li&gt;To set up inbound calls, outbound calls, and routing rules check out the Quick Start Example.&lt;/li&gt;
&lt;li&gt;Learn how to deploy your AI Agents.&lt;/li&gt;
&lt;li&gt;Explore more: Check out the VideoSDK documentation for more features.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our Discord community ↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>automation</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to enable DTMF Events in Telephony AI Agent</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Mon, 05 Jan 2026 08:49:02 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/how-to-enable-dtmf-events-in-telephony-ai-agent-49g7</link>
      <guid>https://dev.to/chaitrali_kakde/how-to-enable-dtmf-events-in-telephony-ai-agent-49g7</guid>
      <description>&lt;p&gt;Not every caller wants to speak to a voice agent. In many call scenarios, users expect to press a key to make a selection, confirm an action, or move forward in a call flow. This is especially common in menu-based systems, short responses, or situations where speech recognition may not be reliable.&lt;/p&gt;

&lt;p&gt;DTMF (Dual-Tone Multi-Frequency) input gives voice agents a clear and predictable way to handle these interactions. When a caller presses a key on their phone, the agent receives that input instantly and can use it to control the call flow or trigger application logic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In this post, we’ll explore how DTMF events can be used in a VideoSDK-powered voice agent, starting from common interaction patterns and moving into how the system processes keypad input in real time.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DTMF Event Detection&lt;/strong&gt;: The agent detects key presses (0–9, *, #) from the caller during a call session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Processing&lt;/strong&gt;: Each key press generates a DTMF event that is delivered to the agent immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Callback Integration&lt;/strong&gt;: A user-defined callback function handles incoming DTMF events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action Execution&lt;/strong&gt;: The agent executes actions or triggers workflows based on the received DTMF input like building IVR flows, collecting user input, or triggering actions in your application.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1 : Enabling DTMF Events
&lt;/h2&gt;

&lt;p&gt;DTMF event detection can be enabled in two ways:&lt;/p&gt;

&lt;p&gt;1) &lt;strong&gt;Via Dashboard&lt;/strong&gt; :&lt;br&gt;
When creating or editing a SIP gateway in the VideoSDK dashboard, enable the DTMF option.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe07d1ij7a15v1pqhq2qs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe07d1ij7a15v1pqhq2qs.png" alt=" " width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2) &lt;strong&gt;Via API&lt;/strong&gt; :&lt;br&gt;
Set the enableDtmf parameter to true when creating or updating a SIP gateway using the API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;curl&lt;/span&gt;    &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Authorization: $YOUR_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; \ 
  &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;H&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Content-Type: application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; \ 
  &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; : &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Twilio Inbound Gateway&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enableDtmf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; : &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
    &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;numbers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; : [&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;+0123456789&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;]

  }&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; \ 
  &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;XPOST&lt;/span&gt; &lt;span class="n"&gt;https&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;//&lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;videosdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;live&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;sip&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;inbound&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;gateways&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once enabled, DTMF events will be detected and published for all calls routed through that gateway.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;To set up inbound calls, outbound calls, and routing rules check out the&lt;a href="https://docs.videosdk.live/telephony/ai-telephony-agent-quick-start#part-2-connect-your-agent-to-the-phone-network" rel="noopener noreferrer"&gt; Quick Start Example.&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 2 . Implementation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;DTMFHandler&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;entrypoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dtmf_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;digit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;digit&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a Sales Representative. Your goal is to sell our products&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Routing you to Sales. Hi, I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m from Sales. How can I help you today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;digit&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a Support Specialist. Your goal is to help customers with technical issues.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Routing you to Support. Hi, I&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;m from Support. What issue are you facing?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Invalid input. Press 1 for Sales or 2 for Support.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;dtmf_handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DTMFHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dtmf_callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;dtmf_handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dtmf_handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Full Working Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CascadingPipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;WorkerJob&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;ConversationFlow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="n"&gt;DTMFHandler&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.deepgram&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DeepgramSTT&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAILLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.elevenlabs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ElevenLabsTTS&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.silero&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SileroVAD&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.turn_detector&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TurnDetector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pre_download_model&lt;/span&gt;

&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;basicConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;level&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INFO&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;%(asctime)s - %(name)s - %(levelname)s - %(message)s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handlers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;StreamHandler&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;span class="nf"&gt;pre_download_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;VoiceAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful voice assistant that can answer questions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_enter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello, how can I help you today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goodbye!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;entrypoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;

    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;VoiceAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conversation_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;CascadingPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;stt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;DeepgramSTT&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAILLM&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;tts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ElevenLabsTTS&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;vad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SileroVAD&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="n"&gt;turn_detector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TurnDetector&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dtmf_callback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DTMF message received:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;dtmf_handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DTMFHandler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dtmf_callback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; 
        &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conversation_flow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation_flow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;dtmf_handler&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dtmf_handler&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wait_for_participant&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;run_until_shutdown&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;room_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DTMF Agent Test&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;playground&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; 

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WorkerJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entrypoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;entrypoint&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_AGENT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_processes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;register&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;localhost&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8081&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By enabling DTMF detection and handling events at the agent level, you can build predictable call flows, guide users through menus, and trigger application logic without interrupting the call experience. When combined with voice input, DTMF gives you more control over how users interact with your agent.&lt;/p&gt;

&lt;p&gt;This makes DTMF a practical addition to any voice agent that needs clear, deterministic user input during a call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources and Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Explore the &lt;a href="https://github.com/videosdk-live/agents-quickstart/blob/main/DTMF%20Handler/dtmf_handler.py" rel="noopener noreferrer"&gt;dtmf-event-implementation-example &lt;/a&gt;for full code implementation.&lt;/li&gt;
&lt;li&gt;To set up inbound calls, outbound calls, and routing rules check out the &lt;a href="https://docs.videosdk.live/telephony/ai-telephony-agent-quick-start#part-2-connect-your-agent-to-the-phone-network" rel="noopener noreferrer"&gt;Quick Start Example.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://docs.videosdk.live/ai_agents/deployments/introduction" rel="noopener noreferrer"&gt;deploy your AI Agents.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Explore more: Check out the &lt;a href="https://dub.sh/gGWIuJh" rel="noopener noreferrer"&gt;VideoSDK documentation&lt;/a&gt; for more features.&lt;/li&gt;
&lt;li&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;Discord community &lt;/a&gt;↗. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
      <category>python</category>
      <category>sip</category>
      <category>agents</category>
    </item>
    <item>
      <title>How to enable preemptive response in AI Voice Agents</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Mon, 29 Dec 2025 11:35:17 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/how-to-enable-preemptive-response-in-ai-voice-agents-4j75</link>
      <guid>https://dev.to/chaitrali_kakde/how-to-enable-preemptive-response-in-ai-voice-agents-4j75</guid>
      <description>&lt;p&gt;When it comes to voice AI, the real challenge isn’t speed it’s timing&lt;/p&gt;

&lt;p&gt;A response that arrives a second too late feels unnatural. That tiny pause is enough to remind users they’re talking to a machine. Humans don’t wait for sentences to end. We anticipate intent and respond at the right moment. Traditional voice agents don’t. They wait for silence and that’s what makes conversations feel slow.&lt;/p&gt;

&lt;p&gt;Preemptive Response fixes this by letting voice agents start understanding and preparing responses while the user is still speaking.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Preemptive Response?
&lt;/h2&gt;

&lt;p&gt;Preemptive Response is a capability that allows a voice agent to start understanding a user’s intent before they finish speaking.&lt;/p&gt;

&lt;p&gt;As the user talks, the Speech-to-Text engine emits partial transcripts in real time. These partial results are enough for the agent to begin reasoning early, instead of waiting for the full sentence and a moment of silence.&lt;/p&gt;

&lt;p&gt;The goal isn’t to interrupt the user it’s to be ready at the right moment.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Preemptive response works
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywycvw5t3p8rkzzzsvuu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fywycvw5t3p8rkzzzsvuu.png" alt=" " width="800" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User audio is streamed to the STT, which generates partial transcripts.&lt;/li&gt;
&lt;li&gt;These partial transcripts are immediately sent to the LLM to enable preemptive (early) responses.&lt;/li&gt;
&lt;li&gt;The LLM output is then passed to the TTS to generate the spoken response.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Enabling Preemptive Response
&lt;/h2&gt;

&lt;p&gt;To enable this feature, set the enable_preemptive_generation flag to True when initializing your STT plugin (e.g., DeepgramSTTV2).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.deepgram&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DeepgramSTTV2&lt;/span&gt;

&lt;span class="n"&gt;stt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DeepgramSTTV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;enable_preemptive_generation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once enabled, partial transcripts start flowing automatically and your agent begins preparing responses earlier by design.&lt;/p&gt;

&lt;p&gt;Currently, preemptive response generation is limited to Deepgram’s STT implementation and is available only in the Flux model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementation
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Prerequisites&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A VideoSDK authentication token (generate from &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;app.videosdk.live&lt;/a&gt;), follow to guide to generate videosdk token)&lt;/li&gt;
&lt;li&gt;A &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK meeting ID &lt;/a&gt;(you can generate one using the Create Room API or through the &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK dashboard&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;Python 3.12 or higher&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Install dependencies&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;videosdk-agents[deepgram,openai,elevenlabs,silero,turn_detector]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Want to use a different provider? Check out our plugins for &lt;a href="https://docs.videosdk.live/ai_agents/plugins/stt/openai" rel="noopener noreferrer"&gt;STT&lt;/a&gt;, &lt;a href="https://docs.videosdk.live/ai_agents/plugins/llm/openai" rel="noopener noreferrer"&gt;LLM&lt;/a&gt;, and &lt;a href="https://docs.videosdk.live/ai_agents/plugins/tts/eleven-labs" rel="noopener noreferrer"&gt;TTS&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Set API Keys in .env&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DEEPGRAM_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your Deepgram API Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;OPENAI_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your OpenAI API Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;ELEVENLABS_API_KEY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your ElevenLabs API Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;VIDEOSDK_AUTH_TOKEN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VideoSDK Auth token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;API Keys&lt;/strong&gt; - Get API keys &lt;a href="https://console.deepgram.com/" rel="noopener noreferrer"&gt;Deepgram ↗&lt;/a&gt;, &lt;a href="https://platform.openai.com/api-keys" rel="noopener noreferrer"&gt;OpenAI ↗&lt;/a&gt;, &lt;a href="https://elevenlabs.io/app/settings/api-keys" rel="noopener noreferrer"&gt;ElevenLabs ↗&lt;/a&gt; &amp;amp; &lt;a href="https://docs.videosdk.live/ai_agents/[https://app.videosdk.live](https://app.videosdk.live/api-keys)" rel="noopener noreferrer"&gt;VideoSDK Dashboard ↗&lt;/a&gt; follow to guide to &lt;a href="https://docs.videosdk.live/ai_agents/authentication-and-token" rel="noopener noreferrer"&gt;generate videosdk token&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Full Working Example
&lt;/h2&gt;


&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CascadingPipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;WorkerJob&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ConversationFlow&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.silero&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SileroVAD&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.turn_detector&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;TurnDetector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pre_download_model&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.deepgram&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DeepgramSTTV2&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAILLM&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;videosdk.plugins.elevenlabs&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ElevenLabsTTS&lt;/span&gt;

&lt;span class="c1"&gt;# Pre-download the Turn Detector model to avoid delays during startup
&lt;/span&gt;&lt;span class="nf"&gt;pre_download_model&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyVoiceAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful voice assistant that can answer questions and help with tasks.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_enter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hello! How can I help you today?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;on_exit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;say&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Goodbye!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Create the agent and conversation flow
&lt;/span&gt;    &lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MyVoiceAgent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;conversation_flow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ConversationFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Define the pipeline with Preemptive Generation enabled
&lt;/span&gt;    &lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CascadingPipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;stt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;DeepgramSTTV2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flux-general-en&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;enable_preemptive_generation&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;  &lt;span class="c1"&gt;# Enable low-latency partials
&lt;/span&gt;        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;OpenAILLM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;tts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;ElevenLabsTTS&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;eleven_flash_v2_5&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;vad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;SileroVAD&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.35&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;turn_detector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;TurnDetector&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Initialize the session
&lt;/span&gt;    &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentSession&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;conversation_flow&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;conversation_flow&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="c1"&gt;# Keep the session running
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;wait&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;finally&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Clean up resources
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;shutdown&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;room_options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RoomOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VideoSDK Cascaded Agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;playground&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;JobContext&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;room_options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;WorkerJob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;entrypoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;jobctx&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;make_context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the Python Script&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also use console for running the script :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="n"&gt;console&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With Preemptive Response enabled, the voice agent no longer waits for speech to end. It begins processing intent as audio arrives, reducing latency and keeping conversations natural. The result is a responsive, end-to-end voice experience that feels fluid in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Explore the &lt;a href="https://dub.sh/WRhwbz3" rel="noopener noreferrer"&gt;preemptive-response-docs&lt;/a&gt; for more information.&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://dub.sh/GB2p2A4" rel="noopener noreferrer"&gt;deploy your AI Agents.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Visit &lt;a href="https://dub.sh/StzJRCB" rel="noopener noreferrer"&gt;Deepgram's flux documentation.&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>deepgram</category>
      <category>voiceai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>How to enable DTMF Events in Telephony AI Agent</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Mon, 29 Dec 2025 11:07:35 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/how-to-enable-dtmf-events-in-telephony-ai-agent-h5n</link>
      <guid>https://dev.to/chaitrali_kakde/how-to-enable-dtmf-events-in-telephony-ai-agent-h5n</guid>
      <description>&lt;p&gt;Not every caller wants to speak to a voice agent. In many call scenarios, users expect to press a key to make a selection, confirm an action, or move forward in a call flow. This is especially common in menu-based systems, short responses, or situations where speech recognition may not be reliable&lt;/p&gt;

&lt;p&gt;DTMF (Dual-Tone Multi-Frequency) input gives voice agents a clear and predictable way to handle these interactions. When a caller presses a key on their phone, the agent receives that input instantly and can use it to control the call flow or trigger application logic.&lt;/p&gt;

&lt;p&gt;In this post, we’ll explore how DTMF events can be used in a VideoSDK-powered voice agent, starting from common interaction patterns and moving into how the system processes keypad input in real time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Typical Interaction Patterns Using DTMF
&lt;/h2&gt;

&lt;p&gt;DTMF input is commonly used at decision points in a call, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Selecting options from a call menu&lt;/li&gt;
&lt;li&gt;Confirming or canceling an action&lt;/li&gt;
&lt;li&gt;Providing short numeric input&lt;/li&gt;
&lt;li&gt;Navigating between steps in a call flow&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These interactions are simple, fast, and familiar to callers, which makes them a good fit for structured voice experiences.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DTMF Event Detection&lt;/strong&gt;: The agent detects key presses (0–9, *, #) from the caller during a call session.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-Time Processing&lt;/strong&gt;: Each key press generates a DTMF event that is delivered to the agent immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Callback Integration&lt;/strong&gt;: A user-defined callback function handles incoming DTMF events.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action Execution&lt;/strong&gt;: The agent executes actions or triggers workflows based on the received DTMF input like building IVR flows, collecting user input, or triggering actions in your application.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 1 : Enabling DTMF Events
&lt;/h2&gt;

&lt;p&gt;DTMF event detection can be enabled in two ways:&lt;/p&gt;

&lt;h2&gt;
  
  
  Via Dashboard:
&lt;/h2&gt;

&lt;p&gt;When creating or editing a SIP gateway in the &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/KJotBDtEI" rel="noopener noreferrer"&gt;VideoSDK dashboard&lt;/a&gt;, enable the DTMF option.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkgcd6ltlnc3wliilp1r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgkgcd6ltlnc3wliilp1r.png" alt=" " width="800" height="417"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Via API:&lt;br&gt;
Set the enableDtmf parameter to true when creating or updating a SIP gateway using the API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl    -H 'Authorization: $YOUR_TOKEN' \ 
  -H 'Content-Type: application/json' \ 
  -d '{
    "name" : "Twilio Inbound Gateway",
    "enableDtmf" : "true",
    "numbers" : ["+0123456789"]

  }' \ 
  -XPOST https://api.videosdk.live/v2/sip/inbound-gateways
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once enabled, DTMF events will be detected and published for all calls routed through that gateway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 . Implementation
&lt;/h2&gt;

&lt;p&gt;To set up inbound calls, outbound calls, and routing rules check out the Quick Start Example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from videosdk.agents import AgentSession, DTMFHandler

async def entrypoint(ctx: JobContext):

    async def dtmf_callback(digit: int):
        if digit == 1:
            agent.instructions = "You are a Sales Representative. Your goal is to sell our products"
            await agent.session.say(
                "Routing you to Sales. Hi, I'm from Sales. How can I help you today?"
            )
        elif digit == 2:
            agent.instructions = "You are a Support Specialist. Your goal is to help customers with technical issues."
            await agent.session.say(
                "Routing you to Support. Hi, I'm from Support. What issue are you facing?"
            )
        else:
            await agent.session.say(
                "Invalid input. Press 1 for Sales or 2 for Support."
            )

    dtmf_handler = DTMFHandler(dtmf_callback)

    session = AgentSession(
        dtmf_handler = dtmf_handler,
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Full Working Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import logging
from videosdk.agents import Agent, AgentSession, CascadingPipeline,WorkerJob,ConversationFlow, JobContext, RoomOptions, Options,DTMFHandler
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model

logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", handlers=[logging.StreamHandler()])
pre_download_model()
class VoiceAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a helpful voice assistant that can answer questions."
        )
    async def on_enter(self) -&amp;gt; None:
        await self.session.say("Hello, how can I help you today?")

    async def on_exit(self) -&amp;gt; None:
        await self.session.say("Goodbye!")

async def entrypoint(ctx: JobContext):

    agent = VoiceAgent()
    conversation_flow = ConversationFlow(agent)

    pipeline=CascadingPipeline(
        stt=DeepgramSTT(),
        llm=OpenAILLM(),
        tts=ElevenLabsTTS(),
        vad=SileroVAD(),
        turn_detector=TurnDetector()
    )

    async def dtmf_callback(message):
        print("DTMF message received:", message)

    dtmf_handler = DTMFHandler(dtmf_callback)

    session = AgentSession(
        agent=agent, 
        pipeline=pipeline,
        conversation_flow=conversation_flow,
        dtmf_handler = dtmf_handler,
    )

    await session.start(wait_for_participant=True, run_until_shutdown=True)

def make_context() -&amp;gt; JobContext:
    room_options = RoomOptions(name="DTMF Agent Test", playground=True)
    return JobContext(room_options=room_options) 

if __name__ == "__main__":
    job = WorkerJob(entrypoint=entrypoint, jobctx=make_context, options=Options(agent_id="YOUR_AGENT_ID", max_processes=2, register=True, host="localhost", port=8081))
    job.start()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By enabling DTMF detection and handling events at the agent level, you can build predictable call flows, guide users through menus, and trigger application logic without interrupting the call experience. When combined with voice input, DTMF gives you more control over how users interact with your agent.&lt;/p&gt;

&lt;p&gt;This makes DTMF a practical addition to any voice agent that needs clear, deterministic user input during a call.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources and Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Explore the &lt;a href="https://github.com/videosdk-live/agents-quickstart/blob/main/DTMF%20Handler/dtmf_handler.py" rel="noopener noreferrer"&gt;dtmf-implementation&lt;/a&gt; on github.&lt;/li&gt;
&lt;li&gt;To set up inbound calls, outbound calls, and routing rules check out the &lt;a href="https://docs.videosdk.live/telephony/ai-telephony-agent-quick-start#part-2-connect-your-agent-to-the-phone-network" rel="noopener noreferrer"&gt;Quick Start Example.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://docs.videosdk.live/ai_agents/deployments/introduction" rel="noopener noreferrer"&gt;deploy your AI Agents.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Explore more: Check out the &lt;a href="https://docs.videosdk.live/telephony/introduction" rel="noopener noreferrer"&gt;VideoSDK documentation&lt;/a&gt; for more features.&lt;/li&gt;
&lt;li&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;Discord community ↗&lt;/a&gt;. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Transfer Calls in AI Voice Agents Using SIP Telephony</title>
      <dc:creator>Chaitrali Kakde</dc:creator>
      <pubDate>Mon, 29 Dec 2025 10:47:01 +0000</pubDate>
      <link>https://dev.to/chaitrali_kakde/how-to-transfer-calls-in-ai-voice-agents-using-sip-telephony-533a</link>
      <guid>https://dev.to/chaitrali_kakde/how-to-transfer-calls-in-ai-voice-agents-using-sip-telephony-533a</guid>
      <description>&lt;p&gt;When someone calls a business, they’re usually looking for a quick and clear resolution not a conversation with a bot that can’t help them move forward. Sometimes it’s a billing issue. Sometimes it’s a sales inquiry. And sometimes, they just want to speak to a real person&lt;/p&gt;

&lt;p&gt;A well-designed AI voice agent knows when to assist and when to step aside. That’s where Call Transfer in VideoSDK comes in. It allows your AI agent to transfer an ongoing SIP call to the right person, without disconnecting the caller or breaking the flow of the conversation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Call Transfer
&lt;/h2&gt;

&lt;p&gt;Call Transfer enables an AI agent to move an active SIP call to another phone number without ending the session. From the caller’s perspective, the transition is automatic, the call continues without dropping, there’s no need to redial, and the conversation flows naturally without awkward pauses or repeated explanations.&lt;/p&gt;

&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The agent evaluates the user’s intent to determine when a call transfer is required and then triggers the function tool.&lt;/li&gt;
&lt;li&gt;When the function tool is triggered, it tells the system to move the call to another phone number.&lt;/li&gt;
&lt;li&gt;The ongoing SIP call is forwarded to the new number instantly, without disconnecting or redialing.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How To Trigger Call Transfer
&lt;/h2&gt;

&lt;p&gt;To set up incoming call handling, outbound calling, and routing rules, check out the &lt;a href="https://docs.videosdk.live/telephony/ai-telephony-agent-quick-start#part-2-connect-your-agent-to-the-phone-network" rel="noopener noreferrer"&gt;Quick Start Example&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from videosdk.agents import Agent, function_tool, 

class CallTransferAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are the Call Transfer Agent Which Help and provide to transfer on going call to new number. use transfer_call tool to transfer the call to new number.",
        )

    async def on_enter(self) -&amp;gt; None:
        await self.session.say("Hello Buddy, How can I help you today?")

    async def on_exit(self) -&amp;gt; None:
        await self.session.say("Goodbye Buddy, Thank you for calling!")

    @function_tool
    async def transfer_call(self) -&amp;gt; None:
        """Transfer the call to Provided number"""
        token = os.getenv("VIDEOSDK_AUTH_TOKEN")
        transfer_to = os.getenv("CALL_TRANSFER_TO") 
        return await self.session.call_transfer(token,transfer_to)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Full Working Example
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import logging
import os

from videosdk.agents import (Agent, AgentSession, CascadingPipeline, function_tool, WorkerJob, ConversationFlow, JobContext, RoomOptions, Options)
from videosdk.plugins.deepgram import DeepgramSTT
from videosdk.plugins.google import GoogleLLM
from videosdk.plugins.cartesia import CartesiaTTS
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler()]
)

# Pre-download turn detector model
pre_download_model()


class CallTransferAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions=(
                "You are the Call Transfer Agent. "
                "Help transfer ongoing calls to a new number using the transfer_call tool."
            )
        )

    async def on_enter(self) -&amp;gt; None:
        await self.session.say("Hello Buddy, How can I help you today?")

    async def on_exit(self) -&amp;gt; None:
        await self.session.say("Goodbye Buddy, Thank you for calling!")

    @function_tool
    async def transfer_call(self) -&amp;gt; None:
        """Transfer the call to the provided number"""
        token = os.getenv("VIDEOSDK_AUTH_TOKEN")
        transfer_to = os.getenv("CALL_TRANSFER_TO")
        return await self.session.call_transfer(token, transfer_to)


async def entrypoint(ctx: JobContext):
    agent = CallTransferAgent()
    conversation_flow = ConversationFlow(agent)

    pipeline = CascadingPipeline(
        stt=DeepgramSTT(),
        llm=GoogleLLM(),
        tts=CartesiaTTS(),
        vad=SileroVAD(),
        turn_detector=TurnDetector()
    )

    session = AgentSession(
        agent=agent,
        pipeline=pipeline,
        conversation_flow=conversation_flow
    )

    await session.start(wait_for_participant=True, run_until_shutdown=True)


def make_context() -&amp;gt; JobContext:
    room_options = RoomOptions(name="Call Transfer Agent", playground=True)
    return JobContext(room_options=room_options)


if __name__ == "__main__":
    job = WorkerJob(
        entrypoint=entrypoint,
        jobctx=make_context,
        options=Options(
            agent_id="YOUR_AGENT_ID",
            register=True,
            host="localhost",
            port=8081
        )
    )
    job.start()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Call Transfer transforms your AI voice agent from a simple responder into a capable call-handling system. By automatically routing ongoing SIP calls to the right person, it ensures users never experience dropped calls or awkward handoffs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Resources and Next Steps
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Visit &lt;a href="https://thrive.zohopublic.in/aref/ccUcTqaXyb/8ljdhUfmV" rel="noopener noreferrer"&gt;VideoSDK dashboard&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Explore the &lt;a href="https://github.com/videosdk-live/agents-quickstart/blob/main/Call%20Transfer/call_transfer.py" rel="noopener noreferrer"&gt;call-transfer-implementation&lt;/a&gt; on github.&lt;/li&gt;
&lt;li&gt;To set up inbound calls, outbound calls, and routing rules check out the &lt;a href="https://docs.videosdk.live/telephony/ai-telephony-agent-quick-start#part-2-connect-your-agent-to-the-phone-network" rel="noopener noreferrer"&gt;Quick Start Example.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Learn how to &lt;a href="https://docs.videosdk.live/ai_agents/deployments/introduction" rel="noopener noreferrer"&gt;deploy your AI Agents.&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Explore more: Check out the &lt;a href="https://docs.videosdk.live/telephony/introduction" rel="noopener noreferrer"&gt;VideoSDK documentation&lt;/a&gt; for more features.&lt;/li&gt;
&lt;li&gt;👉 Share your thoughts, roadblocks, or success stories in the comments or join our &lt;a href="https://dub.sh/yDV95i6" rel="noopener noreferrer"&gt;Discord community ↗&lt;/a&gt;. We’re excited to learn from your journey and help you build even better AI-powered communication tools!&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>webdev</category>
      <category>ai</category>
      <category>agents</category>
      <category>sip</category>
    </item>
  </channel>
</rss>
