When it comes to voice AI, the real challenge isn’t speed it’s timing.
A response that arrives a second too late feels unnatural. That tiny pause is enough to remind users they’re talking to a machine. Humans don’t wait for sentences to end. We anticipate intent and respond at the right moment. Traditional voice agents don’t. They wait for silence and that’s what makes conversations feel slow.
Preemptive Response fixes this by letting voice agents start understanding and preparing responses while the user is still speaking.
What Is Preemptive Response?
Preemptive Response is a capability that allows a voice agent to start understanding a user’s intent before they finish speaking.
As the user talks, the Speech-to-Text engine emits partial transcripts in real time. These partial results are enough for the agent to begin reasoning early, instead of waiting for the full sentence and a moment of silence.
The goal isn’t to interrupt the user it’s to be ready at the right moment.
How Preemptive response works
- User audio is streamed to the STT, which generates partial transcripts.
- These partial transcripts are immediately sent to the LLM to enable preemptive (early) responses.
- The LLM output is then passed to the TTS to generate the spoken response.
Enabling Preemptive Response
To enable this feature, set the enable_preemptive_generation flag to True when initializing your STT plugin (e.g., DeepgramSTTV2).
from videosdk.plugins.deepgram import DeepgramSTTV2
stt = DeepgramSTTV2(
enable_preemptive_generation=True
)
Once enabled, partial transcripts start flowing automatically and your agent begins preparing responses earlier by design.
Currently, preemptive response generation is limited to Deepgram’s STT implementation and is available only in the Flux model.
Implementation
Prerequisites
- A VideoSDK authentication token (generate from app.videosdk.live), follow to guide to generate videosdk token)
- A VideoSDK meeting ID (you can generate one using the Create Room API or through the VideoSDK dashboard)
- Python 3.12 or higher
Install dependencies
pip install "videosdk-agents[deepgram,openai,elevenlabs,silero,turn_detector]"
Want to use a different provider? Check out our plugins for STT, LLM, and TTS.
Set API Keys in .env
DEEPGRAM_API_KEY = "Your Deepgram API Key"
OPENAI_API_KEY = "Your OpenAI API Key"
ELEVENLABS_API_KEY = "Your ElevenLabs API Key"
VIDEOSDK_AUTH_TOKEN = "VideoSDK Auth token"
API Keys - Get API keys Deepgram ↗, OpenAI ↗, ElevenLabs ↗ & VideoSDK Dashboard ↗ follow to guide to generate videosdk token
Full Working Example
import asyncio
import os
from videosdk.agents import Agent, AgentSession, CascadingPipeline, JobContext, RoomOptions, WorkerJob, ConversationFlow
from videosdk.plugins.silero import SileroVAD
from videosdk.plugins.turn_detector import TurnDetector, pre_download_model
from videosdk.plugins.deepgram import DeepgramSTTV2
from videosdk.plugins.openai import OpenAILLM
from videosdk.plugins.elevenlabs import ElevenLabsTTS
# Pre-download the Turn Detector model to avoid delays during startup
pre_download_model()
class MyVoiceAgent(Agent):
def __init__(self):
super().__init__(instructions="You are a helpful voice assistant that can answer questions and help with tasks.")
async def on_enter(self):
await self.session.say("Hello! How can I help you today?")
async def on_exit(self):
await self.session.say("Goodbye!")
async def start_session(context: JobContext):
# 1. Create the agent and conversation flow
agent = MyVoiceAgent()
conversation_flow = ConversationFlow(agent)
# 2. Define the pipeline with Preemptive Generation enabled
pipeline = CascadingPipeline(
stt=DeepgramSTTV2(
model="flux-general-en",
enable_preemptive_generation=True # Enable low-latency partials
),
llm=OpenAILLM(model="gpt-4o"),
tts=ElevenLabsTTS(model="eleven_flash_v2_5"),
vad=SileroVAD(threshold=0.35),
turn_detector=TurnDetector(threshold=0.8)
)
# 3. Initialize the session
session = AgentSession(
agent=agent,
pipeline=pipeline,
conversation_flow=conversation_flow
)
try:
await context.connect()
await session.start()
# Keep the session running
await asyncio.Event().wait()
finally:
# Clean up resources
await session.close()
await context.shutdown()
def make_context() -> JobContext:
room_options = RoomOptions(
name="VideoSDK Cascaded Agent",
playground=True
)
return JobContext(room_options=room_options)
if __name__ == "__main__":
job = WorkerJob(entrypoint=start_session, jobctx=make_context)
job.start()
Run the Python Script
python main.py
You can also use console for running the script :
python main.py console
With Preemptive Response enabled, the voice agent no longer waits for speech to end. It begins processing intent as audio arrives, reducing latency and keeping conversations natural. The result is a responsive, end-to-end voice experience that feels fluid in real time.
Next Steps
- Explore the preemptive-response-docs for more information.
- Learn how to deploy your AI Agents.
- Visit Deepgram's flux documentation.

Top comments (0)