Think about how much time you spend scheduling meetings, checking the weather, sending emails, or jotting down notes throughout the day. These small tasks, though simple, add up and that’s where AI voice agents can help.
Voice assistants powered by VideoSDK, combined with MCP (Model Context Protocol), allow you to integrate your assistant with real-world tools Google Calendar, Notion, weather APIs, reminder apps, and more. With just a voice command, you can automate tasks, fetch data, or control devices seamlessly.
For developers, this is a golden opportunity to create customized workflows and integrations without reinventing the wheel. For users, it means smarter interactions and less manual effort.
📖 What is MCP Integration?
MCP (Model Context Protocol) is a flexible communication layer that allows your AI voice agent to exchange data and events with external services. Whether you're using a local MCP server hosted on your machine or a cloud-based automation platform like Zapier, MCP enables your agent to trigger actions such as scheduling meetings, setting reminders, or fetching information all through voice commands.
Key Components:
- Agent :The AI voice interface built with VideoSDK that listens to user input and dispatches requests.
- MCP Server : Acts as a bridge, connecting the agent to external tools and APIs.
- Integrations : Services like Google Calendar, Notion, or weather APIs that the MCP server can trigger.
📂 Local vs Cloud MCP Integration
| Feature | Local MCP Server | Cloud Integration (Zapier, etc.) |
|---|---|---|
| Hosting | Your machine or private server | Third-party automation platforms |
| Security | You control everything | Easier to implement but depends on cloud provider |
| Flexibility | Customizable workflows | Pre-built connectors and triggers |
| Use Case | Internal apps, data privacy | Fast deployment, multi-step automation |
💥 How to integrate VideoSdk AI Agents with Mcp server
MCP follows a client-server architecture, where the AI agent communicates with external services using structured requests and responses. With VideoSDK’s built-in MCP support, you can easily connect your agent to external Python scripts or APIs for advanced workflows.
⚙ Installation Prerequisites
- A VideoSDK authentication token (generate from app.videosdk.live), follow to guide to generate videosdk token
- A VideoSDK meeting ID (you can generate one using the Create Room API or through the VideoSDK dashboard)
- A Google Cloud Project with the Gemini API enabled and an API Key (Refer to the Google Cloud documentation for setup instructions).
- Python 3.12 or higher
pip install fastmcp # For creating MCP servers
pip install videosdk-agents # VideoSDK agents with MCP support
1. STDIO Transport (MCPServerStdio) : For local process MCPServerStdio
import sys
from pathlib import Path
from videosdk.agents import Agent, MCPServerStdio
# Path to the external Python script (recipe service)
mcp_server_path = Path(__file__).parent / "recipe_service.py"
class RecipeAgent(Agent):
def __init__(self):
super().__init__(
instructions="You are a cooking assistant. Use external services to suggest recipes.",
mcp_servers=[
MCPServerStdio(
executable_path=sys.executable,
process_arguments=[str(mcp_server_path)],
session_timeout=30
)
]
)
# Usage:
# User says: "Suggest a recipe with chicken and rice"
# Agent calls the external MCP server script
# Script returns: "Try chicken fried rice with vegetables"
2. HTTP Transport ( MCPServerHTTP*) :* For remote service communication:
from videosdk.agents import MCPServerHTTP
mcp_servers=[
MCPServerHTTP(
endpoint_url="https://your-mcp-server.com/api/mcp",
session_timeout=30
)
]
- Multiple MCP Servers
mcp_servers=[
MCPServerStdio(
executable_path=sys.executable,
process_arguments=[str(mcp_script)],
session_timeout=30
),
MCPServerHTTP(
endpoint_url="https://mcp.zapier.com/api/mcp/s/your-server-id",
session_timeout=30
)
]
Full working example
- Create a main.py file
import asyncio
import pathlib
import sys
from videosdk.agents import Agent, AgentSession, RealTimePipeline,MCPServerStdio, MCPServerHTTP
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig
class MyVoiceAgent(Agent):
def __init__(self):
# Define paths to your MCP servers
mcp_script = Path(__file__).parent.parent / "MCP_Example" / "mcp_stdio_example.py"
super().__init__(
instructions="""You are a helpful assistant with access to real-time data.
You can provide current time information.
Always be conversational and helpful in your responses.""",
mcp_servers=[
# STDIO MCP Server (Local Python script for time)
MCPServerStdio(
executable_path=sys.executable, # Use current Python interpreter
process_arguments=[str(mcp_script)],
session_timeout=30
),
# HTTP MCP Server (External service example e.g Zapier)
MCPServerHTTP(
endpoint_url="https://your-mcp-service.com/api/mcp",
session_timeout=30
)
]
)
async def on_enter(self) -> None:
await self.session.say("Hi there! How can I help you today?")
async def on_exit(self) -> None:
await self.session.say("Thank you for using the assistant. Goodbye!")
async def main(context: dict):
# Configure Gemini Realtime model
model = GeminiRealtime(
model="gemini-2.0-flash-live-001",
config=GeminiLiveConfig(
voice="Leda", # Available voices: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr
response_modalities=["AUDIO"]
)
)
pipeline = RealTimePipeline(model=model)
agent = MyVoiceAgent()
session = AgentSession(
agent=agent,
pipeline=pipeline,
context=context
)
try:
# Start the session
await session.start()
# Keep the session running until manually terminated
await asyncio.Event().wait()
finally:
# Clean up resources when done
await session.close()
if __name__ == "__main__":
def make_context():
# When VIDEOSDK_AUTH_TOKEN is set in .env - DON'T include videosdk_auth
return {
"meetingId": "your_actual_meeting_id_here", # Replace with actual meeting ID
"name": "AI Voice Agent",
"videosdk_auth": "your_videosdk_auth_token_here" # Replace with actual token
}
- Create another mcp_stdio_example.py file
from mcp.server.fastmcp import FastMCP
import datetime
# Create the MCP server
mcp = FastMCP("CurrentTimeServer")
@mcp.tool()
def get_current_time() -> str:
"""Get the current time in the user's location"""
# Get current time
now = datetime.datetime.now()
# Return formatted time string
return f"The current time is {now.strftime('%H:%M:%S')} on {now.strftime('%Y-%m-%d')}"
if __name__ == "__main__":
# Run the server with STDIO transport
mcp.run(transport="stdio")
Github code full implementation : https://github.com/videosdk-live/agents-quickstart/blob/main/MCP/mcp_agent.py
Conclusion
With VideoSDK AI voice agents and MCP servers, developers can automate workflows like meetings, reminders, Notion updates, weather updates, and emails through voice commands. Both local scripts and cloud integrations are supported, making automation flexible and scalable.
We hope this deep dive into building AI voice agents with VideoSDK and MCP servers has been helpful.
💡 We’d love to hear from you!
- Did you try implementing any of the examples above?
- Are there specific day-to-day workflows you want your AI agent to handle?
- Do you have questions about setting up MCP servers locally?
Drop your thoughts, questions, or experiences in the comments below (or in discord), and let’s make AI voice agents even more practical and powerful together!
Resources
- Github repo : https://github.com/videosdk-community/ai-agent-demo
- Youtube video link : https://www.youtube.com/watch?v=I0bpn7weeqg
- Youtube video link for calender integration : https://www.youtube.com/watch?v=_lrG65ozLI0
- Read more about MCP here: https://docs.videosdk.live/ai_agents/mcp-integration

Top comments (2)
Can you share github link for mcp integration
github.com/videosdk-live/agents-qu...