How to build mcp server from scratch: Beginner’s Guide🥳🔥

#mcp #webdev #ai #showdev

Think about how much time you spend scheduling meetings, checking the weather, sending emails, or jotting down notes throughout the day. These small tasks, though simple, add up and that’s where AI voice agents can help.

Voice assistants powered by VideoSDK, combined with MCP (Model Context Protocol), allow you to integrate your assistant with real-world tools Google Calendar, Notion, weather APIs, reminder apps, and more. With just a voice command, you can automate tasks, fetch data, or control devices seamlessly.

For developers, this is a golden opportunity to create customized workflows and integrations without reinventing the wheel. For users, it means smarter interactions and less manual effort.

📖 What is MCP Integration?

MCP (Model Context Protocol) is a flexible communication layer that allows your AI voice agent to exchange data and events with external services. Whether you're using a local MCP server hosted on your machine or a cloud-based automation platform like Zapier, MCP enables your agent to trigger actions such as scheduling meetings, setting reminders, or fetching information all through voice commands.

Key Components:

Agent :The AI voice interface built with VideoSDK that listens to user input and dispatches requests.
MCP Server : Acts as a bridge, connecting the agent to external tools and APIs.
Integrations : Services like Google Calendar, Notion, or weather APIs that the MCP server can trigger.

📂 Local vs Cloud MCP Integration

Feature	Local MCP Server	Cloud Integration (Zapier, etc.)
Hosting	Your machine or private server	Third-party automation platforms
Security	You control everything	Easier to implement but depends on cloud provider
Flexibility	Customizable workflows	Pre-built connectors and triggers
Use Case	Internal apps, data privacy	Fast deployment, multi-step automation

💥 How to integrate VideoSdk AI Agents with Mcp server

MCP follows a client-server architecture, where the AI agent communicates with external services using structured requests and responses. With VideoSDK’s built-in MCP support, you can easily connect your agent to external Python scripts or APIs for advanced workflows.

⚙ Installation Prerequisites

A VideoSDK authentication token (generate from app.videosdk.live), follow to guide to generate videosdk token
A VideoSDK meeting ID (you can generate one using the Create Room API or through the VideoSDK dashboard)
A Google Cloud Project with the Gemini API enabled and an API Key (Refer to the Google Cloud documentation for setup instructions).
Python 3.12 or higher

pip install fastmcp  # For creating MCP servers
pip install videosdk-agents  # VideoSDK agents with MCP support

1. STDIO Transport (MCPServerStdio) : For local process MCPServerStdio

import sys
from pathlib import Path
from videosdk.agents import Agent, MCPServerStdio

# Path to the external Python script (recipe service)
mcp_server_path = Path(__file__).parent / "recipe_service.py"

class RecipeAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a cooking assistant. Use external services to suggest recipes.",
            mcp_servers=[
                MCPServerStdio(
                    executable_path=sys.executable,
                    process_arguments=[str(mcp_server_path)],
                    session_timeout=30
                )
            ]
        )

# Usage:
# User says: "Suggest a recipe with chicken and rice"
# Agent calls the external MCP server script
# Script returns: "Try chicken fried rice with vegetables"

2. HTTP Transport ( MCPServerHTTP) : For remote service communication:

from videosdk.agents import MCPServerHTTP

mcp_servers=[
    MCPServerHTTP(
        endpoint_url="https://your-mcp-server.com/api/mcp",
        session_timeout=30
    )
]

Multiple MCP Servers

mcp_servers=[
    MCPServerStdio(
        executable_path=sys.executable,
        process_arguments=[str(mcp_script)],
        session_timeout=30
    ),
    MCPServerHTTP(
        endpoint_url="https://mcp.zapier.com/api/mcp/s/your-server-id",
        session_timeout=30
    )
]

Full working example

Create a main.py file

import asyncio
import pathlib
import sys
from videosdk.agents import Agent, AgentSession, RealTimePipeline,MCPServerStdio, MCPServerHTTP
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig

class MyVoiceAgent(Agent):
    def __init__(self):
        # Define paths to your MCP servers
        mcp_script = Path(__file__).parent.parent / "MCP_Example" / "mcp_stdio_example.py"
        super().__init__(
            instructions="""You are a helpful assistant with access to real-time data. 
            You can provide current time information. 
            Always be conversational and helpful in your responses.""",
            mcp_servers=[
                # STDIO MCP Server (Local Python script for time)
                MCPServerStdio(
                    executable_path=sys.executable,  # Use current Python interpreter
                    process_arguments=[str(mcp_script)],
                    session_timeout=30
                ),
                # HTTP MCP Server (External service example e.g Zapier)
                MCPServerHTTP(
                    endpoint_url="https://your-mcp-service.com/api/mcp",
                    session_timeout=30
                )
            ]
        )

    async def on_enter(self) -> None:
        await self.session.say("Hi there! How can I help you today?")

    async def on_exit(self) -> None:
        await self.session.say("Thank you for using the assistant. Goodbye!")

async def main(context: dict):

    # Configure Gemini Realtime model
    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        config=GeminiLiveConfig(
            voice="Leda",  # Available voices: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr
            response_modalities=["AUDIO"]
        )
    )

    pipeline = RealTimePipeline(model=model)
    agent = MyVoiceAgent()

    session = AgentSession(
        agent=agent,
        pipeline=pipeline,
        context=context
    )

    try:
        # Start the session
        await session.start()
        # Keep the session running until manually terminated
        await asyncio.Event().wait()
    finally:
        # Clean up resources when done
        await session.close()

if __name__ == "__main__":
    def make_context():
        # When VIDEOSDK_AUTH_TOKEN is set in .env - DON'T include videosdk_auth
        return {
        "meetingId": "your_actual_meeting_id_here",  # Replace with actual meeting ID
        "name": "AI Voice Agent", 
        "videosdk_auth": "your_videosdk_auth_token_here"  # Replace with actual token
    }

Create another mcp_stdio_example.py file

from mcp.server.fastmcp import FastMCP
import datetime

# Create the MCP server
mcp = FastMCP("CurrentTimeServer")

@mcp.tool()
def get_current_time() -> str:
    """Get the current time in the user's location"""

    # Get current time
    now = datetime.datetime.now()

    # Return formatted time string
    return f"The current time is {now.strftime('%H:%M:%S')} on {now.strftime('%Y-%m-%d')}"

if __name__ == "__main__":
    # Run the server with STDIO transport
    mcp.run(transport="stdio")

Github code full implementation : https://github.com/videosdk-live/agents-quickstart/blob/main/MCP/mcp_agent.py

Conclusion

With VideoSDK AI voice agents and MCP servers, developers can automate workflows like meetings, reminders, Notion updates, weather updates, and emails through voice commands. Both local scripts and cloud integrations are supported, making automation flexible and scalable.

We hope this deep dive into building AI voice agents with VideoSDK and MCP servers has been helpful.

💡 We’d love to hear from you!

Did you try implementing any of the examples above?
Are there specific day-to-day workflows you want your AI agent to handle?
Do you have questions about setting up MCP servers locally?

Drop your thoughts, questions, or experiences in the comments below (or in discord), and let’s make AI voice agents even more practical and powerful together!