DEV Community

Cover image for How to build mcp server from scratch: Beginner’s Guide🥳🔥
Chaitrali Kakde
Chaitrali Kakde

Posted on • Edited on

How to build mcp server from scratch: Beginner’s Guide🥳🔥

Think about how much time you spend scheduling meetings, checking the weather, sending emails, or jotting down notes throughout the day. These small tasks, though simple, add up and that’s where AI voice agents can help.

Voice assistants powered by VideoSDK, combined with MCP (Model Context Protocol), allow you to integrate your assistant with real-world tools Google Calendar, Notion, weather APIs, reminder apps, and more. With just a voice command, you can automate tasks, fetch data, or control devices seamlessly.

For developers, this is a golden opportunity to create customized workflows and integrations without reinventing the wheel. For users, it means smarter interactions and less manual effort.


📖 What is MCP Integration?

MCP (Model Context Protocol) is a flexible communication layer that allows your AI voice agent to exchange data and events with external services. Whether you're using a local MCP server hosted on your machine or a cloud-based automation platform like Zapier, MCP enables your agent to trigger actions such as scheduling meetings, setting reminders, or fetching information all through voice commands.

Key Components:

  • Agent :The AI voice interface built with VideoSDK that listens to user input and dispatches requests.
  • MCP Server : Acts as a bridge, connecting the agent to external tools and APIs.
  • Integrations : Services like Google Calendar, Notion, or weather APIs that the MCP server can trigger.

mcp architecture


📂 Local vs Cloud MCP Integration

Feature Local MCP Server Cloud Integration (Zapier, etc.)
Hosting Your machine or private server Third-party automation platforms
Security You control everything Easier to implement but depends on cloud provider
Flexibility Customizable workflows Pre-built connectors and triggers
Use Case Internal apps, data privacy Fast deployment, multi-step automation

💥 How to integrate VideoSdk AI Agents with Mcp server

MCP follows a client-server architecture, where the AI agent communicates with external services using structured requests and responses. With VideoSDK’s built-in MCP support, you can easily connect your agent to external Python scripts or APIs for advanced workflows.


⚙ Installation Prerequisites

pip install fastmcp  # For creating MCP servers
pip install videosdk-agents  # VideoSDK agents with MCP support
Enter fullscreen mode Exit fullscreen mode

1. STDIO Transport (MCPServerStdio) : For local process MCPServerStdio

import sys
from pathlib import Path
from videosdk.agents import Agent, MCPServerStdio

# Path to the external Python script (recipe service)
mcp_server_path = Path(__file__).parent / "recipe_service.py"

class RecipeAgent(Agent):
    def __init__(self):
        super().__init__(
            instructions="You are a cooking assistant. Use external services to suggest recipes.",
            mcp_servers=[
                MCPServerStdio(
                    executable_path=sys.executable,
                    process_arguments=[str(mcp_server_path)],
                    session_timeout=30
                )
            ]
        )

# Usage:
# User says: "Suggest a recipe with chicken and rice"
# Agent calls the external MCP server script
# Script returns: "Try chicken fried rice with vegetables"
Enter fullscreen mode Exit fullscreen mode

2. HTTP Transport ( MCPServerHTTP*) :* For remote service communication:

from videosdk.agents import MCPServerHTTP

mcp_servers=[
    MCPServerHTTP(
        endpoint_url="https://your-mcp-server.com/api/mcp",
        session_timeout=30
    )
]
Enter fullscreen mode Exit fullscreen mode
  1. Multiple MCP Servers
mcp_servers=[
    MCPServerStdio(
        executable_path=sys.executable,
        process_arguments=[str(mcp_script)],
        session_timeout=30
    ),
    MCPServerHTTP(
        endpoint_url="https://mcp.zapier.com/api/mcp/s/your-server-id",
        session_timeout=30
    )
]
Enter fullscreen mode Exit fullscreen mode

Full working example

  • Create a main.py file
import asyncio
import pathlib
import sys
from videosdk.agents import Agent, AgentSession, RealTimePipeline,MCPServerStdio, MCPServerHTTP
from videosdk.plugins.google import GeminiRealtime, GeminiLiveConfig

class MyVoiceAgent(Agent):
    def __init__(self):
        # Define paths to your MCP servers
        mcp_script = Path(__file__).parent.parent / "MCP_Example" / "mcp_stdio_example.py"
        super().__init__(
            instructions="""You are a helpful assistant with access to real-time data. 
            You can provide current time information. 
            Always be conversational and helpful in your responses.""",
            mcp_servers=[
                # STDIO MCP Server (Local Python script for time)
                MCPServerStdio(
                    executable_path=sys.executable,  # Use current Python interpreter
                    process_arguments=[str(mcp_script)],
                    session_timeout=30
                ),
                # HTTP MCP Server (External service example e.g Zapier)
                MCPServerHTTP(
                    endpoint_url="https://your-mcp-service.com/api/mcp",
                    session_timeout=30
                )
            ]
        )

    async def on_enter(self) -> None:
        await self.session.say("Hi there! How can I help you today?")

    async def on_exit(self) -> None:
        await self.session.say("Thank you for using the assistant. Goodbye!")

async def main(context: dict):

    # Configure Gemini Realtime model
    model = GeminiRealtime(
        model="gemini-2.0-flash-live-001",
        config=GeminiLiveConfig(
            voice="Leda",  # Available voices: Puck, Charon, Kore, Fenrir, Aoede, Leda, Orus, Zephyr
            response_modalities=["AUDIO"]
        )
    )

    pipeline = RealTimePipeline(model=model)
    agent = MyVoiceAgent()

    session = AgentSession(
        agent=agent,
        pipeline=pipeline,
        context=context
    )

    try:
        # Start the session
        await session.start()
        # Keep the session running until manually terminated
        await asyncio.Event().wait()
    finally:
        # Clean up resources when done
        await session.close()

if __name__ == "__main__":
    def make_context():
        # When VIDEOSDK_AUTH_TOKEN is set in .env - DON'T include videosdk_auth
        return {
        "meetingId": "your_actual_meeting_id_here",  # Replace with actual meeting ID
        "name": "AI Voice Agent", 
        "videosdk_auth": "your_videosdk_auth_token_here"  # Replace with actual token
    }
Enter fullscreen mode Exit fullscreen mode
  • Create another mcp_stdio_example.py file
from mcp.server.fastmcp import FastMCP
import datetime

# Create the MCP server
mcp = FastMCP("CurrentTimeServer")

@mcp.tool()
def get_current_time() -> str:
    """Get the current time in the user's location"""

    # Get current time
    now = datetime.datetime.now()

    # Return formatted time string
    return f"The current time is {now.strftime('%H:%M:%S')} on {now.strftime('%Y-%m-%d')}"

if __name__ == "__main__":
    # Run the server with STDIO transport
    mcp.run(transport="stdio")
Enter fullscreen mode Exit fullscreen mode

Github code full implementation : https://github.com/videosdk-live/agents-quickstart/blob/main/MCP/mcp_agent.py


Conclusion

With VideoSDK AI voice agents and MCP servers, developers can automate workflows like meetings, reminders, Notion updates, weather updates, and emails through voice commands. Both local scripts and cloud integrations are supported, making automation flexible and scalable.

We hope this deep dive into building AI voice agents with VideoSDK and MCP servers has been helpful.

💡 We’d love to hear from you!

  • Did you try implementing any of the examples above?
  • Are there specific day-to-day workflows you want your AI agent to handle?
  • Do you have questions about setting up MCP servers locally?

Drop your thoughts, questions, or experiences in the comments below (or in discord), and let’s make AI voice agents even more practical and powerful together!


Resources

Top comments (2)

Collapse
 
hakuna_matata_ profile image
himanshu

Can you share github link for mcp integration

Collapse
 
chaitrali_kakde profile image
Chaitrali Kakde