Hello! I'm Sakasegawa (https://x.com/gyakuse_en)!
About This Article
Today, I will introduce the MCP (Model Context Protocol) announced by Claude (https://modelcontextprotocol.io/), and I'll show you how to create your own MCP server and how it can be used with LLMs other than Claude. I chose to cover MCP because it incorporates important concepts for creating agent systems, which have been attracting attention recently.
What You Will Gain From Today's Article
- Understanding of MCP
- Specific implementation methods
After reading today's article, you'll be able to create a video recommendation app using MCP+ChatGPT+YouTube.
Before MCP
Now, before we get into what MCP is, let's think about AI agents.
MCP is actually a fairly simple protocol (and therefore, I think everyone is making one. It's quite unclear whether MCP will become widespread in the future), so when you think about AI agents, it's something that's naturally required.
For example, an AI agent specialized in weather would understand various tools related to weather, select the appropriate tool in response to a user's command (e.g., "Tell me today's weather"), execute the tool, and provide an answer based on the information obtained.
When we talk about this, you might have remembered Function Calling. Function calling allows you to select the appropriate tool from a user's natural language instructions. MCP is needed from the motivation that tools need to be properly defined in the first place.
The definition of a function call was a set of function name, description, required parameters, and expected return values. For a weather forecast function call, the location and date would be defined as parameters, and the forecast result information would be defined as the expected return value. However, this function call does not cover the part where the actual weather information API is called. Therefore, we need a definition of the tool. So, how should we define it? (Correction: many function calls do not guarantee the return value of the function itself. The main thing is to define the function and the necessary parameters.)
A tool server that summarizes weather information would ideally do the following:
- When connected to the server, it tells you:
- A list of available tools
- Parameters and return values are defined for each tool.
- Workflow-like things that combine tools in a good way
- A list of available tools
- When you send an appropriate request to a tool or workflow, it executes and returns the results nicely
- For weather, it would be nice to have images as well as text returned nicely
MCP is a protocol that summarizes what a tool server should ideally be like.
What is MCP?
Model Context Protocol (MCP) is an open protocol that enables seamless integration between LLM applications and external data sources or tools. Whether you're building an AI-powered IDE, enhancing a chat interface, or creating custom AI workflows, MCP provides a standardized way to connect and provide the necessary context for LLMs.
MCP is an open protocol and can be integrated into any application. IDEs, AI tools, and other software can connect using MCP in a standardized way for local integration.
Quoted and translated from https://modelcontextprotocol.io/introduction
The above explanation might be a bit difficult to understand, but as explained earlier, you can think of it as a mechanism to provide various tools to LLMs.
Terminology Used in MCP
- MCP Hosts: Programs that want to access resources through MCP, such as Claude Desktop, IDEs, and AI tools.
- In short, this is the front-end of the interaction with the user.
- MCP Clients: Protocol clients that maintain a one-to-one connection with the server.
- This can be thought of as a service layer that mediates the connection between the host and the server.
- MCP Servers: Lightweight programs where each server provides specific functionality through the standardized Model Context Protocol.
- This is the actual server.
- Local Resources: Resources on your computer, such as databases, files, and services, that the MCP server can safely access.
- These are resources handled by the server.
- Remote Resources: Resources on the internet that the MCP server connects to via APIs.
- These are resources in the cloud.
MCP Servers
Now, let's take a closer look at MCP servers.
An MCP server provides the following:
- Resources
- Logs, images, API responses (if the MCP server communicates with an external API, for example), etc.
- Prompts
- A mechanism for interactively performing workflows (single or multiple tool execution).
- There are two cases from the MCP client: tools are called directly, or they are called via prompts.
- Tools
- This is the actual implementation of the execution.
- You can get a group of tools from the MCP client with
tools/list
.
- Notification function
- Notifies when resources, prompts, tools, etc. are added to the server.
- If you watch resources, you can display information like "we are here now", which is convenient.
Example of MCP Server Implementation
Implementing Tools
Here's a sample implementation. It's an easy job to write the implementation under if_name
in call_tool
and add it to list_tools
.
app = Server("example-server")
@app.list_tools()
async def list_tools() -> list[types.Tool]:
return [
types.Tool(
name="calculate_sum",
description="Add two numbers together",
inputSchema={
"type": "object",
"properties": {
"a": {"type": "number"},
"b": {"type": "number"}
},
"required": ["a", "b"]
}
)
]
@app.call_tool()
async def call_tool(
name: str,
arguments: dict
) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]:
if name == "calculate_sum":
a = arguments["a"]
b = arguments["b"]
result = a + b
return [types.TextContent(type="text", text=str(result))]
raise ValueError(f"Tool not found: {name}")
Concrete Example of SQLite MCP Server
Let's consider an SQLite MCP server to better understand.
This SQLite server contains sales data for a store.
Pre-stage
- Perform a handshake with the MCP server and get
tools/list
, etc. - It might be a good idea to dynamically convert
tools/list
andprompts/list
into function calls.
Usage Stage
- Tool detection on the MCP host
- The host application analyzes the request content.
- Function Calling
- It determines that the request corresponds to the operation "Get top-selling products".
- It is inferred that the tool
top-selling
should be applied.
- Parameter validation
- The tool
top-selling
requires the following arguments:-
date
: Date of sales data (required) -
limit
: Number of top products (optional, default is 5)
-
- The MCP host obtains or completes these arguments from the user request or conversation history.
- Example: If
date
is not specified, set "today's date" as the default value.
- Example: If
- The tool
- Generating and sending SQL statements
- The MCP host generates the appropriate SQL statement.
- This SQL statement is sent to the MCP endpoint corresponding to the tool
top-selling
. - The MCP server endpoint will be
tools/call
.
- Processing on the MCP server
- The server receives the
tools/call
request.- Tool name:
top-selling
- Arguments:
{"date": "2024-12-01", "limit": 5}
- Tool name:
- The server validates the request and executes the corresponding SQL statement.
- It queries the SQLite database internally and gets the results.
- The server receives the
- Returning results to the host
- The MCP server returns the query results to the host in JSON format.
- Processing results on the host
- The MCP host formats the received data and responds to the user in natural language.
- Example: "Today's top-selling products are: 1st Apples (5000 yen), 2nd Gorillas (4500 yen), 3rd Trumpets (3000 yen)..."
And so, the MCP server is created in this way.
LLMs Other Than Claude + MCP Server
Finally, let's actually create a simple MCP server and run it using an LLM other than Claude. It's a mystery whether it will work well with function calling. I hope it does. For now, let's implement it.
Creating an MCP Server
https://modelcontextprotocol.io/docs/first-server/python
https://github.com/modelcontextprotocol/create-python-server
This time, we will use the YouTube Data API v3 to create a tool that searches YouTube.
uvx create-mcp-server
I named it mcp_server_youtube
.
- A directory named
mcp_server_youtube
will be created. - The server implementation will be written in
mcp_server_youtube/src/mcp_server_youtube/server.py
.
Implementation
The implementation of the MCP server was mostly done using gpt-4o.
- Key points
- Since only the
youtube-search
tool is registered in this server this time, processing is only performed if the request arriving athandle_call_tool
matchesyoutube-search
. - The YouTube Data API v3 is simply implemented as an API.
- Since only the
- Benefits of this
- If you're just calling an API from function calling normally, you don't need an MCP server. However, creating it as an independent MCP server makes it easier to reuse.
import asyncio
from googleapiclient.discovery import build
from mcp.server.models import InitializationOptions
import mcp.types as types
from mcp.server import NotificationOptions, Server
import mcp.server.stdio
from pydantic import AnyUrl
import os
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# Set YouTube Data API key
YOUTUBE_API_KEY = os.getenv("YOUTUBE_API_KEY")
if not YOUTUBE_API_KEY:
raise ValueError("YouTube API Key is not set in the environment variables.")
youtube = build("youtube", "v3", developerKey=YOUTUBE_API_KEY)
# Initialize MCP server
server = Server("mcp_server_youtube")
# Resource (server state) to hold video search results
search_results = {}
# Function to search YouTube videos using YouTube Data API
def search_youtube_videos(query: str, max_results: int = 5):
"""
Search YouTube videos and return the results.
"""
request = youtube.search().list(
q=query,
part="snippet",
type="video",
maxResults=max_results
)
response = request.execute()
# Format and return video information
videos = []
for item in response["items"]:
video_info = {
"title": item["snippet"]["title"],
"description": item["snippet"]["description"],
"channel": item["snippet"]["channelTitle"],
"url": f"https://www.youtube.com/watch?v={item['id']['videoId']}"
}
videos.append(video_info)
return videos
# Endpoint to list tools
@server.list_tools()
async def handle_list_tools() -> list[types.Tool]:
"""
Provide a list of tools.
"""
return [
types.Tool(
name="youtube-search",
description="Search for YouTube videos by query.",
inputSchema={
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "The search query for YouTube videos."
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return.",
"default": 5
}
},
"required": ["query"]
}
)
]
# Endpoint to execute tools
@server.call_tool()
async def handle_call_tool(
name: str, arguments: dict
) -> list[types.TextContent | types.EmbeddedResource]:
"""
Execute the tool to perform a YouTube search.
"""
if name != "youtube-search":
raise ValueError(f"Unknown tool: {name}")
# Get arguments
query = arguments.get("query")
max_results = arguments.get("max_results", 5)
# Perform YouTube search
videos = search_youtube_videos(query, max_results)
# Save results to resources
search_results[query] = videos
# Return results to the client
results_text = "\n".join(
[f"{video['title']} ({video['url']}) - {video['channel']}" for video in videos]
)
return [
types.TextContent(
type="text",
text=f"Search results for '{query}':\n{results_text}"
)
]
# Endpoint to list resources
@server.list_resources()
async def handle_list_resources() -> list[types.Resource]:
"""
List search results as resources.
"""
return [
types.Resource(
uri=AnyUrl(f"youtube-search://{query}"),
name=f"Search Results for '{query}'",
description=f"Search results for query: '{query}'",
mimeType="text/plain",
)
for query in search_results
]
# Endpoint to read resources
@server.read_resource()
async def handle_read_resource(uri: AnyUrl) -> str:
"""
Return the results for the specified search query.
"""
query = uri.path.lstrip("/")
if query not in search_results:
raise ValueError(f"No results found for query: {query}")
# Format and return results
videos = search_results[query]
return "\n".join(
[f"{video['title']} ({video['url']}) - {video['channel']}" for video in videos]
)
# Main function
async def main():
async with mcp.server.stdio.stdio_server() as (read_stream, write_stream):
await server.run(
read_stream,
write_stream,
InitializationOptions(
server_name="mcp_server_youtube",
server_version="0.0.1",
capabilities=server.get_capabilities(
notification_options=NotificationOptions(),
experimental_capabilities={}
)
)
)
# Execution
if __name__ == "__main__":
asyncio.run(main())
Implementing the Host Side
Next, let's implement the host side using gradio.
This was also implemented with the support of gpt-4o
.
- Key points
- Intent detection (role as MCP host) using gpt-4o's Function Calling
- The MCP client part is handled by
mcp_search_youtube
- If you want to go a bit further
- If a list of MCP servers to subscribe to is given, perform a handshake at startup, obtain
tools/list
, etc., and dynamically generate the function call definition part.
- If a list of MCP servers to subscribe to is given, perform a handshake at startup, obtain
import gradio as gr
import openai
import json
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from dotenv import load_dotenv
import os
import asyncio
# Load environment variables
load_dotenv()
# OpenAI API key
openai.api_key = os.getenv("OPENAI_API_KEY")
# MCP server settings
SERVER_COMMAND = os.getenv("PYTHON_PATH") # /path/to/venv/bin/python
SERVER_SCRIPT = os.getenv(
"MCP_SERVER_PATH"
) # /path/to/mcp_server_youtube/src/mcp_server_youtube/server.py
print(SERVER_COMMAND, SERVER_SCRIPT)
# MCP client's YouTube search tool function
def mcp_search_youtube(query: str, max_results: int = 3):
"""
Search YouTube via MCP server (synchronous version)
"""
async def async_search():
server_params = StdioServerParameters(
command=SERVER_COMMAND, args=[SERVER_SCRIPT], env=None
)
# Asynchronous connection to the MCP server
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
result = await session.call_tool(
"youtube-search", arguments={"query": query, "max_results": max_results}
)
return result.content[0].text
# Run asynchronous processing synchronously
return asyncio.run(async_search())
# Gradio chat function
def chat_with_mcp(history, user_input):
"""
Implements conversation with ChatGPT (synchronous version)
"""
# Initialize history on first call
if history is None:
history = []
# Build current conversation history
conversation = [
{
"role": "system",
"content": "You are a helpful assistant that can search YouTube videos or have normal conversations.",
}
]
# Convert from Gradio history to conversation history
for msg in history:
conversation.append({"role": msg["role"], "content": msg["content"]})
# Add user input
conversation.append({"role": "user", "content": user_input})
# Call ChatGPT API
completion = openai.chat.completions.create(
model="gpt-4o",
messages=conversation,
functions=[
{
"name": "youtube_search",
"description": "Search YouTube videos using a query.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query for YouTube.",
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return.",
"default": 3,
},
},
"required": ["query"],
},
}
],
function_call="auto",
)
# Get response
message = completion.choices[0].message
print(message)
# In case of Function Calling
if message.function_call:
print("Function Calling")
function_call = message.function_call
function_name = function_call.name
print(f"Function Name: {function_name}")
function_args = json.loads(function_call.arguments)
if function_name == "youtube_search":
# Call MCP tool
query = function_args["query"]
max_results = function_args.get("max_results", 3)
# Get function call results
search_results = mcp_search_youtube(query, max_results)
print("Search Results:")
print(search_results)
# Add function results to conversation
conversation.append(
{
"role": "function",
"name": function_name,
"content": search_results,
}
)
# Call the API again to get the final response
completion = openai.chat.completions.create(
model="gpt-4o",
messages=conversation,
)
assistant_response = completion.choices[0].message.content
# Update history
history.append({"role": "user", "content": user_input})
history.append({"role": "assistant", "content": assistant_response})
return history
else:
# For unknown functions
assistant_response = f"Unknown function: {function_name}"
history.append({"role": "user", "content": user_input})
history.append({"role": "assistant", "content": assistant_response})
return history
else:
# For normal responses
assistant_response = message.content
history.append({"role": "user", "content": user_input})
history.append({"role": "assistant", "content": assistant_response})
return history
# Gradio interface
with gr.Blocks() as demo:
chatbot = gr.Chatbot(type="messages")
with gr.Row():
txt = gr.Textbox(show_label=False, placeholder="Type your message here...")
submit_btn = gr.Button("Submit")
submit_btn.click(chat_with_mcp, [chatbot, txt], chatbot)
txt.submit(chat_with_mcp, [chatbot, txt], chatbot)
# Run the app
demo.launch()
Summary
- MCP is a way to define tools effectively.
- Implementation can be done relatively quickly with the help of LLMs, so let's do it more and more.
Top comments (0)