The Model Context Protocol (MCP): what it is and how to build a server
Your team's LLM-powered application talks to a search index through one custom integration, a code repository through another, a Postgres database through a chain of LangChain tools, and a file system through raw Python I/O calls. Every new data source means writing a new integration. Every integration uses a different authentication model and returns data in a different shape. The LLM application is tightly coupled to every backend it touches, and swapping one out requires changing the application code directly.
The Model Context Protocol (MCP) exists to replace this bespoke plumbing with a single, standardized interface. Think of it as a USB-C port for LLM applications: one connector shape, one protocol, and any compatible server can plug into any compatible client without custom wiring.
Why a standard protocol matters
LLM-powered tools have exploded in capability over the past two years, but the integration story has not kept up. Each AI application (IDE assistant, chat client, agent framework) historically built its own connectors for databases, APIs, document stores, and code repositories. There was no shared contract. If you wanted to use a specific code search tool with two different AI assistants, you needed two separate integrations.
MCP borrows its design philosophy from the Language Server Protocol (LSP), which standardized how code editors talk to language analyzers. Before LSP, each editor had its own plugin for each language. After LSP, one language server worked with every editor. MCP aims to do the same for AI tools and the data sources they need.
The protocol is an open standard, originally created at Anthropic and published under the MIT license. The specification reached stable at version 2025-11-25, and the Python SDK (mcp on PyPI) is at 1.27.2 as of May 2026. A 2.0.0 alpha was published in June 2026 with an updated transport layer.
How MCP works
MCP uses JSON-RPC 2.0 as its message format. A client (the AI application) connects to a server (a service that provides context) over one of three transport types:
- stdio: the client spawns the server as a child process and communicates over stdin/stdout. Best for local, single-user setups.
- SSE (Server-Sent Events): the server runs as an HTTP endpoint, the client connects over HTTP. Works across machines.
- Streamable HTTP: a newer transport that allows bidirectional streaming over HTTP. Added in the 2025-11-25 spec.
Here is the conceptual architecture:
flowchart LR
subgraph Client["Client (AI App)"]
A["Host<br/>IDE / Chat / Agent"]
B["MCP Client<br/>Protocol handler"]
end
subgraph Server["MCP Server"]
C["MCP Server<br/>Protocol handler"]
D["Resources<br/>context data"]
E["Tools<br/>executable functions"]
F["Prompts<br/>templated workflows"]
end
A <--> B
B <-->|JSON-RPC 2.0<br/>stdio / SSE / HTTP| C
C --> D
C --> E
C --> F
Every MCP session begins with a capability negotiation handshake. The client announces what features it supports (sampling, roots, elicitation). The server announces what features it offers (resources, tools, prompts). Both sides agree on a feature set before any data exchange happens.
Server primitives
Servers offer three main categories of functionality:
Resources expose data to the LLM. Think of them as GET endpoints in a REST API. A resource has a URI and returns content in a structured format. Example: file:///logs/2026-06-01.txt returns the content of that log file. Resources are how the LLM loads context.
Tools are functions the LLM can invoke. Think of them as POST endpoints. A tool has a name, a description, and an input schema (JSON Schema). The LLM can call a tool to execute code, query a database, or trigger an external action. Unlike resources, tools are invoked on demand.
Prompts are reusable templates for LLM interactions. A prompt defines a message template with parameter slots. The client can populate the template and present the result to the user as a pre-built interaction.
Client primitives
Clients can also offer features to servers:
- Sampling: the server can request the client to generate an LLM response, enabling agentic loops where one model delegates to another.
- Roots: the server can request information about filesystem or URI boundaries, so it knows where it is allowed to operate.
- Elicitation: the server can request additional information from the user through the client's UI.
Building an MCP server in Python
The mcp package (v1.27.2) provides a high-level API called FastMCP that makes building a server straightforward. Here is a complete server that exposes a weather tool and a greeting resource:
from mcp.server.fastmcp import FastMCP
# Create an MCP server
mcp = FastMCP("Weather Demo")
# Add a tool: get weather for a city
@mcp.tool()
def get_weather(city: str, units: str = "celsius") -> str:
"""Get the current weather for a city."""
# In production, call a real weather API here
return f"Weather in {city}: 22 degrees {units}, partly cloudy"
# Add a resource: city data by URI
@mcp.resource("city://{name}")
def city_info(name: str) -> str:
"""Get information about a city."""
cities = {
"dubai": "Dubai, UAE. Population: 3.6M. Timezone: UTC+4.",
"london": "London, UK. Population: 8.9M. Timezone: UTC+0.",
"tokyo": "Tokyo, Japan. Population: 14M. Timezone: UTC+9.",
}
return cities.get(name.lower(), f"City '{name}' not found.")
# Add a prompt template
@mcp.prompt()
def travel_planning(city: str) -> str:
"""Generate a travel planning prompt for a destination."""
return (
f"You are a travel assistant helping someone plan a trip to {city}. "
f"Provide practical advice on weather, transportation, and attractions."
)
# Run with stdio transport (default)
if __name__ == "__main__":
mcp.run()
Install it and run:
pip install "mcp[cli]"
python weather_server.py
The server starts on stdio by default. For HTTP transport, change the last line:
mcp.run(transport="streamable-http")
Testing with the MCP Inspector
The official MCP Inspector is a browser-based tool for testing servers:
npx -y @modelcontextprotocol/inspector
Point it at your server endpoint (or stdio command) and you can browse resources, invoke tools, and inspect messages without writing a client.
MCP vs the alternatives
| Feature | MCP | Custom API / REST | LangChain Tools | OpenAI function calling |
|---|---|---|---|---|
| Standardized protocol | Yes | No | No (framework-specific) | No (API-specific) |
| Primitive types | Resources, Tools, Prompts | Endpoints only | Tools only | Functions only |
| Transport options | stdio, SSE, Streamable HTTP | HTTP only | In-process only | HTTP only |
| Bidirectional | Yes (sampling, roots) | Request-response only | Request-response only | Request-response only |
| Auth model | OAuth 2.1 (spec), pluggable | Custom per API | Custom per integration | API key |
| Client independence | Any MCP client | One client per API | LangChain only | OpenAI only |
The main differentiator is client independence. A server written for MCP works with any MCP-compatible client: Claude Code, Claude Desktop, the Continue.dev VS Code extension, or a custom agent framework. Custom APIs and framework-specific tools lock you into one ecosystem.
Common pitfalls
Thinking tools are free. Tools execute arbitrary code on your server. Every tool invocation consumes compute and may have side effects. The LLM cannot distinguish between a cheap operation (reading a config file) and an expensive one (running a 100-row batch query). Set usage limits or implement a permission layer for destructive operations.
Resource URIs must be meaningful. A resource URI is not just a label -- it is the identifier the LLM uses to request data. Using opaque URIs (resource://abc123) makes it impossible for the LLM to discover resources. Use hierarchical, descriptive URIs that hint at the content structure, like docs://project/api/reference or db://customers/orders?status=pending.
Forgetting the capability handshake. If you add a new tool to an existing server and your client does not re-negotiate capabilities, the client will not know the tool exists. The capability exchange happens at connection time. Restart both sides after changing what a server offers.
Overloading a server. An MCP server that exposes 50 tools and 200 resources becomes as hard to navigate as a REST API with 50 endpoints. Group related functionality into separate servers and let the client connect to multiple servers. Claude Desktop and other hosts already support multi-server setups.
Assuming tools are always available to the LLM. Tool invocation requires user consent in most host applications. The user must approve each tool call. Design your tools to be meaningful in a single invocation, because multi-step approval flows create a poor user experience.
When NOT to use it
MCP is the wrong choice if:
-
You are building a single-purpose script. If your Python script calls one API and prints the result, MCP adds unnecessary complexity. Just use
requestsdirectly. - You need sub-millisecond latency. The JSON-RPC serialization and transport overhead adds a few milliseconds per call. For latency-critical, high-frequency operations (real-time streaming inference, hardware control), use a direct connection.
- Your data source has no LLM interaction. MCP is designed to serve context to LLMs. If you are building a regular web application backend with no AI component, use a standard REST or gRPC API.
- Your users are all on one framework. If every consumer of your service uses LangChain and will only ever use LangChain, writing a LangChain tool directly is simpler than writing an MCP server plus a LangChain MCP adapter. MCP pays off when you have multiple client types.
TL;DR
- MCP standardizes how LLM applications connect to data sources. One server works with any MCP-compatible client.
- The protocol uses JSON-RPC 2.0 over stdio, SSE, or Streamable HTTP transport. Features are negotiated at connection time.
- Servers expose Resources (data), Tools (executable functions), and Prompts (templates). Clients can offer Sampling, Roots, and Elicitation.
- The Python SDK
mcp(v1.27.2) provides FastMCP, a decorator-based API for building servers in a few lines of code. - MCP pays off when you have multiple client types consuming the same data sources. For single-purpose scripts or single-framework setups, a direct integration is simpler.
- Use the MCP Inspector (
npx @modelcontextprotocol/inspector) to test servers without writing a client.
Next post: building a multi-server MCP setup that connects a code search service, a documentation index, and a database gateway into a single AI assistant, with practical trade-offs on transport selection and auth.
Top comments (0)