I run my blog, devmindset.dev, through a custom MCP server.**. Publishing posts, updating SEO metadata, assigning categories — all of it goes through a protocol that, a year ago, didn't exist in production form. So I'm not writing about MCP from the documentation's point of view, but from the point of view of someone who stood up a working server and operates it daily. This isn't another "hello world" — it's protocol architecture, deliberate design decisions, and production code in Python.
The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024, now developed under the Agentic AI Foundation (Linux Foundation). The stable spec is dated 2025-11-25, and the largest revision since launch lands on July 28, 2026 — more on that shortly, because it changes how you design transport. But let's start with the question most tutorials skip: what does this protocol actually solve?
What MCP actually solves
The problem MCP addresses is combinatorial. You have M LLM applications (Claude Desktop, Cursor, VS Code, ChatGPT) and N external systems (a database, GitHub, an internal API, WordPress). Without a shared standard, every pair needs a bespoke integration — that's M×N implementations, each with its own format, its own auth, its own maintenance burden. MCP collapses that into M+N: you write the server once, and every compliant client can discover and use it without a line of code on its side.
Mechanically, MCP sits on JSON-RPC 2.0 and defines three roles. The host is the LLM application that coordinates everything. The client is instantiated by the host — one client per server. The server provides context and capabilities. It's deliberately modeled on the Language Server Protocol: just as LSP standardized language support across editors, MCP standardizes wiring tools and data into the AI ecosystem.
And here's the first misconception to defuse: MCP is not "function calling." Function calling is a single-vendor mechanism — you define functions in your code and one specific model invokes them. MCP is a transport protocol and a negotiation layer: the server advertises its capabilities, the client discovers them at runtime, and versions are negotiated at initialization. Function calling lives inside one application; an MCP server is reusable across any host.
Three primitives: tools, resources, and prompts
An MCP server exposes capabilities through three primitives. Conflating them is the most common design mistake — each has a different contract and a different use.
| Primitive | What it is | Who controls it | Use for |
|---|---|---|---|
| Tool | Executable action with validation and logic | Model (calls when needed) | Side-effecting operations, complex logic |
| Resource | Read-only data under a URI template | Application / host | Static or semi-static context |
| Prompt | Reusable template | User (selects deliberately) | Repeatable, structured instructions |
Rule of thumb: Tool when you need input validation and business logic ("create a post with title X and status Y"). Resource when you expose data under a simple parameter ("the contents of document Z"). Prompt when you hand the user a ready-made, parameterized scenario. In practice, most servers start and end with tools — the rest is context optimization.
Transport: stdio vs Streamable HTTP
MCP defines two transports, and choosing between them is the first architectural decision when building an MCP server.
| Dimension | stdio | Streamable HTTP |
|---|---|---|
| Location | Local, same machine | Remote, over HTTPS |
| Run model | Host subprocess | Network service |
| Clients | One (process) | Many concurrently |
| Authorization | Inherited from OS | OAuth 2.1 / OIDC |
| Use for | CLI tools, local integrations | Production servers, SaaS |
And here's the change most material hasn't caught up to yet. The 2026-07-28 revision (currently a release candidate) removes the protocol-level session — the Mcp-Session-Id header is gone (SEP-2567). Protocol version, client info, and capabilities now travel in _meta on every request, and a new server/discover method lets the client fetch server capabilities on demand. The practical consequence: any request can land on any server instance. The sticky routing and shared session stores that horizontal deployments used to need are no longer required at the protocol layer.
This doesn't mean your application has to be stateless. A server that needs state across calls does what HTTP APIs have always done: mint an explicit handle (say, a basket_id) from one tool and have the model pass it back as an ordinary argument on later calls. So design for stateless transport from the start — it's the direction the protocol is heading, and the cheaper path to scale.
A minimal production MCP server — FastMCP
The official Python SDK ships FastMCP — a high-level framework that generates the input schema from signatures and docstrings, integrates Pydantic validation, and registers tools with a decorator. Below is not a "hello world" but a skeleton with everything that separates a toy from production code: a Pydantic model for validation, behavior annotations, async I/O, error handling, and full typing.
from __future__ import annotations
import os
import httpx
from pydantic import BaseModel, Field, ConfigDict
from mcp.server.fastmcp import FastMCP
# Name the server per the {service}_mcp convention
mcp = FastMCP("weather_mcp")
API_BASE = "https://api.example-weather.com/v1"
class ForecastInput(BaseModel):
"""Input validation for a forecast query."""
model_config = ConfigDict(
str_strip_whitespace=True,
extra="forbid", # reject unknown fields
)
city: str = Field(..., description="City name, e.g. 'Wrocław'",
min_length=1, max_length=100)
days: int = Field(default=3, description="Forecast horizon in days",
ge=1, le=14)
def _handle_error(e: Exception) -> str:
"""Consistent, actionable error messages for the model."""
if isinstance(e, httpx.HTTPStatusError):
code = e.response.status_code
if code == 404:
return "Error: city not found. Check the spelling of the name."
if code == 429:
return "Error: rate limit exceeded. Wait before retrying."
return f"Error: API returned status {code}."
if isinstance(e, httpx.TimeoutException):
return "Error: request timed out. Please try again."
return f"Error: unexpected exception: {type(e).__name__}"
@mcp.tool(
name="get_forecast",
annotations={
"title": "Get weather forecast",
"readOnlyHint": True, # does not modify state
"openWorldHint": True, # reaches an external API
},
)
async def get_forecast(params: ForecastInput) -> str:
"""Return a weather forecast for a city.
Args:
params: validated input (city, days).
Returns:
str: a formatted forecast or an actionable error message.
"""
api_key = os.environ.get("WEATHER_API_KEY")
if not api_key:
return "Config error: WEATHER_API_KEY is missing from the environment."
try:
async with httpx.AsyncClient(timeout=10.0) as client:
resp = await client.get(
f"{API_BASE}/forecast",
params={"q": params.city, "days": params.days},
headers={"Authorization": f"Bearer {api_key}"},
)
resp.raise_for_status()
data = resp.json()
except Exception as e:
return _handle_error(e)
lines = [f"Forecast for {params.city} ({params.days} days):"]
for day in data["forecast"]:
lines.append(f" {day['date']}: {day['temp_c']}°C, {day['condition']}")
return "\n".join(lines)
if __name__ == "__main__":
mcp.run() # stdio transport (default)
Several things here are deliberate. The Pydantic model with extra="forbid" rejects unknown fields instead of silently ignoring them. The decorator annotations (readOnlyHint, openWorldHint) are signals to the host. All I/O is async. And the secret comes from an environment variable, not the code — which I'll come back to under security.
Error handling that helps the model
Look at the _handle_error function above. This isn't cosmetics. An error message in an MCP server is read by the model, not by a human staring at logs — and it decides whether the model recovers the call sensibly or gets stuck. "Error 404" says nothing; "city not found, check the spelling" tells the model what to do next. Treat every message as a recovery instruction, not a log line.
It's the same discipline as debugging as a process of deduction rather than guessing — a precise signal instead of noise shortens the path to the cause. The difference is that here the recipient of the signal is a model planning its next step.
Security: why tool descriptions are untrusted
The MCP spec says it plainly: tools represent arbitrary code execution and must be treated with appropriate caution. Moreover — descriptions of tool behavior, including annotations, are untrusted unless they come from a trusted server. This is not a formality. A malicious server can smuggle instructions into a tool description or into a tool's result that the model treats as a command — that's prompt injection via tool output.
The consequences for you as a server author are concrete. Keep secrets in environment variables, never in code or descriptions (you can see it above — WEATHER_API_KEY from os.environ). For remote transport use OAuth 2.1 / OIDC — the 2026-07-28 revision aligns authorization more closely with OAuth and OpenID Connect, and the Enterprise-Managed Authorization extension is now stable. Validate every input with Pydantic, because the model can pass anything. And set annotations honestly:
| Annotation | Meaning | Example |
|---|---|---|
readOnlyHint |
Tool does not modify state | Fetch a forecast, read a post |
destructiveHint |
Irreversible operation | Delete a resource |
idempotentHint |
Repeating changes nothing | Set a value to X |
openWorldHint |
Reaches external systems | Query a weather API |
The host builds user-consent flows on these signals. A lied-about annotation (say, readOnlyHint on a tool that deletes data) isn't just bad code — it breaks the security contract the entire MCP trust model rests on.
State, concurrency, and scaling
A production server handles many clients at once, and every tool does I/O — a call to an API, a database, a disk. That's why all the code is async (async def, httpx.AsyncClient): one process serves many concurrent calls without blocking, because while waiting on a network response the event loop switches to another task.
This is exactly the same I/O-scaling problem solved underneath by epoll and io_uring, when the event loop isn't enough — the "one thread per connection" model doesn't scale indefinitely. An MCP server over Streamable HTTP sits on the same layer: async isn't an ornament, it's the condition for serving many clients on one instance. And thanks to the 2026-07-28 stateless core, horizontal scaling comes down to standing up more instances behind a load balancer — no sticky sessions.
# Local — stdio (default)
mcp.run()
# Remote — Streamable HTTP, scales horizontally
mcp.run(transport="streamable_http", port=8000)
Conclusion
Building an MCP server that isn't a toy comes down to a few deliberate decisions: transport choice (stdio locally, Streamable HTTP in production), the right primitive (tool vs resource vs prompt), Pydantic validation, actionable errors, secrets in the environment, and honest annotations. FastMCP takes the boilerplate off your hands, but architecture and security stay on yours.
One more thing, and it's fresh: design for statelessness. The 2026-07-28 revision makes transport sessionless by default, and that's the cheapest path to scale the protocol has ever offered. An MCP server written today around explicit state handles instead of sessions will survive that change without a rewrite. This is the first post in a series on MCP — the next ones go deeper into security and advanced patterns.
Originally published on devmindset.dev — Linux internals, systems programming, and the self-taught developer mindset.
Related deep-dives:
Top comments (0)