The Model Context Protocol (MCP) went from "Anthropic side project" to industry standard in eighteen months. As of March 2026, MCP SDKs are pulling 97 million monthly downloads. Every serious agent framework — Claude, Cursor, OpenAI Agents SDK, Microsoft Agent Framework — speaks MCP natively.
If you're a Python backend engineer, MCP is the most leveraged thing you can learn right now. This post is a practical walkthrough of shipping a production-grade MCP server using FastMCP, the Python framework that makes it boring.
What MCP actually is
MCP is a protocol for exposing tools, resources, and prompts to an AI agent in a standardized way. Instead of each agent framework inventing its own adapter format, you write your server once and it plugs into any MCP-compatible client.
Think of it as "USB-C for agents."
A minimal server exposes:
-
Tools — functions the agent can call (e.g.
search_customers,get_order_status) -
Resources — URIs the agent can read (e.g.
crm://contacts/123) - Prompts — parameterized prompt templates
Starter: a FastMCP server in 40 lines
# server.py
from fastmcp import FastMCP
from pydantic import BaseModel
import httpx
mcp = FastMCP("internal-crm")
class Customer(BaseModel):
id: str
name: str
tier: str
mrr: float
@mcp.tool()
async def search_customers(query: str, tier: str | None = None) -> list[Customer]:
"""Search the CRM for customers by name or email. Optionally filter by tier."""
async with httpx.AsyncClient() as client:
r = await client.get(
"https://crm.internal/api/search",
params={"q": query, "tier": tier},
)
return [Customer(**row) for row in r.json()]
@mcp.tool()
async def get_customer_notes(customer_id: str) -> str:
"""Fetch the latest account-manager notes for a customer."""
async with httpx.AsyncClient() as client:
r = await client.get(f"https://crm.internal/api/notes/{customer_id}")
return r.text
@mcp.resource("crm://customer/{customer_id}")
async def customer_resource(customer_id: str) -> str:
"""Read-only customer profile."""
async with httpx.AsyncClient() as client:
r = await client.get(f"https://crm.internal/api/customer/{customer_id}")
return r.text
if __name__ == "__main__":
mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)
That's a complete, production-adjacent MCP server. Type-safe inputs and outputs via Pydantic. Docstrings become tool descriptions the agent reads. Resources get URIs the agent can embed in its context.
The transport shift: stdio → Streamable HTTP
Every MCP tutorial from 2024 used stdio transport — the server runs as a subprocess, the agent pipes JSON-RPC over stdin/stdout. That's fine for desktop tools like Claude Desktop. It's the wrong answer for production.
Streamable HTTP (finalized in the 2025 spec) fixes this:
- Servers run as long-lived HTTP services, not per-invocation subprocesses
- Scale horizontally behind a load balancer
- Share across teams and apps
- Deploy once, discover via URL
In FastMCP, the switch is one line: transport="streamable-http".
Auth: OAuth 2.1 the boring way
MCP's 2025 spec added OAuth 2.1 as the standard auth mechanism. You don't roll your own. FastMCP ships with OAuth middleware that plugs into your existing IdP (Auth0, Okta, Cognito, Clerk, etc.):
from fastmcp.auth import OAuth2Middleware
mcp.add_middleware(OAuth2Middleware(
issuer="https://tufail.auth0.com/",
audience="mcp-internal-crm",
required_scope="crm:read",
))
The agent handles the authorization dance. Your server just enforces scopes on each tool.
Deploying to AWS without overspending
Two patterns we've landed on for production MCP:
Pattern A — Low-traffic internal tools: Lambda + API Gateway
- Use
mangumor FastMCP's ASGI adapter to run inside Lambda - Cold starts ~300-500ms (acceptable for human-speed agent interactions)
- Cost: near-zero when idle
Pattern B — High-traffic shared servers: ECS Fargate behind ALB
- One service per logical server
- Auto-scale on CPU/memory
- Pair with ElastiCache for stateful session continuity
- Cost: predictable, ~\$30/mo for a small always-on service
The mistake we made early on: treating every MCP server like it needed an always-on Fargate task. For servers that handle <10 agent calls/hour, Lambda is dramatically cheaper.
What to expose — and what not to
The #1 mistake I see is devs exposing their entire internal API as MCP tools. Don't.
Good MCP servers are curated for an agent's use case. Ask: what would a smart human operator need to do their job? Expose those 5-15 tools. Not your 300-endpoint API.
Good tool design:
-
One clear job per tool.
search_customersnotcrm_unified_query. - Typed inputs and outputs. Pydantic makes this cheap.
- Honest docstrings. The agent reads them. Lie in the docstring and the agent will confidently call your tool wrong.
- Idempotent where possible. Agents retry. Accept that.
What's next
Remote MCP servers + fine-grained OAuth scopes are unlocking internal-AI-assistant work that was impossible a year ago. If you're a Python backend engineer and you haven't shipped an MCP server yet, pick your highest-leverage internal system and wrap it. You'll be surprised how quickly it changes how your team works.
Top comments (0)