Tufail Khan

Posted on Apr 23 • Originally published at tufail.dev

Building MCP Servers in Python: a production primer for 2026

#mcp #python #claude #agentic

The Model Context Protocol (MCP) went from "Anthropic side project" to industry standard in eighteen months. As of March 2026, MCP SDKs are pulling 97 million monthly downloads. Every serious agent framework — Claude, Cursor, OpenAI Agents SDK, Microsoft Agent Framework — speaks MCP natively.

If you're a Python backend engineer, MCP is the most leveraged thing you can learn right now. This post is a practical walkthrough of shipping a production-grade MCP server using FastMCP, the Python framework that makes it boring.

What MCP actually is

MCP is a protocol for exposing tools, resources, and prompts to an AI agent in a standardized way. Instead of each agent framework inventing its own adapter format, you write your server once and it plugs into any MCP-compatible client.

Think of it as "USB-C for agents."

A minimal server exposes:

Tools — functions the agent can call (e.g. search_customers, get_order_status)
Resources — URIs the agent can read (e.g. crm://contacts/123)
Prompts — parameterized prompt templates

Starter: a FastMCP server in 40 lines

# server.py
from fastmcp import FastMCP
from pydantic import BaseModel
import httpx

mcp = FastMCP("internal-crm")

class Customer(BaseModel):
    id: str
    name: str
    tier: str
    mrr: float

@mcp.tool()
async def search_customers(query: str, tier: str | None = None) -> list[Customer]:
    """Search the CRM for customers by name or email. Optionally filter by tier."""
    async with httpx.AsyncClient() as client:
        r = await client.get(
            "https://crm.internal/api/search",
            params={"q": query, "tier": tier},
        )
        return [Customer(**row) for row in r.json()]

@mcp.tool()
async def get_customer_notes(customer_id: str) -> str:
    """Fetch the latest account-manager notes for a customer."""
    async with httpx.AsyncClient() as client:
        r = await client.get(f"https://crm.internal/api/notes/{customer_id}")
        return r.text

@mcp.resource("crm://customer/{customer_id}")
async def customer_resource(customer_id: str) -> str:
    """Read-only customer profile."""
    async with httpx.AsyncClient() as client:
        r = await client.get(f"https://crm.internal/api/customer/{customer_id}")
        return r.text

if __name__ == "__main__":
    mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)

That's a complete, production-adjacent MCP server. Type-safe inputs and outputs via Pydantic. Docstrings become tool descriptions the agent reads. Resources get URIs the agent can embed in its context.

The transport shift: stdio → Streamable HTTP

Every MCP tutorial from 2024 used stdio transport — the server runs as a subprocess, the agent pipes JSON-RPC over stdin/stdout. That's fine for desktop tools like Claude Desktop. It's the wrong answer for production.

Streamable HTTP (finalized in the 2025 spec) fixes this:

Servers run as long-lived HTTP services, not per-invocation subprocesses
Scale horizontally behind a load balancer
Share across teams and apps
Deploy once, discover via URL

In FastMCP, the switch is one line: transport="streamable-http".

Auth: OAuth 2.1 the boring way

MCP's 2025 spec added OAuth 2.1 as the standard auth mechanism. You don't roll your own. FastMCP ships with OAuth middleware that plugs into your existing IdP (Auth0, Okta, Cognito, Clerk, etc.):

from fastmcp.auth import OAuth2Middleware

mcp.add_middleware(OAuth2Middleware(
    issuer="https://tufail.auth0.com/",
    audience="mcp-internal-crm",
    required_scope="crm:read",
))

The agent handles the authorization dance. Your server just enforces scopes on each tool.

Deploying to AWS without overspending

Two patterns we've landed on for production MCP:

Pattern A — Low-traffic internal tools: Lambda + API Gateway

Use mangum or FastMCP's ASGI adapter to run inside Lambda
Cold starts ~300-500ms (acceptable for human-speed agent interactions)
Cost: near-zero when idle

Pattern B — High-traffic shared servers: ECS Fargate behind ALB

One service per logical server
Auto-scale on CPU/memory
Pair with ElastiCache for stateful session continuity
Cost: predictable, ~\$30/mo for a small always-on service

The mistake we made early on: treating every MCP server like it needed an always-on Fargate task. For servers that handle <10 agent calls/hour, Lambda is dramatically cheaper.

What to expose — and what not to

The #1 mistake I see is devs exposing their entire internal API as MCP tools. Don't.

Good MCP servers are curated for an agent's use case. Ask: what would a smart human operator need to do their job? Expose those 5-15 tools. Not your 300-endpoint API.

Good tool design:

One clear job per tool. search_customers not crm_unified_query.
Typed inputs and outputs. Pydantic makes this cheap.
Honest docstrings. The agent reads them. Lie in the docstring and the agent will confidently call your tool wrong.
Idempotent where possible. Agents retry. Accept that.

What's next

Remote MCP servers + fine-grained OAuth scopes are unlocking internal-AI-assistant work that was impossible a year ago. If you're a Python backend engineer and you haven't shipped an MCP server yet, pick your highest-leverage internal system and wrap it. You'll be surprised how quickly it changes how your team works.

DEV Community