DEV Community

Nebula
Nebula

Posted on

Building Production-Grade AI Agents with MCP: A Complete Guide for 2026

The Model Context Protocol crossed 97 million monthly SDK downloads in March 2026. It went from Anthropic's internal experiment to the Linux Foundation's Agentic AI Foundation in roughly 14 months — faster than any developer protocol I've seen. OpenAI deprecated its proprietary Assistants API in favor of MCP. Google's ADK, LangGraph, CrewAI, and Microsoft's Agent Framework all support it. The ecosystem now hosts over 13,000 public servers.

If you're building AI agents that talk to external systems, MCP isn't optional anymore. It's table stakes.

The problem is that most tutorials still show toy examples — a single add(a, b) tool on stdio transport. That's not what production looks like. Production means handling OAuth 2.1 flows, managing Streamable HTTP sessions, designing tools that agents actually use correctly, and deploying servers that survive real traffic.

Here's the guide I wish existed when I started building MCP infrastructure.

Why MCP Beats Custom Integrations

Before MCP, connecting an AI agent to three external systems meant three custom integrations. Each with its own auth flow, error handling, retry logic, and data parser. When one API changed its response format, your agent broke silently.

MCP standardizes this at the protocol level. Every MCP server speaks JSON-RPC 2.0. Every tool declares its input and output schema. The agent discovers capabilities at runtime — no hardcoded endpoint lists, no stale documentation. Add a new tool to your server, and every connected agent can use it immediately.

The three primitives cover every integration pattern:

  • Tools — executable functions the agent invokes (API calls, database queries, file writes)
  • Resources — read-only context the agent consumes (configs, logs, documentation)
  • Prompts — reusable workflow templates that structure interactions

The distinction between tools and resources matters more than most guides acknowledge. Tools change state — they're your agent's verbs. Resources provide context — they're your agent's nouns. When an agent conflates the two, you get agents that try to "read" a database by calling write functions, or worse, agents that mutate state thinking they're just gathering information.

Choosing Your Transport: The stdio vs Streamable HTTP Decision

MCP defines two transport mechanisms. Picking the wrong one causes architecture headaches later.

stdio runs locally. Your client spawns the MCP server as a child process and talks through stdin/stdout. Zero infrastructure, zero network config, perfect isolation. Use stdio for CLI tools, desktop applications, and local dev workflows.

Streamable HTTP is for everything else. The client sends JSON-RPC messages via HTTP POST, and the server responds with either an SSE stream or a JSON object. It works through firewalls, load balancers, and CDNs. Multi-client deployments — your agent server serving hundreds of concurrent agent sessions — require Streamable HTTP.

The old SSE transport was deprecated in the June 2025 spec revision. If you're starting a new project, use Streamable HTTP. Full stop.

Building a Production MCP Server in Python

The Python SDK's FastMCP pattern is the fastest path from zero to working server. But toy examples skip the patterns that matter in production.

Here's a server template that handles real-world requirements:

import os
from mcp.server.fastmcp import FastMCP
from pydantic import ValidationError

# Validate at startup — fail fast
WEATHER_API_KEY = os.environ.get("WEATHER_API_KEY")
if not WEATHER_API_KEY:
    raise RuntimeError("Missing WEATHER_API_KEY")

mcp = FastMCP("weather-service", json_response=True)

@mcp.tool()
def get_forecast(
    city: str,
    days: int = 3,
    units: str = "metric"
) -> dict:
    """Get weather forecast for a city.

    Args:
        city: City name (e.g., 'London', 'Tokyo')
        days: Number of forecast days (1-7)
        units: 'metric' for Celsius, 'imperial' for Fahrenheit

    Returns:
        Forecast dict with daily high/low and conditions
    """
    if days < 1 or days > 7:
        raise ValueError(f"Days must be 1-7, got {days}")
    if units not in ("metric", "imperial"):
        raise ValueError(f"Units must be 'metric' or 'imperial', got {units}")

    # Production: use your actual API client with retries
    import httpx
    response = httpx.get(
        "https://api.weatherapi.com/v1/forecast.json",
        params={
            "key": WEATHER_API_KEY,
            "q": city,
            "days": days,
        },
        timeout=10.0,
    )
    response.raise_for_status()
    data = response.json()

    return {
        "city": data["location"]["name"],
        "forecast_days": data["forecast"]["forecastday"][:days],
        "units": units,
    }

@mcp.resource("weather://current/{city}")
def current_conditions(city: str) -> str:
    """Get current weather conditions for a city."""
    import httpx, json
    response = httpx.get(
        "https://api.weatherapi.com/v1/current.json",
        params={"key": WEATHER_API_KEY, "q": city},
        timeout=10.0,
    )
    response.raise_for_status()
    return json.dumps(response.json())

if __name__ == "__main__":
    mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)
Enter fullscreen mode Exit fullscreen mode

Run it with uv run weather_server.py and any MCP client can connect. The server exposes get_forecast as a tool and weather://current/{city} as a dynamic resource.

What Makes This Different from Tutorial Examples

Three patterns that separate production servers from demos:

1. Validation at startup. If WEATHER_API_KEY isn't set, the server crashes before accepting any connection. Silent missing-config failures during deployment are harder to debug than explicit startup errors.

2. Input validation in tools. The days and units parameters are validated before any external call. Bad input gets a structured ValueError — the MCP framework translates it to a proper JSON-RPC error. Your agent gets a clear error message instead of a 500 from the upstream API.

3. Timeouts on every external call. The timeout=10.0 on httpx.get prevents a slow upstream API from hanging your MCP server. In production, you'd add retry logic with exponential backoff, but the timeout is the minimum safety net.

Advanced Tool Design: What Good MCP Servers Get Right

The most common failure mode in MCP deployments isn't the server code — it's tool descriptions. Agents rely entirely on tool names and descriptions to decide which tool to call. Vague descriptions produce wrong tool selections, and wrong tool selections produce wasted tokens and confused users.

The Description Formula

Every tool description should answer three questions:

  1. When should the agent call this tool?
  2. What data does it need as input?
  3. What structure will it get back?
# BAD: No trigger condition, no output description
@mcp.tool()
def search_db(query: str) -> list:
    """Search the database"""
    ...

# GOOD: Clear trigger, input guidance, output contract
@mcp.tool()
def search_db(query: str, limit: int = 20) -> list:
    """Search user records in the PostgreSQL database.
    Call this when the user asks about specific users, accounts, or transactions.
    Returns a list of matching records with id, name, email, and created_at fields.
    Maximum 100 results per query.
    """
    ...
Enter fullscreen mode Exit fullscreen mode

Multi-Tool Orchestration

Production agents rarely call just one tool. The sequence matters, and your server should make correct sequences obvious through tool design.

Consider a deployment pipeline where the agent needs to: check the current state, build, then deploy. If these are three separate tools with no hints, the agent might deploy without checking state first.

The solution is a composite tool that encapsulates the workflow:

@mcp.tool()
def deploy_application(
    project: str,
    branch: str = "main",
    environment: str = "production",
    dry_run: bool = False
) -> dict:
    """Deploy an application to the target environment.

    This tool handles the full deployment pipeline:
    1. Validates the target branch exists and is up-to-date
    2. Runs the build process and checks for errors
    3. Deploys to the specified environment
    4. Runs post-deployment health checks

    Set dry_run=True to see what would happen without making changes.
    Use search_build_status first if you need to check the current state.
    """
    # Step 1: Validate
    if environment not in ("staging", "production"):
        raise ValueError(f"Environment must be 'staging' or 'production'")

    # Step 2: Build
    build_result = run_build(project, branch)
    if build_result["status"] != "success":
        return {"status": "failed", "step": "build", "error": build_result["error"]}

    if dry_run:
        return {"status": "dry_run", "would_deploy": True}

    # Step 3: Deploy
    deploy_result = execute_deploy(project, environment)
    if deploy_result["status"] != "success":
        return {"status": "failed", "step": "deploy", "error": deploy_result["error"]}

    # Step 4: Health check
    health = run_health_checks(deploy_result["url"])
    return {
        "status": "deployed" if health["healthy"] else "deployed_unhealthy",
        "url": deploy_result["url"],
        "health_checks": health,
    }
Enter fullscreen mode Exit fullscreen mode

The agent calls one tool. The server orchestrates four steps. The agent gets a single structured result. This pattern reduces token usage, prevents partial-state errors, and makes the agent's behavior predictable.

Authentication: OAuth 2.1 for MCP Servers

Every MCP server that accesses protected resources needs authentication. The MCP spec supports OAuth 2.1 for remote servers, and this is the pattern production deployments should use.

The flow works like this:

  1. Client connects to your MCP server via Streamable HTTP
  2. Server responds with 401 Unauthorized and an WWW-Authenticate header pointing to the authorization endpoint
  3. Client redirects the user through the OAuth consent flow
  4. User grants permission, returns with an authorization code
  5. Client exchanges the code for an access token
  6. All subsequent requests include Authorization: Bearer <token>

Implementing this correctly means:

from fastapi import FastAPI, Request
from fastapi.responses import JSONResponse
import jwt
from datetime import datetime, timedelta, UTC

app = FastAPI()

SECRET_KEY = os.environ["MCP_OAUTH_SECRET"]

@app.post("/mcp")
async def mcp_endpoint(request: Request):
    auth_header = request.headers.get("Authorization")
    if not auth_header or not auth_header.startswith("Bearer "):
        return JSONResponse(
            status_code=401,
            headers={"WWW-Authenticate": "Bearer realm=\"MCP\""},
            content={"error": "unauthorized"},
        )

    token = auth_header.split(" ", 1)[1]
    try:
        payload = jwt.decode(token, SECRET_KEY, algorithms=["HS256"])
        if datetime.fromtimestamp(payload["exp"], tz=UTC) < datetime.now(tz=UTC):
            return JSONResponse(status_code=401, content={"error": "token_expired"})
    except jwt.InvalidTokenError:
        return JSONResponse(status_code=401, content={"error": "invalid_token"})

    # Process the MCP JSON-RPC request with authenticated session
    return await process_mcp_request(request, payload)
Enter fullscreen mode Exit fullscreen mode

The key insight: token validation happens at the transport layer, not inside individual tools. Every tool call arrives already authenticated. Your tool code can assume the identity of the caller is known.

Deployment Patterns

Single-Server Architecture

For small teams and single-tenant deployments, a single MCP server process works fine:

# Deploy behind a reverse proxy
uv run weather_server.py --port 8000
# nginx handles TLS, rate limiting, and access logging
# Systemd keeps the process alive
Enter fullscreen mode Exit fullscreen mode

Multi-Server Orchestration

Production systems with multiple MCP servers need orchestration. This is where platforms like Nebula come in — you can spin up a unified agent workspace that connects to multiple MCP servers, manages their lifecycle, and provides a single interface for the agent to discover and call tools across all of them.

The architecture looks like:

Agent Host (Nebula)
├── MCP Client → GitHub Server (PR management)
├── MCP Client → Database Server (read-only queries)
├── MCP Client → Search Server (web research)
└── MCP Client → Custom Server (your domain-specific tools)
Enter fullscreen mode Exit fullscreen mode

Each MCP client connection is isolated. If the database server goes down, the GitHub server continues working. The agent degrades gracefully instead of crashing entirely.

Scaling Streamable HTTP Servers

Streamable HTTP servers can handle thousands of concurrent connections, but you need proper session management:

# Track active sessions
active_sessions: dict[str, SessionState] = {}

@mcp.tool()
def list_active_sessions() -> list:
    """List all currently active agent sessions.
    Call this when you need to audit which agents are connected.
    Returns session IDs, connected duration, and last activity timestamp.
    """
    return [
        {
            "session_id": sid,
            "duration": (datetime.now(tz=UTC) - s.connected_at).total_seconds(),
            "last_activity": s.last_activity.isoformat(),
        }
        for sid, s in active_sessions.items()
    ]
Enter fullscreen mode Exit fullscreen mode

For high-traffic deployments, put your Streamable HTTP servers behind a load balancer with sticky sessions. MCP sessions maintain state — a given agent should always connect to the same server instance unless you're storing session state in Redis.

Common Pitfalls and How to Avoid Them

Too Many Tools

I've seen MCP servers with 47 registered tools. Agents get confused. They call the wrong tool, or they can't find the right one among 47 options.

Rule of thumb: a single MCP server should expose 3-10 tools. If you need more, split into multiple servers. One server for GitHub operations. Another for database queries. A third for monitoring and alerting.

Missing Error Contracts

Every tool should return structured errors, not raw exceptions:


python
@mcp.tool()
def query_database(sql: str) -> dict:
    """Execute a read-only SQL query against the analytics database.
    Only SELECT statements are allowed. INSERT, UPDATE, DELETE will be rejected.
    Returns a dict with 'columns' and 'rows' keys.
    """
    if not sql.strip().upper().startswith("SELECT"):
        return {
            "status": "error",
            "message": "Only SELECT queries are allowed",
            "query": sql,
        }

    try:
        results = execute_query(sql)
        return {"status": "ok", "columns
Enter fullscreen mode Exit fullscreen mode

Top comments (0)