Jahanzaib

Posted on Apr 4 • Originally published at jahanzaib.ai

Model Context Protocol: How I Build MCP Servers That Run in Production (and What Most Guides Skip)

#mcp #modelcontextprotocol #aiagents #productionai

The first time I connected Claude to a live PostgreSQL database through a three-line configuration file, I sat back and thought: this is what every integration should feel like. No custom connector, no bespoke API wrapper, no 400-line Python script that breaks every time the API vendor changes a response field. Just a Model Context Protocol server sitting between the AI and the database, translating naturally.

I've shipped AI systems for 23 production clients since MCP launched. The protocol has moved from an interesting Anthropic experiment to the default way I wire AI agents to external systems. If you're building anything with AI agents today and you're still writing one-off tool integrations, you're doing five times the work you need to. This guide covers everything: what MCP actually is, how to build a production-grade server, the auth and security patterns that matter, and the deployment options I actually use.

Key Takeaways

Model Context Protocol (MCP) is an open standard that eliminates custom integrations between AI models and external tools — one server works with every MCP-compatible client
MCP grew from 100,000 monthly downloads in November 2024 to over 8 million by April 2025, with 5,800+ servers now available
Three primitives cover everything: tools (functions the AI calls), resources (data the AI reads), and prompts (reusable templates)
For local development use stdio transport. For production remote servers, use Streamable HTTP with OAuth 2.1 authentication
The biggest mistake builders make is skipping input validation and structured error handling — both are easy to add and critical for production stability
Real ROI shows up fast: one MCP server replacing a custom CRM connector saved a SaaS client $3,200/month in maintenance engineering hours

MCP turns the chaotic web of AI integrations into a clean protocol-based architecture

What Model Context Protocol Actually Is

Before MCP, building an AI system that touched five external tools meant writing five custom integrations. Then maintaining them. Then rewriting them when the AI model changed or a tool updated its API. If you had 10 AI applications and 20 external tools, you potentially needed 200 different connectors. Anthropic's team called this the M×N problem, and it's the reason most AI agent projects die in the maintenance phase rather than the build phase.

MCP solves this with a single protocol. Build one server for your Salesforce data. Every AI client that speaks MCP — Claude, Cursor, Windsurf, your custom agent — can use that server immediately. No rewrites. You go from M×N integrations to M+N.

Think of it as USB-C for AI. Before USB-C, every device needed different cables, different adapters, different drivers. MCP is the moment AI tooling gets a universal port. The November 2025 MCP specification is the most current stable version, adding proper authentication and long-running workflow support that makes it genuinely production-ready for enterprise use.

The numbers bear this out. MCP SDK downloads grew from roughly 100,000 per month in November 2024 to over 8 million by April 2025. As of early 2026, there are over 5,800 published MCP servers covering GitHub, Slack, Google Drive, PostgreSQL, Notion, Jira, Salesforce, Stripe, and dozens of other services. Companies like Cloudflare, Block (Square), and Autodesk are running MCP in production at scale.

The Three Primitives

Every MCP server exposes some combination of three things:

Tools are functions the AI can call. "Search the database for orders placed in the last 30 days." "Send an email to this address." "Create a Jira ticket with this title and description." The AI decides when to call them based on the conversation. Tools are what most people start with, and they cover 80% of use cases.

Resources are data the AI can read. Unlike tools, resources are static or semi-static: a company wiki, a product catalog, a code repository. The AI fetches them to enrich its context. If your database has a "knowledge" table full of internal documentation, that's a resource, not a tool.

Prompts are reusable templates that appear in the AI client's interface. They're less about automation and more about UX: giving users shortcuts to common workflows. "Summarize today's support tickets" could be a prompt that automatically populates context and kicks off a specific analysis flow.

For most production use cases, you'll build tools first and add resources later when you notice the AI making requests for static data that shouldn't require a full tool call each time.

Choosing Your Transport: stdio vs Streamable HTTP

This decision matters more than most tutorials acknowledge. Getting it wrong means either overly complex local setup or an insecure production deployment.

stdio Transport: For Local and Desktop Clients

stdio transport runs your MCP server as a local process and communicates through standard input and output. Claude for Desktop uses this. Cursor uses this. It's simple, has zero network overhead, and requires no authentication because the AI client launches the server process directly on your machine.

Use stdio when:

You're building for Claude Desktop or other local AI clients
The tools access local resources (files, local databases, local APIs)
You're in development and want fast iteration cycles
The server only needs to serve one user on one machine

The Claude Desktop configuration looks like this:

{
  "mcpServers": {
    "my-server": {
      "command": "python",
      "args": ["/path/to/server.py"],
      "env": {
        "DATABASE_URL": "postgresql://localhost/mydb"
      }
    }
  }
}

Streamable HTTP: For Production Remote Servers

Streamable HTTP runs your MCP server as a proper web service. Multiple users, multiple AI clients, proper authentication, rate limiting, observability. This is what you use when you're building a server that your team's agents — or your customers' agents — will call in production.

The November 2025 specification standardized Streamable HTTP as the recommended transport for remote deployments. It uses standard HTTP for requests and optional Server-Sent Events for streaming responses back to the client.

Use Streamable HTTP when:

Multiple users or clients need access to the same server
The server is deployed remotely (cloud, VPS, serverless)
You need authentication and access control
You need logging, monitoring, and audit trails
You're building a commercial or enterprise service

Transport choice is the first architectural decision that affects everything downstream

Building an MCP Server in Python

I'll walk through a real example: a CRM lookup server that lets an AI agent search customer records, pull account history, and log interactions. This is the type of integration I build most often for AI systems clients.

Setup

Install the official Python SDK:

pip install mcp

For a Streamable HTTP server (production), you also need an ASGI framework:

pip install mcp fastapi uvicorn

Your First Tool

Here's a minimal but production-honest MCP server. I'm not going to show you the "hello world" version — I'm going to show you what I actually ship:

import asyncio
import os
from mcp.server import Server
from mcp.server.stdio import stdio_server
from mcp import types

# Initialize server with a name — shows in client UIs
app = Server("crm-server")

@app.list_tools()
async def list_tools() -> list[types.Tool]:
    return [
        types.Tool(
            name="search_customers",
            description="Search CRM for customer records by name, email, or company. Returns up to 10 matches.",
            inputSchema={
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "Search term: name, email address, or company name",
                        "maxLength": 200
                    },
                    "limit": {
                        "type": "integer",
                        "description": "Max results to return (1-10)",
                        "minimum": 1,
                        "maximum": 10,
                        "default": 5
                    }
                },
                "required": ["query"]
            }
        )
    ]

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name == "search_customers":
        query = arguments.get("query", "").strip()
        limit = min(int(arguments.get("limit", 5)), 10)  # enforce max

        if not query or len(query) < 2:
            return [types.TextContent(
                type="text",
                text="Error: search query must be at least 2 characters"
            )]

        # Your actual CRM lookup logic here
        results = await search_crm(query, limit)

        if not results:
            return [types.TextContent(
                type="text",
                text=f"No customers found matching '{query}'"
            )]

        formatted = format_results(results)
        return [types.TextContent(type="text", text=formatted)]

    return [types.TextContent(type="text", text=f"Unknown tool: {name}")]

async def main():
    async with stdio_server() as (read_stream, write_stream):
        await app.run(read_stream, write_stream, app.create_initialization_options())

if __name__ == "__main__":
    asyncio.run(main())

A few things I do here that most tutorials skip:

maxLength on the input schema: Forces the AI client to validate input before sending. Also documents your constraints to whoever reads the schema.
Explicit limit enforcement in the handler: Never trust schema validation alone. The client might not enforce it. Always re-check in your handler.
Specific error messages: When the AI gets an error, it uses the message to decide what to do next. "Error: X" gives it nothing. A specific message gives it enough to retry correctly or surface the issue to the user.

Handling Errors Like a Production System

Every external call in your tool handler can fail. Database unavailable, API rate limited, network timeout. The way you handle these failures determines whether your AI agent recovers gracefully or enters a spiral of unhelpful retries.

@app.call_tool()
async def call_tool(name: str, arguments: dict) -> list[types.TextContent]:
    if name == "search_customers":
        try:
            results = await asyncio.wait_for(
                search_crm(arguments["query"]),
                timeout=5.0  # 5 second hard cap
            )
            return [types.TextContent(type="text", text=format_results(results))]

        except asyncio.TimeoutError:
            return [types.TextContent(
                type="text",
                text="The CRM search timed out after 5 seconds. Try a more specific query."
            )]
        except DatabaseConnectionError:
            return [types.TextContent(
                type="text",
                text="CRM is temporarily unavailable. The team has been notified."
            )]
        except Exception as e:
            # Log the real error server-side, return safe message to client
            import logging
            logging.error(f"CRM search error: {e}", exc_info=True)
            return [types.TextContent(
                type="text",
                text="An unexpected error occurred. Please try again."
            )]

The pattern: log the real error to your monitoring system, return a clean message to the AI. You don't want stack traces in AI responses. You also don't want the AI to see your database schema or internal service names in error messages.

Error handling in MCP tools determines whether agents recover gracefully or loop endlessly

Production Patterns That Actually Matter

This is where most MCP tutorials stop, and where the real work begins. I've learned these patterns by running MCP servers handling thousands of calls per day across multiple client deployments.

Authentication for Remote Servers

A 2025 security scan of roughly 2,000 publicly exposed MCP servers found that most had zero authentication. None. An open tool endpoint anyone could call. That's not a theoretical risk — that's a live data leak waiting to happen.

The November 2025 MCP specification addressed this directly: OAuth 2.1 is now the standard for authenticating remote MCP server connections. The flow looks like this:

Client discovers server capabilities at /.well-known/mcp
Client initiates OAuth 2.1 authorization flow
Server validates token on every tool call
Scopes control which tools a client can call (read vs write, which resources)

For simpler internal deployments where you control all clients, API key authentication works fine:

from fastapi import HTTPException, Header
from mcp.server.fastapi import MCPAPIRouter

router = MCPAPIRouter()
VALID_API_KEYS = set(os.environ.get("MCP_API_KEYS", "").split(","))

async def verify_api_key(x_api_key: str = Header(None)):
    if x_api_key not in VALID_API_KEYS:
        raise HTTPException(status_code=401, detail="Invalid API key")

@router.post("/mcp", dependencies=[Depends(verify_api_key)])
async def mcp_endpoint(request: Request):
    # handle MCP request
    ...

The important thing is having authentication at all. Whatever mechanism fits your setup — use it. An MCP server with no auth is a direct line into your data systems.

Input Validation Beyond JSON Schema

JSON Schema validation happens at the protocol level but it doesn't protect you from everything. An AI might send a valid string that happens to be a SQL injection attempt, a path traversal string, or a malformed email address that breaks your downstream service.

import re

def validate_search_query(query: str) -> str:
    # Strip whitespace
    query = query.strip()

    # Length bounds
    if len(query) < 2:
        raise ValueError("Query too short")
    if len(query) > 200:
        raise ValueError("Query too long")

    # Block obvious injection attempts
    dangerous_patterns = [
        r"[;'\"\\]",          # SQL injection chars
        r"\.\./",              # path traversal
        r"<[^>]+>",            # HTML tags
    ]
    for pattern in dangerous_patterns:
        if re.search(pattern, query):
            raise ValueError(f"Query contains invalid characters")

    return query

This isn't paranoia. When an AI is calling your tools autonomously, edge cases happen that you didn't anticipate in testing. Validation is cheap to add and expensive to skip.

Structured Logging for Observability

When an AI agent calls your MCP server 200 times a day, you need to know which tools are slow, which ones fail, and how inputs are distributed. Plain print statements won't get you there.

import logging
import json
import time
from datetime import datetime, timezone

logger = logging.getLogger("mcp_server")

@app.call_tool()
async def call_tool(name: str, arguments: dict):
    start = time.perf_counter()
    status = "success"
    error_type = None

    try:
        result = await dispatch_tool(name, arguments)
        return result

    except Exception as e:
        status = "error"
        error_type = type(e).__name__
        raise

    finally:
        elapsed_ms = (time.perf_counter() - start) * 1000
        logger.info(json.dumps({
            "event": "tool_call",
            "tool": name,
            "status": status,
            "error_type": error_type,
            "duration_ms": round(elapsed_ms, 2),
            "timestamp": datetime.now(timezone.utc).isoformat()
        }))

JSON logs ship cleanly to any aggregator: Cloudwatch, Datadog, Grafana, whatever your stack uses. You can then build a dashboard that shows tool call latency percentiles, error rates by tool, and daily usage trends. That's the kind of visibility that lets you run MCP in production with confidence rather than hope.

Deploying Your MCP Server

I run MCP servers in three configurations depending on the client's requirements. Here's how I think about each one.

Serverless (Cloud Run)

For most production MCP servers, Cloud Run is my default. You push a container, Cloud Run scales it to zero when idle and spins up instantly when called. You pay per invocation. For a business whose AI agents make 1,000 tool calls a day, that's often under $5/month in compute.

# Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]

# deploy.sh
gcloud run deploy crm-mcp-server \
  --source . \
  --region us-central1 \
  --no-allow-unauthenticated \
  --set-env-vars DATABASE_URL="$DATABASE_URL" \
  --memory 512Mi \
  --timeout 30s

The --no-allow-unauthenticated flag means Google Cloud IAM handles authentication before requests even reach your server. Your AI client gets a service account key. Clean, auditable, and you don't have to implement auth yourself.

Self-Hosted VPS

Some clients need data to stay on-premises or have compliance requirements that rule out managed cloud services. In those cases I run the MCP server on a VPS behind nginx with TLS termination:

# nginx config
server {
    listen 443 ssl;
    server_name mcp.internal.company.com;

    ssl_certificate /etc/ssl/certs/server.crt;
    ssl_certificate_key /etc/ssl/private/server.key;

    location /mcp {
        proxy_pass http://localhost:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_read_timeout 60s;
    }
}

Run the server with systemd for automatic restarts and startup on boot. Add log rotation. Nothing fancy, but reliable.

Local stdio for Claude Desktop

For individual users who want to give Claude access to local tools — their own file system, a local database, private APIs — stdio transport and Claude Desktop is the simplest path. The server runs locally, the credentials never leave the machine, and setup takes about 10 minutes once the server is written.

Cloud Run handles scaling, SSL, and zero-idle billing for most production MCP deployments

Real Use Cases and the ROI That Comes With Them

Abstract protocols are easy to explain but hard to justify to a CFO. Here's what MCP actually looks like in production deployments I've built, with specific numbers where I have them.

CRM Data Access for a B2B SaaS Team

A 40-person B2B SaaS company had their account managers spending 45 minutes per day pulling customer data from Salesforce to answer questions in Slack. Their AI agent previously had a custom Salesforce connector that required a full-time developer to maintain as Salesforce updated its API.

We replaced the custom connector with an MCP server exposing four tools: search accounts, get account timeline, create activity log, get open opportunities. The AI agent now answers Salesforce questions instantly. The maintenance burden dropped to near zero because the MCP server abstracts the Salesforce API — when Salesforce changes something, I update the server once, and every AI client that uses it gets the fix automatically.

Time savings: roughly 45 minutes × 8 account managers × 22 working days = 132 hours/month. At a loaded cost of $80/hour, that's $10,560/month in recovered productivity. The MCP server took three days to build and costs about $8/month to run.

Document Intelligence for a Legal Services Firm

A legal services firm had over 50,000 contracts in Google Drive. Associates spent hours per week manually searching documents to answer "has this client signed an NDA with us?" and "what's the expiry date on this vendor agreement?"

An MCP server with two tools — search documents by metadata and extract clause text — combined with a vector search index let their AI assistant answer those questions in under 10 seconds. The server pulls documents from Drive, runs them through a local embedding model, and returns relevant excerpts. No data leaves their infrastructure. Total build time: five days. Monthly savings in associate hours: the firm estimated 60+ hours at $150/hour billed rate. That's real money.

This is the type of work I cover in my production AI agents guide — the cases where the ROI is clear and the technical risk is manageable. If you're trying to figure out whether your business is ready for this kind of system, the AI Readiness Assessment is a good place to start.

E-Commerce Inventory Agent

One of my e-commerce clients runs a 7-figure Shopify store with 2,800 SKUs across three warehouses. Their buying team was making reorder decisions from a spreadsheet that got updated weekly.

An MCP server connected to their inventory management system, Shopify, and their 3PL's API gave their AI agent real-time stock levels, velocity data, and supplier lead times. The agent now flags reorder needs proactively, drafts purchase orders, and updates the buying team's Notion dashboard. The MCP layer means any future AI tool their team adopts can plug into the same data without a new integration.

For more on how to decide between agents and simpler automation for use cases like this, read my breakdown on when AI agents actually make sense.

Adding Resources and Prompts

Once your tools are stable, resources and prompts unlock the next level of capability.

Resources make sense when the AI needs to read large, stable data that would be wasteful to query through a tool every time. An employee handbook, a product specification document, a pricing table that updates monthly. You define a resource URI and a handler that returns the content:

@app.list_resources()
async def list_resources() -> list[types.Resource]:
    return [
        types.Resource(
            uri="company://handbook",
            name="Employee Handbook",
            description="Current employee policies and procedures",
            mimeType="text/plain"
        )
    ]

@app.read_resource()
async def read_resource(uri: str) -> str:
    if uri == "company://handbook":
        return load_handbook_text()  # fetch from S3, DB, wherever
    raise ValueError(f"Unknown resource: {uri}")

Prompts are less about automation and more about giving users in Claude Desktop (or any MCP-compatible UI) quick access to standard workflows. A "weekly summary" prompt that automatically populates the last 7 days of activity data, or a "new client onboarding" prompt that pulls the relevant account details. Useful for teams adopting AI tooling who want guided workflows rather than open-ended chat.

Testing Your MCP Server

MCP servers are easy to under-test because the protocol layer hides bugs that only show up at runtime. Three testing patterns I always include:

Unit tests for tool handlers: Test the logic functions directly, not through the protocol. Pass a dict, get a result. These run fast and catch most logic bugs.

Integration tests with the MCP test client: The SDK includes a test client that lets you call your server programmatically without a real AI client. Use this to verify tool discovery, input validation, and error handling.

Contract tests against live data: At least once per release, run your tools against a staging version of your real data source. This catches schema drift, API changes, and permission issues that unit tests can't see.

For n8n users who are also building MCP integrations: my n8n AI agent guide covers how to use n8n as an MCP client to orchestrate multiple servers, which is a common pattern for businesses that want visual workflow management on top of protocol-based tool access.

Contract testing against real data sources catches issues that unit tests miss

Where MCP Is Headed

The 2026 trajectory for MCP is clear: it's becoming infrastructure, not a feature. The major AI providers — Anthropic, OpenAI, Google, Microsoft — all support it or are moving toward it. Autodesk helped shape the enterprise authentication spec. Block and Stripe are running it in production finance systems.

The next frontier is agent-to-agent MCP: AI agents acting as MCP clients to other AI agents. One agent orchestrates a research task, delegates to a data retrieval agent via MCP, gets results back, and continues. This is the multi-agent architecture pattern I cover in the Agentic RAG guide, now with a standardized protocol layer beneath it.

If you're building AI systems today and you're not thinking about MCP as your integration standard, you're building technical debt into every tool you wire up. The work you do on custom connectors now will need to be redone — or it will become the maintenance burden that kills the project two years from now.

The protocol is stable, the ecosystem is massive, and the ROI math is obvious. This is a good time to start.

Frequently Asked Questions

What is Model Context Protocol (MCP) used for?

MCP is used to connect AI models like Claude to external tools, databases, APIs, and data sources through a standardized protocol. Instead of building custom integrations for each combination of AI and tool, you build one MCP server that works with any MCP-compatible AI client. Common uses include connecting AI agents to CRM systems, databases, internal wikis, code repositories, and communication tools like Slack or Jira.

Is MCP only for Claude and Anthropic products?

No. Anthropic open-sourced MCP in November 2024, and it has since been adopted by many other AI platforms including Cursor, Windsurf, Zed, and custom agent frameworks. OpenAI and Google have also indicated support. Any developer can build an MCP server or client using the official SDKs, and the protocol is not tied to any specific AI model or vendor.

How is MCP different from function calling / tool use?

Tool use or function calling is a capability built into individual AI models — each model has its own format and API. MCP is a protocol layer on top of that: a standardized way for AI clients to discover and call tools regardless of which model they're using. Think of it as the difference between a specific charging cable format (tool calling per model) and the USB-C standard (MCP). The same MCP server works with any AI client that speaks the protocol.

What language should I use to build an MCP server?

The official SDKs support Python and TypeScript. Python is the better choice for data-heavy servers (database queries, ML pipelines, document processing). TypeScript works well for JavaScript-based services and anything already running in a Node.js stack. Community SDKs exist for Rust, Go, Java, and C#, but the official SDKs have the best documentation and receive updates first when the spec changes.

How do I authenticate an MCP server in production?

The November 2025 MCP specification standardizes OAuth 2.1 for remote servers using Streamable HTTP transport. For simpler setups, API key authentication enforced at the HTTP layer works well for internal services. If you're deploying on Google Cloud Run, you can use Cloud IAM to handle authentication before requests reach your server. Never deploy a remote MCP server without some form of authentication — a 2025 security scan found most public MCP servers had none, leaving the underlying data systems fully exposed.

Can MCP servers handle multiple concurrent requests?

Yes. Streamable HTTP servers are standard ASGI web services and handle concurrency the same way any async Python server does. With FastAPI and uvicorn, a single process can handle dozens of concurrent tool calls. For higher throughput, add multiple workers or deploy behind an auto-scaling serverless platform like Cloud Run. The MCP protocol itself is stateless per request, which makes horizontal scaling straightforward.

What are the main security risks with MCP servers?

The main risks are: missing authentication (exposing your data systems to anyone who finds the endpoint), insufficient input validation (allowing injection attacks through tool parameters), and overly broad permissions (giving the AI access to delete or modify data when it only needs read access). Follow the principle of least privilege — only expose the tools a specific client needs, and scope database access to exactly the operations those tools require. Log all tool calls for audit purposes.

How long does it take to build a production MCP server?

A simple read-only server with two or three tools takes one to two days including testing and deployment. A server with write operations, proper authentication, error handling, structured logging, and a deployment pipeline takes three to five days. The protocol itself is straightforward — the time goes into understanding the underlying system you're integrating, writing solid input validation, and setting up observability. Complex servers connecting to enterprise systems with custom auth requirements can take up to two weeks.

Citation Capsule: MCP server downloads grew from ~100,000 per month in November 2024 to over 8 million by April 2025. Over 5,800 MCP servers are now available in the ecosystem, and 97M+ monthly SDK downloads were recorded as of December 2025. A 2025 security scan of publicly exposed MCP servers found most had no authentication. Sources: Deepak Gupta MCP Enterprise Guide 2025, MCP Security Research ArXiv 2025, MCP Specification November 2025.

DEV Community