DEV Community

TheProdSDE
TheProdSDE

Posted on • Originally published at pub.towardsai.net

MCP for Backend Engineers: When to Use It (and When to Skip It)

"If you only have one consumer, MCP is a liability."

Subtitle: Real code, real cost, and an honest answer to the question every AI team gets wrong.


TL;DR

  • 1 app · 1 team · 1 toolset → use plain tool calling
  • Multiple teams or AI surfaces → use MCP
  • Existing APIs → wrap them in MCP
  • Early-stage / MVP → skip MCP

MCP is a scaling decision, not a starting point.


Why This Exists

A client had a simple ask: "Let our AI assistant use our REST API."

Three teams built three integrations. All three were wrong:

- Model called /getCustomer when it needed /getOrders
- Auth header dropped silently on retry
- Three teams hardcoded three different endpoint assumptions

Result: 3 integrations. 3 slightly different bugs. 1 very awkward client call.
Enter fullscreen mode Exit fullscreen mode

That's when MCP became the answer. One server wrapping the API. Any AI host connects to it. The client got exactly what they asked for — and we got a pattern reused three times since.

MCP is not about smarter models — it's about cleaner boundaries.


MCP in One Line

MCP = API Gateway + Plugin System + Contract Layer for LLMs

Without MCP With MCP
Duplicate integrations Single contract
Auth bugs everywhere Centralized auth
Hardcoded tools Dynamic discovery
Fragile wrappers Reusable servers
Rebuild per host Plug into existing system

The Hero Diagram: MCP vs Tool Calling

This is the architecture difference that actually matters.

graph TD
    subgraph TC["❌ Tool Calling — Tightly Coupled"]
        direction TB
        TC_H["AI Host"]
        TC_T1["Tool: get_customer()"]
        TC_T2["Tool: list_orders()"]
        TC_T3["Tool: check_inventory()"]
        TC_H --> TC_T1
        TC_H --> TC_T2
        TC_H --> TC_T3
    end

    subgraph MCP["✅ MCP — Loosely Coupled via Contract"]
        direction TB
        MH1["Host: Web Chat"]
        MH2["Host: Slack Bot"]
        MH3["Host: VS Code"]

        MS1["customer-mcp-server\nOwner: CRM Team"]
        MS2["payments-mcp-server\nOwner: Payments Team"]
        MS3["infra-mcp-server\nOwner: Platform Team"]

        MH1 --> MS1
        MH1 --> MS2
        MH2 --> MS1
        MH2 --> MS3
        MH3 --> MS2
        MH3 --> MS3
    end
Enter fullscreen mode Exit fullscreen mode

The shift: Tool calling is a local decision. MCP is an organizational contract.


What MCP Actually Is

Model Context Protocol (MCP) is an open protocol that standardises how AI applications connect to external tools and data sources. Think of it as USB-C for AI tools — one connector, many servers, any host.

Instead of every IDE, chat UI, or agent framework inventing its own plugin format, MCP defines a common contract using JSON-RPC 2.0 over Streamable HTTP or stdio.

Three roles in the spec:

  • Host — the application the user sees (IDE, CLI, chat UI, agent framework)
  • Client — a connector inside the host that speaks the MCP protocol
  • Server — a service that exposes tools, resources, and prompts to the model

The client sits between host and server. It connects to one server at a time and speaks the protocol; the host can orchestrate multiple clients simultaneously.

graph TD
    subgraph HOST["AI Host (Orchestrates Everything)"]
        direction TB
        H["Orchestrator Logic + LLM Calls"]
        C1["MCP Client 1"]
        C2["MCP Client 2"]
        H --> C1
        H --> C2
    end

    subgraph SERVERS["MCP Servers (Owned per Team)"]
        S1["customer-mcp-server"]
        S2["infra-mcp-server"]
    end

    C1 --> S1
    C2 --> S2
Enter fullscreen mode Exit fullscreen mode

Key Insight: The model never talks to your backend directly. It only talks to tools exposed by MCP servers. That's the boundary.


Case Study: Wrapping a Client REST API

Before: Three separate UIs (web, Slack bot, IDE) called the REST API directly. Each integrator made slightly different assumptions about endpoints and auth — producing three brittle integrations and repeated bugs.

After: One MCP server wrapping the existing API. Same tool list and prompts exposed to every host.

Results in production:

  • Integration time reduced by ~60% for new hosts
  • Tool-call auth failures dropped from multiple incidents to zero in the first month (root cause: centralized token handling)
  • Two new hosts onboarded with zero backend changes

This is the practical payoff: a small upfront cost for long-term reduction in duplicated integration work and clearer ownership.


Should You Use MCP? — Decide Before Reading Further

Most tutorials make you read 80% before answering this. Here it is upfront.

If you pick MCP too early, you're trading ~1–2 extra days of setup for zero immediate benefit. Start with the simplest thing. Reach for MCP when the "three wrappers" problem is already real.

Three questions that determine everything:

  1. How many consumers will call these tools?

    • One app, one team → plain tool calling, done
    • Multiple apps or teams → MCP starts making sense
  2. Who owns the tools vs who owns the AI host?

    • Same team owns both → plain tool calling, tight coupling is fine
    • Different teams → MCP gives you the clean boundary
  3. Do these tools already exist as APIs?

    • Yes → wrap them as an MCP server, zero changes to existing backend
    • No → build them either way, MCP adds structure from day one

Five Real Scenarios

Scenario 1 — Internal chatbot for one team
A team wants an AI that queries their own database and sends Slack alerts. One app, one team, they control everything.
Plain tool calling. MCP adds overhead with zero benefit here.

Scenario 2 — Your client's REST API (the exact story above)
Client has an existing API. Wants multiple AI surfaces to consume it — web app today, Slack bot next month, VS Code extension later.
MCP. Wrap the API once, every host connects via the same protocol.

Scenario 3 — Enterprise platform, multiple teams
Payments team, infra team, CRM team — each owns their domain. One central AI console orchestrates across all three.
MCP per domain. Each team ships and secures their own server.

Scenario 4 — Quick prototype / MVP
You're validating an idea over a weekend. Speed matters more than architecture.
Plain tool calling always. Get the idea working first. MCP is a refactor you do when it proves valuable.

Scenario 5 — Agentic workflow with long-running tasks
Agent coordinates across multiple systems, tasks need to be traceable and retryable across sessions.
MCP. The clean server/client boundary makes observability and retry logic significantly easier to implement.

Decision Flow

flowchart TD
    Start([Start])
    Start --> Prototype{Prototype / MVP?}
    Prototype -- Yes --> Plain[Plain tool calling]
    Prototype -- No --> Multi{Multiple hosts / teams?}
    Multi -- No --> Plain
    Multi -- Yes --> Ownership{Same team owns tools & host?}
    Ownership -- Yes --> Plain
    Ownership -- No --> APICheck{APIs already exist?}
    APICheck -- Yes --> MCP[Use MCP]
    APICheck -- No --> Consider[Build either way — MCP adds structure]
    classDef recommend fill:#f9f,stroke:#333,stroke-width:1px;
    class MCP recommend;
Enter fullscreen mode Exit fullscreen mode

Key Insight: MCP is triggered by scale of ownership and consumption — not complexity.


Where MCP Is a Bad Idea

MCP is the wrong choice when:

  • You're at the early-stage or MVP phase
  • Only one host consumes the tools
  • One team owns the full stack
  • Latency is business-critical
  • Tool shapes are still evolving

If you only have one consumer, MCP is architecture cosplay.


Core MCP Concepts

Server-side primitives

1. Tools
Executable actions — often with side effects. The LLM decides when to call them.
Examples: create_invoice, deploy_service, run_sql_query.

2. Resources
Read-only data and context the model is allowed to see — not actions.
Examples: docs, config files, logs, user profile JSON, query results.

3. Prompts
Reusable workflow templates — Postman collections for LLM behaviour. Server-defined recipes the host can invoke.
Examples: bug_report_prompt, deployment_checklist_prompt.

Client-side capabilities

4. Sampling
The server asks the client: "Run a model completion for me using your model access." Useful when servers don't hold model keys.

5. Roots
The server asks: "What file paths or URIs can I operate on?" Organises where tools can act.

6. Elicitation
The server asks the client to collect more information from the user.
Example: "Ask the user which environment to deploy to: dev/staging/prod."


Request / Response Lifecycle

The most important thing to understand: the Host orchestrates LLM calls — not the MCP Client. The client only handles server communication. This distinction is what most sequence diagrams get wrong.

sequenceDiagram
    autonumber
    participant U as User
    participant H as Host / Orchestrator
    participant L as LLM
    participant C as MCP Client
    participant S as MCP Server

    U ->> H : Query
    H ->> C : discover tools
    C ->> S : GET /mcp/tools
    S -->> C : tool schemas
    C -->> H : return schemas
    H ->> L : prompt + tool schemas
    L -->> H : tool call decision
    H ->> C : execute tool call
    C ->> S : call_tool(name, args)
    S -->> C : tool result
    C -->> H : return result
    H ->> L : result + continue prompt
    L -->> H : final answer
    H -->> U : output
Enter fullscreen mode Exit fullscreen mode

Key Insight: The model decides what to call, but the server controls what exists. That's the contract.


Minimal MCP Server (Python)

Three things to watch in this block:

  • @mcp.tool() is an executable action with side effects
  • @mcp.resource() is read-only context — the model sees it, doesn't call it
  • The /health endpoint is not optional — Container Apps and AKS need it for readiness probes. Skip it and your pods restart on a loop.
# server.py
# requires: fastmcp>=2.0.0, python-dotenv

import os
import uuid
from dotenv import load_dotenv
from fastmcp import FastMCP, Resource

load_dotenv()

mcp = FastMCP(
    name="customer-service",
    instructions="Customer service MCP server.",
)

# ── Tool: get_customer ────────────────────────────────────────────────────────
@mcp.tool()
async def get_customer(customer_id: str) -> dict:
    """Fetch customer profile by ID."""
    # Replace with your real DB call
    return {
        "customer_id": customer_id,
        "name": "Priya Sharma",
        "email": "priya@example.com",
        "tier": "premium",
        "last_order_date": "2026-03-15",
    }

# ── Tool: get_orders ─────────────────────────────────────────────────────────
@mcp.tool()
async def get_orders(customer_id: str, limit: int = 5) -> dict:
    """Fetch recent orders for a customer."""
    return {
        "customer_id": customer_id,
        "orders": [
            {"order_id": f"ORD-{i}", "status": "delivered", "amount": 1200 + i * 50}
            for i in range(1, limit + 1)
        ],
    }

# ── Tool: create_support_ticket ───────────────────────────────────────────────
@mcp.tool()
async def create_support_ticket(
    customer_id: str,
    issue: str,
    priority: str = "medium",
) -> dict:
    """Create a support ticket for a customer issue."""
    return {
        "ticket_id": f"TKT-{uuid.uuid4().hex[:8].upper()}",
        "customer_id": customer_id,
        "issue": issue,
        "priority": priority,
        "status": "open",
    }

# ── Resource: customer playbook ───────────────────────────────────────────────
@mcp.resource("customer-playbook")
async def customer_playbook() -> Resource:
    """Internal guidance for handling customer incidents."""
    content = """
# Customer Handling Playbook

- Always greet the customer by name.
- Premium customers: consider goodwill credit for severe issues.
- Escalate to L3 if incident impacts > 10 customers.
- SLA: 4 hours for premium, 24 hours for standard.
"""
    return Resource(
        name="customer_playbook",
        description="Internal guidance for handling customer incidents.",
        mimeType="text/markdown",
        content=content,
    )

# ── Prompt: email draft ───────────────────────────────────────────────────────
@mcp.prompt("email_draft_prompt")
async def email_draft_prompt():
    """Reusable workflow template for customer apology emails."""
    return {
        "name": "email_draft_prompt",
        "description": "Template for customer apology emails.",
        "messages": [{
            "role": "system",
            "content": (
                "You are a senior customer success manager.\n"
                "Write a short apology email to {{customer_name}} about: {{issue}}.\n"
                "Tone: professional, empathetic. Under 100 words."
            ),
        }],
    }

# ── Health check — required for ACA and AKS readiness probes ─────────────────
@mcp.custom_route("/health", methods=["GET"])
async def health():
    from fastapi.responses import JSONResponse
    return JSONResponse({"status": "ok", "server": "customer-mcp"})

if __name__ == "__main__":
    mcp.run(
        transport="streamable-http",
        host="0.0.0.0",
        port=int(os.getenv("PORT", 8000)),
    )
Enter fullscreen mode Exit fullscreen mode

Start it:

pip install "fastmcp>=2.0.0" python-dotenv
python server.py
# Live at http://localhost:8000/mcp
Enter fullscreen mode Exit fullscreen mode

Inspect without writing a single line of client code:

npx @modelcontextprotocol/inspector http://localhost:8000/mcp
Enter fullscreen mode Exit fullscreen mode

MCP Client + Agent Loop

The key line is client.list_tools() — tools are discovered dynamically from the server, not hardcoded in the client. Every host that connects to this server gets the same tool list automatically, even as you add new tools.

# client.py
# requires: fastmcp>=2.0.0, openai>=1.0.0, python-dotenv

import asyncio
import json
import os
from dotenv import load_dotenv
from fastmcp import Client
from openai import AsyncOpenAI

load_dotenv()

openai_client = AsyncOpenAI(api_key=os.getenv("OPENAI_API_KEY"))
MCP_URL = os.getenv("MCP_SERVER_URL", "http://localhost:8000/mcp")


async def run(query: str) -> str:
    async with Client(MCP_URL) as client:

        # Discover tools from the server — not hardcoded here
        tools = await client.list_tools()
        openai_tools = [
            {
                "type": "function",
                "function": {
                    "name": t.name,
                    "description": t.description,
                    "parameters": t.inputSchema,
                },
            }
            for t in tools
        ]

        print(f"\n[MCP] {len(tools)} tools discovered: {[t.name for t in tools]}")
        messages = [{"role": "user", "content": query}]

        while True:
            resp = await openai_client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=openai_tools,
                tool_choice="auto",
            )
            msg = resp.choices[0].message
            messages.append(msg)

            # No tool calls = final answer
            if not msg.tool_calls:
                return msg.content

            # Execute each tool call via the MCP server
            for tc in msg.tool_calls:
                args = json.loads(tc.function.arguments)
                print(f"[Tool] {tc.function.name}({args})")

                result = await client.call_tool(tc.function.name, args)
                tool_content = (
                    result.content[0].text if result.content else "{}"
                )
                messages.append({
                    "role": "tool",
                    "tool_call_id": tc.id,
                    "content": tool_content,
                })


if __name__ == "__main__":
    queries = [
        "Look up customer C-123 and tell me their tier.",
        "Get the last 3 orders for customer C-123.",
        "C-123 had a late delivery. Open a ticket and draft an apology email.",
    ]
    for q in queries:
        print(f"\n[Query] {q}")
        print(asyncio.run(run(q)))
        print("" * 60)
Enter fullscreen mode Exit fullscreen mode

Run it (server must be up):

python client.py
Enter fullscreen mode Exit fullscreen mode

Auth — The Part Most Tutorials Skip

TL;DR:
Internal / pod-to-pod → skip auth entirely, use network policy.
Public-facing → OAuth 2.1 + PKCE, no exceptions. The spec mandates it.
Everything else is a practical middle ground.

For remote MCP servers (Streamable HTTP), the spec prescribes OAuth 2.1 Authorization Code + PKCE. The MCP client acts as the OAuth client; your server acts as the resource server; an Authorization Server (Keycloak, Auth0, Azure AD) issues the tokens.

The flow:

  1. Client hits server without a token → 401 with WWW-Authenticate pointing to /.well-known/oauth-protected-resource
  2. Client discovers the auth server from that metadata
  3. Client runs OAuth 2.1 Authorization Code + PKCE
  4. Client retries MCP requests with Authorization: Bearer <token>
  5. Server validates the token and serves tools/resources

Here's the token validation middleware — the piece that took longest to get right because the aud (audience) claim check is easy to miss. Without it, a token issued for a different service will pass validation silently.

# auth_middleware.py
import os
import httpx
from authlib.jose import jwt, JsonWebKey
from fastapi import Request
from fastapi.responses import JSONResponse

MCP_RESOURCE_ID = os.getenv("MCP_RESOURCE_ID", "https://your-mcp-server/")
AUTH_SERVER_URL  = os.getenv("AUTH_SERVER_URL", "")
_jwks_cache: dict | None = None

SKIP_PATHS = {"/health", "/.well-known/oauth-protected-resource"}


async def fetch_jwks() -> dict:
    global _jwks_cache
    if _jwks_cache:
        return _jwks_cache
    async with httpx.AsyncClient() as http:
        r = await http.get(f"{AUTH_SERVER_URL}/protocol/openid-connect/certs")
        r.raise_for_status()
        _jwks_cache = r.json()
    return _jwks_cache


async def oauth_middleware(request: Request, call_next):
    if request.url.path in SKIP_PATHS:
        return await call_next(request)

    auth = request.headers.get("Authorization", "")
    if not auth.startswith("Bearer "):
        return JSONResponse(
            status_code=401,
            content={"error": "missing_token"},
            headers={"WWW-Authenticate": f'Bearer realm="{MCP_RESOURCE_ID}"'},
        )

    token = auth.removeprefix("Bearer ").strip()

    try:
        jwks   = await fetch_jwks()
        claims = jwt.decode(token, JsonWebKey.import_key_set(jwks))
        claims.validate()

        # ── The check most tutorials miss ───────────────────────────
        aud = claims.get("aud", [])
        if isinstance(aud, str):
            aud = [aud]
        if MCP_RESOURCE_ID not in aud:
            return JSONResponse(status_code=403, content={"error": "wrong_audience"})

        request.state.user = dict(claims)
        return await call_next(request)

    except Exception as e:
        return JSONResponse(status_code=401, content={"error": str(e)})
Enter fullscreen mode Exit fullscreen mode

Wire it into your server:

# Add to server.py after: mcp = FastMCP(...)
from auth_middleware import oauth_middleware
mcp.app.middleware("http")(oauth_middleware)
Enter fullscreen mode Exit fullscreen mode

Real-World Cost (Numbers, Not Theory)

Expect:

  • +3–100ms MCP protocol overhead per call (varies by gateway)
  • +280–950ms total round-trip latency including tool execution (p50–p95)
  • +1.8–4.2 seconds for multi-step agentic flows (3-agent chains, p50–p95)
  • Extra infra: containers, auth service, monitoring
  • Non-trivial engineering overhead during setup

If your system is latency-sensitive or single-tool focused, MCP will feel slower — because it is.

You're buying reuse and consistency, not speed.


Observability & Debugging

Make MCP servers and client-host interactions observable from day one:

  • Trace every request — propagate a correlation ID across host → client → server → backend and include it in logs and responses
  • Metrics to trackmcp_tool_call_latency, mcp_tool_call_errors, mcp_discovery_time, mcp_auth_failures, per-tool success rates
  • Structured JSON logs(timestamp, level, correlation_id, tool, args, duration, error) to make postmortems fast
  • Distributed tracing — use OpenTelemetry to connect host traces to server traces for full request/response visibility
  • Troubleshooting checklist — confirm /.well-known/oauth-protected-resource, check token aud claim, verify /health readiness, replay failing tool calls against a local mock server

Testing & QA

Treat the MCP server like a public contract:

  • Contract tests — verify the tool list and input/output schemas remain compatible across releases. Fail CI on breaking schema changes.
  • Unit tests — each @mcp.tool() should have unit tests for edge cases
  • Integration tests — run the server against a mocked backend and a simulated client that performs discovery + tool calls
  • Smoke tests — lightweight end-to-end checks that run on deploy (e.g., a small request to get_customer and get_orders)

Production Checklist

  • [ ] Health/readiness: expose /health and use readiness probes (ACA/AKS)
  • [ ] Replicas: min-replicas >= 1 for interactive workloads to avoid cold starts
  • [ ] Auth: validate aud claim and enforce least-privilege scopes
  • [ ] Rate limiting & quotas: protect downstream systems from runaway agents
  • [ ] Secrets: store credentials in Key Vault / secret manager — avoid plain env vars for DB credentials
  • [ ] Observability: metrics, logs, and traces wired to your platform
  • [ ] Cost & latency: budget for an extra 280–950ms network hop per tool call

Deploying on Azure

MCP is not Azure-specific — it runs on any platform that supports HTTP. But if you're already in the Azure ecosystem, here's the honest breakdown.

flowchart TD
    A["Infra context?"]
    A -->|Greenfield| B["Azure Container Apps\n✅ Default choice"]
    A -->|Existing cluster| C["AKS + Workload Identity"]
    A -->|Low usage / event-driven| D["Azure Functions"]
    A -->|Existing App Service| E["App Service /mcp"]
Enter fullscreen mode Exit fullscreen mode

Azure Deployment Decision Table

Situation Best fit Why
Greenfield, no existing infra Azure Container Apps Least ops overhead, managed ingress
Already running AKS AKS + Ingress Reuse cluster auth, colocate workloads
Multiple servers, same team ACA shared environment Internal discovery, Dapr sidecars
Infrequent tools / cost-sensitive Azure Functions Per-invocation billing, zero idle cost
Existing App Service web app App Service /mcp Zero infrastructure change
Enterprise, Istio service mesh AKS + mTLS Zero trust alongside OAuth 2.1

Option 1 — Azure Container Apps (start here)

Recommended for most teams starting fresh. Handles autoscaling, ingress, Entra ID integration, and scale-to-zero with no MCP-specific platform knowledge needed.

RG="mcp-rg"
ACR="<your-acr-name>"

# Build and push
az acr build --registry $ACR --image customer-mcp-server:latest .

# Create environment
az containerapp env create \
  --name mcp-env --resource-group $RG --location eastus

# Store credentials in Key Vault — never directly in env vars
az keyvault secret set \
  --vault-name mcp-keyvault \
  --name database-url \
  --value "postgresql://user:pass@host:5432/db"

# Deploy
az containerapp create \
  --name customer-mcp-server \
  --resource-group $RG \
  --environment mcp-env \
  --image $ACR.azurecr.io/customer-mcp-server:latest \
  --target-port 8000 \
  --ingress external \
  --min-replicas 1 \
  --max-replicas 10 \
  --secrets db-url=keyvaultref:<key-vault-secret-uri> \
  --env-vars DATABASE_URL=secretref:db-url
Enter fullscreen mode Exit fullscreen mode

min-replicas 1 is critical for interactive use — scale-to-zero causes cold start that breaks mid-conversation flow. Expect ~1–3s latency even on warm replicas; design your agent loop timeouts accordingly (30s is a safe outer bound).

Option 2 — AKS (if you already run a cluster)

Use Workload Identity over storing credentials in env vars. Reuse cluster auth and network policies.

# mcp-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: customer-mcp-server
  namespace: mcp-system
spec:
  replicas: 2
  selector:
    matchLabels:
      app: customer-mcp-server
  template:
    metadata:
      labels:
        app: customer-mcp-server
    spec:
      containers:
        - name: mcp-server
          image: <your-acr>.azurecr.io/customer-mcp-server:latest
          ports:
            - containerPort: 8000
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: mcp-secrets
                  key: database-url
          readinessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 8000
            initialDelaySeconds: 15
            periodSeconds: 20
---
apiVersion: v1
kind: Service
metadata:
  name: customer-mcp-server
  namespace: mcp-system
spec:
  selector:
    app: customer-mcp-server
  ports:
    - port: 8000
      targetPort: 8000
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: mcp-ingress
  namespace: mcp-system
spec:
  rules:
    - host: mcp.yourdomain.internal
      http:
        paths:
          - path: /mcp
            pathType: Prefix
            backend:
              service:
                name: customer-mcp-server
                port:
                  number: 8000
Enter fullscreen mode Exit fullscreen mode
kubectl create namespace mcp-system
kubectl apply -f mcp-deployment.yaml
kubectl get pods -n mcp-system
Enter fullscreen mode Exit fullscreen mode

Option 3 — Azure Functions (infrequent, event-driven tools)

Best for tools called rarely — scheduled jobs, webhook-triggered actions, batch processors. Per-invocation billing means zero idle cost. Not suitable for interactive agent loops — cold start breaks conversational flow.

Option 4 — App Service (add /mcp to an existing service)

Mount the MCP server on /mcp alongside existing routes. Zero infrastructure change. Lowest friction path for adding MCP to a service that already exists.


Common Failure Modes

  • Token misconfiguration — auth breaks silently on retry
  • Tool explosion — too many vaguely named tools confuse the model
  • Latency stacking — chained MCP calls compound response time (budget 280–950ms per hop)
  • Business logic duplication — logic living in both the server and the calling agent
  • Weak observability — no tracing across the client–server boundary
  • No ownership boundaries — multiple teams modifying the same server

Critical rule: One MCP server = one owning team. Violating this recreates the exact problem MCP was meant to solve.


What MCP Does NOT Solve

  • Bad tool design
  • Weak or vague prompts
  • Latency at the model level
  • Capability gaps in the underlying LLM
  • Business logic duplication inside your tools

MCP standardises access. It does not improve what you expose.


Migration Path

Don't start with MCP. Migrate to it when the pain appears:

  1. Start with plain tool calling
  2. Identify tools being duplicated across teams or surfaces
  3. Wrap those shared tools in an MCP server
  4. Add auth, deployment, and observability incrementally

Avoid premature standardisation. It's just premature optimisation with a fancier name.


Further Resources


If this saved you from building an MCP server you didn't need — hit clap. It takes one second and tells the algorithm this is worth distributing. If you've already wired MCP into production, drop your setup in the comments — specifically curious what the auth layer looks like on your end.


The Honest Verdict

MCP won't make your architecture simpler. It will make it honest — every tool, resource, and auth scope exactly where it belongs, exposed through a contract any host can consume.

If you're building for one app, one team: skip it. Plain tool calling ships faster and is entirely adequate.

But if you're building something multiple systems or teams will consume — especially if those tools already exist as APIs — MCP ends the "three copies of the same integration, each slightly wrong" problem permanently.

That's the only problem it solves. It solves it very well.


Related Reading


Written by TheProdSDE · Follow for production-grade AI systems, backend architecture, and honest engineering takes.

Top comments (0)