Nerav Doshi

Posted on Jun 15 • Edited on Jun 24 • Originally published at pipelineandprompts.com

MCP Server Architecture for Platform Teams — Giving AI Live Access to Your Infrastructure

#platformengineering #kubernetes #devops #aiinthestack

Pipeline & Prompts | Byte size guides on DevOps, Cloud and AI

AI in the Stack #3

⚡ Byte Size Summary

MCP (Model Context Protocol) is the standard that lets AI agents interact with external systems — your cluster, your observability stack, your ticketing system — without bespoke integration code for every tool.

MCP directly addresses AI hallucination and 2AM incident response by grounding AI answers in live system state. It does not solve tribal knowledge alone — that needs RAG alongside it.

This article covers the production-grade architecture: what MCP servers are, how to design them for platform engineering use cases, and what you need to get right before running them anywhere near production.

In logistics, the hardest problems rarely come from missing data.

They come from disconnected systems.

The warehouse knows one thing. The transportation management system knows another. Inventory systems lag behind reality by hours. Operators work around the gaps manually — copying numbers between screens, making calls to confirm what the system should already know, carrying context in their heads because no single system has the full picture.

I spent years watching intelligent people solve problems that should not have existed, because the systems around them were designed to optimise locally rather than coordinate globally. The data was there. The capability was there. The coordination layer was not.

Modern infrastructure operations feel surprisingly similar.

Your Kubernetes cluster knows the state of every pod. Your observability stack knows the error rates and latency trends. Your ticketing system knows what changes were deployed in the last 24 hours. Your CI/CD pipeline knows what is currently in flight. And your AI assistant — the tool you are increasingly asking to help you reason about incidents — knows none of it, unless you paste it in manually.

Model Context Protocol is the coordination layer that changes this. Not by giving AI access to everything at once, but by giving it a structured, auditable, controlled way to request the context it needs, from the systems that have it, at the moment it needs it.

That is what this article is about.

What MCP Actually Is

Model Context Protocol (MCP) is an open standard, introduced by Anthropic, that defines how AI models communicate with external tools and data sources. Think of it as a common language that sits between an AI assistant and the systems it needs to interact with.

Before MCP, every AI integration was bespoke. You wanted your LLM to query your Kubernetes cluster? Write a custom function. You wanted it to check PagerDuty? Write another one. You wanted it to search your runbooks and open a Jira ticket? Three separate integrations, all maintained independently, all breaking in different ways when APIs change.

MCP replaces that with a standard. An MCP server exposes a set of tools — defined capabilities the AI can invoke — plus resources — data it can read. The AI client (Claude, Cursor, any MCP-compatible host) discovers what tools are available, decides which to call based on the user's question, calls them, and incorporates the results into its response.

The AI does not have direct access to your systems. It has access to an MCP server that mediates that access. That distinction matters enormously for security and governance — which is why this article spends as much time on architecture as on implementation.

Why Platform Engineers Should Care

The RAG pipeline from Article 02 was useful for static knowledge — runbooks, documentation, past incident reports. MCP is useful for live state.

When an engineer asks "what is causing the latency spike in the payments service right now?" — that is not a runbook question. It requires current pod status, recent deployment events, live error rates, and possibly the last three alerts that fired. None of that lives in a document. All of it lives in systems your MCP server can reach.

The distinction between what MCP solves and what it does not matters before you design anything.

AI hallucination — yes, directly. Hallucination happens when an LLM answers from training data instead of ground truth. MCP forces the AI to retrieve live, authoritative state before responding. It does not eliminate hallucination entirely — an LLM can still misinterpret what it retrieves — but it directly attacks the root cause for infrastructure questions.

2AM incidents — yes, directly. This is the primary operational use case. Instead of an engineer manually checking five systems in sequence while half-asleep, an AI with MCP access can pull pod status, recent events, and active alerts in a single query and reason across all of it simultaneously. Speed and context at the moment they are hardest to find.

Too many dashboards — partially. MCP does not reduce the number of dashboards in your environment. It gives an AI a way to query across the systems those dashboards represent, so an engineer asks one question instead of navigating five screens. The dashboards still exist. You stop having to drive them manually during an incident.

Tribal knowledge — not alone. MCP surfaces what your systems know. It does not surface what your team knows — the undocumented context that lives in people's heads, the runbook that exists nowhere in any system, the reason a service is named what it is. That is a RAG problem. The combination of RAG (for historical and human knowledge) and MCP (for live system state) is where the tribal knowledge gap actually starts to close. Neither alone is sufficient.

An AI that can read your runbooks and query your cluster simultaneously is a meaningful operational tool. An AI that can only do one of those things is a limited one.

MCP Server Architecture for Platform Engineering

A production-grade MCP server for a platform team has four layers:

Every tool invocation travels this path: the AI client sends a request, the Auth Gateway validates identity before anything reaches your infrastructure, the MCP server processes it through governance and audit controls, and the Kubernetes API Server enforces access policy independently of the application layer. Two enforcement gates — not one. That is the architecture the implementation sections below are built around.

The four layers in code:

Layer 1 — Governance First

Before writing a single tool definition, decide and enforce these three things:

Read-only by default. Every tool that touches production infrastructure should be read-only unless you have explicitly designed the write path with human approval steps. An MCP server that can kubectl delete anything is an incident waiting to happen. Start with read, earn trust, expand deliberately.

Audit logging. Every tool call should be logged with: timestamp, tool name, input parameters, calling session identity, and response status. This is your audit trail when something goes wrong. It is also how you demonstrate to your security team that AI is not a black box.

Rate limiting. An AI in an agentic loop can call tools hundreds of times in seconds. Without rate limiting, a runaway agent can exhaust your Kubernetes API quota, spam your ticketing system, or trigger alert storms in your observability stack. Set per-session and per-tool limits before you deploy.

Layer 2 — Backend Clients

The MCP server needs clients for each system it connects to. Keep these thin — their job is to call APIs and return structured data, not to contain business logic.

For a Kubernetes-connected MCP server, using the official kubernetes Python client:

# k8s_client.py
from kubernetes import client, config
from typing import Optional

class KubernetesClient:
    def __init__(self, in_cluster: bool = False):
        if in_cluster:
            config.load_incluster_config()
        else:
            config.load_kube_config()
        self.v1 = client.CoreV1Api()
        self.apps_v1 = client.AppsV1Api()

    def get_pod_status(self, namespace: str, pod_name: str) -> dict:
        pod = self.v1.read_namespaced_pod(name=pod_name, namespace=namespace)
        return {
            "name": pod.metadata.name,
            "namespace": pod.metadata.namespace,
            "phase": pod.status.phase,
            "conditions": [
                {"type": c.type, "status": c.status, "reason": c.reason}
                for c in (pod.status.conditions or [])
            ],
            "container_statuses": [
                {
                    "name": cs.name,
                    "ready": cs.ready,
                    "restart_count": cs.restart_count,
                    "state": str(cs.state)
                }
                for cs in (pod.status.container_statuses or [])
            ]
        }

    def list_failing_pods(self, namespace: Optional[str] = None) -> list[dict]:
        if namespace:
            pods = self.v1.list_namespaced_pod(namespace=namespace)
        else:
            pods = self.v1.list_pod_for_all_namespaces()

        failing = []
        for pod in pods.items:
            if pod.status.phase not in ("Running", "Succeeded"):
                failing.append({
                    "name": pod.metadata.name,
                    "namespace": pod.metadata.namespace,
                    "phase": pod.status.phase,
                    "reason": pod.status.reason
                })
        return failing

    def get_recent_events(self, namespace: str, limit: int = 20) -> list[dict]:
        events = self.v1.list_namespaced_event(
            namespace=namespace,
            limit=limit
        )
        return [
            {
                "type": e.type,
                "reason": e.reason,
                "message": e.message,
                "involved_object": e.involved_object.name,
                "count": e.count,
                "last_timestamp": str(e.last_timestamp)
            }
            for e in sorted(
                events.items,
                key=lambda x: x.last_timestamp or "",
                reverse=True
            )
        ]

Layer 3 — Tool Definitions

This is the layer the AI interacts with directly. Tool descriptions are not just documentation — they are what the LLM reads to decide whether to call the tool and how to format its inputs. Write them precisely.

# tools.py
from mcp.server import Server
from mcp.types import Tool, TextContent
import json
import logging

from k8s_client import KubernetesClient
from audit import log_tool_call

logger = logging.getLogger(__name__)
k8s = KubernetesClient(in_cluster=False)  # Set True when running inside the cluster


def register_tools(server: Server):

    @server.list_tools()
    async def list_tools():
        return [
            Tool(
                name="get_pod_status",
                description=(
                    "Get the current status of a specific Kubernetes pod, including phase, "
                    "readiness conditions, container states, and restart counts. "
                    "Use this when investigating why a specific pod is unhealthy or not ready."
                ),
                inputSchema={
                    "type": "object",
                    "properties": {
                        "namespace": {
                            "type": "string",
                            "description": "The Kubernetes namespace the pod is in"
                        },
                        "pod_name": {
                            "type": "string",
                            "description": "The exact name of the pod"
                        }
                    },
                    "required": ["namespace", "pod_name"]
                }
            ),
            Tool(
                name="list_failing_pods",
                description=(
                    "List all pods that are not in Running or Succeeded state across the cluster "
                    "or within a specific namespace. Use this as a first step when an incident "
                    "is reported and you need to identify which pods are affected."
                ),
                inputSchema={
                    "type": "object",
                    "properties": {
                        "namespace": {
                            "type": "string",
                            "description": "Optional: filter to a specific namespace"
                        }
                    }
                }
            ),
            Tool(
                name="get_recent_events",
                description=(
                    "Retrieve recent Kubernetes events for a namespace, ordered by most recent first. "
                    "Events capture warnings, errors, and state changes. Use this to understand "
                    "what happened in the cluster leading up to an issue."
                ),
                inputSchema={
                    "type": "object",
                    "properties": {
                        "namespace": {
                            "type": "string",
                            "description": "The namespace to retrieve events from"
                        },
                        "limit": {
                            "type": "integer",
                            "description": "Maximum number of events to return (default 20)",
                            "default": 20
                        }
                    },
                    "required": ["namespace"]
                }
            )
        ]

    @server.call_tool()
    async def call_tool(name: str, arguments: dict):
        log_tool_call(tool=name, inputs=arguments)  # Always audit first

        try:
            if name == "get_pod_status":
                result = k8s.get_pod_status(
                    namespace=arguments["namespace"],
                    pod_name=arguments["pod_name"]
                )
            elif name == "list_failing_pods":
                result = k8s.list_failing_pods(
                    namespace=arguments.get("namespace")
                )
            elif name == "get_recent_events":
                result = k8s.get_recent_events(
                    namespace=arguments["namespace"],
                    limit=arguments.get("limit", 20)
                )
            else:
                return [TextContent(type="text", text=f"Unknown tool: {name}")]

            return [TextContent(type="text", text=json.dumps(result, indent=2))]

        except Exception as e:
            logger.error(f"Tool {name} failed: {str(e)}")
            return [TextContent(type="text", text=f"Tool execution failed: {str(e)}")]

Layer 4 — Transport and Auth

MCP supports two transport modes:

stdio — the server runs as a subprocess of the AI client. Simple, local, no network exposure. Right for developer workstations and local tooling.

HTTP with SSE (Server-Sent Events) — the server runs as a persistent service, reachable over the network. Required for shared team tooling, remote access, and running inside a cluster. For production deployments, SSE transport with mutual TLS (mTLS) is the hardened path; API key authentication is acceptable for internal cluster traffic with network policy controls in place.

For a platform team MCP server running on Kubernetes:

# main.py
import asyncio
import logging
from mcp.server import Server
from mcp.server.sse import SseServerTransport
from starlette.applications import Starlette
from starlette.routing import Route
from starlette.middleware import Middleware
from starlette.middleware.base import BaseHTTPMiddleware
from tools import register_tools

logging.basicConfig(level=logging.INFO)

server = Server("platform-mcp")
register_tools(server)


class APIKeyMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        api_key = request.headers.get("X-API-Key")
        if api_key != EXPECTED_API_KEY:  # Load from env, not hardcoded
            from starlette.responses import JSONResponse
            return JSONResponse({"error": "Unauthorised"}, status_code=401)
        return await call_next(request)


transport = SseServerTransport("/messages")

async def handle_sse(request):
    async with transport.connect_sse(
        request.scope, request.receive, request._send
    ) as streams:
        await server.run(
            streams[0], streams[1], server.create_initialization_options()
        )

app = Starlette(
    routes=[Route("/sse", endpoint=handle_sse)],
    middleware=[Middleware(APIKeyMiddleware)]
)

Kubernetes Deployment

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: platform-mcp-server
  namespace: platform-tools
spec:
  replicas: 1
  selector:
    matchLabels:
      app: platform-mcp-server
  template:
    metadata:
      labels:
        app: platform-mcp-server
    spec:
      serviceAccountName: platform-mcp-sa  # Read-only SA — see RBAC below
      containers:
        - name: mcp-server
          image: your-registry/platform-mcp:latest
          ports:
            - containerPort: 8080
          env:
            - name: MCP_API_KEY
              valueFrom:
                secretKeyRef:
                  name: platform-mcp-secrets
                  key: api-key
---
# k8s/rbac.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: platform-mcp-reader
rules:
  - apiGroups: [""]
    resources: ["pods", "events", "namespaces", "nodes"]
    verbs: ["get", "list", "watch"]   # Read-only — no create, update, delete
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets"]
    verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: platform-mcp-reader-binding
subjects:
  - kind: ServiceAccount
    name: platform-mcp-sa
    namespace: platform-tools
roleRef:
  kind: ClusterRole
  name: platform-mcp-reader
  apiGroup: rbac.authorization.k8s.io

The RBAC configuration enforces the governance constraint at the Kubernetes level — not just in application code. Even if a bug in the tool definitions allowed a write operation to reach the Kubernetes client, the service account has no permission to execute it.

Defence in depth. Not one gate — two.

What This Unlocks

With a platform MCP server running, a Claude-powered assistant can handle questions like these using live cluster data:

"What pods are failing in the payments namespace right now?" → calls list_failing_pods
"Why did the checkout service restart three times this morning?" → calls get_pod_status + get_recent_events
"Is there anything unusual happening across the cluster before I deploy?" → calls list_failing_pods across all namespaces

This is the coordination layer the opening story was pointing at. In logistics, the fix for disconnected systems was never better dashboards — it was a shared integration layer that let every system speak to every other system through a common protocol. MCP is that layer for AI and infrastructure.

Combined with the RAG pipeline from Article 02, the same assistant can cross-reference live cluster state against your runbooks — returning answers grounded in documentation and informed by current reality simultaneously. That is the operational use case MCP was built for.

What to Build Next

The server in this article covers Kubernetes read operations. The natural extensions, covered in the GitHub repo, are:

Prometheus integration — add a get_metrics tool that queries PromQL (Prometheus Query Language) and returns current error rates and latency percentiles
PagerDuty integration — add get_active_incidents and get_recent_alerts tools
Write operations with human approval — a restart_pod tool that creates a Jira ticket and waits for human sign-off before executing; this is the governance pattern that makes agentic write operations safe in production

The write operation pattern — where the AI prepares an action, a human approves it, and the MCP server executes — is covered in Article 05 of this series.

What's Next

Article 04 — Prompt Versioning in Production: Treat Prompts Like Infrastructure Artifacts

System prompts are configuration. Changing them without version control, testing, or rollback strategy is the same mistake engineers made with infrastructure before Terraform existed. Next: how to version, test, and deploy prompts with the same discipline you apply to everything else in your stack.

Top comments (1)

XYG-LUNA • Jul 14

Solid architecture overview for platform teams adopting MCP. The live infrastructure access pattern you describe is powerful but raises interesting questions about resource governance.

We've been building MCP servers with a payment layer (HTTP 402 + Alipay) and found that platform teams also need usage accounting — not just for cost recovery, but for rate limiting and fair sharing across teams. When AI agents have live infra access, uncontrolled calls can cascade into real operational incidents.

One pattern that worked well: treating each MCP tool call like a micro-transaction with built-in audit trails. The 402 response carries not just payment info but also usage metadata that feeds back into the platform's observability stack.

Curious how you handle the blast radius when an agent makes an unexpected infrastructure call?