DEV Community

Cover image for From IT Manager to AI Engineer: Build a Cloud Infrastructure Agent with Cloud Run's Managed MCP Server
joshyfruit
joshyfruit

Posted on

From IT Manager to AI Engineer: Build a Cloud Infrastructure Agent with Cloud Run's Managed MCP Server

Google Cloud NEXT '26 Challenge Submission

Google Cloud NEXT '26 dropped over 260 announcements. Most headlines went to Gemini 3.1, TPU v8, and the Agentic Data Cloud. But buried in the Cloud Run section was something that made me stop scrolling — a fully managed remote MCP server, now generally available. If you manage infrastructure AND build AI systems, this one's for you.


Why This Hit Different for Me

I wear two hats. One day I'm SSH-ing into VMs, reviewing Cloud Run deployments, and making sure services don't fall over at 2am. The next I'm wiring up LLM agents, building tool pipelines, and figuring out why my context window blew up. These two worlds have always felt weirdly disconnected.

MCP (Model Context Protocol) on Cloud Run is the first thing I've seen that genuinely bridges them. Instead of hand-crafting API clients for every infra operation your agent needs to do, you point it at a managed MCP server — and suddenly your AI agent can deploy services, read logs, and inspect health metrics like a junior SRE who never sleeps.

Let's build it.


What We're Building

By the end of this walkthrough you'll have:

  1. The built-in Cloud Run MCP server wired up to Gemini CLI so you can manage deployments via natural language
  2. A custom MCP server running on Cloud Run that exposes infrastructure health tools
  3. An ADK agent that combines both to answer questions like "Which of my services had errors in the last hour?"

Here's the full picture of what we're assembling:

Architecture diagram showing the Cloud Run Managed MCP Server setup — Gemini CLI and ADK Agent connect through Model Armor and Cloud IAM to both the Google Managed MCP Server (run.googleapis.com/mcp) and a custom infra-health-mcp service on Cloud Run, which reads from Cloud Logging and Artifact Registry


Prerequisites

  • A Google Cloud project with billing enabled
  • gcloud CLI installed and authenticated
  • Python 3.10+
  • Docker (for building the custom server)
  • Gemini CLI installed

Set your project up front so every command just works:

export PROJECT_ID="my-project-id"
export REGION="us-central1"
gcloud config set project $PROJECT_ID
Enter fullscreen mode Exit fullscreen mode

IAM roles you'll need on your account:

  • roles/run.admin
  • roles/iam.serviceAccountUser
  • roles/artifactregistry.writer

Part 1 — Use the Built-in Cloud Run MCP Server

Google now hosts a fully managed MCP server at https://run.googleapis.com/mcp. It exposes tools like list_services, get_service, deploy_service_from_image, and deploy_service_from_archive — no setup required on your end.

Step 1: Authenticate

The managed endpoint uses your Google Cloud identity. Make sure your ADC (Application Default Credentials) are set:

gcloud auth application-default login
Enter fullscreen mode Exit fullscreen mode

Step 2: Wire it to Gemini CLI

Open (or create) ~/.gemini/settings.json and add:

{
  "mcpServers": {
    "cloud-run": {
      "url": "https://run.googleapis.com/mcp",
      "transport": "http"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Talk to Your Infrastructure

Fire up Gemini CLI and try this:

gemini

> List all my Cloud Run services in us-central1
Enter fullscreen mode Exit fullscreen mode

You'll see it call list_services under the hood and return a clean summary of every service, its URL, and status. No gcloud run services list --region us-central1 --format=json | jq ... gymnastics required.

Try something bolder:

> Deploy the image us-docker.pkg.dev/cloudrun/container/hello to a new service
  called "hello-from-agent" in us-central1
Enter fullscreen mode Exit fullscreen mode

It calls deploy_service_from_image, fills in the parameters, and your service is live. That's infrastructure-as-conversation, and honestly it feels a little magical the first time.

Here's what a full agent session looks like — listing services, spotting errors, and triggering a hotfix deploy all from one prompt chain:

Terminal screenshot showing a Gemini CLI session: the agent calls infra-health-mcp to list services, checks error rates in parallel across three services, surfaces a 5xx spike on api-gateway, then deploys a hotfix image via the Cloud Run managed MCP server

IT Specialist note: The managed endpoint enforces Cloud IAM on every call. If your credentials don't have run.services.create, the deploy fails cleanly with a permission error — not a hallucinated success. That's the kind of guardrail you need when agents touch production infra.


Part 2 — Build & Deploy Your Own Custom MCP Server

The built-in server covers Cloud Run operations. But what about your custom health checks, log analysis, or cross-service diagnostics? That's where you roll your own.

We'll build an Infra Health MCP Server with three tools:

  • list_services — wraps Cloud Run's Admin API
  • get_service_error_rate — queries Cloud Logging for 5xx errors
  • check_service_health — returns a simple green/yellow/red status

Step 1: Create the Project

mkdir infra-health-mcp && cd infra-health-mcp
Enter fullscreen mode Exit fullscreen mode

Create pyproject.toml:

[project]
name = "infra-health-mcp"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
    "fastmcp>=2.0.0",
    "google-cloud-run>=0.10.0",
    "google-cloud-logging>=3.0.0",
]
Enter fullscreen mode Exit fullscreen mode

Step 2: Write the MCP Server

Create server.py:

import asyncio
import json
import logging
import os
from datetime import datetime, timedelta, timezone

from fastmcp import FastMCP
from google.cloud import run_v2, logging as cloud_logging

logger = logging.getLogger(__name__)
logging.basicConfig(format="[%(levelname)s]: %(message)s", level=logging.INFO)

mcp = FastMCP("Infra Health MCP Server")
run_client = run_v2.ServicesClient()
log_client = cloud_logging.Client()


@mcp.tool()
def list_services(project_id: str, region: str) -> str:
    """List all Cloud Run services with their status and URLs.

    Args:
        project_id: Google Cloud project ID
        region: GCP region (e.g. us-central1)

    Returns:
        JSON list of services with name, URL, and last deployment time
    """
    logger.info(f"Listing services in {project_id}/{region}")
    parent = f"projects/{project_id}/locations/{region}"
    services = []
    for svc in run_client.list_services(parent=parent):
        services.append({
            "name": svc.name.split("/")[-1],
            "uri": svc.uri,
            "last_deployed": svc.update_time.isoformat() if svc.update_time else "unknown",
            "ready": svc.terminal_condition.state.name if svc.terminal_condition else "unknown",
        })
    return json.dumps(services, indent=2)


@mcp.tool()
def get_service_error_rate(project_id: str, region: str, service_name: str, minutes: int = 60) -> str:
    """Get the 5xx error count for a Cloud Run service over a time window.

    Args:
        project_id: Google Cloud project ID
        region: GCP region
        service_name: Name of the Cloud Run service
        minutes: How many minutes back to look (default 60)

    Returns:
        JSON with total requests, error count, and error rate percentage
    """
    logger.info(f"Checking error rate for {service_name} over last {minutes} minutes")
    since = datetime.now(timezone.utc) - timedelta(minutes=minutes)

    filter_str = (
        f'resource.type="cloud_run_revision" '
        f'resource.labels.service_name="{service_name}" '
        f'resource.labels.location="{region}" '
        f'httpRequest.status>=500 '
        f'timestamp>="{since.isoformat()}"'
    )

    error_count = sum(1 for _ in log_client.list_entries(
        filter_=filter_str,
        projects=[project_id],
    ))

    return json.dumps({
        "service": service_name,
        "window_minutes": minutes,
        "error_count_5xx": error_count,
        "checked_at": datetime.now(timezone.utc).isoformat(),
    })


@mcp.tool()
def check_service_health(project_id: str, region: str, service_name: str) -> str:
    """Return a simple health status for a Cloud Run service.

    Args:
        project_id: Google Cloud project ID
        region: GCP region
        service_name: Name of the Cloud Run service

    Returns:
        JSON with status (green/yellow/red) and a human-readable reason
    """
    logger.info(f"Health check for {service_name}")
    name = f"projects/{project_id}/locations/{region}/services/{service_name}"
    svc = run_client.get_service(name=name)

    state = svc.terminal_condition.state.name if svc.terminal_condition else "UNKNOWN"

    if state == "CONDITION_SUCCEEDED":
        status, reason = "green", "Service is running and healthy"
    elif state in ("CONDITION_FAILED", "CONTAINER_FAILED"):
        status, reason = "red", f"Service is in a failed state: {state}"
    else:
        status, reason = "yellow", f"Service state is uncertain: {state}"

    return json.dumps({"service": service_name, "status": status, "reason": reason})


if __name__ == "__main__":
    port = int(os.getenv("PORT", 8080))
    logger.info(f"Infra Health MCP server starting on port {port}")
    asyncio.run(
        mcp.run_async(
            transport="streamable-http",
            host="0.0.0.0",
            port=port,
        )
    )
Enter fullscreen mode Exit fullscreen mode

Why Streamable HTTP? Cloud Run is stateless and scales horizontally. The older SSE transport needed persistent connections — a terrible fit for serverless. Streamable HTTP uses plain POST/GET, so every request is independent. Your MCP server scales to zero between calls and you only pay when it's actually doing work.

Step 3: Containerize It

Create Dockerfile:

FROM python:3.13-slim

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

COPY . /app
WORKDIR /app

ENV PYTHONUNBUFFERED=1

RUN uv sync

EXPOSE $PORT

CMD ["uv", "run", "server.py"]
Enter fullscreen mode Exit fullscreen mode

Step 4: Create a Service Account

Your MCP server needs permission to read Cloud Run and Cloud Logging:

gcloud iam service-accounts create infra-health-sa \
  --display-name="Infra Health MCP Server"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/run.viewer"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/logging.viewer"
Enter fullscreen mode Exit fullscreen mode

Step 5: Build & Deploy

# Create Artifact Registry repo
gcloud artifacts repositories create mcp-servers \
  --repository-format=docker \
  --location=$REGION

# Build and push
gcloud builds submit \
  --tag "${REGION}-docker.pkg.dev/${PROJECT_ID}/mcp-servers/infra-health:latest"

# Deploy
gcloud run deploy infra-health-mcp \
  --image "${REGION}-docker.pkg.dev/${PROJECT_ID}/mcp-servers/infra-health:latest" \
  --region=$REGION \
  --no-allow-unauthenticated \
  --memory=512Mi \
  --cpu=1 \
  --concurrency=80 \
  --timeout=120 \
  --service-account="infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com"
Enter fullscreen mode Exit fullscreen mode

Cloud Run gives you a URL like https://infra-health-mcp-<hash>-uc.a.run.app. Grab it:

export MCP_URL=$(gcloud run services describe infra-health-mcp \
  --region=$REGION \
  --format='value(status.url)')
Enter fullscreen mode Exit fullscreen mode

Step 6: Test It Locally via the Cloud Run Proxy

Don't expose your MCP server to the internet directly. Use the proxy to test with your local credentials:

gcloud run services proxy infra-health-mcp --region=$REGION --port=3000
Enter fullscreen mode Exit fullscreen mode

Now hit it at http://localhost:3000 — your credentials are injected automatically, no token management needed.


Part 3 — Wire Both Servers into an ADK Agent

Now the fun part. We'll build an agent that uses both MCP servers — the built-in Cloud Run one and your custom infra health server — to answer infrastructure questions like a seasoned SRE.

Install ADK

pip install google-adk
Enter fullscreen mode Exit fullscreen mode

Create the Agent

Create agent.py:

import asyncio
from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, StreamableHTTPConnectionParams
import os

MCP_URL = os.environ["MCP_URL"]  # your infra-health-mcp URL


async def main():
    # Connect to both MCP servers
    cloud_run_tools = MCPToolset(
        StreamableHTTPConnectionParams(url="https://run.googleapis.com/mcp")
    )
    infra_health_tools = MCPToolset(
        StreamableHTTPConnectionParams(url=MCP_URL)
    )

    agent = LlmAgent(
        name="infra-agent",
        model="gemini-2.0-flash",
        instruction=(
            "You are an infrastructure operations assistant. "
            "You have access to Cloud Run management tools and infrastructure health tools. "
            "When asked about service health or errors, always check both the service status "
            "and recent error rates before answering. Be concise and actionable."
        ),
        tools=[cloud_run_tools, infra_health_tools],
    )

    # Example queries — swap these for interactive input
    queries = [
        "List all my Cloud Run services in us-central1",
        "Which services had 5xx errors in the last hour?",
        "Give me a health summary for all services",
    ]

    for query in queries:
        print(f"\n>>> {query}")
        response = await agent.run(query)
        print(response.text)


if __name__ == "__main__":
    asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Run it:

MCP_URL=$MCP_URL python agent.py
Enter fullscreen mode Exit fullscreen mode

You'll see the agent autonomously call list_services, then loop over each service calling get_service_error_rate and check_service_health — building a full infra health picture without you writing a single orchestration loop.

This is the moment it clicks. You didn't write "for service in services: check health". The agent reasoned its way to that pattern. Your job was defining the tools. That's a genuine shift in how we build infra tooling.


Security: Don't Skip This Section

Agents with infrastructure permissions need real guardrails. Here's what I'd put in place before letting this near production:

1. Scope service account permissions tightly. The infra-health-sa has read-only roles. If you want an agent that can also deploy, create a separate service account for write operations and require explicit approval flows before those tools fire.

2. Use IAM deny policies for the write MCP tools. You can explicitly deny run.services.create on specific service accounts at the project level — useful if you only want agents to have deploy access in staging, not prod.

3. Enable Model Armor. Google's Model Armor sits in front of MCP calls and blocks prompt injection attempts, malicious URIs, and unsafe content before they reach your tools. Enable it in the Google Cloud console under AI Safety.

4. Cloud Audit Logs are your friend. Every MCP tool call made through Google-managed servers is logged automatically. Set up a log-based alert for any deploy_service_from_image calls from service accounts that shouldn't be deploying.

# Example: alert on unexpected deploys
gcloud logging metrics create unexpected-agent-deploy \
  --description="MCP deploy calls from unexpected accounts" \
  --log-filter='protoPayload.methodName="google.cloud.run.v2.Services.CreateService"'
Enter fullscreen mode Exit fullscreen mode

My Honest Take

What impressed me most at NEXT '26 isn't any single feature — it's that Google is treating MCP as a first-class citizen across the entire platform. BigQuery has a managed MCP server. Cloud Logging has one. Cloud SQL is getting one. This is becoming the standard interface layer between AI agents and cloud services.

For IT specialists and infrastructure engineers, this is actually exciting rather than threatening. The tedious parts of infra ops — writing one-off scripts to list resources, cross-referencing logs with deployment times, checking health across 20 services — are exactly what agents are good at. You shift from doing the repetitive tasks to designing the tools that do them.

The rough edges? Auth setup for remote MCP servers is still fiddly, especially in multi-project setups. The ADK toolset documentation is still catching up to the pace of announcements. And "fully managed" doesn't yet mean "zero config" — you still need to wire up IAM carefully.

But the direction is clear, and the foundation is solid. The infra engineer who learns to build good MCP servers is going to be unreasonably productive over the next few years.


What's Next

Drop a comment if you build something cool with this. I'm especially curious what domain-specific MCP servers people come up with.


Built and tested as part of the Google Cloud NEXT '26 Writing Challenge. All code examples use placeholder project IDs — swap in your own before running.

Top comments (0)