joshyfruit

Posted on Apr 30

From IT Manager to AI Engineer: Build a Cloud Infrastructure Agent with Cloud Run's Managed MCP Server

#cloudnextchallenge

Google Cloud NEXT '26 Challenge Submission

Google Cloud NEXT '26 dropped over 260 announcements. Most headlines went to Gemini 3.1, TPU v8, and the Agentic Data Cloud. But buried in the Cloud Run section was something that made me stop scrolling — a fully managed remote MCP server, now generally available. If you manage infrastructure AND build AI systems, this one's for you.

Why This Hit Different for Me

I wear two hats. One day I'm SSH-ing into VMs, reviewing Cloud Run deployments, and making sure services don't fall over at 2am. The next I'm wiring up LLM agents, building tool pipelines, and figuring out why my context window blew up. These two worlds have always felt weirdly disconnected.

MCP (Model Context Protocol) on Cloud Run is the first thing I've seen that genuinely bridges them. Instead of hand-crafting API clients for every infra operation your agent needs to do, you point it at a managed MCP server — and suddenly your AI agent can deploy services, read logs, and inspect health metrics like a junior SRE who never sleeps.

Let's build it.

What We're Building

By the end of this walkthrough you'll have:

The built-in Cloud Run MCP server wired up to Gemini CLI so you can manage deployments via natural language
A custom MCP server running on Cloud Run that exposes infrastructure health tools
An ADK agent that combines both to answer questions like "Which of my services had errors in the last hour?"

Here's the full picture of what we're assembling:

Prerequisites

A Google Cloud project with billing enabled
gcloud CLI installed and authenticated
Python 3.10+
Docker (for building the custom server)
Gemini CLI installed

Set your project up front so every command just works:

export PROJECT_ID="my-project-id"
export REGION="us-central1"
gcloud config set project $PROJECT_ID

IAM roles you'll need on your account:

roles/run.admin
roles/iam.serviceAccountUser
roles/artifactregistry.writer

Part 1 — Use the Built-in Cloud Run MCP Server

Google now hosts a fully managed MCP server at https://run.googleapis.com/mcp. It exposes tools like list_services, get_service, deploy_service_from_image, and deploy_service_from_archive — no setup required on your end.

Step 1: Authenticate

The managed endpoint uses your Google Cloud identity. Make sure your ADC (Application Default Credentials) are set:

gcloud auth application-default login

Step 2: Wire it to Gemini CLI

Open (or create) ~/.gemini/settings.json and add:

{
  "mcpServers": {
    "cloud-run": {
      "url": "https://run.googleapis.com/mcp",
      "transport": "http"
    }
  }
}

Step 3: Talk to Your Infrastructure

Fire up Gemini CLI and try this:

gemini

> List all my Cloud Run services in us-central1

You'll see it call list_services under the hood and return a clean summary of every service, its URL, and status. No gcloud run services list --region us-central1 --format=json | jq ... gymnastics required.

Try something bolder:

> Deploy the image us-docker.pkg.dev/cloudrun/container/hello to a new service
  called "hello-from-agent" in us-central1

It calls deploy_service_from_image, fills in the parameters, and your service is live. That's infrastructure-as-conversation, and honestly it feels a little magical the first time.

Here's what a full agent session looks like — listing services, spotting errors, and triggering a hotfix deploy all from one prompt chain:

IT Specialist note: The managed endpoint enforces Cloud IAM on every call. If your credentials don't have run.services.create, the deploy fails cleanly with a permission error — not a hallucinated success. That's the kind of guardrail you need when agents touch production infra.

Part 2 — Build & Deploy Your Own Custom MCP Server

The built-in server covers Cloud Run operations. But what about your custom health checks, log analysis, or cross-service diagnostics? That's where you roll your own.

We'll build an Infra Health MCP Server with three tools:

list_services — wraps Cloud Run's Admin API
get_service_error_rate — queries Cloud Logging for 5xx errors
check_service_health — returns a simple green/yellow/red status

Step 1: Create the Project

mkdir infra-health-mcp && cd infra-health-mcp

Create pyproject.toml:

[project]
name = "infra-health-mcp"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
    "fastmcp>=2.0.0",
    "google-cloud-run>=0.10.0",
    "google-cloud-logging>=3.0.0",
]

Step 2: Write the MCP Server

Create server.py:

import asyncio
import json
import logging
import os
from datetime import datetime, timedelta, timezone

from fastmcp import FastMCP
from google.cloud import run_v2, logging as cloud_logging

logger = logging.getLogger(__name__)
logging.basicConfig(format="[%(levelname)s]: %(message)s", level=logging.INFO)

mcp = FastMCP("Infra Health MCP Server")
run_client = run_v2.ServicesClient()
log_client = cloud_logging.Client()


@mcp.tool()
def list_services(project_id: str, region: str) -> str:
    """List all Cloud Run services with their status and URLs.

    Args:
        project_id: Google Cloud project ID
        region: GCP region (e.g. us-central1)

    Returns:
        JSON list of services with name, URL, and last deployment time
    """
    logger.info(f"Listing services in {project_id}/{region}")
    parent = f"projects/{project_id}/locations/{region}"
    services = []
    for svc in run_client.list_services(parent=parent):
        services.append({
            "name": svc.name.split("/")[-1],
            "uri": svc.uri,
            "last_deployed": svc.update_time.isoformat() if svc.update_time else "unknown",
            "ready": svc.terminal_condition.state.name if svc.terminal_condition else "unknown",
        })
    return json.dumps(services, indent=2)


@mcp.tool()
def get_service_error_rate(project_id: str, region: str, service_name: str, minutes: int = 60) -> str:
    """Get the 5xx error count for a Cloud Run service over a time window.

    Args:
        project_id: Google Cloud project ID
        region: GCP region
        service_name: Name of the Cloud Run service
        minutes: How many minutes back to look (default 60)

    Returns:
        JSON with total requests, error count, and error rate percentage
    """
    logger.info(f"Checking error rate for {service_name} over last {minutes} minutes")
    since = datetime.now(timezone.utc) - timedelta(minutes=minutes)

    filter_str = (
        f'resource.type="cloud_run_revision" '
        f'resource.labels.service_name="{service_name}" '
        f'resource.labels.location="{region}" '
        f'httpRequest.status>=500 '
        f'timestamp>="{since.isoformat()}"'
    )

    error_count = sum(1 for _ in log_client.list_entries(
        filter_=filter_str,
        projects=[project_id],
    ))

    return json.dumps({
        "service": service_name,
        "window_minutes": minutes,
        "error_count_5xx": error_count,
        "checked_at": datetime.now(timezone.utc).isoformat(),
    })


@mcp.tool()
def check_service_health(project_id: str, region: str, service_name: str) -> str:
    """Return a simple health status for a Cloud Run service.

    Args:
        project_id: Google Cloud project ID
        region: GCP region
        service_name: Name of the Cloud Run service

    Returns:
        JSON with status (green/yellow/red) and a human-readable reason
    """
    logger.info(f"Health check for {service_name}")
    name = f"projects/{project_id}/locations/{region}/services/{service_name}"
    svc = run_client.get_service(name=name)

    state = svc.terminal_condition.state.name if svc.terminal_condition else "UNKNOWN"

    if state == "CONDITION_SUCCEEDED":
        status, reason = "green", "Service is running and healthy"
    elif state in ("CONDITION_FAILED", "CONTAINER_FAILED"):
        status, reason = "red", f"Service is in a failed state: {state}"
    else:
        status, reason = "yellow", f"Service state is uncertain: {state}"

    return json.dumps({"service": service_name, "status": status, "reason": reason})


if __name__ == "__main__":
    port = int(os.getenv("PORT", 8080))
    logger.info(f"Infra Health MCP server starting on port {port}")
    asyncio.run(
        mcp.run_async(
            transport="streamable-http",
            host="0.0.0.0",
            port=port,
        )
    )

Why Streamable HTTP? Cloud Run is stateless and scales horizontally. The older SSE transport needed persistent connections — a terrible fit for serverless. Streamable HTTP uses plain POST/GET, so every request is independent. Your MCP server scales to zero between calls and you only pay when it's actually doing work.

Step 3: Containerize It

Create Dockerfile:

FROM python:3.13-slim

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

COPY . /app
WORKDIR /app

ENV PYTHONUNBUFFERED=1

RUN uv sync

EXPOSE $PORT

CMD ["uv", "run", "server.py"]

Step 4: Create a Service Account

Your MCP server needs permission to read Cloud Run and Cloud Logging:

gcloud iam service-accounts create infra-health-sa \
  --display-name="Infra Health MCP Server"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/run.viewer"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
  --role="roles/logging.viewer"

Step 5: Build & Deploy

# Create Artifact Registry repo
gcloud artifacts repositories create mcp-servers \
  --repository-format=docker \
  --location=$REGION

# Build and push
gcloud builds submit \
  --tag "${REGION}-docker.pkg.dev/${PROJECT_ID}/mcp-servers/infra-health:latest"

# Deploy
gcloud run deploy infra-health-mcp \
  --image "${REGION}-docker.pkg.dev/${PROJECT_ID}/mcp-servers/infra-health:latest" \
  --region=$REGION \
  --no-allow-unauthenticated \
  --memory=512Mi \
  --cpu=1 \
  --concurrency=80 \
  --timeout=120 \
  --service-account="infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com"

Cloud Run gives you a URL like https://infra-health-mcp-<hash>-uc.a.run.app. Grab it:

export MCP_URL=$(gcloud run services describe infra-health-mcp \
  --region=$REGION \
  --format='value(status.url)')

Step 6: Test It Locally via the Cloud Run Proxy

Don't expose your MCP server to the internet directly. Use the proxy to test with your local credentials:

gcloud run services proxy infra-health-mcp --region=$REGION --port=3000

Now hit it at http://localhost:3000 — your credentials are injected automatically, no token management needed.

Part 3 — Wire Both Servers into an ADK Agent

Now the fun part. We'll build an agent that uses both MCP servers — the built-in Cloud Run one and your custom infra health server — to answer infrastructure questions like a seasoned SRE.

Install ADK

pip install google-adk

Create the Agent

Create agent.py:

import asyncio
from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, StreamableHTTPConnectionParams
import os

MCP_URL = os.environ["MCP_URL"]  # your infra-health-mcp URL


async def main():
    # Connect to both MCP servers
    cloud_run_tools = MCPToolset(
        StreamableHTTPConnectionParams(url="https://run.googleapis.com/mcp")
    )
    infra_health_tools = MCPToolset(
        StreamableHTTPConnectionParams(url=MCP_URL)
    )

    agent = LlmAgent(
        name="infra-agent",
        model="gemini-2.0-flash",
        instruction=(
            "You are an infrastructure operations assistant. "
            "You have access to Cloud Run management tools and infrastructure health tools. "
            "When asked about service health or errors, always check both the service status "
            "and recent error rates before answering. Be concise and actionable."
        ),
        tools=[cloud_run_tools, infra_health_tools],
    )

    # Example queries — swap these for interactive input
    queries = [
        "List all my Cloud Run services in us-central1",
        "Which services had 5xx errors in the last hour?",
        "Give me a health summary for all services",
    ]

    for query in queries:
        print(f"\n>>> {query}")
        response = await agent.run(query)
        print(response.text)


if __name__ == "__main__":
    asyncio.run(main())

Run it:

MCP_URL=$MCP_URL python agent.py

You'll see the agent autonomously call list_services, then loop over each service calling get_service_error_rate and check_service_health — building a full infra health picture without you writing a single orchestration loop.

This is the moment it clicks. You didn't write "for service in services: check health". The agent reasoned its way to that pattern. Your job was defining the tools. That's a genuine shift in how we build infra tooling.

Security: Don't Skip This Section

Agents with infrastructure permissions need real guardrails. Here's what I'd put in place before letting this near production:

1. Scope service account permissions tightly. The infra-health-sa has read-only roles. If you want an agent that can also deploy, create a separate service account for write operations and require explicit approval flows before those tools fire.

2. Use IAM deny policies for the write MCP tools. You can explicitly deny run.services.create on specific service accounts at the project level — useful if you only want agents to have deploy access in staging, not prod.

3. Enable Model Armor. Google's Model Armor sits in front of MCP calls and blocks prompt injection attempts, malicious URIs, and unsafe content before they reach your tools. Enable it in the Google Cloud console under AI Safety.

4. Cloud Audit Logs are your friend. Every MCP tool call made through Google-managed servers is logged automatically. Set up a log-based alert for any deploy_service_from_image calls from service accounts that shouldn't be deploying.

# Example: alert on unexpected deploys
gcloud logging metrics create unexpected-agent-deploy \
  --description="MCP deploy calls from unexpected accounts" \
  --log-filter='protoPayload.methodName="google.cloud.run.v2.Services.CreateService"'

My Honest Take

What impressed me most at NEXT '26 isn't any single feature — it's that Google is treating MCP as a first-class citizen across the entire platform. BigQuery has a managed MCP server. Cloud Logging has one. Cloud SQL is getting one. This is becoming the standard interface layer between AI agents and cloud services.

For IT specialists and infrastructure engineers, this is actually exciting rather than threatening. The tedious parts of infra ops — writing one-off scripts to list resources, cross-referencing logs with deployment times, checking health across 20 services — are exactly what agents are good at. You shift from doing the repetitive tasks to designing the tools that do them.

The rough edges? Auth setup for remote MCP servers is still fiddly, especially in multi-project setups. The ADK toolset documentation is still catching up to the pace of announcements. And "fully managed" doesn't yet mean "zero config" — you still need to wire up IAM carefully.

But the direction is clear, and the foundation is solid. The infra engineer who learns to build good MCP servers is going to be unreasonably productive over the next few years.

What's Next

Explore the official Cloud Run MCP docs
Check out the ADK MCP codelab
Try adding a deploy_service tool to your custom server — and see how differently it feels when an agent handles rollback logic

Drop a comment if you build something cool with this. I'm especially curious what domain-specific MCP servers people come up with.

Built and tested as part of the Google Cloud NEXT '26 Writing Challenge. All code examples use placeholder project IDs — swap in your own before running.

Top comments (1)

Rahul Joshi • Apr 30

This is a brilliant bridge between traditional infrastructure and the agentic future. Seeing how Managed MCP can turn Cloud Run into a system of action really opens up new possibilities for automated cloud governance.