Google Cloud NEXT '26 dropped over 260 announcements. Most headlines went to Gemini 3.1, TPU v8, and the Agentic Data Cloud. But buried in the Cloud Run section was something that made me stop scrolling — a fully managed remote MCP server, now generally available. If you manage infrastructure AND build AI systems, this one's for you.
Why This Hit Different for Me
I wear two hats. One day I'm SSH-ing into VMs, reviewing Cloud Run deployments, and making sure services don't fall over at 2am. The next I'm wiring up LLM agents, building tool pipelines, and figuring out why my context window blew up. These two worlds have always felt weirdly disconnected.
MCP (Model Context Protocol) on Cloud Run is the first thing I've seen that genuinely bridges them. Instead of hand-crafting API clients for every infra operation your agent needs to do, you point it at a managed MCP server — and suddenly your AI agent can deploy services, read logs, and inspect health metrics like a junior SRE who never sleeps.
Let's build it.
What We're Building
By the end of this walkthrough you'll have:
- The built-in Cloud Run MCP server wired up to Gemini CLI so you can manage deployments via natural language
- A custom MCP server running on Cloud Run that exposes infrastructure health tools
- An ADK agent that combines both to answer questions like "Which of my services had errors in the last hour?"
Here's the full picture of what we're assembling:
Prerequisites
- A Google Cloud project with billing enabled
-
gcloudCLI installed and authenticated - Python 3.10+
- Docker (for building the custom server)
- Gemini CLI installed
Set your project up front so every command just works:
export PROJECT_ID="my-project-id"
export REGION="us-central1"
gcloud config set project $PROJECT_ID
IAM roles you'll need on your account:
roles/run.adminroles/iam.serviceAccountUserroles/artifactregistry.writer
Part 1 — Use the Built-in Cloud Run MCP Server
Google now hosts a fully managed MCP server at https://run.googleapis.com/mcp. It exposes tools like list_services, get_service, deploy_service_from_image, and deploy_service_from_archive — no setup required on your end.
Step 1: Authenticate
The managed endpoint uses your Google Cloud identity. Make sure your ADC (Application Default Credentials) are set:
gcloud auth application-default login
Step 2: Wire it to Gemini CLI
Open (or create) ~/.gemini/settings.json and add:
{
"mcpServers": {
"cloud-run": {
"url": "https://run.googleapis.com/mcp",
"transport": "http"
}
}
}
Step 3: Talk to Your Infrastructure
Fire up Gemini CLI and try this:
gemini
> List all my Cloud Run services in us-central1
You'll see it call list_services under the hood and return a clean summary of every service, its URL, and status. No gcloud run services list --region us-central1 --format=json | jq ... gymnastics required.
Try something bolder:
> Deploy the image us-docker.pkg.dev/cloudrun/container/hello to a new service
called "hello-from-agent" in us-central1
It calls deploy_service_from_image, fills in the parameters, and your service is live. That's infrastructure-as-conversation, and honestly it feels a little magical the first time.
Here's what a full agent session looks like — listing services, spotting errors, and triggering a hotfix deploy all from one prompt chain:
IT Specialist note: The managed endpoint enforces Cloud IAM on every call. If your credentials don't have
run.services.create, the deploy fails cleanly with a permission error — not a hallucinated success. That's the kind of guardrail you need when agents touch production infra.
Part 2 — Build & Deploy Your Own Custom MCP Server
The built-in server covers Cloud Run operations. But what about your custom health checks, log analysis, or cross-service diagnostics? That's where you roll your own.
We'll build an Infra Health MCP Server with three tools:
-
list_services— wraps Cloud Run's Admin API -
get_service_error_rate— queries Cloud Logging for 5xx errors -
check_service_health— returns a simple green/yellow/red status
Step 1: Create the Project
mkdir infra-health-mcp && cd infra-health-mcp
Create pyproject.toml:
[project]
name = "infra-health-mcp"
version = "0.1.0"
requires-python = ">=3.10"
dependencies = [
"fastmcp>=2.0.0",
"google-cloud-run>=0.10.0",
"google-cloud-logging>=3.0.0",
]
Step 2: Write the MCP Server
Create server.py:
import asyncio
import json
import logging
import os
from datetime import datetime, timedelta, timezone
from fastmcp import FastMCP
from google.cloud import run_v2, logging as cloud_logging
logger = logging.getLogger(__name__)
logging.basicConfig(format="[%(levelname)s]: %(message)s", level=logging.INFO)
mcp = FastMCP("Infra Health MCP Server")
run_client = run_v2.ServicesClient()
log_client = cloud_logging.Client()
@mcp.tool()
def list_services(project_id: str, region: str) -> str:
"""List all Cloud Run services with their status and URLs.
Args:
project_id: Google Cloud project ID
region: GCP region (e.g. us-central1)
Returns:
JSON list of services with name, URL, and last deployment time
"""
logger.info(f"Listing services in {project_id}/{region}")
parent = f"projects/{project_id}/locations/{region}"
services = []
for svc in run_client.list_services(parent=parent):
services.append({
"name": svc.name.split("/")[-1],
"uri": svc.uri,
"last_deployed": svc.update_time.isoformat() if svc.update_time else "unknown",
"ready": svc.terminal_condition.state.name if svc.terminal_condition else "unknown",
})
return json.dumps(services, indent=2)
@mcp.tool()
def get_service_error_rate(project_id: str, region: str, service_name: str, minutes: int = 60) -> str:
"""Get the 5xx error count for a Cloud Run service over a time window.
Args:
project_id: Google Cloud project ID
region: GCP region
service_name: Name of the Cloud Run service
minutes: How many minutes back to look (default 60)
Returns:
JSON with total requests, error count, and error rate percentage
"""
logger.info(f"Checking error rate for {service_name} over last {minutes} minutes")
since = datetime.now(timezone.utc) - timedelta(minutes=minutes)
filter_str = (
f'resource.type="cloud_run_revision" '
f'resource.labels.service_name="{service_name}" '
f'resource.labels.location="{region}" '
f'httpRequest.status>=500 '
f'timestamp>="{since.isoformat()}"'
)
error_count = sum(1 for _ in log_client.list_entries(
filter_=filter_str,
projects=[project_id],
))
return json.dumps({
"service": service_name,
"window_minutes": minutes,
"error_count_5xx": error_count,
"checked_at": datetime.now(timezone.utc).isoformat(),
})
@mcp.tool()
def check_service_health(project_id: str, region: str, service_name: str) -> str:
"""Return a simple health status for a Cloud Run service.
Args:
project_id: Google Cloud project ID
region: GCP region
service_name: Name of the Cloud Run service
Returns:
JSON with status (green/yellow/red) and a human-readable reason
"""
logger.info(f"Health check for {service_name}")
name = f"projects/{project_id}/locations/{region}/services/{service_name}"
svc = run_client.get_service(name=name)
state = svc.terminal_condition.state.name if svc.terminal_condition else "UNKNOWN"
if state == "CONDITION_SUCCEEDED":
status, reason = "green", "Service is running and healthy"
elif state in ("CONDITION_FAILED", "CONTAINER_FAILED"):
status, reason = "red", f"Service is in a failed state: {state}"
else:
status, reason = "yellow", f"Service state is uncertain: {state}"
return json.dumps({"service": service_name, "status": status, "reason": reason})
if __name__ == "__main__":
port = int(os.getenv("PORT", 8080))
logger.info(f"Infra Health MCP server starting on port {port}")
asyncio.run(
mcp.run_async(
transport="streamable-http",
host="0.0.0.0",
port=port,
)
)
Why Streamable HTTP? Cloud Run is stateless and scales horizontally. The older SSE transport needed persistent connections — a terrible fit for serverless. Streamable HTTP uses plain POST/GET, so every request is independent. Your MCP server scales to zero between calls and you only pay when it's actually doing work.
Step 3: Containerize It
Create Dockerfile:
FROM python:3.13-slim
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/
COPY . /app
WORKDIR /app
ENV PYTHONUNBUFFERED=1
RUN uv sync
EXPOSE $PORT
CMD ["uv", "run", "server.py"]
Step 4: Create a Service Account
Your MCP server needs permission to read Cloud Run and Cloud Logging:
gcloud iam service-accounts create infra-health-sa \
--display-name="Infra Health MCP Server"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/run.viewer"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com" \
--role="roles/logging.viewer"
Step 5: Build & Deploy
# Create Artifact Registry repo
gcloud artifacts repositories create mcp-servers \
--repository-format=docker \
--location=$REGION
# Build and push
gcloud builds submit \
--tag "${REGION}-docker.pkg.dev/${PROJECT_ID}/mcp-servers/infra-health:latest"
# Deploy
gcloud run deploy infra-health-mcp \
--image "${REGION}-docker.pkg.dev/${PROJECT_ID}/mcp-servers/infra-health:latest" \
--region=$REGION \
--no-allow-unauthenticated \
--memory=512Mi \
--cpu=1 \
--concurrency=80 \
--timeout=120 \
--service-account="infra-health-sa@${PROJECT_ID}.iam.gserviceaccount.com"
Cloud Run gives you a URL like https://infra-health-mcp-<hash>-uc.a.run.app. Grab it:
export MCP_URL=$(gcloud run services describe infra-health-mcp \
--region=$REGION \
--format='value(status.url)')
Step 6: Test It Locally via the Cloud Run Proxy
Don't expose your MCP server to the internet directly. Use the proxy to test with your local credentials:
gcloud run services proxy infra-health-mcp --region=$REGION --port=3000
Now hit it at http://localhost:3000 — your credentials are injected automatically, no token management needed.
Part 3 — Wire Both Servers into an ADK Agent
Now the fun part. We'll build an agent that uses both MCP servers — the built-in Cloud Run one and your custom infra health server — to answer infrastructure questions like a seasoned SRE.
Install ADK
pip install google-adk
Create the Agent
Create agent.py:
import asyncio
from google.adk.agents import LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, StreamableHTTPConnectionParams
import os
MCP_URL = os.environ["MCP_URL"] # your infra-health-mcp URL
async def main():
# Connect to both MCP servers
cloud_run_tools = MCPToolset(
StreamableHTTPConnectionParams(url="https://run.googleapis.com/mcp")
)
infra_health_tools = MCPToolset(
StreamableHTTPConnectionParams(url=MCP_URL)
)
agent = LlmAgent(
name="infra-agent",
model="gemini-2.0-flash",
instruction=(
"You are an infrastructure operations assistant. "
"You have access to Cloud Run management tools and infrastructure health tools. "
"When asked about service health or errors, always check both the service status "
"and recent error rates before answering. Be concise and actionable."
),
tools=[cloud_run_tools, infra_health_tools],
)
# Example queries — swap these for interactive input
queries = [
"List all my Cloud Run services in us-central1",
"Which services had 5xx errors in the last hour?",
"Give me a health summary for all services",
]
for query in queries:
print(f"\n>>> {query}")
response = await agent.run(query)
print(response.text)
if __name__ == "__main__":
asyncio.run(main())
Run it:
MCP_URL=$MCP_URL python agent.py
You'll see the agent autonomously call list_services, then loop over each service calling get_service_error_rate and check_service_health — building a full infra health picture without you writing a single orchestration loop.
This is the moment it clicks. You didn't write "for service in services: check health". The agent reasoned its way to that pattern. Your job was defining the tools. That's a genuine shift in how we build infra tooling.
Security: Don't Skip This Section
Agents with infrastructure permissions need real guardrails. Here's what I'd put in place before letting this near production:
1. Scope service account permissions tightly. The infra-health-sa has read-only roles. If you want an agent that can also deploy, create a separate service account for write operations and require explicit approval flows before those tools fire.
2. Use IAM deny policies for the write MCP tools. You can explicitly deny run.services.create on specific service accounts at the project level — useful if you only want agents to have deploy access in staging, not prod.
3. Enable Model Armor. Google's Model Armor sits in front of MCP calls and blocks prompt injection attempts, malicious URIs, and unsafe content before they reach your tools. Enable it in the Google Cloud console under AI Safety.
4. Cloud Audit Logs are your friend. Every MCP tool call made through Google-managed servers is logged automatically. Set up a log-based alert for any deploy_service_from_image calls from service accounts that shouldn't be deploying.
# Example: alert on unexpected deploys
gcloud logging metrics create unexpected-agent-deploy \
--description="MCP deploy calls from unexpected accounts" \
--log-filter='protoPayload.methodName="google.cloud.run.v2.Services.CreateService"'
My Honest Take
What impressed me most at NEXT '26 isn't any single feature — it's that Google is treating MCP as a first-class citizen across the entire platform. BigQuery has a managed MCP server. Cloud Logging has one. Cloud SQL is getting one. This is becoming the standard interface layer between AI agents and cloud services.
For IT specialists and infrastructure engineers, this is actually exciting rather than threatening. The tedious parts of infra ops — writing one-off scripts to list resources, cross-referencing logs with deployment times, checking health across 20 services — are exactly what agents are good at. You shift from doing the repetitive tasks to designing the tools that do them.
The rough edges? Auth setup for remote MCP servers is still fiddly, especially in multi-project setups. The ADK toolset documentation is still catching up to the pace of announcements. And "fully managed" doesn't yet mean "zero config" — you still need to wire up IAM carefully.
But the direction is clear, and the foundation is solid. The infra engineer who learns to build good MCP servers is going to be unreasonably productive over the next few years.
What's Next
- Explore the official Cloud Run MCP docs
- Check out the ADK MCP codelab
- Try adding a
deploy_servicetool to your custom server — and see how differently it feels when an agent handles rollback logic
Drop a comment if you build something cool with this. I'm especially curious what domain-specific MCP servers people come up with.
Built and tested as part of the Google Cloud NEXT '26 Writing Challenge. All code examples use placeholder project IDs — swap in your own before running.


Top comments (0)