Bringing Async MCP to Google Cloud Run — Introducing cloudrun-mcp
When you design distributed AI or agentic workloads on Google Cloud’s Cloud Run, you often juggle three recurring problems:
- How to authenticate workloads securely
- How to maintain long-lived, event-driven sessions
- How to stream model context data efficiently without blocking threads
cloudrun-mcp solves all three in one lightweight Python SDK.
What is MCP (Model Context Protocol)?
MCP — Model Context Protocol is an emerging open standard for exchanging context between AI models, tools, and environments.
Think of it as “WebSockets for AI knowledge.”
Instead of hardcoding API calls, your model connects to an MCP server and streams structured events such as:
- context.create
- document.attach
- agent.reply
For developers deploying AI agents on Cloud Run, GKE, or hybrid workloads, an async client is essential for scalability.
Introducing cloudrun-mcp
Async MCP (Model Context Protocol) client for Cloud Run.
Built by Raghava Chellu (February 2026), cloudrun-mcp brings:
- First-class async streaming
- Automatic Cloud Run authentication
- Agentic-AI-friendly APIs
to your production workloads.
How It Works
Under the hood:
- The client uses aiohttp to maintain an HTTP/1.1 keep-alive streaming session.
- Inside Cloud Run, it queries the metadata service to obtain a signed JWT:
http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=<your-audience>
- Each event from the MCP server arrives as a Server-Sent Event (SSE).
- The SDK yields events as a Python async iterator, ready for real-time AI reasoning loops.
Installation
pip install cloudrun-mcp
Requirements
- Python ≥ 3.10
- Deployed on GCP (Cloud Run / GKE / GCE) with metadata-server access
Usage Example
import asyncio
from cloudrun_mcp import MCPClient
async def main():
client = MCPClient(base_url="https://your-mcp-server.run.app")
async for event in client.events():
print(event)
asyncio.run(main())
Typical Output Stream
{"event":"context.create","status":"ok"}
{"event":"model.response","content":"42"}
{"event":"model.done"}
That’s it — you’ve connected an async agent running on Cloud Run to an MCP backend and are receiving real-time context updates.
Why Async MCP Matters
AI workloads are evolving from simple request-response APIs to long-running reasoning graphs.
Synchronous I/O becomes a bottleneck.
cloudrun-mcp leverages Python’s asyncio to keep event loops responsive across:
- Streaming token generation
- Function-calling orchestration
- Multi-model chains
It’s especially powerful for Agentic AI, where orchestrators consume continuous model context (tool outputs, planning updates, memory events).
Authentication Deep Dive
The SDK automatically:
- Discovers the metadata endpoint
- Retrieves an ID token targeting your MCP server
- Injects it into request headers
Authorization: Bearer <token>
- Refreshes tokens every ~55 minutes
No OAuth flows.
No key.json files.
Perfect for production micro-agents.
Streaming with Back-Pressure Control
async for event in client.events(buffer=32):
await handle_event(event)
Typical Deployment Pattern
[MCP Clients] <--SSE--> [cloudrun-mcp SDK] <--Auth--> [Cloud Run Service]
\
↳ [Agent Processors / Vector DB / PubSub Pipelines]
cloudrun-mcp acts as the async bridge between Cloud identity and AI reasoning streams.
Real-World Use Cases
Event-Driven AI Agents
Agents listening to MCP streams and triggering workflows automatically.
🔹 LLM Orchestration Pipelines
Streaming intermediate reasoning steps to dashboards.
🔹 IoT Telemetry Ingestion
Continuous SSE device streams pushed to Pub/Sub.
🔹 Hybrid Edge Inference
Bridge local MCP hubs with Cloud Run decision services.
Design Philosophy
The SDK follows three principles:
- Async First — built entirely on asyncio
- Zero Secrets — uses Workload Identity exclusively
- Agentic Friendly — integrates with frameworks like LangChain or CrewAI
Top comments (0)