DEV Community

Raghava Chellu
Raghava Chellu

Posted on

Bringing Async MCP to Google Cloud Run — Introducing cloudrun-mcp

Bringing Async MCP to Google Cloud Run — Introducing cloudrun-mcp

When you design distributed AI or agentic workloads on Google Cloud’s Cloud Run, you often juggle three recurring problems:

  • How to authenticate workloads securely
  • How to maintain long-lived, event-driven sessions
  • How to stream model context data efficiently without blocking threads

cloudrun-mcp solves all three in one lightweight Python SDK.

What is MCP (Model Context Protocol)?

MCP — Model Context Protocol is an emerging open standard for exchanging context between AI models, tools, and environments.

Think of it as “WebSockets for AI knowledge.”

Instead of hardcoding API calls, your model connects to an MCP server and streams structured events such as:

  • context.create
  • document.attach
  • agent.reply

For developers deploying AI agents on Cloud Run, GKE, or hybrid workloads, an async client is essential for scalability.

Introducing cloudrun-mcp

Async MCP (Model Context Protocol) client for Cloud Run.

Built by Raghava Chellu (February 2026), cloudrun-mcp brings:

  • First-class async streaming
  • Automatic Cloud Run authentication
  • Agentic-AI-friendly APIs

to your production workloads.

How It Works

Under the hood:

  • The client uses aiohttp to maintain an HTTP/1.1 keep-alive streaming session.
  • Inside Cloud Run, it queries the metadata service to obtain a signed JWT:
http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience=<your-audience>
Enter fullscreen mode Exit fullscreen mode
  • Each event from the MCP server arrives as a Server-Sent Event (SSE).
  • The SDK yields events as a Python async iterator, ready for real-time AI reasoning loops.

Installation

pip install cloudrun-mcp
Enter fullscreen mode Exit fullscreen mode

Requirements

  • Python ≥ 3.10
  • Deployed on GCP (Cloud Run / GKE / GCE) with metadata-server access

Usage Example

import asyncio
from cloudrun_mcp import MCPClient

async def main():
    client = MCPClient(base_url="https://your-mcp-server.run.app")

    async for event in client.events():
        print(event)

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Typical Output Stream

{"event":"context.create","status":"ok"}
{"event":"model.response","content":"42"}
{"event":"model.done"}
Enter fullscreen mode Exit fullscreen mode

That’s it — you’ve connected an async agent running on Cloud Run to an MCP backend and are receiving real-time context updates.

Why Async MCP Matters

AI workloads are evolving from simple request-response APIs to long-running reasoning graphs.

Synchronous I/O becomes a bottleneck.

cloudrun-mcp leverages Python’s asyncio to keep event loops responsive across:

  • Streaming token generation
  • Function-calling orchestration
  • Multi-model chains

It’s especially powerful for Agentic AI, where orchestrators consume continuous model context (tool outputs, planning updates, memory events).

Authentication Deep Dive

The SDK automatically:

  • Discovers the metadata endpoint
  • Retrieves an ID token targeting your MCP server
  • Injects it into request headers
Authorization: Bearer <token>
Enter fullscreen mode Exit fullscreen mode
  • Refreshes tokens every ~55 minutes

No OAuth flows.
No key.json files.
Perfect for production micro-agents.

Streaming with Back-Pressure Control

async for event in client.events(buffer=32):
    await handle_event(event)
Enter fullscreen mode Exit fullscreen mode

Typical Deployment Pattern

[MCP Clients] <--SSE--> [cloudrun-mcp SDK] <--Auth--> [Cloud Run Service]
         \
          ↳ [Agent Processors / Vector DB / PubSub Pipelines]
Enter fullscreen mode Exit fullscreen mode

cloudrun-mcp acts as the async bridge between Cloud identity and AI reasoning streams.

Real-World Use Cases

Event-Driven AI Agents

Agents listening to MCP streams and triggering workflows automatically.

🔹 LLM Orchestration Pipelines

Streaming intermediate reasoning steps to dashboards.

🔹 IoT Telemetry Ingestion

Continuous SSE device streams pushed to Pub/Sub.

🔹 Hybrid Edge Inference

Bridge local MCP hubs with Cloud Run decision services.

Design Philosophy

The SDK follows three principles:

  • Async First — built entirely on asyncio
  • Zero Secrets — uses Workload Identity exclusively
  • Agentic Friendly — integrates with frameworks like LangChain or CrewAI

Top comments (0)