DEV Community

Cover image for The Most Underrated Infrastructure Shift of the Year: Google Cloud’s New Serverless
Cedric Sebastian
Cedric Sebastian Subscriber

Posted on

The Most Underrated Infrastructure Shift of the Year: Google Cloud’s New Serverless

Google Cloud NEXT '26 Challenge Submission

This is a submission for the Google Cloud NEXT Writing Challenge

Topic: Cloud Run’s new serverless primitives—Cloud Run Instances, Cloud Run Sandboxes, and Ephemeral Disk Storage—announced at Google Cloud Next ’26, plus native integration with the Gemini Enterprise Agent Platform via the Model Context Protocol (MCP).

Format: Opinion Piece


Introduction: The Announcement Nobody Is Talking About

Google Cloud Next ’26 produced 260 announcements [1]. Predictably, the headlines went to the Gemini Enterprise Agent Platform—the rebranded, expanded successor to Vertex AI—alongside eighth-generation TPUs, the Agentic Data Cloud, and the Wiz-powered Agentic Defense story [2]. Thomas Kurian’s keynote framed the entire conference around a singular, aggressive thesis: the agentic enterprise is here, and Google Cloud is its operating system [6].

Buried beneath that flashy narrative, however, was a set of infrastructure announcements that will easily outlast the keynote buzz. Google quietly introduced three new Cloud Run primitives that fundamentally redefine what serverless means on GCP. Alongside these, the platform revealed a tight, native integration with the Gemini Enterprise Agent Platform via the Model Context Protocol (MCP) [6]. Together, these updates transform Cloud Run from a passive HTTP host into a stateful, secure, agent-first compute fabric.

The developer community has largely ignored these primitives in favor of the Gemini spectacle. That is a massive mistake. This evolution addresses the single greatest barrier to production-grade AI agents—trustworthy, isolated, and persistent execution—in a way no hyperscaler has delivered before. If developers overlook these primitives now, they will spend the next two years retrofitting architectures that could have been built correctly from day one.


The Problem: Why “Traditional” Serverless Failed the Agentic Era

Before examining the new primitives, we have to understand the architectural gap they fill. Cloud Run established itself as the gold standard for stateless, request-driven compute, abstracting away the Kubernetes control plane while preserving Knative’s operational simplicity. But this model was designed for a world of synchronous HTTP requests, not for AI agents that:

  • Run continuously for hours.
  • Maintain complex conversational state.
  • Execute dynamically generated, untrusted code.
  • Demand persistent, local scratch space.

The industry workarounds had become folklore. Developers pinged their own endpoints to keep containers warm. They massively over-provisioned RAM because the local filesystem was an in-memory tmpfs—meaning if you staged a 2 GB dataset, you consumed 2 GB of your container’s memory, often triggering an OOM (Out of Memory) kill.

For anything more complex, teams begrudgingly fell back to GKE clusters or VMs. These weren’t edge cases; they were the default, broken patterns for agentic workloads. One Chinese developer perfectly summarized the prevailing anxiety in the community:

“每次赋予代理真实工具权限(不是玩具演示),都会感受到一种‘无法摆脱的低级恐惧’”
“Every time you give an agent real tool permissions (not a toy demo), you feel an inescapable, primal fear.” [8]

Giving an LLM the keys to a shell and hoping it behaves isn’t security; it’s a prayer. This is exactly the crisis Google’s new Cloud Run primitives solve.


The Four Primitives That Change Serverless

Google announced four interconnected capabilities at Next ’26. Here is a breakdown of what they are and why they matter.

The Architectural Shift at a Glance

Capability Traditional Serverless Model Next '26 Cloud Run Primitives
Compute Scale-to-zero, request-driven Instances: Guaranteed, long-lived background processing
Security Basic container-level isolation Sandboxes: Zero-trust, gVisor-backed micro-VMs
Storage In-memory tmpfs (The "RAM Tax") Ephemeral Disk: Dedicated NVMe block storage
Orchestration Custom API middleware MCP Integration: Native, IAM-authenticated agent scaling

1. Cloud Run Instances (Preview): Guaranteed Baseline Compute

Historically, Cloud Run’s scale-to-zero model left a frustrating gap for architectures requiring continuous background processing or persistent daemons. Cloud Run Instances bridge that gap by provisioning long-lived, dedicated containers that do not scale to zero [3].

From a systems-engineering perspective, this is the most critical primitive. It provides a native serverless abstraction for continuous event backbones without relying on HTTP concurrency hacks. The official blog demonstrated a single command to provision such an instance with a Cloud Storage volume mount:

gcloud run instances create \
  --image alpine/openclaw:latest \
  --port 18789 \
  --memory 4Gi \
  --default-url \
  --add-volume mount-path=/home/node/.openclaw,type=cloud-storage,bucket=$BUCKET_NAME
Enter fullscreen mode Exit fullscreen mode

Developers no longer need to choose between the operational simplicity of serverless and the continuous persistence requirements of stateful workloads.

2. Cloud Run Sandboxes (Preview): Zero-Trust Execution

When building systems that dynamically generate and execute code, you are dealing with fundamentally untrusted workloads. Cloud Run Sandboxes expose gVisor-backed, micro-VM isolation as a configurable developer primitive, allowing you to spin up execution environments with granular egress and syscall restrictions [4].

This is the exact same isolation technology Google uses to protect Gemini itself. The performance characteristics are staggering: 300 sandboxes launched per second per cluster, sub-second time-to-first-instruction, and stateful session persistence [5]. Prior approaches forced a trade-off: sacrifice isolation (shared storage) or sacrifice continuity (reconstructing the environment for every call). Cloud Run Sandboxes maintain strict boundaries while preserving session state—acting as a "memory-equipped prison" for autonomous code execution.

3. Ephemeral Disk Storage (Preview): Breaking the RAM Tax

This primitive will immediately improve the quality of life for developers doing data-intensive work. Previously, staging large datasets meant paying a massive "RAM tax" due to the tmpfs filesystem.

Ephemeral Disk Storage attaches dedicated, local, high-speed block storage (NVMe-backed) directly to your instances [3]. This storage dies with the instance, but crucially, it does not consume your memory allocation. A Kafka consumer buffering gigabytes of streaming data can now write to local NVMe rather than holding it in memory, avoiding OOM kills entirely [4].

4. Gemini Enterprise Agent Platform Integration via MCP (Preview)

Google tightly integrated Cloud Run with the Gemini Enterprise Agent Platform using the Model Context Protocol (MCP). This allows multi-agent systems to securely interface with internal infrastructure without building custom authentication middleware; the transport layer is natively authenticated via Google Cloud IAM [4].

This elevates Cloud Run from a mere hosting target to a cognitive capability. An agent equipped with an MCP-mapped skill can dynamically act as its own platform engineer—provisioning an Instance for a background daemon or a Sandbox for safe code execution on the fly.


Production Patterns: What This Actually Enables

Architectural features are only as valuable as the systems they enable. Here are four production patterns these primitives unlock [4]:

  • Continuous Stateful Ingestion: Combine Instances and Ephemeral Disk to deploy a horizontally scalable Kafka consumer group entirely on serverless. Instances maintain uninterrupted connections; NVMe buffers the payloads.
  • Autonomous Zero-Trust SOC Analyst: Security agents often need to execute dynamic scripts to analyze suspicious binaries. Cloud Run Sandboxes ensure this execution occurs within an isolated gVisor micro-VM with restricted network egress, ensuring a zero-blast radius.
  • Agentic Firewall Configuration Loops: An Instance can run continuously to tail telemetry logs. Upon detecting an anomaly, it triggers a Gemini Agent via MCP, which securely navigates the firewall architecture via IAM and updates blocklists in real time.
  • The Agentic Platform Engineer: A local agent equipped with an MCP-mapped Skill can dynamically orchestrate its own Instances, Sandboxes, and Ephemeral Disks to solve complex problems, provisioning its own compute resources in real time.

The Critique: What’s Missing and What to Watch

An honest evaluation requires acknowledging the gaps:

  • Preview Gating: The most valuable primitives are preview-only and limited to select customers. Developers must weigh the risk of betting on preview APIs against immediate architectural benefits.
  • Vendor Coupling Risk: The MCP integration is deeply tied to Google Cloud IAM. While the A2A (Agent-to-Agent) protocol and ADK (Agent Development Kit) are open-source, the managed hosting layer and Agent Registry are heavily coupled to GCP [5].
  • Observability Maturity: Debugging what happened inside an ephemeral gVisor container requires a mature observability pipeline. While ADK 1.0’s OpenTelemetry integration is encouraging, the sandbox-specific tracing story needs clearer documentation.
  • The Competitive Landscape: AWS Lambda and Azure Container Apps haven't responded with equivalent primitives yet—but they will. Google’s gVisor-backed serverless sandboxing is a massive first-mover advantage, but it is a window of opportunity, not an impenetrable moat.

Conclusion: The Foundation, Not the Fireworks

The Gemini Enterprise Agent Platform is the user-facing story of Next ’26. But platforms need foundations, and Cloud Run’s new primitives are precisely that: the infrastructure layer that makes agentic workloads production-grade rather than a flimsy proof-of-concept.

My recommendation to the developer community is straightforward: look past the keynote pyrotechnics. Experiment with Cloud Run Instances for your stateful workloads. Evaluate Cloud Run Sandboxes as the execution boundary for any agent running untrusted code. Plan your capacity models around Ephemeral Disk rather than RAM over-provisioning.

The agentic era demands infrastructure that is stateful, secure, and highly composable. Google shipped exactly that at Next ’26. The rest of us just haven’t noticed yet.


References

[1] Google Cloud, “260 things we announced at Google Cloud Next ’26 – a recap,” Google Cloud Blog, Apr. 25, 2026. [Online]. Available: https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2026-wrap-up

[2] Google Cloud, “Welcome to Google Cloud Next ‘26,” Google Cloud Blog, Apr. 23, 2026. [Online]. Available: https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26

[3] S. Giannini and B. Runkle, “What’s new in Cloud Run at Next ‘26,” Google Cloud Blog, Apr. 23, 2026. [Online]. Available:

What’s new for Cloud Run at Next ‘26 | Google Cloud Blog

Cloud Run updates at Next include integrations with Google AI Studio and Gemini Enterprise Agent Platform plus support for NVIDIA Blackwell GPUs.

favicon cloud.google.com

[4] Google Cloud Developer Advocates, “Stateful Serverless: The Architectural Evolution of Cloud Run for the Agentic Era,” Medium, Apr. 28, 2026. [Online]. Available: https://medium.com/google-cloud/architectural-evolution-of-cloud-run-for-the-agentic-era-28af995cecb0

[5] R. Bobade, “I Looked Past the Keynote Hype: Why A2A + ADK Is the Real Story of Google Cloud NEXT ’26,” dev.to, Apr. 23, 2026. [Online]. Available:

[6] M. Gerstenhaber and M. Bachman, “Introducing Gemini Enterprise Agent Platform, powering the next wave of agents,” Google Cloud Blog, Apr. 23, 2026. [Online]. Available: https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise-agent-platform

[7] Google Cloud, “Day 1 at Google Cloud Next ‘26 recap,” Google Cloud Blog, Apr. 23, 2026. [Online]. Available: https://cloud.google.com/blog/topics/google-cloud-next/next26-day-1-recap

[8] “每秒300个沙盒:Google把AI代理的‘定时炸弹’拆了,” 163.com, Apr. 30, 2026. [Online]. Available: https://www.163.com/dy/article/KRNLIEBV05561FZW.html

Top comments (0)