Afjal Quraishi

Posted on Apr 27

The Most Underrated Announcement at Google Cloud Next '26: GKE Agent Sandbox

#devchallenge #cloudnextchallenge #googlecloud

Google Cloud NEXT '26 Challenge Submission

Google Cloud Next '26 dropped 260 announcements in two days. The Gemini Enterprise Agent Platform got the keynote spotlight. The 8th-gen TPUs got the infrastructure crowd buzzing. Workspace Intelligence is already in everyone's LinkedIn posts.

But about 56 minutes into the Developer Keynote, Ankur Kotwal said something that quietly validated a concern every team building agentic apps has been sitting with: agents need boundaries. Secure execution environments. A place where LLM-generated code can run without touching your credentials, your internal services, or your infrastructure.

And then — quietly tucked inside the Kubernetes section — GKE Agent Sandbox went GA.

I think it's the most important thing they announced for developers actually building agentic AI systems right now. Let me show you why, and walk you through what it actually looks like in code.

The Problem Nobody Talks About Loudly Enough

Here's a scenario every team building agentic apps eventually hits:

Your AI agent decides, autonomously, to write and execute some Python to answer a user's question. Maybe it's analyzing a CSV, running a calculation, scraping some data. The LLM generates the code. Your system runs it — on your infrastructure, with your credentials potentially in scope.

If that agent is compromised, or just confidently wrong and destructive, it has the keys to your kingdom.

This isn't hypothetical. It's the current default state of most "agentic" apps in production today, and it's a serious problem once you stop celebrating what agents can do and start asking where they should be allowed to do it.

The industry has been so busy building the reasoning layer that it's barely started engineering the execution layer.

What GKE Agent Sandbox Actually Is

GKE Agent Sandbox is a managed GKE add-on that gives you isolated, stateful, single-replica environments specifically designed for running untrusted, LLM-generated code. It's based on the open-source kubernetes-sigs/agent-sandbox project (a real Kubernetes SIG effort), and on Google Cloud it runs with managed gVisor providing kernel-level isolation — meaning untrusted code can't escape to the host OS, full stop.

The key differentiators from "just run this in a container":

The Claim Model separates requesting a sandbox from managing one. Your agent logic creates a SandboxClaim referencing a SandboxTemplate; the controller handles provisioning. Your AI orchestrator doesn't need to be a Kubernetes expert.

Warm Pools pre-provision sandboxes so they're claimable in under a second, eliminating cold-start latency — the tradeoff that historically made "safe" execution impractical at scale.

Pod Snapshots (limited preview) let you checkpoint and restore full sandbox state. Idle sandboxes can be snapshotted, suspended, and resumed exactly where they left off — which has significant cost implications for long-lived agent sessions.

Default Deny networking means every sandbox is air-gapped by default. Compromised sandboxes can't reach your internal services or the GKE control plane unless you explicitly allow it in your SandboxTemplate.

What It Actually Looks Like

Here's where I'll go beyond the docs summary and show you the actual setup. Start by defining your SandboxTemplate and SandboxWarmPool in a single manifest:

# sandbox-template-and-pool.yaml
apiVersion: extensions.agents.x-k8s.io/v1alpha1
kind: SandboxTemplate
metadata:
  name: python-runtime-template
  namespace: default
spec:
  podTemplate:
    spec:
      runtimeClassName: gvisor          # kernel-level isolation
      automountServiceAccountToken: false  # no credentials in scope
      securityContext:
        runAsNonRoot: true
      nodeSelector:
        sandbox.gke.io/runtime: gvisor
      tolerations:
        - key: "sandbox.gke.io/runtime"
          value: "gvisor"
          effect: "NoSchedule"
      containers:
        - name: python-runtime
          image: registry.k8s.io/agent-sandbox/python-runtime-sandbox:v0.1.0
          ports:
            - containerPort: 8888
          resources:
            limits:
              memory: "1Gi"             # required
          securityContext:
            capabilities:
              drop: ["ALL"]             # required
---
apiVersion: extensions.agents.x-k8s.io/v1alpha1
kind: SandboxWarmPool
metadata:
  name: python-sandbox-warmpool
  namespace: default
spec:
  replicas: 2
  sandboxTemplateRef:
    name: python-runtime-template

Notice the details that matter for security: automountServiceAccountToken: false means the pod has zero access to your GCP credentials by default, drop: ["ALL"] strips all Linux capabilities, and runtimeClassName: gvisor gets you syscall interception at the kernel level.

Apply it, deploy a Sandbox Router, and now your agent code looks like this:

from k8s_agent_sandbox import SandboxClient
from k8s_agent_sandbox.models import SandboxLocalTunnelConnectionConfig

client = SandboxClient(
    connection_config=SandboxLocalTunnelConnectionConfig()
)

sandbox = client.create_sandbox(
    template="python-runtime-template",
    namespace="default"
)

try:
    result = sandbox.commands.run("python3 analyze.py --input data.csv")
    print(result.stdout)
finally:
    sandbox.delete()

That's it. Your agent calls sandbox.commands.run(), the code executes in a gVisor-isolated Pod claimed from the warm pool in sub-second time, and when it's done, the sandbox is gone. The Sandbox Router handles all the traffic routing. Your orchestration layer never touches a Pod directly.

The pattern is clean: your agent reasons, the sandbox executes, network policies air-gap everything.

How It Compares to the Alternatives

Before I tell you this is the right architecture, let me acknowledge what teams were doing before this:

Approach	Isolation	Cold Start	Cloud-native	Self-hostable
GKE Agent Sandbox	Kernel (gVisor)	<1s (warm pool)	✅ Full GKE integration	✅ (open-source SIG)
E2B	VM-level microVMs	~1–3s	❌ Managed service only	❌
Modal	Container + network	~2–5s	Partial	❌
Cloud Run Jobs	Container	~3–8s	✅	❌
Plain Kubernetes Pod	Container only	~10–30s	✅	✅

E2B and Modal are both excellent products — they pioneered the category of purpose-built AI execution sandboxes and deserve credit for that. But they're fully managed, proprietary, and not integrated into your existing GKE cluster. If you're already running a GKE-based agentic stack, GKE Agent Sandbox gives you the same isolation story without adding another vendor dependency, and without giving up the flexibility to run it yourself.

Cloud Run Jobs are a common "good enough" alternative — but they're stateless, have no warm pool, and give you container-level isolation, not kernel-level isolation.

The Open Source Angle That Makes This Durable

This is something the keynote didn't dwell on, but it matters a lot: this isn't Google locking you into a proprietary runtime.

kubernetes-sigs/agent-sandbox is a real Kubernetes SIG project. The CRD-based API (SandboxTemplate, SandboxClaim, SandboxWarmPool) is designed to be vendor-neutral, with runtime support for both gVisor and Kata Containers. You can deploy the open-source controller on a non-GKE cluster — on-prem, on another cloud, in a hybrid deployment — right now.

What you get from the managed GKE version that you don't get self-hosting:

Automatic controller upgrades and security patches
Native integration with GKE's Pod Snapshots feature
The 300-sandboxes-per-second scale on Axion N4A instances
First-class support in the GKE console

The abstraction you build your agent against is portable. The engineering investment you make in this architecture isn't cloud-locked. For teams with cloud sovereignty concerns or existing hybrid Kubernetes deployments, that's not a minor footnote — it's the entire story.

My Honest Critique

I don't want to just be a hype machine, so here's where the friction actually is:

The gVisor compatibility gap is real. Not every Python workload runs cleanly on gVisor. Syscalls that hit unimplemented kernel features will fail silently or with confusing errors. Before you commit to this architecture, validate your specific libraries (especially anything using ctypes, low-level networking, or FUSE). The official docs are honest about this but don't give you a concrete compatibility matrix.

The Sandbox Router needs careful production configuration. The quick-start guide uses kubectl port-forward for the tunnel, which is explicitly marked as a dev-only approach. In production, you need a proper ingress setup in front of the Router. The docs are thin on what "careful configuration" actually means at scale — specifically, how to handle Router failures without dropping in-flight sandbox sessions.

Multi-tenancy guidance is missing. The docs do the happy path (one agent, one sandbox) very well. But if you need to isolate sandboxes across different end-users — so that user A's agent can never reach user B's sandbox — the guidance is almost absent. The right answer is probably separate namespaces with strict NetworkPolicy and RBAC scoping the SandboxClaim creation, but you're left to figure that out yourself. This is the gap I'd most want Google to fill in before recommending this for multi-tenant production systems.

Pod Snapshots are still limited preview. This feature is the one that makes agent state management genuinely elegant — checkpoint a long-running data analysis session, restore it 10 minutes later without re-running setup. It's not broadly available yet. The warm pool approach is a solid interim, but snapshots are the real unlock.

The Bigger Picture

Google's framing at Next '26 was all about the "agentic era" — AI that doesn't just answer questions but takes actions across systems. That framing only holds up if the execution layer is trustworthy.

GKE Agent Sandbox is the piece of infrastructure that makes the rest of the agentic stack defensible. Without something like it, "autonomous agents" means "autonomous access to your entire cloud environment." With it, you have actual isolation boundaries, actual security controls, and an actual engineering story for running untrusted code at scale.

The TPUs and the Gemini models get the headlines. But GKE Agent Sandbox is the thing I'd actually go build with right now.

Want to try it?

Tags: googlecloud kubernetes ai devops security

DEV Community