The Problem Every Agent Builder Knows
Your agent just generated some Python. Now what? You need to run it. Somewhere. Safely. Without it touching your prod database, your secrets, your other pods, or anything else it wasn't supposed to touch.
So you cobbled something together. Maybe a size-1 StatefulSet with gVisor. Maybe a subprocess with a timeout. Maybe a Docker container you spin up per-request and pray the cold start isn't too painful. It works — mostly. Until it doesn't.
The DIY agent sandbox is one of the most common pieces of technical debt in agentic AI systems right now. GKE Agent Sandbox, GA as of Cloud Next '26, is the opinionated answer to it.
What You're Probably Doing Today
Let's be honest about the DIY path. Here's a typical pattern:
# StatefulSet (size 1) + gVisor + manual warm pool
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: agent-sandbox
spec:
replicas: 1 # pray you sized this right
template:
spec:
runtimeClassName: gvisor
containers:
- name: sandbox
image: my-sandbox:latest
resources:
limits:
cpu: "1"
memory: 512Mi
# + manual PVC + headless Service + custom lifecycle mgmt
# + warm pool you have to manage yourself
# + no snapshot support — crash = start over
This works at one sandbox. At ten it's fine. At a hundred it's a maintenance nightmare. You're writing glue code for provisioning, lifecycle management, networking, and warm pools — none of which is your actual product.
What Agent Sandbox Gives You Instead
| DIY Approach | GKE Agent Sandbox |
|---|---|
| StatefulSet + gVisor wired manually | Managed gVisor via SandboxClaim CRD |
| Cold starts of 2–3 min per sandbox | Sub-second via SandboxWarmPool |
| Crash = restart from zero, no state | Pod Snapshots — checkpoint and resume |
| Manual warm pool sizing and mgmt | WarmPool declared, GKE manages it |
| Custom networking + routing code | Sandbox Router handles all traffic |
| No SDK — raw Kubernetes YAML | Python SDK — no YAML in your hot path |
The numbers that matter:
- 300 sandboxes/sec provisioned per cluster
- Sub-second time to first instruction from warm pool
- 90% latency reduction over cold starts
- 30% better price-performance on Axion N4A vs leading competitors
Hands-On Tutorial: Enable GKE Agent Sandbox From Scratch
Level: Intermediate (knows Kubernetes basics)
Time: ~15 minutes
Requirements: GCP project with billing enabled, gcloud CLI, kubectl, Python 3.10+
You'll go from zero to a running, isolated sandbox cluster — with a warm pool ready to claim in under a second. All commands run in Cloud Shell.
Step 1 — Set Your Environment Variables
Open Cloud Shell and define these once. Every command below uses them — no manual substitution needed.
export PROJECT_ID=$(gcloud config get project)
export CLUSTER_NAME="agent-sandbox-cluster"
export REGION="us-central1"
export CLUSTER_VERSION="1.35.2-gke.1269000"
export NODE_POOL_NAME="agent-sandbox-pool"
export MACHINE_TYPE="e2-standard-2"
Note: GKE version
1.35.2-gke.1269000or later is required. Earlier versions don't support Agent Sandbox.
Step 2 — Create the GKE Standard Cluster
Create the base cluster first. Agent Sandbox gets added via a dedicated node pool — you can't enable it on the default pool.
gcloud beta container clusters create ${CLUSTER_NAME} \
--region=${REGION} \
--cluster-version=${CLUSTER_VERSION}
Prefer Autopilot? Use this single command instead — it handles the node pool automatically, then skip straight to Step 5:
gcloud beta container clusters create-auto ${CLUSTER_NAME} \ --region=${REGION} \ --cluster-version=${CLUSTER_VERSION} \ --enable-agent-sandbox
Step 3 — Add a gVisor-Enabled Node Pool
Agent Sandbox requires a dedicated node pool with gVisor enabled and the cos_containerd image type. This is non-negotiable — gVisor won't work on other image types.
gcloud container node-pools create ${NODE_POOL_NAME} \
--cluster=${CLUSTER_NAME} \
--machine-type=${MACHINE_TYPE} \
--region=${REGION} \
--image-type=cos_containerd \
--sandbox=type=gvisor
Step 4 — Enable the Agent Sandbox Feature
Now flip the switch that installs the Agent Sandbox controller and registers the CRDs on your cluster.
gcloud beta container clusters update ${CLUSTER_NAME} \
--region=${REGION} \
--enable-agent-sandbox
Verify it worked:
gcloud beta container clusters describe ${CLUSTER_NAME} \
--region=${REGION} \
--format="value(addonsConfig.agentSandboxConfig.enabled)"
# Expected output: True
✅ If you see True — you're live. The Agent Sandbox controller is running and the SandboxTemplate, SandboxWarmPool, and SandboxClaim CRDs are registered in your cluster.
Step 5 — Apply Your SandboxTemplate and WarmPool
Define your runtime blueprint and tell GKE how many pre-warmed sandboxes to keep ready. Save this as sandbox-setup.yaml:
apiVersion: sandbox.gke.io/v1
kind: SandboxTemplate
metadata:
name: python-agent-runtime
spec:
runtimeClassName: gvisor
containers:
- name: runtime
image: python:3.11-slim
resources:
requests: { cpu: "500m", memory: "256Mi" }
limits: { cpu: "1", memory: "512Mi" }
---
apiVersion: sandbox.gke.io/v1
kind: SandboxWarmPool
metadata:
name: python-agent-pool
spec:
template: python-agent-runtime
size: 5 # 5 pre-warmed sandboxes — adjust to your load
Apply it and watch the pool fill up:
kubectl apply -f sandbox-setup.yaml
# Watch the warm pool fill up
kubectl get sandboxwarmpool python-agent-pool -w
Step 6 — Install the Python Client and Run Your First Sandbox
Install the client locally and open a dev tunnel to the Sandbox Router. This is the fastest way to test without setting up Ingress.
# Install the client
pip install agentic-sandbox-client
# Get credentials for your cluster
gcloud container clusters get-credentials ${CLUSTER_NAME} \
--region=${REGION}
# Open dev tunnel to the Sandbox Router
kubectl port-forward svc/sandbox-router-svc 8080:8080
Now in a new terminal tab, claim your first sandbox. Save this as test_sandbox.py:
from agent_sandbox import SandboxClient
import asyncio
async def main():
client = SandboxClient(dev_mode=True)
# claim from warm pool — should be sub-second
sandbox = await client.claim(
template="python-agent-runtime"
)
print(f"Sandbox claimed: {sandbox.id}")
# run code inside the isolated sandbox
result = await sandbox.execute(
"print('Hello from inside gVisor isolation!')"
)
print(f"Output: {result.stdout}")
await sandbox.release()
print("Sandbox released back to pool.")
asyncio.run(main())
Run it:
python test_sandbox.py
✅ Expected output:
Sandbox claimed: sandbox-abc123
Output: Hello from inside gVisor isolation!
Sandbox released back to pool.
Teardown when done to avoid unexpected charges:
gcloud container clusters delete ${CLUSTER_NAME} --region=${REGION} --quiet
Total time from zero to first sandboxed execution: ~15 minutes. Compare that to the days you'd spend wiring up the DIY equivalent.
The Core Concepts — Fast
1. SandboxTemplate + SandboxClaim
Template is the reusable blueprint — runtime class, resource limits, image. Claim is how your app requests one. Separation of concerns: infra team owns the template, your orchestrator just creates claims.
2. SandboxWarmPool
Declares how many pre-warmed, pre-initialized sandboxes to keep ready. When a claim comes in, it grabs one from the pool instead of cold-starting. This is where sub-second latency comes from.
3. Sandbox Router
A stable ClusterIP endpoint that routes traffic to the right sandbox pod. In dev mode, tunnel with kubectl port-forward. In prod, your orchestrator talks to the router directly with RBAC or Workload Identity auth.
The Open Source Angle — Why It Matters Architecturally
GKE Agent Sandbox is a managed wrapper around the kubernetes-sigs/agent-sandbox open-source controller. This is not a detail — it's load-bearing for your architecture decisions.
The SandboxClaim, SandboxTemplate, and SandboxWarmPool CRDs are becoming a vendor-neutral standard under SIG Apps. Build your orchestrator against these primitives today, and you're not locked into GKE. Any cluster that runs the open-source controller speaks the same API.
You're not betting on Google. You're betting on an emerging Kubernetes standard.
Honest Critique — What's Still Missing
Pod Snapshots is still preview. The resume-from-state story is the most compelling feature for long-running agents, and it's not fully baked yet. The rest of the system is solid, but this is the piece you'll want before committing to the architecture for stateful multi-step agents.
The Python SDK is the only first-class client. If your orchestrator is in Go, TypeScript, or anything else, you're talking raw Kubernetes API for now. Workable, but it pushes complexity back onto you.
Dev mode uses kubectl port-forward. Fine for local testing but your dev/prod parity story needs thought. The production path with RBAC/Workload Identity is genuinely different from the tunnel-based dev path.
Bottom Line
If you're running agents that execute untrusted code and you're not using something like this — you have a security incident waiting to happen. The DIY path is not a permanent solution; it's a liability you're carrying.
Agent Sandbox gives you kernel-level isolation, sub-second provisioning, and a clean Python SDK, all backed by an open standard that won't trap you. The snapshots piece isn't fully there yet — but everything else is production-ready today.
The agentic AI era needed proper infrastructure. Not workarounds, not duct tape, not "good enough for now." GKE Agent Sandbox is that infrastructure — and it's available today. Your next agent deserves better than the hack you're currently running. Ship it right.
GKE Agent Sandbox is GA as of Google Cloud Next '26, April 22, 2026. Requires GKE v1.35.2-gke.1269000+.
Open-source controller: github.com/kubernetes-sigs/agent-sandbox
Official docs: cloud.google.com/kubernetes-engine/docs/how-to/agent-sandbox
Top comments (0)