Isolation/sandboxing Agents give the ability to run agentic workflows in a safe, secure, and governed way. Without it, your Agents can access just about anything you can along with doing any type of web research and API calls.
With sandboxing solving this agentic issue, the next question is "where and how will sandboxes run?" and that's where Substrate comes into play.
In this blog, you'll learn about what Substrate is and how to deploy it in GKE.
Prerequistes
To follow along with this blog post from a hands-on perspective, you will need:
- A GCP account
- A GKE cluster
What Is Agent Substrate
There are two things that Kubernetes is incredibly good at out of the box:
- Orchestration
- Clustering worker nodes to ensure users have a pool of GPU, CPU, and memory
What can be built on top of k8s that isn't out of the box is higher levels of efficiency for hardware resource management, lower latency, and the implementation of Agentic workflows (e.g - running Agents and isolating Agents). However, the primitives of Kubernetes (Pods, autoscaling, clustering of Worker Nodes) is still very-much needed, so there needs to be a tool/platform for the Agentic era that builds on top of what we know as k8s today. Something that has its own Control Plane/management layer, but still uses what Kubernetes has to offer.
That's where Agent Susbtrate comes into play.
Underneath the hood, Substrate uses gvisor (same thing as the Agent Sandbox project from the CNCF SIG), which is a container sandbox developed by Google that focuses on security, isolation, and the ability to use it in an efficient fashion (e.g - not take up a ton of hardware resources).
Substrate Internals
There are four main parts to Substrate:
- ate-api-server (control plane)
- atenet-router (the Envoy/DNS router)
- valkey (the state store)
- pod-certificate-controller itself
And the "agent-like Actors" along with Workers.
You will also see atelet, which is a Per-node Agent (DaemonSet, runs on every worker node) and it manages Worker Pods, drives runsc checkpoint/restore, streams snapshots to/from the GCS bucket that you will be creating in an upcoming section.
System Components
And the four workloads mount podCertificate volumes for all said system components. The pod certs are so that these components (or rather, the Pods running the components) get auto-issued, auto-rotated TLS certs to do mTLS between each other.
π‘
Per Google: Pod Certificates is a native Kubernetes feature that automatically issues short-lived X.509 TLS certificates directly to running Pods. Introduced as an alpha feature in Kubernetes v1.34 and advanced to Beta in v1.35, this capability allows workloads to authenticate to the kube-apiserver and establish mutual TLS (mTLS) with other workloads natively.
Pod Certificates are a hard requirement for Agent Substrate, as they're how Substrate gives each component an auto-rotated per-pod mTLS identity. The pod declares a podCertificate projected volume source, which triggers a PodCertificateRequest, and the signer fulfills it. The kubelet projects (and auto-rotates) the credential bundle into the pod, and that volume must be mounted for the pod to run.
To clarify two separate distinctions:
- Pod certs == identity for Substrate's own infrastructure (the four pods above). This is what needs Pod Certificates.
- Actor identity == the SessionIdentity gRPC service (MintJWT/MintCert), backed by the session-id JWT/CA pool secrets. Actor/worker/ateom podsdo not mount
podCertificatevolumes.
So the feature isn't about giving agents certs, it's about the platform securing itself.
Actors
Substrate runs Agent-like workloads called βactorsβ. It then maps the actors onto what Substrate calls "workers", which are k8s Pods. With workers, you get:
- Functionality for managing the actors lifecycle (e.g. - create, destroy, suspend, resume actors)
- The ability to assign actors to workers in real time
- Route incoming traffic to actors.
Because of Substrate's efficiency in how Actors run, you can run a plethora of Actors on a Single Worker. Google tested this with 250 Stateful Actors across only 8 Pods (the Workers).
Interacting With Substrate
Because Substrate has its own management plane and resources, you can interact with it via its own command-line tool, ate.
e.g - kubectl ate (more to come on this in the configuration sections that are upcoming).
Environment Configuration Needs/Prereqs
There are a few things that you will need configured for your Google Kubernetes Engine (GKE) cluster, GCP environment, and CLI tools.
-
gcloudand all of the auth that goes with it to manage your GCP and GKE environment on the terminal.
export PROJECT_ID=<your-project-id>
gcloud auth login
gcloud auth application-default login --project="$PROJECT_ID"
gcloud auth configure-docker gcr.io
- The required APIs for Substrate.
gcloud services enable \
cloudresourcemanager.googleapis.com \
container.googleapis.com \
networkconnectivity.googleapis.com \
serviceusage.googleapis.com \
storage.googleapis.com \
--project="$PROJECT_ID"
- The Agent Substrate repo cloned down in your local environment. You can clone it from here.
- Local tools on your terminal
-
- Go (v1.26.3 or above)
kubectlgit-
opensslfor converging the Valkey CA cert (more on that later)
Why Use GKE or Kind?
ThepodCertificate projected volume source is code in the kubelet/apiserver, but it's behind feature gates that default to off as of k8s 1.36. To use it, you need to turn them on via the k8s API Server. Something like:
--feature-gates=PodCertificateRequest=true,ClusterTrustBundle=true,ClusterTrustBundleProjection=true
--runtime-config=certificates.k8s.io/v1beta1=true
The problem is that not all managed k8s services (for example, AKS) allow you to turn on this feature. GKE does as it provides a "knob" out of the box and unmanged/raw k8s clusters (Kind, Kubeadm, etc.) allow you to because you manage the configuration.
ko
kois a build tool for Go container images from Google. It builds an image straight from Go source without a Dockerfile and a Docker daemon. Images are built and pushed by ko to your KO_DOCKER_REPO. valkey (state store) can be deployed for you by an install scrip so you don't have to install them manually
Configure Your Environment
With the prereqs, environment configs, and explanations of Agent Substrate and its components, let's get hands-on and deploy the Substrate environment.
- Within the
substratedirectory that you cloned, run the following:
cp hack/ate-dev-env.sh.example .ate-dev-env.sh
- Edit
.ate-dev-env.shwith your environment configs. Since you already have a GKE cluster per the Prerequisites section, you will only need the following in the file:
# --- Project / identity ---
export PROJECT_ID=my-substrate-proj
export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} --format="value(projectNumber)")
# --- Your existing cluster ---
export CLUSTER_NAME=substrate-poc
export CLUSTER_LOCATION=us-central1-c
# Set to your kubeconfig context so install-ate.sh skips `gcloud get-credentials`:
export KUBECTL_CONTEXT=gke_my-substrate-proj_us-central1-c_substrate-poc
# --- Snapshot bucket (GCE_REGION is the BUCKET's region, not the cluster's) ---
export GCE_REGION=us-central1
export BUCKET_NAME=snapshot-substrate-test-${PROJECT_ID}
# --- Image registry for ko ---
export KO_DOCKER_REPO="gcr.io/${PROJECT_ID}/ate-images"
export KO_DEFAULTPLATFORMS=linux/amd64
- Derive the two identities from step 2.
export ATELET_PRINCIPAL="principal://iam.googleapis.com/projects/${PROJECT_NUMBER}/locations/global/workloadIdentityPools/${PROJECT_ID}.svc.id.goog/subject/ns/ate-system/sa/atelet"
export NODE_SA="${PROJECT_NUMBER}-compute@developer.gserviceaccount.com"
- Ensure the GKE cluster has the Pod Certificate beta APIs and Workload Identity enabled.
source .ate-dev-env.sh
gcloud container clusters update "$CLUSTER_NAME" \
--location="$CLUSTER_LOCATION" --project="$PROJECT_ID" \
--enable-kubernetes-unstable-apis=certificates.k8s.io/v1beta1/podcertificaterequests,certificates.k8s.io/v1beta1/clustertrustbundles
gcloud container clusters update "$CLUSTER_NAME" \
--location="$CLUSTER_LOCATION" --project="$PROJECT_ID" \
--workload-pool="${PROJECT_ID}.svc.id.goog"
- Create a snapshot bucket for your Actors.
gcloud storage buckets create "gs://${BUCKET_NAME}" \
--project="$PROJECT_ID" --location="$GCE_REGION" --uniform-bucket-level-access
- Create IAM permissions for
ateletfor when it is interacting with the bucket.
gcloud storage buckets add-iam-policy-binding "gs://${BUCKET_NAME}" \
--member="$ATELET_PRINCIPAL" --role=roles/storage.objectAdmin
gcloud storage buckets add-iam-policy-binding "gs://${BUCKET_NAME}" \
--member="$ATELET_PRINCIPAL" --role=roles/storage.bucketViewer
- Grant project-level IAM permissions for the GKE nodes and
atelet.
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:${NODE_SA}" --role=roles/storage.objectViewer
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="serviceAccount:${NODE_SA}" --role=roles/artifactregistry.reader
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="$ATELET_PRINCIPAL" --role=roles/storage.objectAdmin
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
--member="$ATELET_PRINCIPAL" --role=roles/artifactregistry.reader
New Node Pools
Mounting the Pod Certificate volume is a kubelet (node-level) capability, and a node's kubelet config is fixed when the node is created. Enabling the beta APIs on the control plane doesn't retroactively apply to nodes that already exist. Since this was an existing cluster, its nodes predate the enablement, so they have to be recreated to pick up the feature. The simplest way to get fresh nodes is a new node pool (a same-version upgrade won't recreate them because the nodes already match the control-plane version).
- Create c3 type node pools.
gcloud container node-pools create substrate-pool \
--cluster="$CLUSTER_NAME" --location="$CLUSTER_LOCATION" --project="$PROJECT_ID" \
--machine-type=c3-standard-4 --num-nodes=1 \
--workload-metadata=GKE_METADATA
- Wait for the node pools.
kubectl get nodes -l cloud.google.com/gke-nodepool=substrate-poo
- Delete the old node pools.
gcloud container node-pools delete default-pool \
--cluster="$CLUSTER_NAME" --location="$CLUSTER_LOCATION" --project="$PROJECT_ID"
With the cluster environment configured and installed, let's install Agent Substrate.
Installing Substrate
Within the substrate directory, you will see install-ate.sh file under the hack directory, which builds the core images (via ko, pushed to KO_DOCKER_REPO) and deploys the Agent Substrate control plane/management plane and node components:
- The CRDs
-
ate-api-server(control plane) -
pod-certificate-controller(in-cluster mTLS signer that fulfills thePodCertificateRequests) -
atelet(node DaemonSet) -
atenet(DNS + Envoy router) -
valkey(dynamic state store). - Run the following command:
./hack/install-ate.sh --deploy-ate-system
You'll see the installation in progress.
- Wait for the system Pods to come up.
kubectl get pods -n ate-system --watch
After the Pods come up, Substrate is now installed.
Install The Substrate CLI
With the Substrate system up and running, you need a way to interact with it's control/management plane. To do that, you'll use the ate sub-command.
- Install the command.
go install ./cmd/kubectl-ate
- Add the binary to your path.
echo 'export PATH="$PATH:$(go env GOPATH)/bin"' >> ~/.zshrc
source ~/.zshrc
- Test out the sub-command.
kubectl ate --help
You now have ate installed and are ready to interact with Agent Substrate.
Wrapping Up
As the Agentic AI era continues to change how we think about Agents, so will the systems that we run them on. The next phase of "the systems we run them on" is Sandboxes, which will continue to rise in popularity for many organizations, as it gives the ability to isolate Agents from an ingress and egress perspective, along with what actions they can take with the tools that are available to them. I see Sandboxes being especially important as autonomous Agents become more relevant as well.




Top comments (0)