DEV Community: Michael Levan

Cutting Idle Agent Costs by 90% with Agent Substrate

Michael Levan — Tue, 30 Jun 2026 12:40:06 +0000

Cost is everything. In just about every agentic conversation, the three things that come up for enterprises implementing AI workloads are:

Cost
Observability
Security

and as AI continues to throw everyone for a loop when it comes to cost management (e.g - Uber running out of the yearly token budget in one quarter), the ability to shrink resource (like hardware) usage will be crucial moving forward.

In this blog post, you will learn how to cust costs by 90% using Agent Susbtrate in comparison to Agents running in k8s Deployments/Pods.

The Cost Comparison

Agents need a place to run. The "place to run" needs to be a platform that's easily managed, orchestrated, and has the ability to cluster resources. Resources like CPU, GPU, and memory need to be able to scale and expand. Without this, it's a matter of manually managing servers that Agents are running on and clients to interact with said server.

That's why so many organizations choose Kubernetes to run Agentic.

When running Agents per Pod, however, that can get costly very quick in terms of hardware (GPU, CPU, memory) and performance (can your cluster scale up and down quickly based on resource needs when it comes to Agents coming up and going down per use?).

The tests in this blog post show:

Always-on Agents running in k8s.
Actors running in Workers via Agent Substrate

And the comparison will be 50 always-on Pods in comparison to 50 Actors across 5-7 Workers (Pods). If there are 50 Agents running per Pod and 50 Agents running per Worker with 5-10 Actors per Pod, you can already imagine the hardware resource savings that can be accomplished.

Right now, the majority of organizations start off with the "one Agent per Pod" approach as that's the fastest way to show value and get up and running. For the future, however, Agents in Actors via Agent Substrate will be how organizations deploy when they care about efficiency, optimization, and managing cost.

Let's dive in from a hands-on perspective.

Prerequisites

To follow along in a hands-on fashion, you will need:

A Kubernetes cluster in GKE or Kind locally
Agent Substrate installed
kubectl-ate installed

You can install Agent Substrate and kubectl-ate from the Agent Substrate repo.

Installation & Configuration

Within the Agent Substrate repo, you will see a file in the hack directory called ate-dev-env.sh.example. Make a copy of the file:

cp hack/ate-dev-env.sh.example .ate-dev-env.sh

Then, edit the file with your cluster and account information. For example, if you are using GCP and deploy a GKE cluster, the .ate-dev-env.sh will look like the below:

PROJECT_ID=<your-project-id>
PROJECT_NUMBER=<your-project-number>
GCE_REGION=<bucket-region>
CLUSTER_LOCATION=<cluster-zone-or-region>
CLUSTER_NAME=<your-gke-cluster>
BUCKET_NAME=<your-snapshot-bucket>
KO_DOCKER_REPO=gcr.io/<your-project-id>/ate-images
KUBECTL_CONTEXT=<your-kube-context>

Once you've filled in your .ate-dev-env.sh file, you can source it and install the counter demo which exists in the Agent Substrate repo.

source .ate-dev-env.sh
./hack/install-ate.sh --deploy-demo-counter

You can then install the kubectl-ate CLI to interact with Actors, Workers, and Templates.

go install ./cmd/kubectl-ate

export PATH="$(go env GOPATH)/bin:$PATH"

Verify that Substrate is up and operational:

kubectl get workerpools.ate.dev counter -n ate-demo-counter
kubectl get pods -n ate-demo-counter
kubectl ate get workers

A Quick Note About GKE

If your GKE cluster is regional or has Worker Nodes spread across zones, pin the demo WorkerPool to one zone before creating benchmark actors. Agent Substrate uses checkpoint/restore, and gVisor restores can fail if a snapshot created on oneunderlying CPU platform is restored on a node with a different CPU feature set.

Choose the zone where you want the counter workers to run:

export SUBSTRATE_WORKER_ZONE=us-east1-d

Then patch the counter WorkerPool to schedule Workers in that zone.

kubectl patch workerpools.ate.dev counter \
  -n ate-demo-counter \
  --type=merge \
  -p "{\"spec\":{\"template\":{\"nodeSelector\":{\"topology.kubernetes.io/zone\":\"${SUBSTRATE_WORKER_ZONE}\"}}}}"

Configure The Benchmark

With the installation and configuration complete, you can now start setting up the benchmark environment and tests.

There are several environment variables below. The ACTOR_COUNT is the number of logical counter agents to test in both scenarios. BENCHMARK_NAMESPACE | is the namespace for the always-on Kubernetes baseline workloads and the in-cluster benchmark client Pod. BASELINE_PREFIX is the name prefix for Kubernetes baseline Deployments and Services. SUBSTRATE_PREFIX is the name prefix for Substrate actors created by the benchmark. TEMPLATE_REF is the Substrate actor template reference in <namespace>/<name> format. The counter demo creates ate-demo-counter/counter. SUBSTRATE_ROUTER_URL is the in-cluster URL for atenet-router; benchmark client sends Substrate actor traffic through this service. BASELINE_CPU_REQUEST is the requests assigned to each always-on Kubernetes baseline Pod. Used to make baseline resource consumption explicit. BASELINE_MEMORY_REQUEST is the memory request assigned to each always-on Kubernetes baseline Pod and is used to make baseline resource consumption explicit. BASELINE_RESULTS_FILE is the Local TSV file for Kubernetes baseline latency results. SUBSTRATE_RESULTS_FILE and SUMMARY_FILE are the files that contain the results.

export ACTOR_COUNT=50
export BENCHMARK_NAMESPACE=cost-comparison
export BASELINE_PREFIX=k8s-counter
export SUBSTRATE_PREFIX=substrate-counter
export TEMPLATE_REF=ate-demo-counter/counter
export SUBSTRATE_ROUTER_URL=http://atenet-router.ate-system.svc:80

export BASELINE_CPU_REQUEST=50m
export BASELINE_MEMORY_REQUEST=64Mi
export SUBSTRATE_WORKER_CPU_REQUEST=50m
export SUBSTRATE_WORKER_MEMORY_REQUEST=64Mi

export BASELINE_RESULTS_FILE=baseline-kubernetes-results.tsv
export SUBSTRATE_RESULTS_FILE=substrate-results.tsv
export SUMMARY_FILE=cost-comparison-summary.txt

Get the counter image from the live ActorTemplate. This keeps the Kubernetes baseline on the same counter server image used by the Substrate demo.

export COUNTER_IMAGE=$(kubectl get actortemplates.ate.dev counter \
  -n ate-demo-counter \
  -o jsonpath='{.spec.containers[0].image}')

printf "Counter image: %s\n" "$COUNTER_IMAGE"

Check the image. If the output from the following is blank, that means the container image is valid.

case "$COUNTER_IMAGE" in
  ko://*)
    printf "Counter image was not resolved: %s\n" "$COUNTER_IMAGE"
    exit 1
    ;;
esac

Retrieve the Substrate Worker count.

export WORKER_REPLICAS=$(kubectl get workerpools.ate.dev counter \
  -n ate-demo-counter \
  -o jsonpath='{.spec.replicas}')

printf "Logical agents: %s\nSubstrate workers: %s\n" \
  "$ACTOR_COUNT" "$WORKER_REPLICAS"

Deploying Kubernetes Always-On Pods

In this section, you will deploy the Kubernetes Pods that will be running the counter demo.

Create a Namespace for the Pods to exist.

kubectl create namespace "$BENCHMARK_NAMESPACE"

Deploy one Deployment and Service object per logical counter Agent.

for i in $(seq 1 "$ACTOR_COUNT"); do
  name=$(printf "%s-%03d" "$BASELINE_PREFIX" "$i")

  kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ${name}
  namespace: ${BENCHMARK_NAMESPACE}
  labels:
    app.kubernetes.io/name: counter
    app.kubernetes.io/part-of: cost-comparison
    cost-comparison/model: always-on-kubernetes
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: counter
      app.kubernetes.io/instance: ${name}
  template:
    metadata:
      labels:
        app.kubernetes.io/name: counter
        app.kubernetes.io/instance: ${name}
        app.kubernetes.io/part-of: cost-comparison
        cost-comparison/model: always-on-kubernetes
    spec:
      containers:
      - name: counter
        image: ${COUNTER_IMAGE}
        command:
        - /ko-app/counter
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: ${BASELINE_CPU_REQUEST}
            memory: ${BASELINE_MEMORY_REQUEST}
---
apiVersion: v1
kind: Service
metadata:
  name: ${name}
  namespace: ${BENCHMARK_NAMESPACE}
  labels:
    app.kubernetes.io/name: counter
    app.kubernetes.io/part-of: cost-comparison
    cost-comparison/model: always-on-kubernetes
spec:
  selector:
    app.kubernetes.io/name: counter
    app.kubernetes.io/instance: ${name}
  ports:
  - name: http
    port: 80
    targetPort: 80
EOF
done

Confirm the baseline.

kubectl get deployments -n "$BENCHMARK_NAMESPACE" \
  -l cost-comparison/model=always-on-kubernetes

kubectl get pods -n "$BENCHMARK_NAMESPACE" \
  -l cost-comparison/model=always-on-kubernetes

You'll see an output with 50 objects like in the example below:

NAME                               READY   STATUS    RESTARTS   AGE
k8s-counter-001-9f9f44464-h526d    1/1     Running   0          128m
k8s-counter-002-9678fb86f-bwpwm    1/1     Running   0          128m
k8s-counter-003-54bccfd7db-k5wgz   1/1     Running   0          128m
k8s-counter-004-5957785959-59k24   1/1     Running   0          128m
k8s-counter-005-5df4559dd4-pgm4v   1/1     Running   0          128m
k8s-counter-006-5597c6cd9-nj9mm    1/1     Running   0          128m
k8s-counter-007-8d5c5bb74-j2n6p    1/1     Running   0          128m
k8s-counter-008-ff9898c5f-hc7hw    1/1     Running   0          128m
k8s-counter-009-77bc4cf8bd-rk2sq   1/1     Running   0          128m
k8s-counter-010-578684f8c8-d7xg8   1/1     Running   0          128m
k8s-counter-011-8697447c4f-5jz9p   1/1     Running   0          128m
k8s-counter-012-5d7bc67f4d-hhhkh   1/1     Running   0          128m
xxxxx
xxxxx
xxxxx
xxxxx
xxxxx

Create Substrate Actors

In this configuration, you can create one Actor per logical counter Agent.

for i in $(seq 1 "$ACTOR_COUNT"); do
  actor=$(printf "%s-%03d" "$SUBSTRATE_PREFIX" "$i")
  kubectl ate create actor "$actor" --template "$TEMPLATE_REF" || true
done

When you run the above, you will see 50 Actors deployed via kubectl ate get actors

However, you will only see 7 Workers.

That's the optimization and efficiency right there. Same exact deployment/configuration as the Kubernetes section above, except with Agent Substrate, you can run the same workloads in 7 Pods (Workers) insteadof 50.

How Actors Work

You’ll see 50 Agent Substrate Actors and a smaller set of Workers (Pods). The Actors are logical workloads, but they are not actively running while they are STATUS_SUSPENDED. By default, Actors are in a "suspended" state until they are used, which is why Actors are so great from an efficiency perspective. When traffic arrives for an Actor, Agent Substrate assigns that actor to an available Worker, resumes it, serves the request, and can suspend it again afterward. This is the efficiency model: many idle actors can exist without each requiring its own always-on Kubernetes Pod.

Create a Benchmark Client

The benchmark client runs inside the cluster so both paths avoid local port-forward overhead.

benchmark-client = temporary in-cluster curl Pod.

Why it exists:

Sends requests to the Kubernetes baseline Services.
Sends requests to Substrate through atenet-router.- Avoids local kubectl port-forward latency.
Keeps both benchmark paths inside the cluster network for a fairer comparison.

kubectl delete pod benchmark-client \
  -n "$BENCHMARK_NAMESPACE" \
  --ignore-not-found

kubectl run benchmark-client \
  -n "$BENCHMARK_NAMESPACE" \
  --image=curlimages/curl:8.10.1 \
  --restart=Never \
  --command -- sleep 3600

kubectl wait --for=condition=Ready pod/benchmark-client \
  -n "$BENCHMARK_NAMESPACE" \
  --timeout=2m

Run The Benchmark For Kubernetes Always On

Each baseline agent receives two requests:

First measured request.
Second warm request.

Because these are always-on Pods, both requests should be served by already running Kubernetes workloads.

kubectl exec -n "$BENCHMARK_NAMESPACE" benchmark-client -- sh -c '
set -eu
actor_count="$1"
prefix="$2"
namespace="$3"

printf "agent\tfirst_seconds\twarm_seconds\n"

for i in $(seq 1 "$actor_count"); do
  name=$(printf "%s-%03d" "$prefix" "$i")
  url="http://${name}.${namespace}.svc.cluster.local"

  first_seconds=$(curl -sS -o /dev/null -w "%{time_total}" -X POST "$url")
  warm_seconds=$(curl -sS -o /dev/null -w "%{time_total}" -X POST "$url")

  printf "%s\t%s\t%s\n" "$name" "$first_seconds" "$warm_seconds"
done
' sh "$ACTOR_COUNT" "$BASELINE_PREFIX" "$BENCHMARK_NAMESPACE" > "$BASELINE_RESULTS_FILE"

Inspect the baseline results:

column -t -s $'\t' "$BASELINE_RESULTS_FILE"

You'll see an output similar to the below for 50 counters.

agent            first_seconds  warm_seconds
k8s-counter-001  0.023404       0.003911
k8s-counter-002  0.023275       0.005233
k8s-counter-003  0.015850       0.003773
k8s-counter-004  0.017657       0.005033
k8s-counter-005  0.014946       0.004443
k8s-counter-006  0.015616       0.004212
k8s-counter-007  0.016875       0.004261
k8s-counter-008  0.014731       0.004317
k8s-counter-009  0.017053       0.004707
k8s-counter-010  0.013013       0.003273
k8s-counter-011  0.014281       0.004552
k8s-counter-012  0.018644       0.003734
xxxxx
xxxxx

Run The Benchmark For Substrate

Each Substrate actor receives two requests:

Wake request, which resumes a suspended Actor and serves the request.
Warm request, which hits the already-running Actor.

After each actor is measured, the Actor is suspended so the worker can serve the next Actor.

printf "actor\twake_seconds\twarm_seconds\n" > "$SUBSTRATE_RESULTS_FILE"

for i in $(seq 1 "$ACTOR_COUNT"); do
  actor=$(printf "%s-%03d" "$SUBSTRATE_PREFIX" "$i")
  actor_host="${actor}.actors.resources.substrate.ate.dev"

  result=$(kubectl exec -n "$BENCHMARK_NAMESPACE" benchmark-client -- sh -c '
set -eu
router_url="$1"
actor_host="$2"

wake_seconds=$(curl -sS -o /dev/null -w "%{time_total}" \
  -X POST \
  -H "Host: ${actor_host}" \
  "$router_url")

warm_seconds=$(curl -sS -o /dev/null -w "%{time_total}" \
  -X POST \
  -H "Host: ${actor_host}" \
  "$router_url")

printf "%s\t%s" "$wake_seconds" "$warm_seconds"
' sh "$SUBSTRATE_ROUTER_URL" "$actor_host")

  printf "%s\t%s\n" "$actor" "$result" >> "$SUBSTRATE_RESULTS_FILE"
  kubectl ate suspend actor "$actor" >/dev/null
done

Inspect the Substrate results:

column -t -s $'\t' "$SUBSTRATE_RESULTS_FILE"

The Results

Now it's time to measure the results and see if Substrate really saves resources and helps optimize workloads running in k8s.

export BASELINE_RUNNING_PODS=$(kubectl get pods \
  -n "$BENCHMARK_NAMESPACE" \
  -l cost-comparison/model=always-on-kubernetes \
  --field-selector=status.phase=Running \
  --no-headers | wc -l | tr -d ' ')

export SUBSTRATE_WORKLOAD_PODS=$(kubectl get pods \
  -n ate-demo-counter \
  --field-selector=status.phase=Running \
  --no-headers | wc -l | tr -d ' ')

The results for k8s always on:

printf "baseline_cpu_request_per_pod=%s\n" "$BASELINE_CPU_REQUEST"
printf "baseline_memory_request_per_pod=%s\n" "$BASELINE_MEMORY_REQUEST"

baseline_cpu_request_per_pod=50m
baseline_memory_request_per_pod=64Mi

The results for Substrate:

baseline_cpu_request_per_pod=50m
baseline_memory_request_per_pod=64Mi
substrate_worker_cpu_request_per_pod=50m
substrate_worker_memory_request_per_pod=64Mi
baseline_total_cpu_request_millicores=2500
baseline_total_memory_request_mib=3200
substrate_total_cpu_request_millicores=250
substrate_total_memory_request_mib=320

The above shows the requested-capacity savings:

Kubernetes baseline: 50 Pods * 50m CPU / 64Mi = 2500m CPU / 3200Mi

Substrate: 5 Pods * 50m CPU / 64Mi = 250m CPU / 320Mi

Which results in a 90% reduction!

You can also capture actual CPU and memory usage if metrics-server is installed:

kubectl top pods -n "$BENCHMARK_NAMESPACE" \
  -l cost-comparison/model=always-on-kubernetes || true

kubectl top pods -n ate-demo-counter || true

Notice how many resources are saved with just a few Substrate deployments (on the bottom) vs k8s running workloads (50 pods, 50 agents).

The Kubernetes baseline is running one always-on Pod per logical workload. WithACTOR_COUNT=50, that means 50 k8s-counter-* Pods are running even when they are mostly idle. Each baseline Pod has explicit CPU and memory requests, so the always-on capacity grows linearly with the number of logical workloads.

The Substrate side is running the same logical workload count as actors, but only the worker pool stays hot. In this run, the counter WorkerPool has 5 counter-deployment-* Pods. Each worker Pod has explicit CPU and memory requests, so the requested always-on capacity grows with worker count, not Actor count.

The main difference is the always-on footprint:

Kubernetes baseline: 50 running workload PodsAgent Substrate:     5 running worker Pods for 50 logical actors

Your result shows the core optimization: 50 idle logical workloads do not require50 always-on workload Pods when they run as Substrate actors.

Agent Substrate: Building Actors and Workers

Michael Levan — Sat, 06 Jun 2026 19:19:49 +0000

Any time AI is spoken about, it's followed up with "and this is how much it's costing us". Cost management/optimization along with overall optimization of Agentic resources is top of mind for everyone in an organization, from the engineer implementing Agents to the CFO trying to figure out how to effectively spend tokens.

In this blog post, you will learn how to set up Agent Substrate Actors and Workers to ensure agentic efficiency.

Prerequisites

To follow along from a hands-on perspective, you will need the following:

A GKE or Kind cluster (due to the needs for Pod Certificates).
Substrate installed. You can follow the guide here to do so.
Clone the Substrate repo (demos and such that can be run from there).

If you don't have a cluster readily available, you can follow along from a theoretical perspective and configure it at a later time.

Tldr; What Is Substrate?

API/management plane + k8s Worker Nodes.

Substrate implements its own control plane as it's built with more efficiency and optimization for the era of AI Agents. However, k8s is still the best place to cluster resources and have them orchestrated. Substrate combines what it's best at with what Kubernetes is best at.

With Substrate, you have Workers (Kubernetes Pods) and Actors (Agents running inside said Pods).

More here: https://www.cloudnativedeepdive.com/agent-substrate-the-agentic-ai-isolation-layer-on-k8s/

Deploy The Substrate Demo

The counter demo is a small stateful Go HTTP server that increments an in-memory counter on every request. Deploying it creates the Namespace, Worker Pool and Actor Template.

WorkerPool == the pool of warm k8s Pods that actors get multiplexed onto.

ActorTemplate == the immutable Actor definition/schema. Substrate builds the snapshot from it so new Actors hydrate instantly.

Deploy the demo.

./hack/install-ate.sh --deploy-demo-counter

Ensure that the golden snapshot is ready.

kubectl wait --for=condition=Ready actortemplate/counter \
  -n ate-demo-counter --timeout=5m

Check to see that al resources are created.

kubectl get workerpool,actortemplate -n ate-demo-counter

Create An Actor

With the Actor Template and Worker Pool in place, you can now create an Actor from the snapshotted template.

Create an Actor.

kubectl ate create actor my-counter-1 --template ate-demo-counter/counter

You'll notice that the Actor starts in a SUSPENDED state as there hasn't been any traffic sent to it.

kubectl ate get actor my-counter-1

In the next section, you'll send traffic to the Actor to see it used.

Test The Actor

The first step is to port-forward the router. The route in Substrate routes to Actors by a specific DNS name: <actor-id>.actors.resources.substrate.ate.dev

Within the terminal, port-forward the router.

kubectl port-forward -n ate-system svc/atenet-router 8000:80

In another terminal, test the connectivity by sending a request to it.

curl -X POST -H "Host: my-counter-1.actors.resources.substrate.ate.dev" \
  http://localhost:8000

The first request triggers an on-demand resume. The Substrate Control Plane then claims a warm Worker (Pod), restores the snapshot into the Sandbox, and forwards the request. This is where a lot of the "magic happens" in Substrate. Instead of having Agents constantly running, they only run when they receive a request. This saves resources and money.

Congrats! You have officially tested and ensured that Actors (Agents) work within your Substrate cluster.

Agent Substrate: The Agentic AI Isolation Layer On K8s

Michael Levan — Sun, 31 May 2026 17:57:12 +0000

Isolation/sandboxing Agents give the ability to run agentic workflows in a safe, secure, and governed way. Without it, your Agents can access just about anything you can along with doing any type of web research and API calls.

With sandboxing solving this agentic issue, the next question is "where and how will sandboxes run?" and that's where Substrate comes into play.

In this blog, you'll learn about what Substrate is and how to deploy it in GKE.

Prerequistes

To follow along with this blog post from a hands-on perspective, you will need:

A GCP account
A GKE cluster

What Is Agent Substrate

There are two things that Kubernetes is incredibly good at out of the box:

Orchestration
Clustering worker nodes to ensure users have a pool of GPU, CPU, and memory

What can be built on top of k8s that isn't out of the box is higher levels of efficiency for hardware resource management, lower latency, and the implementation of Agentic workflows (e.g - running Agents and isolating Agents). However, the primitives of Kubernetes (Pods, autoscaling, clustering of Worker Nodes) is still very-much needed, so there needs to be a tool/platform for the Agentic era that builds on top of what we know as k8s today. Something that has its own Control Plane/management layer, but still uses what Kubernetes has to offer.

That's where Agent Susbtrate comes into play.

Underneath the hood, Substrate uses gvisor (same thing as the Agent Sandbox project from the CNCF SIG), which is a container sandbox developed by Google that focuses on security, isolation, and the ability to use it in an efficient fashion (e.g - not take up a ton of hardware resources).

Substrate Internals

There are four main parts to Substrate:

ate-api-server (control plane)
atenet-router (the Envoy/DNS router)
valkey (the state store)
pod-certificate-controller itself

And the "agent-like Actors" along with Workers.

You will also see atelet, which is a Per-node Agent (DaemonSet, runs on every worker node) and it manages Worker Pods, drives runsc checkpoint/restore, streams snapshots to/from the GCS bucket that you will be creating in an upcoming section.

System Components

And the four workloads mount podCertificate volumes for all said system components. The pod certs are so that these components (or rather, the Pods running the components) get auto-issued, auto-rotated TLS certs to do mTLS between each other.

💡

Per Google: Pod Certificates is a native Kubernetes feature that automatically issues short-lived X.509 TLS certificates directly to running Pods. Introduced as an alpha feature in Kubernetes v1.34 and advanced to Beta in v1.35, this capability allows workloads to authenticate to the kube-apiserver and establish mutual TLS (mTLS) with other workloads natively.

Pod Certificates are a hard requirement for Agent Substrate, as they're how Substrate gives each component an auto-rotated per-pod mTLS identity. The pod declares a podCertificate projected volume source, which triggers a PodCertificateRequest, and the signer fulfills it. The kubelet projects (and auto-rotates) the credential bundle into the pod, and that volume must be mounted for the pod to run.

To clarify two separate distinctions:

Pod certs == identity for Substrate's own infrastructure (the four pods above). This is what needs Pod Certificates.
Actor identity == the SessionIdentity gRPC service (MintJWT/MintCert), backed by the session-id JWT/CA pool secrets. Actor/worker/ateom podsdo not mount podCertificate volumes.

So the feature isn't about giving agents certs, it's about the platform securing itself.

Actors

Substrate runs Agent-like workloads called “actors”. It then maps the actors onto what Substrate calls "workers", which are k8s Pods. With workers, you get:

Functionality for managing the actors lifecycle (e.g. - create, destroy, suspend, resume actors)
The ability to assign actors to workers in real time
Route incoming traffic to actors.

Because of Substrate's efficiency in how Actors run, you can run a plethora of Actors on a Single Worker. Google tested this with 250 Stateful Actors across only 8 Pods (the Workers).

Interacting With Substrate

Because Substrate has its own management plane and resources, you can interact with it via its own command-line tool, ate.

e.g - kubectl ate (more to come on this in the configuration sections that are upcoming).

Environment Configuration Needs/Prereqs

There are a few things that you will need configured for your Google Kubernetes Engine (GKE) cluster, GCP environment, and CLI tools.

gcloud and all of the auth that goes with it to manage your GCP and GKE environment on the terminal.

export PROJECT_ID=<your-project-id>

gcloud auth login
gcloud auth application-default login --project="$PROJECT_ID"
gcloud auth configure-docker gcr.io

The required APIs for Substrate.

gcloud services enable \
  cloudresourcemanager.googleapis.com \
  container.googleapis.com \
  networkconnectivity.googleapis.com \
  serviceusage.googleapis.com \
  storage.googleapis.com \
  --project="$PROJECT_ID"

The Agent Substrate repo cloned down in your local environment. You can clone it from here.
Local tools on your terminal
1. Go (v1.26.3 or above)
2. kubectl
3. git
4. openssl for converging the Valkey CA cert (more on that later)

Why Use GKE or Kind?

ThepodCertificate projected volume source is code in the kubelet/apiserver, but it's behind feature gates that default to off as of k8s 1.36. To use it, you need to turn them on via the k8s API Server. Something like:

--feature-gates=PodCertificateRequest=true,ClusterTrustBundle=true,ClusterTrustBundleProjection=true
--runtime-config=certificates.k8s.io/v1beta1=true

The problem is that not all managed k8s services (for example, AKS) allow you to turn on this feature. GKE does as it provides a "knob" out of the box and unmanged/raw k8s clusters (Kind, Kubeadm, etc.) allow you to because you manage the configuration.

ko

kois a build tool for Go container images from Google. It builds an image straight from Go source without a Dockerfile and a Docker daemon. Images are built and pushed by ko to your KO_DOCKER_REPO. valkey (state store) can be deployed for you by an install scrip so you don't have to install them manually

Configure Your Environment

With the prereqs, environment configs, and explanations of Agent Substrate and its components, let's get hands-on and deploy the Substrate environment.

Within the substrate directory that you cloned, run the following:

cp hack/ate-dev-env.sh.example .ate-dev-env.sh

Edit .ate-dev-env.sh with your environment configs. Since you already have a GKE cluster per the Prerequisites section, you will only need the following in the file:

  # --- Project / identity ---
  export PROJECT_ID=my-substrate-proj
  export PROJECT_NUMBER=$(gcloud projects describe ${PROJECT_ID} --format="value(projectNumber)")

  # --- Your existing cluster ---
  export CLUSTER_NAME=substrate-poc
  export CLUSTER_LOCATION=us-central1-c
  # Set to your kubeconfig context so install-ate.sh skips `gcloud get-credentials`:
  export KUBECTL_CONTEXT=gke_my-substrate-proj_us-central1-c_substrate-poc

  # --- Snapshot bucket (GCE_REGION is the BUCKET's region, not the cluster's) ---
  export GCE_REGION=us-central1
  export BUCKET_NAME=snapshot-substrate-test-${PROJECT_ID}

  # --- Image registry for ko ---
  export KO_DOCKER_REPO="gcr.io/${PROJECT_ID}/ate-images"
  export KO_DEFAULTPLATFORMS=linux/amd64

Derive the two identities from step 2.

export ATELET_PRINCIPAL="principal://iam.googleapis.com/projects/${PROJECT_NUMBER}/locations/global/workloadIdentityPools/${PROJECT_ID}.svc.id.goog/subject/ns/ate-system/sa/atelet"
export NODE_SA="${PROJECT_NUMBER}-compute@developer.gserviceaccount.com"

Ensure the GKE cluster has the Pod Certificate beta APIs and Workload Identity enabled.

source .ate-dev-env.sh

gcloud container clusters update "$CLUSTER_NAME" \
  --location="$CLUSTER_LOCATION" --project="$PROJECT_ID" \
  --enable-kubernetes-unstable-apis=certificates.k8s.io/v1beta1/podcertificaterequests,certificates.k8s.io/v1beta1/clustertrustbundles

gcloud container clusters update "$CLUSTER_NAME" \
  --location="$CLUSTER_LOCATION" --project="$PROJECT_ID" \
  --workload-pool="${PROJECT_ID}.svc.id.goog"

Create a snapshot bucket for your Actors.

gcloud storage buckets create "gs://${BUCKET_NAME}" \
  --project="$PROJECT_ID" --location="$GCE_REGION" --uniform-bucket-level-access

Create IAM permissions for atelet for when it is interacting with the bucket.

gcloud storage buckets add-iam-policy-binding "gs://${BUCKET_NAME}" \
  --member="$ATELET_PRINCIPAL" --role=roles/storage.objectAdmin
gcloud storage buckets add-iam-policy-binding "gs://${BUCKET_NAME}" \
  --member="$ATELET_PRINCIPAL" --role=roles/storage.bucketViewer

Grant project-level IAM permissions for the GKE nodes and atelet.

gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="serviceAccount:${NODE_SA}" --role=roles/storage.objectViewer
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="serviceAccount:${NODE_SA}" --role=roles/artifactregistry.reader

gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="$ATELET_PRINCIPAL" --role=roles/storage.objectAdmin
gcloud projects add-iam-policy-binding "$PROJECT_ID" \
  --member="$ATELET_PRINCIPAL" --role=roles/artifactregistry.reader

New Node Pools

Mounting the Pod Certificate volume is a kubelet (node-level) capability, and a node's kubelet config is fixed when the node is created. Enabling the beta APIs on the control plane doesn't retroactively apply to nodes that already exist. Since this was an existing cluster, its nodes predate the enablement, so they have to be recreated to pick up the feature. The simplest way to get fresh nodes is a new node pool (a same-version upgrade won't recreate them because the nodes already match the control-plane version).

Create c3 type node pools.

  gcloud container node-pools create substrate-pool \
    --cluster="$CLUSTER_NAME" --location="$CLUSTER_LOCATION" --project="$PROJECT_ID" \
    --machine-type=c3-standard-4 --num-nodes=1 \
    --workload-metadata=GKE_METADATA

Wait for the node pools.

kubectl get nodes -l cloud.google.com/gke-nodepool=substrate-poo

Delete the old node pools.

  gcloud container node-pools delete default-pool \
    --cluster="$CLUSTER_NAME" --location="$CLUSTER_LOCATION" --project="$PROJECT_ID"

With the cluster environment configured and installed, let's install Agent Substrate.

Installing Substrate

Within the substrate directory, you will see install-ate.sh file under the hack directory, which builds the core images (via ko, pushed to KO_DOCKER_REPO) and deploys the Agent Substrate control plane/management plane and node components:

The CRDs
ate-api-server (control plane)
pod-certificate-controller (in-cluster mTLS signer that fulfills the PodCertificateRequests)
atelet (node DaemonSet)
atenet (DNS + Envoy router)
valkey (dynamic state store).
Run the following command:

./hack/install-ate.sh --deploy-ate-system

You'll see the installation in progress.

Wait for the system Pods to come up.

kubectl get pods -n ate-system --watch

After the Pods come up, Substrate is now installed.

Install The Substrate CLI

With the Substrate system up and running, you need a way to interact with it's control/management plane. To do that, you'll use the ate sub-command.

Install the command.

go install ./cmd/kubectl-ate

Add the binary to your path.

echo 'export PATH="$PATH:$(go env GOPATH)/bin"' >> ~/.zshrc
source ~/.zshrc

Test out the sub-command.

kubectl ate --help

You now have ate installed and are ready to interact with Agent Substrate.

Wrapping Up

As the Agentic AI era continues to change how we think about Agents, so will the systems that we run them on. The next phase of "the systems we run them on" is Sandboxes, which will continue to rise in popularity for many organizations, as it gives the ability to isolate Agents from an ingress and egress perspective, along with what actions they can take with the tools that are available to them. I see Sandboxes being especially important as autonomous Agents become more relevant as well.

Multi-Model Failover In Your AI Gateway

Michael Levan — Sat, 09 May 2026 13:44:20 +0000

Think about two scenarios that are pretty common. 1) You hit a rate limit or run out of tokens, so you have to "downgrade" to a small/less powerful Model. 2) An LLM provider is down or having intermittent issues.

In these two cases, what do you do if you only have one Model set up for your Gateway to route to?

In this blog post, you'll learn how to set up failover for your LLMs.

Prerequisites

To follow along with this blog post from a hands-on perspective, you will need the following:

A Kubernetes cluster (local is fine).
Agentgateway installed along with the Kubernetes Gateway API CRDs. If you don't have agentgateway installed, you can learn how to do so here.
API access to your LLM provider. The example in this blog uses Anthropic, but you can use OpenAI, Gemini, etc.

If you don't have the above, that's fine! You can still follow along from a theoretical perspective and implement it when you're able.

Gateway Setup

The first thing you will need to do is set up a Gateway, AgentgatewayBackend, and HTTPRoute. The AgentgatewayBackend is what tells your Gateway what to route to. As you'll see in the example below, you'll route to an Opus Model.

Set your Anthropic API key as an environment variable so it can be saved as a k8s secret.

export ANTHROPIC_API_KEY=

Create the k8s secret with your API key.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: anthropic-secret
  namespace: agentgateway-system
type: Opaque
stringData:
  Authorization: $ANTHROPIC_API_KEY
EOF

Create a Gateway object that allows traffic from all Namespaces and uses the agentgateway Gateway Class.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: agentgateway-openshell
  namespace: agentgateway-system
spec:
  gatewayClassName: agentgateway
  listeners:
    - name: http
      port: 8080
      protocol: HTTP
      allowedRoutes:
        namespaces:
          from: Same
EOF

Create the AgentgatewayBackend that ensures your Gateway routes to the right Model.

kubectl apply -f - <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: anthropic
  namespace: agentgateway-system
spec:
  ai:
    provider:
        anthropic:
          model: "claude-opus-4-6"
  policies:
    auth:
      secretRef:
        name: anthropic-secret
EOF

Create the HTTPRoute so that your traffic is routed to the appropriate endpoint.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: openshell-openai
  namespace: agentgateway-system
spec:
  parentRefs:
    - name: agentgateway-openshell
      namespace: agentgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1
    backendRefs:
    - name: anthropic
      namespace: agentgateway-system
      group: agentgateway.dev
      kind: AgentgatewayBackend
EOF

Test your Gateway.

export GATEWAY_ADDRESS=$(kubectl get svc -n agentgateway-system agentgateway-openshell -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
echo $GATEWAY_ADDRESS

curl "http://$GATEWAY_ADDRESS:8080/v1/chat/completions" -H content-type:application/json -d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a skilled cloud-native network engineer."
    },
    {
      "role": "user",
      "content": "Write me a paragraph containing the best way to think about Istio Ambient Mesh"
    }
  ]
}' | jq

You should see an output similar to the screenshot below.

With the Gateway configured, let's test Model failover.

Failover Configuration

Now that the Gateway is deployed and the AgentgatewayBackend points to an Opus Model, let's see what happens when a failover occurs. Before that, however, you need to update the AgentgatewayBackend to utilize multiple Models.

Apply the AgentgatewayBackend below, which just updates what you already have to contain multiple Models.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: anthropic
  namespace: agentgateway-system
spec:
  ai:
    groups:
      - providers:
          - name: anthropic-opus-46
            anthropic:
              model: claude-opus-4-6
            policies:
              auth:
                secretRef:
                  name: anthropic-secret
      - providers:
          - name: anthropic-sonnet-46
            anthropic:
              model: claude-sonnet-4-6
            policies:
              auth:
                secretRef:
                  name: anthropic-secret
EOF

Test the curl again to ensure that you can still route to a Model.

curl "http://$GATEWAY_ADDRESS:8080/v1/chat/completions" -H content-type:application/json -d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a skilled cloud-native network engineer."
    },
    {
      "role": "user",
      "content": "Write me a paragraph containing the best way to think about Istio Ambient Mesh"
    }
  ]
}' | jq

Notice in the screenshot below that it's automatically routing to Opus 4.6. The reason why is that it's the first Model specified in your provider blocks.

What we want to do now that the curl still works is test a failover.

Apply the AgentgatewayBackend again, except this time, specify a "fake" Model.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: anthropic
  namespace: agentgateway-system
spec:
  ai:
    groups:
      - providers:
          - name: anthropic-opus-46
            anthropic:
              model: claude-opus-4-6-FAKE
            policies:
              auth:
                secretRef:
                  name: anthropic-secret
      - providers:
          - name: anthropic-sonnet-46
            anthropic:
              model: claude-sonnet-4-6
            policies:
              auth:
                secretRef:
                  name: anthropic-secret
EOF

Create an AgentgatewayPolicy that uses your HTTPRoute as a target reference and filters based on codes.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: failover-health
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: agentgateway.dev
    kind: AgentgatewayBackend
    name: anthropic
  backend:
    health:
      unhealthyCondition: "response.code == 404 || response.code == 429"
      eviction:
        duration: 10s
        consecutiveFailures: 1
EOF

Run the curl again.

curl "http://$GATEWAY_ADDRESS:8080/v1/chat/completions" -H content-type:application/json -d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a skilled cloud-native network engineer."
    },
    {
      "role": "user",
      "content": "Write me a paragraph containing the best way to think about Istio Ambient Mesh"
    }
  ]
}' | jq

You'll now see that the Model used is Sonnet.

404 is the code for the HTTP status code (in this case, if the Model can't be reached). You'll also see 429 in the policy as well. That's the code for rate limits.

Wrapping Up

Rate limits, Models failing, endpoints not reachable, and Models being deprecated are all real things that occur in production. Ensuring you have a Model failover set up means you can properly manage your Agentic uptime.

Configuring Tool Traces In Your MCP Gateway

Michael Levan — Sun, 26 Apr 2026 11:30:31 +0000

An Agent makes a call to an LLM. The LLM decides which MCP server tool should be used for a task. The Agent then makes a call to said tool. This can happen once, or it can happen hundreds of times.

Here's the question: Do you know what MCP Server tools were used, when they were used, and where the prompt originated from? In other words, how can you actually track and confirm tool traces within your MCP Gateway?

That's where having an MCP Gateway that exposes these traces and metrics comes into play.

In this blog, you'll learn how to do full end-to-end trace observability for any MCP Server and tool.

Prerequisites

To follow along with this blog post in a hands-on fashion, you will need:

A k8s cluster (local is fine).
Agentgateway OSS installed, which you can find here.
A GitHub account because you will need a PAT (personal access token) to use the GitHub Copilot MCP Server.

If you don't have a k8s cluster, there's a large chunk of this blog post that's pretty visual, so you can follow along from a theoretical standpoint.

How Agentgateway Exposes MCP Traces

Agentgateway exposes the agentgateway_mcp_requests_total metric which includes:

The method used
Resource
MCP Server
MCP session ID
Tool name
Listener
Route
Routing rules

I can view the metrics within agentgateway after I make an MCP Server tool call by port-forwarding the Gateway Pod and using a curl over port 15020, which is the agentgateway pod internal metrics/stats listener.

kubectl port-forward -n agentgateway-system pod/mcp-gateway-7f9f6679cd-d5jmg 15020:15020

curl -s http://127.0.0.1:15020/metrics | grep agentgateway_mcp_requests

And then I can see the following metric output:

agentgateway_mcp_requests_total{method="tools/call",resource_type="tool",server="github-copilot",resource="get_me",bind="3000/agentgateway-system/mcp-gateway",gateway="agentgateway-system/mcp-gateway",listener="mcp",route="agentgateway-system/mcp-route",route_rule="unknown"} 1

However, if you want to collect more distinct information within a tracing tool using an OTel collector, you can use CEL expressions to specify what you want exported:

  - name: mcp.tool_name
    expression: 'default(mcp.tool.name, "")'
  - name: mcp.tool_target
    expression: 'default(mcp.tool.target, "")'
  - name: mcp.method_name
    expression: 'default(mcp.methodName, "")'

Agentgateway emits base traces, but with the above, you can enrich the trace output with MCP-specific details.

In the next section, you will set up an agentgateway configuration with MCP so the traces can be viewed within an observability tool.

Gateway and MCP Setup

With some theory on the "how" and the "why" done, let's get hands-on and see how to set your gateway and MCP Server up.

Create a new Gateway using the agentgateway Gateway Class.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: mcp-gateway
  namespace: agentgateway-system
  labels:
    app: github-mcp-server
spec:
  gatewayClassName: agentgateway
  listeners:
    - name: mcp
      port: 3000
      protocol: HTTP
      allowedRoutes:
        namespaces:
          from: Same
EOF

Create a secret to authenticate with the GitHub Copilot MCP Server.

export GITHUB_PAT=

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: github-pat
  namespace: agentgateway-system
type: Opaque
stringData:
  Authorization: "Bearer ${GITHUB_PAT}"
EOF

Create an agentgatewaybackend, which tells the Gateway to route to the GitHub Copilot MCP Server.

kubectl apply -f - <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: github-mcp-server
  namespace: agentgateway-system
spec:
  mcp:
    targets:
      - name: github-copilot
        static:
          host: api.githubcopilot.com
          port: 443
          path: /mcp/
          protocol: StreamableHTTP
          policies:
            tls: {}
            auth:
              secretRef:
                name: github-pat
EOF

Create an HTTPRoute with the agentgatewaybackend as the reference/target.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: mcp-route
  namespace: agentgateway-system
  labels:
    app: github-mcp-server
spec:
  parentRefs:
    - name: mcp-gateway
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /mcp
      backendRefs:
        - name: github-mcp-server
          namespace: agentgateway-system
          group: agentgateway.dev
          kind: AgentgatewayBackend
EOF

With the Gateway configured, let's do a test to ensure that the MCP Server can be connected to.

Quick Test

Retrieve your gateways ALB IP address. If you're running a k8s cluster locally, you may not have this, so you can instead us localhost wherever $GATEWAY_IP is used.

export GATEWAY_IP=$(kubectl get svc mcp-gateway -n agentgateway-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $GATEWAY_IP

Open MCP Inspector, which is a popular MCP client.

npx modelcontextprotocol/inspector#0.18.0

You should be able to see the list of tools within the GitHub Copilot MCP Server.

Now that we know the MCP Server works as expected, let's set up the observability configuration.

Observability Setup

With the Gateway and MCP Server configured and connection tested, it's time to set up the tracing mechanism along with the OTel collector and the ability to view the traces visually. This section will cover how to set up tempo, an OTel tracing collector, and kube-prometheus.

Install Tempo.

helm upgrade --install tempo tempo \
  --repo https://grafana.github.io/helm-charts \
  --version 1.16.0 \
  --namespace telemetry \
  --create-namespace \
  --values - <<EOF
persistence:
  enabled: false
tempo:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
EOF

Install the OTel traces collector.

helm upgrade --install opentelemetry-collector-traces opentelemetry-collector \
  --repo https://open-telemetry.github.io/opentelemetry-helm-charts \
  --version 0.127.2 \
  --set mode=deployment \
  --set image.repository="otel/opentelemetry-collector-contrib" \
  --set command.name="otelcol-contrib" \
  --namespace telemetry \
  --create-namespace \
  -f - <<EOF
config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
  exporters:
    otlp/tempo:
      endpoint: http://tempo.telemetry.svc.cluster.local:4317
      tls:
        insecure: true
    debug:
      verbosity: detailed
  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [batch]
        exporters: [debug, otlp/tempo]
EOF

Install Grafana with Tempo as the data source.

helm upgrade --install kube-prometheus-stack kube-prometheus-stack \
  --repo https://prometheus-community.github.io/helm-charts \
  --namespace telemetry \
  --create-namespace \
  --values - <<EOF
alertmanager:
  enabled: false
prometheus:
  prometheusSpec:
    enableRemoteWriteReceiver: true
grafana:
  enabled: true
  datasources:
    datasources.yaml:
      apiVersion: 1
      datasources:
      - name: Prometheus
        type: prometheus
        uid: prometheus
        access: proxy
        url: http://kube-prometheus-stack-prometheus.telemetry:9090
      - name: Tempo
        type: tempo
        uid: tempo
        access: proxy
        url: http://tempo.telemetry.svc.cluster.local:3100
EOF

Because agentgateway and the OTel collector are in different namespaces, the Kubernetes Gateway API requires a reference grant.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: ReferenceGrant
metadata:
  name: allow-otel-collector-traces-access
  namespace: telemetry
spec:
  from:
  - group: agentgateway.dev
    kind: AgentgatewayPolicy
    namespace: agentgateway-system
  to:
  - group: ""
    kind: Service
    name: opentelemetry-collector-traces
EOF

Enable traces on the MCP Gateway and use CEL to add the attributes you want within the trace so you can get a visual representation of the MCP Server tool call.

kubectl apply -f - <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: mcp-tracing
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: Gateway
    name: mcp-gateway
  frontend:
    tracing:
      backendRef:
        name: opentelemetry-collector-traces
        namespace: telemetry
        port: 4317
      protocol: GRPC
      clientSampling: "true"
      randomSampling: "true"
      resources:
      - name: service.name
        expression: '"agentgateway-mcp"'
      - name: deployment.environment.name
        expression: '"development"'
      attributes:
        add:
        - name: mcp.method_name
          expression: 'default(mcp.methodName, "")'
        - name: mcp.session_id
          expression: 'default(mcp.sessionId, "")'
        - name: mcp.tool_name
          expression: 'default(mcp.tool.name, "")'
        - name: mcp.tool_target
          expression: 'default(mcp.tool.target, "")'
        - name: backend.name
          expression: 'default(backend.name, "")'
        - name: backend.type
          expression: 'default(backend.type, "")'
    accessLog:
      attributes:
        add:
        - name: mcp.tool_name
          expression: 'default(mcp.tool.name, "")'
        - name: mcp.tool_target
          expression: 'default(mcp.tool.target, "")'
        - name: mcp.method_name
          expression: 'default(mcp.methodName, "")'
EOF

You now have everything you need to capture MCP Server and tool traces.

Run A Tool Call

In the previous section, you installed and configured tracing to work for your MCP Gateway. Now, it's time to put it to the test and look at some traces along with seeing them visually.

Run the following command. This pulls the last 100 logs from your OTel tracing collector.

kubectl logs -n telemetry -l app.kubernetes.io/instance=opentelemetry-collector-traces --tail=100

Once you run the above, you will see the following:

InstrumentationScope agentgateway
Span #0
    Trace ID       : 2aba28785b334fd105f4c2b4dee2b6f5
    Parent ID      :
    ID             : e93f58c47c723d77
    Name           : POST /mcp/*
    Kind           : Server
    Start time     : 2026-04-25 23:44:23.281302662 +0000 UTC
    End time       : 2026-04-25 23:44:23.4975073 +0000 UTC
    Status code    : Unset
    Status message :
Attributes:
     -> gateway: Str(agentgateway-system/mcp-gateway)
     -> listener: Str(mcp)
     -> route: Str(agentgateway-system/mcp-route)
     -> src.addr: Str(10.224.0.62:36759)
     -> http.method: Str(POST)
     -> http.host: Str(52.225.32.209)
     -> http.path: Str(/mcp)
     -> http.version: Str(HTTP/1.1)
     -> http.status: Int(200)
     -> trace.id: Str(2aba28785b334fd105f4c2b4dee2b6f5)
     -> span.id: Str(e93f58c47c723d77)
     -> protocol: Str(mcp)
     -> mcp.method.name: Str(tools/call)
     -> mcp.target: Str(github-copilot)
     -> mcp.resource.type: Str(tool)
     -> mcp.session.id: Str(vzMGQa/WpW6P/rAJ57/rpM3Wl+so/kCXVV8oWkdPBc6hwIgTLGjd5eOgZ9XC1JCRtjBsQwMZnTAUdPR+YFvYdYadKTJVPrH85YllyUdxGXvDDAmX+HkjmuvfPNNu9wcyEWc6DlyoHFJPANq2/vct)
     -> duration: Str(216ms)
     -> url.scheme: Str(http)
     -> network.protocol.version: Str(1.1)
     -> mcp.method_name: Str(tools/call)
     -> mcp.session_id: Str(vzMGQa/WpW6P/rAJ57/rpM3Wl+so/kCXVV8oWkdPBc6hwIgTLGjd5eOgZ9XC1JCRtjBsQwMZnTAUdPR+YFvYdYadKTJVPrH85YllyUdxGXvDDAmX+HkjmuvfPNNu9wcyEWc6DlyoHFJPANq2/vct)
     -> mcp.tool_name: Str(get_me)
     -> mcp.tool_target: Str(github-copilot)
     -> backend.name: Str(agentgateway-system/github-mcp-server)
     -> backend.type: Str(mcp)

Notice how everything and anything you can think of in terms of "how do I observe an MCP Server tool call" is with the log.

You can also view them from a dashboard.

To view the traces in Grafana, do the following:

Port-forward the Grafana service: kubectl --namespace telemetry port-forward svc/kube-prometheus-stack-grafana 3000
Open http://localhost:3000
Log in with username admin and the password is under kubectl get secret kube-prometheus-stack-grafana -n telemetry -o jsonpath='{.data.admin-password}' | base64 --decode

Once logged in, if you go to the traces view under Explore -> Tempo, you can see your MCP Server tool calls.

And if you dive into the span, you can see everything from the tool name to the session ID. This is a collection of everything you'd need to know to follow the trace of a tool call from your Agent.

Wrapping Up

Observability and tracking MCP Server tool calls isn't about putting red tape in front of MCP. Yes, you need to secure MCP Servers and the tools that an Agent is being called. There is definitely no debating that. What you also need, however, is the ability to ensure that an Agent actually called the right tool and collect/log the information of an Agent call for auditing and tracking purposes. MCP Servers are "black boxes" and so is just about everything else in AI, but if you set up proper tracing, you can have an understanding of what's going on within the system in production.

Managing MCP Servers and Tools With Agentregistry OSS

Michael Levan — Sat, 04 Apr 2026 22:38:18 +0000

Three big topics when it comes to MCP:

How do you know the MCP Server is secure?
Where is it stored?
Is it version-controlled, or can anyone just change it at any time?

And that's where having an MCP registry comes into play.

In this blog post, you'll learn how to securely store your MCP Server, and it's available tools to be used later within your Agents.

Quick Recap: What Is MCP?

Model Context Protocol (MCP) is a spec/standard created by and open-sourced Anthropic. The goal of MCP is to have a server that hosts tools, and these tools are able to implement certain functionality for what you're working on. For example, you can use a Kubernetes MCP Server that can do everything from list/describe/log Pods and deploy objects to Kubernetes. MCP uses JSON-RPC 2.0 for it's communication layer underneath the hood for communication between an Agent (the client) and MCP tools (hosted on the server).

The "Is MCP Dead" Debate

I was at MCPDevSummit in NY this week, and I caught a keynote that explained the need for MCP Server tools pretty nicely from a theoretical perspective. Right now, it may be easier for Agents to talk to MCP Server tools vs having them talk tens or hundreds of APIs directly. The reason why is that it's simpler for an Agent to call a tool and have that tool (because a tool, underneath the hood, is simply a function/method) call the APIs instead. What this could come down to is less tokens used and less context bloat, along with hopefully, better results.

Configuring Agentregistry Locally

With an understanding of what MCP is at a high level, let's dive into the hands-on portion of this blog post. In this section, you'll get agentregistry deployed, which takes around 30 seconds.

Pull down the latest version of agentregistry.

curl -fsSL https://raw.githubusercontent.com/agentregistry-dev/agentregistry/main/scripts/get-arctl | bash

Run the following command, which starts the agentregistry daemon.

 arctl daemon start

You'll see an output similar to the following:

Starting agentregistry daemon...
✓ agentregistry daemon started successfully

Open Docker and you'll see agentregistry running along with a link you can click to reach the UI.

You should now see the agentregistry UI.

Sidenote: if you have a remote registry, you can connect to it with the following:

arctl --registry-url http://YOUR-HOST:12121 version

Adding An MCP Server To Agentregistry

With agentregistry deployed, you can now add an MCP Server to the registry to ensure it's stored and secured. For testing purposes, lets use the filesystem MCP Server that's stored on GitHub.

Using arctl mcp publish, you'll pass in the following flags.
1. MCP Server: server-filesystem
2. Type: NPM
3. Version: 0.6.3

arctl mcp publish io.github.modelcontextprotocol/server-filesystem --type npm --package-id
  @modelcontextprotocol/server-filesystem --version 0.6.3 --description 'MCP server for filesystem access' --git
  https://github.com/modelcontextprotocol/servers.git -v

The MCP Server will now show in your registry.

You can also add your MCP Server via the UI.

Click the purple + Add button and choose Server.

Add in the details about your MCP Server.

Conclusion

Having a safe, secure, and reliable place to store something as prone to security incidents as MCP Servers is key to creating a proper posture for you and your organization when using AI. This is why agentregistry can also be used to store Agent Skills and prompts. Because the majority of what you're using is either a function/method (an MCP Server tool) or .MD files/text files, shadow AI can easily occur.

Running OpenClaw on Kubernetes

Michael Levan — Sat, 14 Mar 2026 17:00:08 +0000

The "new and exciting" way of interacting with Agents is the personal assistant method (from iMessage, WhatsApp, or whatever else) and this interest is taking the industry by storm. OpenAI "bought" OpenClaw, Nvidia is investing in its own version of personal assistants, and several other organizations are trying to figure out how to implement this in production.

The question is - does it run on your infra?

In this blog post, you'll learn how to implement OpenClaw in Kubernetes and observe/secure it with agentgateway.

Prerequisites

To follow along with this blog post from a hands-on perspective, you will need the following:

A Kubernetes cluster running with at least 2 vCPUs and 2–4 GB RAM, though 8 GB RAM and higher are recommended for smoother performance.
agentgateway installed, which you can find here.

Containerization Setup

There are two different ways that you can use OpenClaw in a containerized fashion:

Build your own container image. There's a Dockerfile in the OpenClaw repo which you can find here.
Use a container image that was already built. There is an official Alpine image which you can find here.

The first option will, of course, be the most secure, as you can build the container image yourself and ensure you know what is within the Dockerfile. In air-gapped environments, this would be the ideal setup.

Agentgateway Config To Observe and Secure Agentic Traffic

Once the containerization setup is complete, you can begin the agentgateway setup.

Create a Gateway using the Kubernetes Gateway API CRDs and the agentgateway Gateway class.

kubectl apply -f- <<EOF
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: agentgateway-oc
  namespace: agentgateway-system
  labels:
    app: agentgateway
spec:
  gatewayClassName: agentgateway
  listeners:
  - protocol: HTTP
    port: 8081
    name: http
    allowedRoutes:
      namespaces:
        from: All
EOF

Set an env variable with an Anthropic API key

export ANTHROPIC_API_KEY=

Create a Kubernetes Secret with the API key.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: anthropic-secret
  namespace: agentgateway-system
  labels:
    app: agentgateway-oc
type: Opaque
stringData:
  Authorization: $ANTHROPIC_API_KEY
EOF

Create an agentgatewaybackend, which tells the Gateway what to route to. In this case, it's using Anthropic as the LLM Provider.

Please note: When using an ai agentgatewaybackend with the Anthropic provider, agentgateway attempts to parse and re-marshal the request body as a structured LLM message, which fails on OpenClaw's native Anthropic format due to a missing type field in complex message content. Switching to a static backend pointing directly at api.anthropic.com:443 tells agentgateway to forward the request as-is without any LLM-specific processing, while still providing routing, observability, and logging on all traffic. The tls: {} policy is required because api.anthropic.com listens on HTTPS (port 443), and without it, agentgateway sends plain HTTP, which Cloudflare rejects.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  labels:
    app: agentgateway-oc
  name: anthropic
  namespace: agentgateway-system
spec:
  static:
    host: api.anthropic.com
    port: 443
  policies:
    tls: {}
EOF

Create a route that points to the path /v1/messages, which is the format that Anthropic expects.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: ocroute
  namespace: agentgateway-system
spec:
  parentRefs:
    - name: agentgateway-oc
      namespace: agentgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /v1/messages
    backendRefs:
    - name: openclaw
      namespace: agentgateway-system
      group: agentgateway.dev
      kind: AgentgatewayBackend
EOF

Deploy OpenClaw On Kubernetes

With the Gateway deployed, let's set up OpenClaw.

Please note: This deployment is just for testing and does not include anything for persistent volumes for data that is not ephemeral. If you want that configuration, you can create a PVC and mount it on /home/node/.openclaw and /home/node/workspace.

Create a ConfigMap to map the configuration that's needed for OpenClaw to route traffic through agentgateway. Please remember you will need to replace the baseUrl with the hostname or IP of your Gateway.

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: openclaw-agw-config
  labels:
    app: openclaw
data:
  agw-overlay.json: |
    {
      "gateway": {
        "bind": "lan"
      },
      "models": {
        "mode": "merge",
        "providers": {
          "anthropic": {
            "baseUrl": "http://YOUR_AGENTGATEWAY_HOSTNAME_OR_IP:8081",
            "models": []
          }
        }
      }
    }
EOF

Create a Kubernetes Deployment that points to the OpenClaw Alpine container image.

Please note: When deploying OpenClaw in Kubernetes with agentgateway, the openclaw.json config file needs to include the agentgateway baseUrl to route LLM traffic through the gateway. However, OpenClaw auto-generates its base config (including auth tokens and default settings) at startup, and any config modification, from the initial overlay or from running openclaw onboard triggers OpenClaw's built-in hot-reload, which performs a full process restart that kills PID 1 and causes the container to crash. The solution uses a wrapper shell script that pre-creates openclaw.json with the agentgateway overlay before OpenClaw starts

(so initial startup merges cleanly), and runs OpenClaw inside a while true loop so the shell remains PID 1 and automatically restarts OpenClaw whenever a config change triggers its internal restart, preventing the container from exiting.

Please note: The models: [] parameter is required by the schema, but it also causes the a ANTHROPIC_MODEL_ALIASES error. This is a known bug in v2026.3.12. The ANTHROPIC_MODEL_ALIASES error is a temporal dead zone issue that affects any config using an Anthropic primary model. The workaround is to use v2026.3.11 instead. That's why you see that image pinned in the deployment below.

kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openclaw
  labels:
    app: openclaw
spec:
  replicas: 1
  selector:
    matchLabels:
      app: openclaw
  template:
    metadata:
      labels:
        app: openclaw
    spec:
      containers:
        - name: openclaw
          image: alpine/openclaw:2026.3.11
          ports:
            - name: gateway
              containerPort: 18789
              protocol: TCP
            - name: bridge
              containerPort: 18790
              protocol: TCP
          command:
            - sh
            - -c
            - |
              mkdir -p /home/node/.openclaw
              cp /tmp/agw-overlay.json /home/node/.openclaw/openclaw.json
              trap 'kill $(jobs -p) 2>/dev/null' EXIT
              while true; do
                docker-entrypoint.sh node openclaw.mjs gateway --allow-unconfigured
                echo "OpenClaw process exited, restarting..."
                sleep 2
              done
          volumeMounts:
            - name: agw-config
              mountPath: /tmp/agw-overlay.json
              subPath: agw-overlay.json
          resources:
            requests:
              cpu: "4"
              memory: "8Gi"
            limits:
              cpu: "4"
              memory: "8Gi"
      volumes:
        - name: agw-config
          configMap:
            name: openclaw-agw-config
EOF

Create a Service for OpenClaw. This Service will be used in the next section when implementing agentgateway for secure and observable OpenClaw traffic.

kubectl apply -f -<<EOF
apiVersion: v1
kind: Service
metadata:
  name: openclaw
  labels:
    app: openclaw
spec:
  type: ClusterIP
  selector:
    app: openclaw
  ports:
    - name: gateway
      port: 18789
      targetPort: gateway
      protocol: TCP
    - name: bridge
      port: 18790
      targetPort: bridge
      protocol: TCP
EOF

You'll now see that OpenClaw is running.

openclaw-bf55866b7-s7wn6   1/1     Running       0          4m36s

Onboard OpenClaw

For OpenClaw to work, you need to set configurations like how you want to interact with OpenClaw (iMessage, Telegram, etc.) and the LLM Provider you want to use. To do that, you need to run the openclaw onboard command. Because this is running in Kubernetes, you can exec into the Pod.

kubectl exec -it YOUR_OPENCLAW_POD_NAME -n default -- openclaw onboard

You'll see an output similar to the below and you can get started with the onboarding process.

After the onboarding, you can test and ensure that OpenClaw is passing traffic through agentgateway.

kubectl exec OPENCLAW_POD_NAME -n default -- openclaw agent --message "Say hi"

And you'll see traffic routing through agentgateway similar to the below:

2026-03-14T15:41:26.634010Z     info    request gateway=agentgateway-system/agentgateway-oc listener=http route=agentgateway-system/ocroute endpoint=api.anthropic.com:443 src.addr=10.224.0.149:62282 http.method=POST http.host=52.241.254.163 http.path=/v1/messages http.version=HTTP/1.1 http.status=200 protocol=http duration=2936ms

Route and Secure OpenAI Azure Foundry Traffic Through Your AI Gateway

Michael Levan — Tue, 10 Mar 2026 15:21:50 +0000

As you begin to expland into various Agentic frameworks, there's a good chance you will end up choosing the one that exists within the cloud provider you're already using. If you're in Azure, that's Azure Foundry.

The question then becomes "How do I securely route and observe the traffic?".

In this blog post, you'll learn how to route Foundry traffic through a secure, reliable, and performant AI Gateway with agentgateway.

Prerequisites

To follow along with this blog post in a hands-on fashion, you'll need the following:

An Azure account.
Agentgateway installed (OSS), which you can find here.

What Is Microsoft Foundry

Foundry is the Agentic framework within Azure. If you use AWS and have heard of Bedrock before or GCP and have heard of Vertex AI, it's all very similar. They allow you to host Models from various providers (OpenAI, Anthropic, etc.) and connect to those Models from a centralized endpoint with the same API key/token (so you don't have to worry about various keys per provider). Some of the services, like Foundry, also allow you to connect to tools and fine-tune the Models you're working with.

The "tldr" is that it's an Agentic hosting platform to connect to various LLMs.

Azure Foundry Setup

With the knowledge around what Foundry is in place, let's dive into the setup. You'll start with setting up Foundry.

Within the Azure porta, search for foundry.

In the Foundry portal, click the blue + Create button.

Create the Foundry resource within your respective subscription and resource group.

Once Foudnry is created, you'll see a UI similar to the belo. Save the project API key. You'll need it for the next section when you create the Gateway configuration.

Within Foundry, search for gpt-5-mini. Realistically, you can use any Model, but the mini Models will save you some money.

Deploy the Model with the default settings.

With the Model deployed, you will now be able to reach it with agentgateway.

Gateway Configuration

Create an environment variable with the Foundry API key that you saved in the previous section in step 4.

export AZURE_FOUNDRY_API_KEY=

Create a Gateway object listening on port 8081.

kubectl apply -f- <<EOF
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: agentgateway-azureopenai-route
  namespace: agentgateway-system
  labels:
    app: agentgateway-azureopenai-route
spec:
  gatewayClassName: agentgateway
  listeners:
  - protocol: HTTP
    port: 8081
    name: http
    allowedRoutes:
      namespaces:
        from: All
EOF

Save the ALB IP of the Gateway in an environment variable. If you're not using a k8s cluster that can create a public ALB IP, you can use localhost when connecting to the Gateway as long as you port-forward the k8s Gateway svc.

export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system agentgateway-azureopenai-route -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
echo $INGRESS_GW_ADDRESS

Create a k8s secret that stores the Foundry API key.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: azureopenai-secret
  namespace: agentgateway-system
  labels:
    app: agentgateway-azureopenai-route
type: Opaque
stringData:
  Authorization: $AZURE_FOUNDRY_API_KEY
EOF

The agentgateway backend will tell the Gateway what to route to. In this case, it's the gpt-5-mini Model. You'll also point to the Foundry endpoint.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  labels:
    app: agentgateway-azureopenai-route
  name: azureopenai
  namespace: agentgateway-system
spec:
  ai:
    provider:
      azureopenai:
        endpoint: mlevantesting-resource.services.ai.azure.com
        deploymentName: gpt-5-mini
        apiVersion: 2025-01-01-preview
  policies:
    auth:
      secretRef:
        name: azureopenai-secret
EOF

The last step is to create a route. Because you're using a GPT Model, the path will be /v1/chat/completions, but you can set a custom route to shorten the path.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: azureopenai
  namespace: agentgateway-system
  labels:
    app: agentgateway-azureopenai-route
spec:
  parentRefs:
    - name: agentgateway-azureopenai-route
      namespace: agentgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /azureopenai
    filters:
    - type: URLRewrite
      urlRewrite:
        path:
          type: ReplaceFullPath
          replaceFullPath: /v1/chat/completions
    backendRefs:
    - name: azureopenai
      namespace: agentgateway-system
      group: agentgateway.dev
      kind: AgentgatewayBackend
EOF

Test the route to the OpenAI Model via agentgateway. Swap out $INGRESS_GW_ADDRESS with localhost if your Gateway doesn't have a public ALB IP.

curl "$INGRESS_GW_ADDRESS:8081/azureopenai" -v -H content-type:application/json -d '{
  "messages": [
    {
      "role": "system",
      "content": "You are a skilled cloud-native network engineer."
    },
    {
      "role": "user",
      "content": "Write me a paragraph containing the best way to think about Istio Ambient Mesh"
    }
  ]
}' | jq

You should see an output similar to the below.

Intercept, Inspect, Secure: Proxying Claude Code CLI Traffic

Michael Levan — Fri, 20 Feb 2026 12:39:27 +0000

Architecture diagrams always look something like this:

Agent -> Gateway -> LLM (or MCP Server).

The Agents that organizations are typically referring to are Agents that perform an action via prompts by a user or autonomously, and those Agents are usually running in a system somewhere in production. That is, however, not where the majority of Agentic traffic originates. The larger chunk of traffic comes from Agentic clients (Claude Code CLI, Cursor, Copilot, etc.) and because of that, we now must think about Agents not only running in production systems, but on someone's laptop.

In this blog post, you'll learn how to secure that traffic within an Agentic client with agentgateway.

Gateway Configuration

The first thing to ensure is that you have a proper AI Gateway configured so traffic from the Agentic CLI client can get from point A to point B securely. In this case, you can use agentgateway, which is an AI Gateway built from the ground up specifically for AI traffic.

Generate an API key and put it into an environment variable so a k8s Secret can be created with it later.

export ANTHROPIC_API_KEY=

Create the Gateway object.

kubectl apply -f- <<EOF
kind: Gateway
apiVersion: gateway.networking.k8s.io/v1
metadata:
  name: agentgateway-route
  namespace: agentgateway-system
  labels:
    app: agentgateway
spec:
  gatewayClassName: enterprise-agentgateway
  infrastructure:
    parametersRef:
      name: tracing
      group: enterpriseagentgateway.solo.io
      kind: EnterpriseAgentgatewayParameters
  listeners:
  - protocol: HTTP
    port: 8080
    name: http
    allowedRoutes:
      namespaces:
        from: All
EOF

Create the secret for Anthropic. This way, you have proper access to Anthropic via your Gateway for LLM calls.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: anthropic-secret
  namespace: agentgateway-system
  labels:
    app: agentgateway-route
type: Opaque
stringData:
  Authorization: $ANTHROPIC_API_KEY
EOF

Create an Agentgateway Backend.

Two things to keep in mind with the AgentgatewayBackend config.

The first is that notice the routes are going through /v1/messages and not /v1/chat/completions like you'd normally see in an OpenAI API format spec route. The reason is that Agentgateway can handle the translation (from Anthropic spec to OpenAI spec), but because you're routing traffic directly through Claude, no translation occurs, which is why the Anthropic spec is needed.

The second thing is with the two configurations below, you'll see either a Model specified (Opus) or an open bracket to specify any Model you want. The reason why is because if you specify a Model in your AgentgatewayBackend and then use a different Model in Claude Code CLI, you will get a 400 error that says something along the lines of "thinking mode isn't enabled", which isn't the error that Claude Code should be showing you, but that's what you'll most likely see. If you specify Opus, you must use Opus in your Claude Code CLI configuration. If you specify no Model and just a Provider (anthropic: {}), you can use any Model you'd like.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  labels:
    app: agentgateway-route
  name: anthropic
  namespace: agentgateway-system
spec:
  ai:
    provider:
        anthropic:
          model: "claude-opus-4-6"
  policies:
    ai:
      routes:
        '/v1/messages': Messages
        '*': Passthrough
    auth:
      secretRef:
        name: anthropic-secret
EOF

Or without a specified Model

kubectl apply -f - <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  labels:
    app: agentgateway-route
  name: anthropic
  namespace: agentgateway-system
spec:
  ai:
    provider:
      anthropic: {}
  policies:
    auth:
      secretRef:
        name: anthropic-secret
    ai:
      routes:
        '/v1/messages': Messages
        '*': Passthrough
EOF

Create the routing configurations that point to your Gateway and use the Agentgateway Backend you created in the previous step as the reference.

kubectl apply -f- <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: claude
  namespace: agentgateway-system
  labels:
    app: agentgateway-route
spec:
  parentRefs:
    - name: agentgateway-route
      namespace: agentgateway-system
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: anthropic
      namespace: agentgateway-system
      group: agentgateway.dev
      kind: AgentgatewayBackend
EOF

Test Connectivity

With the Gateway, Backend, and Route configured, let's ensure that the Claude Code CLI traffic can successfully go through agentgateway.

Grab your ALB IP from the Gateway within an environment variable. If you're running this locally and don't have access to an ALB IP, you can skip this test and just use localhost after port-forwarding the Gateway service.

export INGRESS_GW_ADDRESS=$(kubectl get svc -n agentgateway-system agentgateway-route -o jsonpath="{.status.loadBalancer.ingress[0]['hostname','ip']}")
echo $INGRESS_GW_ADDRESS

Test the LLM connectivity through your Gateway with a single prompt.

ANTHROPIC_BASE_URL="http://$INGRESS_GW_ADDRESS$:8080" claude -p "What is a credit card"

Or with localhost.

ANTHROPIC_BASE_URL="http://127.0.0.1:8080" claude -p "What is a credit card"

You can also go into Claude Code CLI if you just run ANTHROPIC_BASE_URL="[http://127.0.0.1:8080](http://127.0.0.1:8080/)" claude or ANTHROPIC_BASE_URL="http://$INGRESS_GW_ADDRESS$:8080" and you'll be able to prompt it with whatever you'd like.

With the traffic connectivity tested, let's implement Prompt Guards.

Prompt Guards

Connectivity through agentgateway with Claude Code CLI has been tested and confirmed, so now, let's move into the security piece.

The number 1 thing organizations want to be able to secure is what can actually get prompted via an Agent. For example, the last thing you want is to have someone prompt an Agent with Delete all of the Kubernetes clusters in production and it actually does it. To avoid this, you need to ensure that what a user can prompt is something that they should be able to prompt.

Modify the AgentgatewayBackend with a prompt guard. Notice how this is a regex and for the test, we want to block any traffic that has the words credit card in it.

kubectl apply -f- <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  labels:
    app: agentgateway-route
  name: anthropic
  namespace: agentgateway-system
spec:
  ai:
    provider:
        anthropic:
          model: "claude-opus-4-6"
  policies:
    ai:
      routes:
        '/v1/messages': Messages
        '*': Passthrough
      promptGuard:
        request:
        - response:
            message: "Rejected due to inappropriate content"
          regex:
            action: Reject
            matches:
            - "credit card"
    auth:
      secretRef:
        name: anthropic-secret
EOF

You can also do the same thing without a Model specified:

kubectl apply -f - <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  labels:
    app: agentgateway-route
  name: anthropic
  namespace: agentgateway-system
spec:
  ai:
    provider:
      anthropic: {}
  policies:
    auth:
      secretRef:
        name: anthropic-secret
    ai:
      routes:
        '/v1/messages': Messages
        '*': Passthrough
      promptGuard:
        request:
        - response:
            message: "Rejected due to inappropriate content"
          regex:
            action: Reject
            matches:
            - "credit card"
EOF

Run the check again by running either of the following:

ANTHROPIC_BASE_URL="http://$INGRESS_GW_ADDRESS:8080" claude -p "What is a credit card"
ANTHROPIC_BASE_URL="http://$INGRESS_GW_ADDRESS:8080" claude -p and then prompting within Claude Code What is a credit card

You'll get an output similar to the one below.

With traffic routing through agentgateway from Claude Code CLI and the knowledge of how prompt guards can work in this scenario, you can now secure traffic from anyones laptop/desktop when they're using an Agentic CLI client.

Build AI Agents on Kubernetes: Kagent + Amazon Bedrock Setup Guide

Michael Levan — Sat, 24 Jan 2026 13:19:06 +0000

Managing various LLM provider accounts, subscriptions, and cost can get cumbersome for many organizations in a world where multiple LLMs are used. To avoid this, you can use what can be called a "middle ground" between your Agent and the LLM provider.

With AWS Bedrock, you can set up an API key and access various LLMs from Claude to GPT to Llama from one place. Instead of having multiple API keys and various accounts, you can route all of your Agentic traffic from your Agent to an LLM via Bedrock.

In this blog post, you'll learn how to set up an Agent via kagent to access Bedrock Models and use them to perform any action you'd like.

Prerequisites

To follow along with this blog post from a hands-on perspective, you should have the following:

A Kubernetes cluster.
Kagent installed, which you can find here.

Configuring Access To AWS

The first step is ensuring that you have proper access to AWS so you can use the Model that you'd like to implement within your Agent.

Create environment variables with your AWS access key, secret, and region. To retrieve an AWS access key and secret, you'll need to create them in AWS IAM.

export AWS_ACCESS_KEY_ID=<your-access-key-id>
export AWS_SECRET_ACCESS_KEY=<your-secret-access-key>
export AWS_REGION=us-west-1

Once you have access, you can run the command below which will show you what Models are available in your region of choice.

aws bedrock list-inference-profiles --region us-east-1 \
  --query "inferenceProfileSummaries[?contains(inferenceProfileId, 'claude')].{id:inferenceProfileId,name:inferenceProfileName}" \
  --output table

Here's an example of the output you should see on your terminal.

----------------------------------------------------------------------------
|                                  ListInferenceProfiles                                  |
+---------------------------------------------------+----------------------
|                        id                         |                name                 |
+---------------------------------------------------+-----------------------
|  us.anthropic.claude-sonnet-4-20250514-v1:0       |  US Claude Sonnet 4                 |
|  global.anthropic.claude-sonnet-4-5-20250929-v1:0 |  Global Claude Sonnet 4.5           |
|  us.anthropic.claude-haiku-4-5-20251001-v1:0      |  US Anthropic Claude Haiku 4.5      |
|  global.anthropic.claude-haiku-4-5-20251001-v1:0  |  Global Anthropic Claude Haiku 4.5  |
|  us.anthropic.claude-opus-4-5-20251101-v1:0       |  US Anthropic Claude Opus 4.5       |
|  global.anthropic.claude-opus-4-5-20251101-v1:0   |  GLOBAL Anthropic Claude Opus 4.5   |
|  us.anthropic.claude-sonnet-4-5-20250929-v1:0     |  US Anthropic Claude Sonnet 4.5     |
+---------------------------------------------------+-----------------------

Next, go into AWS Bedrock and generate an API key. Although you have access to your AWS account, there's a separate API key needed to access LLMs via AWS Bedrock.

Create an environment variable with the API key.

export BEDROCK_API_KEY=

With this configuration, you can now begin the Model and Agent setup so you can access LLMs via Bedrock through kagent.

Model And Agent Setup

The next phase is to create a Model Config which will be how the Agent knows what Model to access. In this case, the Model called to within the Model Config will be an OpenAI GPT Model.

Create a Kubernetes secret that contains your AWS access key, secret, and Bedrock API key.

kubectl create secret generic kagent-bedrock-aws -n kagent \
  --from-literal=AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
  --from-literal=AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
  --from-literal=BEDROCK_API_KEY=$BEDROCK_API_KEY \
  --from-literal=AWS_SESSION_TOKEN=""

Implement a Model config that calls out to the openai.gpt-oss-20b-1:0 Model using your Bedrock API key secret. You'll also see the base URL which is the URL where the Model and provider exist via Bedrock.

kubectl apply -f - <<EOF
apiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
  name: bedrock-model-config
  namespace: kagent
spec:
  apiKeySecret: kagent-bedrock-aws
  apiKeySecretKey: BEDROCK_API_KEY
  model: openai.gpt-oss-20b-1:0
  provider: OpenAI
  openAI:
    baseUrl: "https://bedrock-runtime.us-east-1.amazonaws.com/openai/v1"
EOF

Check that the Model was accepted.

kubectl get modelconfig bedrock-model-config -n kagent -o jsonpath='{.status.conditions}' | jq

You'll see an output similar to the below:

[
  {
    "lastTransitionTime": "...",
    "message": "",
    "reason": "ModelConfigReconciled",
    "status": "True",
    "type": "Accepted"
  }
]

With the Model config set up, you can now create the Agent and test it.

Using Bedrock With Kagent

With kagent installed, you have access to various CRDs like the ModelConfig object you created in the previous section. Within the kagent CRDs, you also have access to the Agent object, which allows you to define everything from what Model Config to use to the prompt to MCP Server tools and Agent Skills.

Create a new Agent with the YAML below. It includes all of the secrets needed, a prompt, and a few MCP Server tools.

kubectl apply -f - <<EOF
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: bedrock-agent-test
  namespace: kagent
spec:
  description: Kubernetes troubleshooting agent powered by Claude via Bedrock
  type: Declarative
  declarative:
    modelConfig: bedrock-model-config
    deployment:
      env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: kagent-bedrock-aws
              key: AWS_ACCESS_KEY_ID
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: kagent-bedrock-aws
              key: AWS_SECRET_ACCESS_KEY
        - name: AWS_SESSION_TOKEN
          valueFrom:
            secretKeyRef:
              name: kagent-bedrock-aws
              key: AWS_SESSION_TOKEN
    systemMessage: |
      You're a friendly and helpful agent that uses Kubernetes tools to help with troubleshooting and deployments.

      # Instructions
      - If user question is unclear, ask for clarification before running any tools
      - Always be helpful and friendly
      - If you don't know how to answer the question, respond with "Sorry, I don't know how to answer that"

      # Response format
      - ALWAYS format your response as Markdown
      - Include a summary of actions you took and an explanation of the result
    tools:
      - type: McpServer
        mcpServer:
          name: kagent-tool-server
          kind: RemoteMCPServer
          toolNames:
          - k8s_get_available_api_resources
          - k8s_get_cluster_configuration
          - k8s_get_events
          - k8s_get_pod_logs
          - k8s_get_resource_yaml
          - k8s_get_resources
          - k8s_check_service_connectivity
EOF

Wait until the Agent is up and operational.

kubectl get pods -n kagent --watch

Open the kagent dashboard
Go to your new Agent.
Prompt it with something like What can you do?. You'll see an output similar to the one below.

Routing Observable and Secure Traffic Through Claude

Michael Levan — Sun, 18 Jan 2026 15:38:08 +0000

AI traffic that goes through enterprise systems should include everything from servers, cloud environments, and even laptops, desktops, and mobile devices. This level of observability and security isn't "new"; the industry has had it for years with Mobile Device Management (MDM) software. With AI workloads, however, the concepts of properly observing and securing local systems seem to have been forgotten.

And we can't forget about AI traffic.

In this blog post, you'll learn how to route local AI traffic through agentgateway when tools like Claude desktop are interacting with MCP Servers.

Prerequisites

To follow along from a hands-on perspective, you'll need the following:

A Kubernetes cluster.
Claude Desktop.
Agentgateway installed.

The Low-Hanging Fruit

Organizations, enterprises, teams, and engineers are working on consistent ways to implement Agentic infrastructure, whether that be on systems, domain-specific Agents, generic Agents, MCP, and everything in between. This is typically happening in many places today at the, what we can call "backend layer". The "backend layer" are the cloud environments, servers running AI workloads, and networks.

However, there's one piece to the puzzle that seems to be overlooked - the "frontend layer". These are the user devices (laptops, desktops, mobile devices) within the organization that are being used at work.

In the engineering space, that typically falls into the LLM, Agents, or desktop software that engineers are using (Claude Code, Claude Desktop, Gemini CLI, etc.). With these "frontend layer" tools, it's open to all with zero observability or security. Now, the goal isn't to completely lock everything down to where no one can use AI, but there needs to be defense in depth, security practices, and perhaps most importantly, observability for all AI traffic even, and especially, when it's coming from a local machine.

Much like all systems (laptops, desktops, mobile devices) go through networks within the enterprise that are the internal networks (traffic through a router and rules in place by a firewall and observed at the packet level), AI traffic needs to be looked at the same way.

Deploying An MCP Server

The first step in the journey is to give Claude Code desktop "something" to route to. This could be another Agent, various Models, or an MCP Server for specific tool selection needs. This section will walk you through how to deploy an MCP Server on a Kubernetes cluster.

Deploy the following configuration which contains a configmap that has the MCP Server configuration, a Kubernetes Deployment, and a Kubernetes Service.

kubectl apply -f - <<EOF
apiVersion: v1
kind: ConfigMap
metadata:
  name: mcp-math-script
  namespace: default
data:
  server.py: |
    import uvicorn
    from mcp.server.fastmcp import FastMCP
    from starlette.applications import Starlette
    from starlette.routing import Route
    from starlette.requests import Request
    from starlette.responses import JSONResponse, Response

    mcp = FastMCP("Math-Service")

    @mcp.tool()
    def add(a: int, b: int) -> int:
        return a + b

    @mcp.tool()
    def multiply(a: int, b: int) -> int:
        return a * b

    async def handle_mcp(request: Request):
        try:
            data = await request.json()
            method = data.get("method")
            msg_id = data.get("id")
            result = None

            if method == "initialize":
                result = {
                    "protocolVersion": "2024-11-05",
                    "capabilities": {"tools": {}},
                    "serverInfo": {"name": "Math-Service", "version": "1.0"}
                }

            elif method == "notifications/initialized":
                # Notifications are fire-and-forget, return empty 202 response
                return Response(status_code=202)

            elif method == "tools/list":
                tools_list = await mcp.list_tools()
                result = {
                    "tools": [
                        {
                            "name": t.name,
                            "description": t.description,
                            "inputSchema": t.inputSchema
                        } for t in tools_list
                    ]
                }

            elif method == "tools/call":
                params = data.get("params", {})
                name = params.get("name")
                args = params.get("arguments", {})

                # Call the tool
                tool_result = await mcp.call_tool(name, args)

                # --- FIX: Serialize the content objects manually ---
                serialized_content = []
                for content in tool_result:
                    if hasattr(content, "type") and content.type == "text":
                        serialized_content.append({"type": "text", "text": content.text})
                    elif hasattr(content, "type") and content.type == "image":
                         serialized_content.append({
                             "type": "image",
                             "data": content.data,
                             "mimeType": content.mimeType
                         })
                    else:
                        # Fallback for dictionaries or other types
                        serialized_content.append(content if isinstance(content, dict) else str(content))

                result = {
                    "content": serialized_content,
                    "isError": False
                }

            elif method == "ping":
                result = {}

            else:
                return JSONResponse(
                    {"jsonrpc": "2.0", "id": msg_id, "error": {"code": -32601, "message": "Method not found"}},
                    status_code=404
                )

            return JSONResponse({"jsonrpc": "2.0", "id": msg_id, "result": result})

        except Exception as e:
            # Print error to logs for debugging
            import traceback
            traceback.print_exc()
            return JSONResponse(
                {"jsonrpc": "2.0", "id": None, "error": {"code": -32603, "message": str(e)}},
                status_code=500
            )

    app = Starlette(routes=[
        Route("/mcp", handle_mcp, methods=["POST"]),
        Route("/", lambda r: JSONResponse({"status": "ok"}), methods=["GET"])
    ])

    if __name__ == "__main__":
        print("Starting Fixed Math Server on port 8000...")
        uvicorn.run(app, host="0.0.0.0", port=8000)
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcp-math-server
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mcp-math-server
  template:
    metadata:
      labels:
        app: mcp-math-server
    spec:
      containers:
      - name: math
        image: python:3.11-slim
        command: ["/bin/sh", "-c"]
        args:
        - |
          pip install "mcp[cli]" uvicorn starlette &&
          python /app/server.py
        ports:
        - containerPort: 8000
        volumeMounts:
        - name: script-volume
          mountPath: /app
        readinessProbe:
          httpGet:
            path: /
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: script-volume
        configMap:
          name: mcp-math-script
---
apiVersion: v1
kind: Service
metadata:
  name: mcp-math-server
  namespace: default
spec:
  selector:
    app: mcp-math-server
  ports:
  - port: 80
    targetPort: 8000
EOF

The MCP Server should now be running in a Pod via the default Namespace with the mcp-math-server k8s Service sitting in front of the Pod.

Configuring A Gateway

With the MCP Server deployed, you need a way to pass traffic through to it. If you think about when Agents communicate to other Agents, MCP Servers, or LLMs, there's a "middle layer", which is how the Agent gets from point A (itself) to point B (the MCP Server in this case), that "middle layer" is where the packets flow, which is the Gateway.

If you aren't running on a Kubernetes cluster that has the ability to create a public ALB with an IP address that's accessible externally, you can use something like Metallb or port-forward the Gateway in your terminal.

Create a new Gateway, which will use the agentgateway Gateway Class. It will be listening on port 8080 and allow traffic from the same Namespace as where the Gateway is deployed (agentgateway-system).

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: agentgateway-mcp
  namespace: agentgateway-system
spec:
  gatewayClassName: enterprise-agentgateway
  listeners:
  - name: http
    port: 8080
    protocol: HTTP
    allowedRoutes:
      namespaces:
        from: Same
EOF

Implement an agentgateway backend, which is what tells the Gateway what to route to. In this case, it's the MCP Server that you deployed in the previous section.

kubectl apply -f - <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayBackend
metadata:
  name: demo-mcp-server
  namespace: agentgateway-system
spec:
  mcp:
    targets:
      - name: demo-mcp-server
        static:
          host: mcp-math-server.default.svc.cluster.local
          port: 80
          path: /mcp
          protocol: StreamableHTTP
EOF

Create an HTTP route so there's a path for the Gateway to route to. In this case, the "path" is the MCP Server via the agentgateway backend.

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: mcp-route
  namespace: agentgateway-system
spec:
  parentRefs:
  - name: agentgateway-mcp
  rules:
  - backendRefs:
    - name: demo-mcp-server
      namespace: agentgateway-system
      group: agentgateway.dev
      kind: AgentgatewayBackend
EOF

Retrieve the IP address of the Gateway. If an external one doesn't exist, you can port-forward the Gateway service.

export GATEWAY_IP=$(kubectl get svc agentgateway-mcp -n agentgateway-system -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $GATEWAY_IP

Open MCP Inspector to test the traffic to the MCP Server.

npx modelcontextprotocol/inspector#0.16.2

Add the following URL into MCP Inspector. If you're port forwarding the Gateway service, use localhost instead of an IP address.

http://YOUR_ALB_LB_IP:8080/mcp

If you search for tools, you should see an add and multiply tool.

Configure Claude Desktop With An MCP Server

The last step is to configure Claude Desktop to route through/use the AI gateway (agentgateway) that you deployed in the previous section. This will ensure that the traffic flowing from Claude Desktop to the MCP Server is observable, has the ability to be secured, and is going through a properly built Gateway designed specifically for AI workloads.

Create a new file called claude_desktop_config.json in the path where Claude exists (like in the following example).

mkdir -p ~/Library/Application\ Support/Claude
cat > ~/Library/Application\ Support/Claude/claude_desktop_config.json << 'EOF'
{
  "mcpServers": {
    "math-service": {
      "command": "npx",
      "args": ["-y", "supergateway", "--streamableHttp", "http://YOUR_ALB_LB_IP:8080/mcp"]
    }
  }
}
EOF

After saving the config, restart Claude Desktop for changes to take effect. If you don't see any errors when opening Claude Desktop, that means the configuration that you added in step 1 worked as expected.
With Claude Desktop open, ask it a simple question like What is 2 + 2.

Traffic is now routing through agentgateway via Claude Code!

Running Any AI Agent on Kubernetes: Step-by-Step

Michael Levan — Sat, 13 Dec 2025 23:35:58 +0000

There are many Agentic creation frameworks ranging from CrewAI to kagent to langchain and several others which are typically written in Python or JS. If you're an engineer working on Kubernetes, you may be thinking "What about a declarative Agent deployment method?"

In this blog post, you'll see how to create your own Agent in an Agent framework and then deploy it to kagent in a declarative fashion.

Prerequisites

To follow along with this blog post, you should have the following:

A Kubernetes cluster deployed with kagent installed. If you've never installed kagent, you can find the how-to here.
Python3.10 or above installed.
Docker desktop (or just the Docker engine) installed to build the container image.

What Are BYO Agents

BYO (Bring Your Own) means you can create an Agent in any of the supported providers from kagent. You can also create your agent fully along with connect it to MCP sercers in kagent, but if you're already used to writing your Agents in Python using CrewAI, ADK, langchain, or any other framework, kagent gives you the ability to import those Agents. The only thing you need to do is containerize the Agent, which is straightforward with a Dockerfile (you'll see an example in the section on creating Agents).

Building An Agent

With the previous section giving you knowledge around BYO Agents, it's time to start creating an Agent and see it run within Kubernetes. The next two sections will walk you through how to build a custom Agent with Agent Development Kit (ADK), which is an Agent creation framework and use an existing Agent to see the process of getting one that's readily available for Kubernetes deployed.

Creating An Agent

Install the Google ADK library. Depending on where you're running the below, you may need to use pip3 instead of pip.

pip install google-adk

With the adk command, use the create subcommand to create a scaffolding for an ADK Agent in Python.

adk create NAME_OF_YOUR_AGENT

You should see an output similar to the one below (with the name of your Agent).

You can cd into the directory and use the run subcommand to see it in action as with the scaffolding, you'll have an Agent template.

cd adk/NAME_OF_YOUR_AGENT && adk run NAME_OF_YOUR_AGENT

Using An Existing Agent

To make life a bit easier, instead of having to go and build out everything that is needed for the Agent to be containerized, you can use one that was already built and tested (by myself). If you're wondering "Well, why did I build an Agent then?" it's because with that Agent, you'll be able to containerize it and run it yourself after seeing the example in this section as you can use it as a reference.

Clone the agentic-demo-code repo and cd into the adk/troubleshoot-agent directory.
Open the Dockerfile and you should see the file contents below.

### STAGE 1: base image
ARG DOCKER_REGISTRY=ghcr.io
ARG VERSION=0.7.4
FROM $DOCKER_REGISTRY/kagent-dev/kagent/kagent-adk:$VERSION

WORKDIR /app

COPY troubleshootagent/ troubleshootagent/
COPY pyproject.toml pyproject.toml
COPY uv.lock uv.lock
COPY how-it-works.md how-it-works.md

RUN uv sync --locked --refresh

CMD ["troubleshootagent"]

Run the following command to build the container image.

docker build . -t troubleshootagent:latest

If you see an error about a "uv sync", run the following command to create a lock file for library versions and dependencies.

uv lock

You should see that the image was fully built.

With the Agent container image local, you'll need to push it to a container registry of your choosing. Considering Docker Hub is free, you can use that if you'd prefer. Below is an example with my GitHub org.

docker tag troubleshootagent:latest adminturneddevops/troubleshootagent:latest

docker push adminturneddevops/troubleshootagent:latest

If you don't want to push the container image to your container registry, you can use adminturneddevops/troubleshootagent:latest in the next section since the container image will be public.

Deploying An Agent On Kubernetes

With the Agent fully built, it's time to deploy it on Kubernetes using the kagent framework. This will give you a declarative method of running Agents in a mature orchestration platform like Kubernetes.

For the Agent to work, it'll connect to an LLM. You need authentication/API access to an LLM of your choosing. In this scenario, Google Gemini is used, but you can swap it for any AI Provider you'd like to use.

Use an env variable to expoert the API key.

export GOOGLE_API_KEY=

Create a Kubernetes Secret with the API key.

kubectl apply -f- <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: kagent-google
  namespace: kagent
type: Opaque
stringData:
  GOOGLE_API_KEY: $GOOGLE_API_KEY
EOF

Use the Agent object via the kagent CRDs to add the Agent to kagent.

kubectl apply -f - <<EOF
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
  name: troubelshoot-agent
  namespace: kagent
spec:
  description: This agent is used to be a Platform Engineering troubleshoot expert.
  type: BYO
  byo:
    deployment:
      image: adminturneddevops/troubleshootagent:latest
      env:
        - name: GOOGLE_API_KEY
          valueFrom:
            secretKeyRef:
              name: kagent-google
              key: GOOGLE_API_KEY
EOF

Confirm that the Agent is running by looking at the Pod in the kagent Namespace.

kubectl get pods -n kagent

You can now begin using the Agent in kagent.