TL;DR: Kubernetes schedules LLM workloads well, but it does not give them the isolation boundary they need once they start calling tools, executing code, or handling tenant data.
Open Source Summit North America made one thing obvious: the cloud native crowd has moved from "can Kubernetes run LLM workloads?" to "what breaks when we trust Kubernetes too much?"
That is the right question.
The default Kubernetes security model assumes a pod is mostly an application packaging unit. It gives you namespaces, cgroups, seccomp, AppArmor, service accounts, admission control, and network policy. All of that matters. None of it changes the central fact that normal containers share the host kernel.
For a stateless API, that tradeoff is usually fine. For an LLM tool runner that can read files, call APIs, invoke Python, shell out to package managers, and chain actions across systems, that boundary starts looking thin.
The uncomfortable version is this: vanilla Kubernetes is orchestration, not containment.
The Problem
LLM inference by itself is not the scary part. A model server that receives a prompt and returns tokens is mostly a specialized API service with GPU scheduling problems.
The risk changes when the workload gains agency:
Prompt input
-> retrieval
-> tool selection
-> code execution
-> network call
-> file write
-> another tool call
At that point, the workload is no longer just serving traffic. It is interpreting untrusted text and turning it into actions.
That is why the recent CNCF security conversation around AI sandboxing matters. Kubernetes can restart a failed pod, route around a bad node, and roll a deployment. It cannot understand whether a prompt is trying to turn a tool into an escape path. It also cannot turn a shared kernel into a hard tenant boundary.
What I Tried First
My first instinct was the usual Kubernetes hardening stack:
apiVersion: v1
kind: Pod
metadata:
name: llm-worker
spec:
securityContext:
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
containers:
- name: worker
image: ghcr.io/example/llm-worker:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
That should still be the baseline. The mistake is treating it as the finish line.
Pod Security Standards reduce obvious footguns. NetworkPolicy controls blast radius. RBAC prevents a compromised workload from casually listing secrets or mutating the cluster. Admission policies keep the platform honest.
But an LLM agent running untrusted code is not just a badly configured web pod. It is closer to a multi tenant execution service. That needs a runtime boundary, not only a YAML checklist.
The Runtime Choice
The Kubernetes primitive that makes this manageable is RuntimeClass.
Instead of creating one magical "secure cluster," you route workloads by risk:
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: gvisor
handler: runsc
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: kata
handler: kata
Then each workload declares the boundary it needs:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tool-using-agent
spec:
replicas: 3
selector:
matchLabels:
app: tool-using-agent
template:
metadata:
labels:
app: tool-using-agent
spec:
runtimeClassName: kata
serviceAccountName: llm-agent
containers:
- name: agent
image: ghcr.io/example/tool-agent:2026.05
My rule of thumb:
| Workload | Runtime | Why |
|---|---|---|
| Plain inference API |
runc or gvisor
|
Low tool risk, latency sensitive |
| Retrieval worker with narrow egress | gvisor |
Better syscall boundary with less operational change |
| Agent that calls tools | kata |
VM boundary per pod, Kubernetes friendly |
| Arbitrary code execution | Firecracker style microVM | Treat it like untrusted tenant compute |
gVisor is the easiest first step because it integrates as an OCI runtime through runsc. Kata is the better fit when the isolation requirement is stronger and a VM per pod is acceptable. Firecracker is the most interesting boundary for code execution, but it is also the one I would least casually bolt onto an existing cluster without a real operations plan.
The Minimum Policy Set
The runtime is only one layer. I would not run LLM workloads without this set:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: llm-worker-egress
spec:
podSelector:
matchLabels:
app: tool-using-agent
policyTypes: ["Egress"]
egress:
- to:
- namespaceSelector:
matchLabels:
name: model-gateway
ports:
- protocol: TCP
port: 443
- to:
- namespaceSelector:
matchLabels:
name: telemetry
ports:
- protocol: TCP
port: 4317
Also make the service account boring:
apiVersion: v1
kind: ServiceAccount
metadata:
name: llm-agent
automountServiceAccountToken: false
If the workload does not need Kubernetes API access, do not mount a token. If it does, bind only the exact verbs it needs.
Benchmark Plan
I am not going to fake GPU numbers from a laptop. The package needs a real GPU node before publishing final performance claims.
This is the harness I would run:
| Runtime | Cold start p50 | Cold start p95 | Tokens per second | RSS overhead | Notes |
|---|---|---|---|---|---|
| runc | TODO | TODO | TODO | TODO | baseline |
| gVisor | TODO | TODO | TODO | TODO | syscall boundary |
| Kata | TODO | TODO | TODO | TODO | VM per pod |
| Firecracker | TODO | TODO | TODO | TODO | strongest code runner candidate |
The important part is measuring the right things. Startup time matters for bursty agents. Throughput matters for inference. RSS overhead matters because GPU nodes are already expensive. Operational failure modes matter more than all three.
The Takeaway
If you are running a normal model server, Kubernetes plus standard hardening may be enough.
If you are running tool using agents, code execution, tenant prompts, or workloads with broad egress, plain pods are the wrong abstraction. Use Kubernetes for scheduling. Use sandboxed runtimes for containment. Keep policy enforcement outside the model path where possible.
Kubernetes is still the control plane. It just should not be the only security boundary.
References
- CNCF: https://www.cncf.io/blog/2026/04/30/ai-sandboxing-is-having-its-kubernetes-moment/
- Kubernetes Agent Sandbox: https://kubernetes.io/blog/2026/03/20/running-agents-on-kubernetes-with-agent-sandbox/
- llm-d joins CNCF: https://www.cncf.io/blog/2026/03/24/welcome-llm-d-to-the-cncf-evolving-kubernetes-into-sota-ai-infrastructure/
- gVisor: https://github.com/google/gvisor
- Kata Containers: https://katacontainers.io/
- Firecracker containerd: https://github.com/firecracker-microvm/firecracker-containerd



Top comments (0)