We taught agents to act on their own. Now someone has to watch the door.

#agents #ai #kubernetes #security

I work at Tigera, so take this with the appropriate grain of salt. But I've spent enough time around kubernetes security to know when something is genuinely a new problem versus an old problem wearing a new hat. AI agents are a new problem.

This week we shipped Lynx. I want to talk about why it exists, because the "why" is more interesting than the feature list, and the why is something I've watched play out in real clusters.

The thing nobody wants to say out loud

Most of our security tooling assumes workloads are predictable. A service does roughly the same thing every time you call it. You can reason about it, write a policy for it, and trust that the policy holds because the behavior holds.

AI agents break that assumption. The same agent, given the same task twice, can take two different paths. It reads untrusted input, decides what to do next, calls a tool, calls another agent, talks to an LLM, and you find out what happened after it happened. The credential it used was valid. That tells you the door opened. It tells you nothing about what walked through.

That gap is the whole problem. We've gotten very good at handing out keys and very bad at knowing what anyone does once they're inside.

What usually happens instead

Here's the part that made this real for me. Most teams already have agents running. They just don't know how many.

Someone on the data team wires up an agent to summarize tickets. Someone in platform builds one to triage alerts. A few of them call out to OpenAI or Anthropic directly, with a key pasted into an env var, on a pod nobody registered anywhere. Security finds out when the bill arrives or when something breaks.

When we point Lynx's discovery at a cluster, the first scan almost always turns up agents the platform team didn't know existed. Every time. It's a little uncomfortable and very useful.

How Lynx actually thinks about this

The core idea is simple to say and hard to build: put one control point in the path of every agent interaction, and require nothing from the agent's code to do it.

That last bit matters more than it sounds. If governance depends on developers importing your library and using it correctly, you don't have governance. You have a suggestion. Lynx works at the platform level instead, so it applies whether or not the person who wrote the agent cooperated.

A few of the design choices I think are worth calling out:

It runs on Kubernetes primitives. The whole data model is a handful of custom resources stored in the API server. No bolt-on database to operate. Telemetry goes to a ClickHouse you bring yourself, so your data stays where you want it.

It doesn't issue agents their own long-lived keys. The gateway holds the upstream credential. When agent A needs to call agent B, the gateway mints a fresh token scoped to that single hop, good for a few minutes. If it leaks, it's worthless almost immediately and useless anywhere except that one path. Compare that to the env-var API key that works forever, everywhere, for anyone who copies it.

Policy is default-deny and written in Cedar, the same language whether you're authorizing a request or constraining what an agent can do at the syscall level. One mental model for two layers usually beats two clever models you have to hold in your head at once.

And it watches the kernel. The discovery and anomaly detection run on eBPF, down where the TLS handshake happens. An agent that skips the gateway entirely still shows up, because it can't make a network call without the kernel seeing it. That's how shadow agents stop being invisible.

The line I keep coming back to

There's a phrase from the launch that stuck with me: the difference between hoping an agent behaves after it gets a valid token, and knowing what it did.

That's the actual shift. Identity and access control answer "should this be allowed to start." They were never built to answer "what did it do, who did it do it on behalf of, and which policy let it." For deterministic software you could mostly skip the second question. For agents you can't, because the answer changes run to run.

Who this is really for

If you're on a platform or AI team, the pitch is that you can keep shipping agents fast without flying blind, and you get the audit trail handed to you instead of building it later under pressure.

If you're the person who has to sign off on the risk, it's the first time you can see all of it: the registered agents, the shadow ones, who they talk to, and what they're allowed to touch. It's already running in production at some large banks, which are not exactly known for relaxed risk appetites.

What I actually think

I'm wary of "AI-native security" as a phrase. A lot of it is old products with new copy. This one is different in a way I can defend: the problem really is new, because non-deterministic workloads really do break the assumptions our existing tools were built on. You can't audit your way out of it with a Cloud Access Security Broker (CASB) designed for SaaS logins.

I don't think agents are going to slow down to wait for governance to catch up. So the honest question isn't whether to put something in their path. It's whether you'd rather do it now, on your terms, or after the first incident, on someone else's.

If you want the deeper technical version, Peter Kelly's walkthrough of how Lynx works under the hood is worth your time, and Ratan Tipirneni's post covers the reasoning behind building it at all.

Disclosure: I work at Tigera. These are my own views, not an official company position.