<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tigera Inc</title>
    <description>The latest articles on DEV Community by Tigera Inc (tigeraio).</description>
    <link>https://dev.to/tigeraio</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F12572%2Fe692e88e-7a1e-49d5-870b-930d459570c0.png</url>
      <title>DEV Community: Tigera Inc</title>
      <link>https://dev.to/tigeraio</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tigeraio"/>
    <language>en</language>
    <item>
      <title>Introducing Tigera Lynx</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Fri, 19 Jun 2026 17:45:47 +0000</pubDate>
      <link>https://dev.to/tigeraio/introducing-tigera-lynx-1pk</link>
      <guid>https://dev.to/tigeraio/introducing-tigera-lynx-1pk</guid>
      <description>&lt;p&gt;Today we're announcing the general availability of Tigera Lynx, a unified control plane for Kubernetes-native AI agents.&lt;/p&gt;

&lt;p&gt;Lynx gives enterprises a single place to find every agent in their Kubernetes estate, tighten posture, assign a sandbox, give each agent a cryptographic identity, enforce policy on every action it takes, audit what agents actually do, and detect anomalous behavior — without changing a line of agent code.&lt;/p&gt;

&lt;p&gt;It sits in the path of every agent call (agent-to-agent, agent-to-tool, and agent-to-LLM) to authenticate, authorize, mediate, and audit each one. It plugs into the tools you already run, including your identity provider (Entra ID, Okta) or SPIFFE/SPIRE and your existing observability systems, and is built on open standards rather than proprietary lock-in.&lt;/p&gt;

&lt;p&gt;Built on a decade of deep Kubernetes network security experience, Lynx is generally available today 👉 &lt;a href="https://www.tigera.io/tigera-products/lynx/" rel="noopener noreferrer"&gt;https://www.tigera.io/tigera-products/lynx/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>aisecurity</category>
      <category>agenticai</category>
      <category>kubernetes</category>
      <category>security</category>
    </item>
    <item>
      <title>How Lynx Works: A Technical Walkthrough</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Thu, 18 Jun 2026 17:19:30 +0000</pubDate>
      <link>https://dev.to/tigeraio/how-lynx-works-a-technical-walkthrough-akp</link>
      <guid>https://dev.to/tigeraio/how-lynx-works-a-technical-walkthrough-akp</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkrfsenp43wdi8yfdkams.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fkrfsenp43wdi8yfdkams.png" alt="Tigera Lynx" width="800" height="500"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We launched Lynx this week. Instead of restating the pitch, I want to explain how it’s built and why we made the architectural choices we did. If you run Kubernetes and you’re starting to put AI agents on it, this is roughly the system you’d end up designing yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lynx is a control and data plane for all agentic AI traffic, providing a registry, gateway, audit, authentication with token exchange, policy enforcement, agent sandboxing, shadow agent discovery, and advanced AI capabilities such as red team agent and a guardian supervising agent to keep your agents on track. Lynx is single control point in the path of every agent call&lt;/strong&gt; – agent-to-agent, agent-to-MCP, agent-to-LLM. Every call is authenticated, authorized against policy, and recorded, with no changes to agent code.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvxz5xkcwuppqgyewhze9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvxz5xkcwuppqgyewhze9.png" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The constraints we started from
&lt;/h2&gt;

&lt;p&gt;Four principles shaped the design:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No agent code changes.&lt;/strong&gt; Governance has to be applied by the platform, not adopted as a library. If it requires a code change, it won’t land uniformly – and uniformity is the entire point.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No new database in the control plane.&lt;/strong&gt; The source of truth is the Kubernetes API server and the data model is custom resources – there’s no separate datastore to run, back up, and secure. (Telemetry is the one thing that needs a column store at scale; that’s kept separate and is bring-your-own.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don’t reinvent the data plane.&lt;/strong&gt; Proxying agentic protocols – MCP, A2A, streaming LLM traffic – well is a full-time job. We wanted to own the &lt;em&gt;policy&lt;/em&gt;, not the proxy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Catch what doesn’t opt in.&lt;/strong&gt; A governance layer that only sees traffic routed through it is blind exactly where the risk is. We needed an out-of-band way to find the agents nobody registered.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The data model
&lt;/h2&gt;

&lt;p&gt;Lynx is Kubernetes-native to the core: its entire vocabulary is a small set of custom resources – &lt;code&gt;Agent&lt;/code&gt;, &lt;code&gt;MCPServer&lt;/code&gt;, &lt;code&gt;LLMProvider&lt;/code&gt;, &lt;code&gt;ServiceIdentity&lt;/code&gt;, and &lt;code&gt;Policy&lt;/code&gt; – stored in the Kubernetes API itself. There’s no Lynx database; every record is something you can manage, GitOps, and RBAC like anything else in your cluster. The registry is a thin API in front of these resources. It records agents; it doesn’t run them.&lt;/p&gt;

&lt;p&gt;Two ideas matter throughout. First, &lt;strong&gt;workload identity is the join key&lt;/strong&gt; that ties a running pod to its registry record. Second, an agent becomes governed by two independent acts – it runs with an identity, and someone registers it – which means registration can happen in CI/CD at deploy time while the workload itself stays unaware of Lynx.&lt;/p&gt;

&lt;h2&gt;
  
  
  Identity: reuse what you already trust
&lt;/h2&gt;

&lt;p&gt;A workload proves who it is with &lt;strong&gt;one of two mechanisms&lt;/strong&gt; – it’s one or the other, and which one is recorded in the workload’s registration:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;SPIFFE/SPIRE&lt;/strong&gt; for mTLS workload identity, where private keys never leave the pod.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OIDC&lt;/strong&gt; for tokens from your existing IdP (Entra ID, Okta, Keycloak). The binding is the issuer/subject pair you record at registration time. Because the in-cluster Kubernetes API server is itself an OIDC issuer, this path also covers plain &lt;strong&gt;Kubernetes ServiceAccount tokens&lt;/strong&gt; – a pod’s projected token simply &lt;em&gt;is&lt;/em&gt; its identity, with nothing to mount beyond what Kubernetes already gives every pod.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can register more than one identity on an agent – which turns an IdP migration into a config change rather than a cutover – but any single call authenticates by exactly one. Human access to the dashboard uses the same IdP over OIDC, kept distinct so a person’s token is never mistaken for an agent’s. And not every caller is an agent: a &lt;code&gt;ServiceIdentity&lt;/code&gt; lets a plain service or human-operated client be governed by the same machinery.&lt;/p&gt;

&lt;p&gt;The validation pipeline behind all of this is deliberately strict and shared by every component: issuers are matched against a per-service allow-list (there is no “any issuer” mode), tokens are signature-verified and bound to an audience, and keys rotate automatically. The result is that agents reuse identity you already trust, rather than living in a parallel, ungoverned credential system.&lt;/p&gt;

&lt;h2&gt;
  
  
  One gateway for A2A, MCP, and LLM
&lt;/h2&gt;

&lt;p&gt;The framing that matters most: Lynx is a &lt;strong&gt;single consolidated gateway for all three classes of agentic traffic&lt;/strong&gt;. Today these tend to be governed by three different things – a service mesh for agent-to-agent, a bespoke proxy or SDK wrapper for MCP, an egress gateway or nothing at all for LLM calls. That fragmentation is how you end up with three identity models, three policy languages, and three audit trails nobody can correlate.&lt;/p&gt;

&lt;p&gt;Lynx collapses them into one control point with one identity model, one policy language, and one audit trail. Agents, MCP servers, and LLM providers are all first-class objects with their own governed routes, and every call – whatever its kind – is authenticated, authorized, and recorded the same way.&lt;/p&gt;

&lt;p&gt;The LLM path has a property teams feel immediately: &lt;strong&gt;the gateway holds the provider credential, the agent never does&lt;/strong&gt;. Upstream API keys live in one governed place, rotate centrally, and never sit in agent pods – and when a provider needs no upstream auth, the gateway strips the caller’s credential so a Lynx-issued token can’t leak to a third party.&lt;/p&gt;

&lt;h2&gt;
  
  
  The data plane: drive the proxy, don’t fork it
&lt;/h2&gt;

&lt;p&gt;The proxy in the request path is &lt;a href="https://agentgateway.dev" rel="noopener noreferrer"&gt;agentgateway&lt;/a&gt;, the open-source Rust proxy purpose-built for LLM/MCP/A2A traffic. We run it &lt;strong&gt;unmodified&lt;/strong&gt; and drive it the way Envoy is driven – over xDS. Our control plane watches the custom resources and compiles them into the proxy’s native configuration; the proxy itself never sees a Lynx resource, has no Kubernetes access, and holds no cluster privileges.&lt;/p&gt;

&lt;p&gt;That decoupling is deliberate, and it buys four things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Blast radius –&lt;/strong&gt; a malformed registration drops one route; it can’t corrupt the proxy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Least privilege&lt;/strong&gt;  &lt;strong&gt;–&lt;/strong&gt; the component on the wire has no API-server reach.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Schema freedom&lt;/strong&gt;  &lt;strong&gt;–&lt;/strong&gt; we evolve our data model without touching the proxy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hot reconfiguration&lt;/strong&gt;  &lt;strong&gt;–&lt;/strong&gt; register an agent and its route is programmed live, no restart.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The customer who already likes agentgateway gets it as-is, with Lynx’s governance layered on through the same open extension points they already trust – no proprietary lock-in at the data path.&lt;/p&gt;

&lt;h2&gt;
  
  
  The decision point: policy in the path, credentials minted per hop
&lt;/h2&gt;

&lt;p&gt;Before the proxy forwards a request, it calls back into Lynx’s decision point, which runs the same sequence every time: authenticate the caller, validate the destination’s requirements, and &lt;strong&gt;evaluate policy in &lt;a href="https://www.cedarpolicy.com/" rel="noopener noreferrer"&gt;Cedar&lt;/a&gt;&lt;/strong&gt; – a formally-grounded language, default-deny, with LLM, MCP, and agent access all expressed in one model. Only on an allow does the request proceed.&lt;/p&gt;

&lt;p&gt;The property I care most about is what happens on allow: t &lt;strong&gt;he gateway mints a fresh, short-lived credential scoped to that one hop&lt;/strong&gt;. When Agent A calls Agent B, A never holds a credential for B – it proves only its own identity, and the gateway issues a token good for exactly that destination, for a couple of minutes. A leaked token is useless beyond a single hop: no shared secrets, no standing keys, no blast radius.&lt;/p&gt;

&lt;p&gt;For multi-step chains – A calls B, which calls a tool – this extends into proper &lt;strong&gt;on-behalf-of delegation built on RFC 8693 token exchange&lt;/strong&gt;. An agent presents the token it already has and asks for a destination; Lynx validates it, checks policy &lt;em&gt;at the moment of issuance&lt;/em&gt; (so an unauthorized hop never even produces a credential), and mints a destination-scoped token carrying the original subject. The payoff is threefold: agents stay IdP-agnostic (one endpoint, one credential), delegation is genuine and auditable end-to-end rather than just the last leg, and least privilege is enforced by construction. To the rest of your estate, Lynx looks like an ordinary OAuth2 provider – standards in, standards out.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fj8dm0dw8lay64w3vu3xq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fj8dm0dw8lay64w3vu3xq.png" width="800" height="336"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Catching what routes around the gateway
&lt;/h2&gt;

&lt;p&gt;A gateway only governs traffic that flows through it – and the agents that &lt;em&gt;don’t&lt;/em&gt; route through it are exactly the ones worth finding. So Lynx watches at a layer the workload can’t bypass or tamper with: the kernel, via eBPF, deployed as a per-node agent that needs no instrumentation of the workloads it observes.&lt;/p&gt;

&lt;p&gt;The first signal is LLM egress. Any workload calling a provider does a TLS handshake; Lynx observes that in the kernel, attributes it to a pod, and checks whether that pod is a registered agent – classifying each as registered, a &lt;strong&gt;shadow agent&lt;/strong&gt; , or unattributable. This is the backstop for the LLM path specifically: even an agent that calls a provider &lt;em&gt;directly,&lt;/em&gt; bypassing the gateway entirely, still does a handshake the kernel sees. The gateway governs what routes through it; this finds what goes around it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fd4s2zyg0md5jjnr32ufw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fd4s2zyg0md5jjnr32ufw.png" width="800" height="292"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent sandboxing
&lt;/h2&gt;

&lt;p&gt;The same vantage point is also an &lt;strong&gt;enforcement&lt;/strong&gt; point. Lynx can run each agent inside a tailored kernel-level sandbox – a per-workload syscall policy that constrains which operations the pod may perform – rather than letting it act with the full ambient authority of its pod. Notably, those policies are written in the &lt;strong&gt;same Cedar language&lt;/strong&gt; as request authorization and compiled down to run in the kernel, so one policy model drives both the request path and the sandbox. Because enforcement lives in the kernel, a flagged or shadow agent is contained immediately and “unbypassably”, rather than merely alerted on.&lt;/p&gt;

&lt;p&gt;This is also where the platform is heading next: a per-agent behavioral baseline over kernel-level activity, with anomaly detection for the cases a request-time policy can’t catch – credential theft, lateral movement, an agent doing something it never has – and agent-specific threats such as memory and context poisoning. Policy governs intent; this layer is about what actually happens.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F9i7bzap5qllcdljhbh6x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F9i7bzap5qllcdljhbh6x.png" width="800" height="439"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Tracing, audit, and compliance
&lt;/h2&gt;

&lt;p&gt;Everything emits OpenTelemetry, and the design choice that pays off here is that the gateway’s authorization decisions and the agents’ own reasoning – their LLM and tool calls – land in the &lt;strong&gt;same distributed trace&lt;/strong&gt;. You don’t get one system for “what the agent did” and a separate one for “what the platform allowed”; you get a single, correlated timeline of each interaction.&lt;/p&gt;

&lt;p&gt;That timeline is what turns governance into an audit story. Every call carries who the caller was, on whose behalf it acted, which policy permitted it, and what the decision was – and because each hop is independently authorized and freshly credentialed, the chain is attributable &lt;strong&gt;end-to-end&lt;/strong&gt; , not just at the edge. Alongside the request traces, every change to the system itself – a registration, a policy edit – is recorded as an audit event capturing the actor, the operation, and the exact before-and-after. Together these are the reproducible, cryptographically attributable record that incident response and auditors ask for, and that frameworks such as SOC 2, HIPAA, GDPR, and financial-services regimes require you to produce on demand – without a separate logging project bolted on after the fact.&lt;/p&gt;

&lt;p&gt;Traces and audit records flow into ClickHouse (bring-your-own), which powers the dashboard’s inventory, policy editor, audit search, agent-to-agent traffic graph, and shadow-agent views.&lt;/p&gt;

&lt;h2&gt;
  
  
  Driving Lynx: dashboard, CLI, and MCP
&lt;/h2&gt;

&lt;p&gt;Everything in Lynx is an API over Kubernetes resources, so there are three ways to operate it – all thin clients over the same control plane:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The dashboard&lt;/strong&gt;. A web UI for the people who live in this day to day – agent and MCP inventory, a Cedar policy editor, the agent-to-agent traffic graph, audit search, and trace exploration. It’s a Next.js and React app that renders agent execution traces with &lt;a href="https://github.com/evilmartians/agent-prism" rel="noopener noreferrer"&gt;agent-prism&lt;/a&gt;, so a multi-hop, multi-agent interaction reads as a single timeline.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;lyctl.&lt;/strong&gt; A single Go CLI for everything scriptable – registering agents and MCP servers, authoring and testing policies, and standing up a complete demo environment in one command. It’s the natural fit for CI/CD, where registration belongs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP&lt;/strong&gt;. Lynx ships its own Model Context Protocol server that exposes the governance operations – list and register agents, write policies, inspect audit traces – as MCP tools. So you can drive Lynx straight from an AI assistant like &lt;strong&gt;Claude or Cursor&lt;/strong&gt; : “register this agent,” “which agents can reach the payments MCP server?”, “what changed in policy yesterday?” The platform that governs agents is itself operable by one.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fvgbgvvl9q61wn9a14mv4.png" width="800" height="455"&gt;
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Built on open standards
&lt;/h2&gt;

&lt;p&gt;We deliberately built Lynx on proven, open technology rather than inventing a parallel stack – it’s why it drops into an existing cluster and speaks the protocols your tooling already speaks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes-native –&lt;/strong&gt; the entire data model is custom resources in the Kubernetes API; it installs as a single Helm chart and runs no database of its own.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity –&lt;/strong&gt; SPIFFE/SPIRE for workload mTLS, and OIDC/OAuth2 with your existing IdP (including Kubernetes ServiceAccount tokens). Per-hop delegation uses RFC 8693 token exchange, and tokens are verified through standard JWKS.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy –&lt;/strong&gt; authorization is expressed in &lt;a href="https://www.cedarpolicy.com/" rel="noopener noreferrer"&gt;Cedar&lt;/a&gt;, a formally-grounded, open policy language – the same language whether it’s evaluated in the request path or compiled into the kernel.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Data plane –&lt;/strong&gt; the open-source &lt;a href="https://agentgateway.dev" rel="noopener noreferrer"&gt;agentgateway&lt;/a&gt; proxy, driven dynamically over xDS and integrated through the standard ext-authz contract, with native fluency in MCP and A2A.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Visibility and enforcement –&lt;/strong&gt; eBPF for kernel-level discovery and sandboxing, with no instrumentation of the workloads themselves.
Observability – OpenTelemetry end to end, stored in ClickHouse.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The throughline: Lynx contributes the governance layer – identity binding, Cedar policy, per-hop credentials, the agent registry – and bridges to everything else through open, standard contracts. No proprietary lock-in at the parts that matter most.&lt;/p&gt;

&lt;h2&gt;
  
  
  How it installs
&lt;/h2&gt;

&lt;p&gt;Lynx is a single Helm chart on any conformant Kubernetes cluster. The minimal install is the registry and the gateway; the data plane, the policy decision point, the kernel-level detector, the telemetry pipeline, and the UI are each switched on as you need them. The most revealing first step is to turn on discovery and watch what’s already talking to LLM providers across your fleet – for most teams, that first scan surfaces agents nobody knew were running.&lt;/p&gt;

&lt;p&gt;Explore our product page to &lt;a href="https://www.tigera.io/tigera-products/lynx/" rel="noopener noreferrer"&gt;see Lynx in action&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/how-lynx-works-a-technical-walkthrough/" rel="noopener noreferrer"&gt;How Lynx Works: A Technical Walkthrough&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>technicalblog</category>
      <category>aiagentsecurity</category>
      <category>products</category>
    </item>
    <item>
      <title>Why We Built Lynx: Bringing Control to the Age of AI Agents</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Wed, 17 Jun 2026 13:00:22 +0000</pubDate>
      <link>https://dev.to/tigeraio/why-we-built-lynx-bringing-control-to-the-age-of-ai-agents-12j0</link>
      <guid>https://dev.to/tigeraio/why-we-built-lynx-bringing-control-to-the-age-of-ai-agents-12j0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqedokj0ntxl0xu01kzal.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fqedokj0ntxl0xu01kzal.png" alt="Tigera Lynx" width="691" height="304"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For a decade, one idea has guided everything we’ve built at Tigera: How do you secure a dynamic system with a lot of moving parts that is changing rapidly, with a programmatic approach? Calico has applied that idea for Global 2000 companies running the largest Kubernetes platforms in the world, securing tens of millions of mission-critical transactions every day. Today I’m excited to announce the next chapter of that work: Lynx, a unified control plane for Kubernetes-native AI agents.&lt;/p&gt;

&lt;p&gt;This enables us to apply our deep knowledge of Kubernetes, eBPF, and our expertise in building scalable and highly performant systems to solve the security challenges that come with deploying AI Agents. Before I explain how Lynx addresses these challenges, it’s worth being clear about why AI agents are so hard to secure in the first place.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI agents broke the assumptions security stacks were built on
&lt;/h2&gt;

&lt;p&gt;The enterprise security tooling most organizations run was designed for workloads that are deterministic. A service does roughly the same thing today that it did yesterday. You can reason about its behavior, define what it’s allowed to touch, and trust that a valid credential maps to expected actions.&lt;/p&gt;

&lt;p&gt;AI agents don’t work that way. They’re autonomous and non-deterministic. An agent acts on behalf of a user, reaches for whatever tool, LLM, or other agent it needs, carries a delegation chain, and reads untrusted input as it goes. The same agent can take a different path every time it runs. A valid credential no longer guarantees good behavior, it just guarantees the door opens. And every time a new agent or tool comes online or there are changes in the platform, the blast radius shifts again.&lt;/p&gt;

&lt;p&gt;This leaves three teams staring at the same problem from three different angles, and none of them able to give a confident answer. The AI team wants to experiment with the latest technology and move fast. Platform engineering teams are measured on how fast they delpoy, but can’t prove the platform is actually under control. And the security team is asked to approve agents whose posture they have no real way to defend. Everyone needs to be accountable, but no one has the right controls.&lt;/p&gt;

&lt;p&gt;Lynx closes that gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Lynx does
&lt;/h2&gt;

&lt;p&gt;Lynx sits in the path of every agent call, whether agent-to-agent, agent-to-tool, or agent-to-LLM, and authenticates, authorizes, mediates, and audits each one. It does this without changing a line of agent code, and it plugs into the tools you already run: your identity provider (EntraID, Okta) or SPIFFE/SPIRE, and your existing observability systems. It’s built on open standards, not proprietary lock-in.&lt;/p&gt;

&lt;p&gt;One control plane brings together five capabilities that, until now, teams have been trying to stitch together by hand.&lt;/p&gt;

&lt;p&gt;It starts with &lt;strong&gt;discovery&lt;/strong&gt;. A central registry catalogs every agent, including its owner, its purpose, and its version, while eBPF-powered auto-discovery finds the agents nobody registered. Shadow agents are flagged and quarantined, and any agent’s actions can be reconstructed end-to-end through OpenTelemetry traces.&lt;/p&gt;

&lt;p&gt;From there, Lynx manages &lt;strong&gt;posture&lt;/strong&gt;. AI-CSPM continuously evaluates every agent against a baseline and surfaces drift and over-permissions the moment they appear, with per-agent sandboxing and pre-built compliance packs mapping to GDPR, HIPAA, SOC 2, and financial services requirements. A Red Team Agent continuously probes for weaknesses in posture and misconfigurations.&lt;/p&gt;

&lt;p&gt;It gives every agent a real &lt;strong&gt;identity&lt;/strong&gt;. Each one receives a verifiable cryptographic identity through integration with your identity provider (EntraID, Okta) or through SPIFFE/SPIRE, with no shared secrets. Long-lived API keys give way to short-lived, tightly scoped, auto-rotated tokens. A JWT token is minted for every hop of a multi-agent workflow so credentials are scoped to a single hop rather than handed around.&lt;/p&gt;

&lt;p&gt;Lynx authorizes every transaction and enforces &lt;strong&gt;policy&lt;/strong&gt; at the gateway. A single default-deny policy governs LLM, MCP, and agent access using the Cedar policy language, evaluated before any call executes. A misbehaving agent can be quarantined instantly, and a high-stakes call can be routed to a human—again, with no agent code changes. Lynx also provides the other controls that you need to secure and manage your agent: prompt injection, rate limiting, guardrails, budgets, spend limits, custom webhooks, MCP multiplexing, and aggregation and session management.&lt;/p&gt;

&lt;p&gt;Lastly, Lynx watches for &lt;strong&gt;anomalous behavior&lt;/strong&gt; at a layer agents can’t tamper with. eBPF and LSM observe every syscall, network call, and file access at the kernel, catching credential theft and lateral movement even when an action technically passes policy. This produces a forensic audit trail, and a Guardian Agent detects anomalous behavior and quarantines suspicious agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Security &amp;amp; visibility for Kubernetes-native AI
&lt;/h2&gt;

&lt;p&gt;When I look at AI agents, I don’t see a new category that requires us to start over. I see the next class of workload that is autonomous, distributed, and increasingly embedded in critical business processes. I see AI agents actively interacting with traditional applications and databases running in containers/VMs, and the need for a unified solution that can secure this traffic. The discipline is the same one we’ve practiced from the start: give every workload a verifiable identity, evaluate every action against policy before it runs, and observe behavior closely enough to catch what policy alone can miss. And do this in a manner that is agnostic of the underlying infrastructure so that we can help you avoid platform lock-in and the risk of a price hike that goes with it.&lt;/p&gt;

&lt;p&gt;As our CTO Peter Kelly puts it, because we watch behavior with eBPF and LSM at the kernel, we can detect an agent going wrong even when it carries a valid credential—and produce a reproducible audit trail to prove it. That’s the difference between hoping an agent behaves after acquiring a valid token, and knowing what it did.&lt;/p&gt;

&lt;p&gt;AI agents are going to keep multiplying across your estate. The question isn’t whether you’ll run them. It’s whether you can see them, govern them, and answer for them. With Lynx, you can.&lt;/p&gt;

&lt;p&gt;Lynx is generally available today. It scales horizontally on a Kubernetes-native architecture with no per-call overhead, and it’s already running in production at some of the world’s largest banks.&lt;/p&gt;

&lt;p&gt;Explore our product page to &lt;a href="https://www.tigera.io/tigera-products/lynx/" rel="noopener noreferrer"&gt;see Lynx in action&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/why-we-built-lynx-bringing-control-to-the-age-of-ai-agents/" rel="noopener noreferrer"&gt;Why We Built Lynx: Bringing Control to the Age of AI Agents&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>companyblog</category>
      <category>featuredblog</category>
      <category>aiagentsecurity</category>
      <category>products</category>
    </item>
    <item>
      <title>Five Principles of an Accountable AI Agent Network: How to Evaluate Any Governance Platform</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Wed, 10 Jun 2026 20:19:08 +0000</pubDate>
      <link>https://dev.to/tigeraio/five-principles-of-an-accountable-ai-agent-network-how-to-evaluate-any-governance-platform-2jcm</link>
      <guid>https://dev.to/tigeraio/five-principles-of-an-accountable-ai-agent-network-how-to-evaluate-any-governance-platform-2jcm</guid>
      <description>&lt;p&gt;The &lt;a href="https://www.tigera.io/blog/the-ai-agent-accountability-crisis-why-governance-isnt-keeping-up-with-deployment/" rel="noopener noreferrer"&gt;first post&lt;/a&gt; in this series argued that AI agent governance hasn’t kept pace with deployment. The &lt;a href="https://www.tigera.io/blog/the-five-pillars-of-ai-agent-accountability-a-diagnostic-framework-for-engineering-leaders/" rel="noopener noreferrer"&gt;second&lt;/a&gt; laid out the five pillars of accountability, and what is required. The &lt;a href="https://www.tigera.io/blog/the-ai-agent-accountability-gap-why-network-policies-api-gateways-and-rbac-are-not-enough/" rel="noopener noreferrer"&gt;third&lt;/a&gt; walked through why network policies, API gateways, MCP/A2A protocols, DIY security patterns, and Role-based Access Control (RBAC) each leave critical accountability gaps.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;So what does good look like?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The five pillars define &lt;strong&gt;what&lt;/strong&gt; &lt;a href="https://www.tigera.io/blog/your-ai-agents-are-autonomous-but-are-they-accountable/" rel="noopener noreferrer"&gt;AI agent accountability&lt;/a&gt; requires. The principles below define &lt;strong&gt;how&lt;/strong&gt; a governance platform should deliver it. These are the architectural principles your team should evaluate any AI agent governance solution against, whether you build it, buy it, or assemble it from open-source components.&lt;/p&gt;

&lt;p&gt;If a vendor pitches you a governance platform that fails any of these five, walk away.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the five principles of an accountable AI agent network?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.tigera.io/learn/guides/kubernetes-security/kubernetes-network-policy/" rel="noopener noreferrer"&gt;Kubernetes Network Policies&lt;/a&gt; are essential for securing any cluster. They restrict which pods can communicate with which other pods at the network level, and they should absolutely be part of your security posture.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Default-deny:&lt;/strong&gt; No agent communicates unless a policy explicitly permits it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attribute-based policy:&lt;/strong&gt; Policies reference agent attributes, not agent names.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-trust identity:&lt;/strong&gt; Every request authenticated, every identity verified.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit by design:&lt;/strong&gt; Every interaction produces a structured, correlated trace automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes-native:&lt;/strong&gt; The platform extends your existing infrastructure rather than replacing it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each principle below explains why it matters and what a passing solution looks like.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjx576wo13n6xvjgaq2b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyjx576wo13n6xvjgaq2b.png" width="800" height="130"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Use the five principles as a checklist when evaluating any governance platform. Fail any one, and the platform is one missing principle away from security theater.&lt;/p&gt;

&lt;h3&gt;
  
  
  Principle 1: Default-deny
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;No agent communicates with any other agent unless explicitly permitted by policy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the only safe starting posture for accountability. If your governance layer defaults to &lt;em&gt;allowing&lt;/em&gt; communication and only blocks what’s explicitly forbidden, every interaction you didn’t anticipate is ungoverned, and you can’t be accountable for what you didn’t authorize.&lt;/p&gt;

&lt;p&gt;Default-deny flips the model: nothing is allowed until a policy explicitly permits it. Every allowed interaction is intentional, traceable, and auditable. New agents are isolated by default until policies are written to grant them access, which is exactly the behavior you want in a governed network.&lt;/p&gt;

&lt;p&gt;Default-deny seems restrictive, but in practice it’s liberating. Your security team doesn’t have to anticipate every possible _ &lt;strong&gt;bad&lt;/strong&gt; _ interaction. They only have to define the &lt;strong&gt;&lt;em&gt;good&lt;/em&gt;&lt;/strong&gt; ones.&lt;/p&gt;

&lt;h3&gt;
  
  
  Principle 2: Attribute-based policy
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Policies should reference agent attributes, not agent names.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Hardcoding agent names in policies creates a governance system that breaks every time you add or rename an agent. It’s the equivalent of maintaining a firewall with hundreds of IP-based rules instead of using network segments.&lt;/p&gt;

&lt;p&gt;Attribute-based policies reference properties like capabilities, risk levels, team ownership, and environment tags. Instead of &lt;em&gt;“Agent-Finance-v2 can call Agent-Compliance-v3,&lt;/em&gt;” the policy says &lt;em&gt;“Agents with capability=financial-analysis can call agents with capability=compliance-query.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This approach has a powerful scaling property: when a new agent registers with matching attributes, existing policies apply automatically. The governance grows with the agent network, not against it. A team deploying a new agent doesn’t need to file a ticket to update allow-lists, they describe the agent’s attributes at registration time, and the policy engine handles the rest.&lt;/p&gt;

&lt;p&gt;This is the principle that separates a security model that survives at 10 agents from one that survives at 1,000.&lt;/p&gt;

&lt;h3&gt;
  
  
  Principle 3: Zero-trust identity
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Every request authenticated. Every identity verified. Trust nothing by default.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agent networks are susceptible to the same identity threats as any distributed system: spoofing, replay attacks, credential theft. But agents add an unique challenge: they operate on behalf of the users. This means both the &lt;strong&gt;workload identity&lt;/strong&gt; (is this actually the agent it claims to be?) and the &lt;strong&gt;user identity&lt;/strong&gt; (on whose behalf is this agent acting?) must be verified.&lt;/p&gt;

&lt;p&gt;A governance platform should support &lt;strong&gt;dual authentication&lt;/strong&gt; : cryptographic workload identity (proving the agent is genuine) and token-based user identity (establishing who triggered the action). Both identities should be available for policy evaluation and audit logging.&lt;/p&gt;

&lt;p&gt;Short-lived credentials, automatic rotation, and cryptographic verification should be standard, not optional add-ons. Static API keys and long-lived tokens are liabilities in an agent network; compromised credentials can enable automated lateral movement at machine speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Principle 4: Audit by design, not by afterthought
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Every interaction produces a structured, correlated trace automatically.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If your team has to &lt;em&gt;add&lt;/em&gt; logging after the fact, you’ve already lost accountability. Audit records should be a &lt;strong&gt;byproduct of the governance layer’s enforcement&lt;/strong&gt; , not a separate system bolted on later.&lt;/p&gt;

&lt;p&gt;When the governance layer evaluates a policy and permits (or denies) an agent interaction, that evaluation &lt;em&gt;is&lt;/em&gt; the audit record. It captures: who called whom, what policy was evaluated, what the decision was, what attributes matched, and when it happened. These records should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Structured&lt;/strong&gt; (not free-text logs),&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Correlated&lt;/strong&gt; across multi-hop chains (using distributed trace IDs),&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Queryable&lt;/strong&gt; by agent, by policy, by time range, by outcome.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The practical implication: the audit trail should be a &lt;strong&gt;first-class product&lt;/strong&gt; of the governance platform, not a configuration option. If you have to enable it, someone will forget. If it’s built in, it’s always there.&lt;/p&gt;

&lt;h3&gt;
  
  
  Principle 5: Kubernetes-native
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The governance layer should work with your existing infrastructure, not replace it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Enterprises have invested heavily in Kubernetes, Helm charts, GitOps pipelines, RBAC, namespaces, and observability stacks. An AI agent governance platform that requires a separate control plane, its own deployment model, or a proprietary orchestration layer will face adoption resistance and operational overhead.&lt;/p&gt;

&lt;p&gt;The governance platform should be deployable via Helm, manageable via CRDs, observable (e.g. via Prometheus or OpenTelemetry), and compatible with existing identity infrastructure (OIDC providers, SPIFFE/SPIRE). It should feel like a natural extension of the Kubernetes platform, not a foreign system that happens to run on it.&lt;/p&gt;

&lt;p&gt;This isn’t just about developer experience. It’s about &lt;strong&gt;operational sustainability&lt;/strong&gt;. If the governance platform requires specialized skills your platform team doesn’t have, it will become a bottleneck instead of an enabler.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the principles reinforce each other
&lt;/h2&gt;

&lt;p&gt;These five principles aren’t independent. They reinforce each other:&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Principle&lt;/strong&gt; | &lt;strong&gt;What it enables&lt;/strong&gt; |&lt;br&gt;
| Default-deny | Provenance; every allowed interaction was explicitly authorized |&lt;br&gt;
| Attribute-based policy | Governance at scale; authorization grows with the network |&lt;br&gt;
| Zero-trust identity | Trust in audit records; every participant is verified |&lt;br&gt;
| Audit by design | Traceability and compliance; every decision is recorded |&lt;br&gt;
| Kubernetes-native | Adoption; the platform integrates with existing infrastructure |&lt;/p&gt;

&lt;p&gt;When evaluating governance solutions, test each principle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If a solution requires you to default to allowing communication and only block specific interactions, &lt;strong&gt;it fails Principle 1.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If it requires naming agents in policies, &lt;strong&gt;it fails Principle 2.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If it relies on static API keys or long-lived tokens, &lt;strong&gt;it fails Principle 3.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If it doesn’t produce correlated audit trails automatically, &lt;strong&gt;it fails Principle 4.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If it needs its own control plane outside Kubernetes, &lt;strong&gt;it fails Principle 5.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The right solution delivers all five. Because accountability requires nothing less.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What’s the difference between default-deny and zero-trust?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Default-deny is a policy posture — no communication unless explicitly permitted. Zero-trust is an identity posture — every identity must be verified, every time. They reinforce each other but aren’t interchangeable. A platform with zero-trust identity but default-allow policy is still ungoverned.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why does Kubernetes-native matter for AI agent accountability?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Because adoption is the difference between a governance platform that works and one that gets shelved. If your platform team has to learn a new control plane, run a parallel deployment pipeline, or operate a proprietary policy engine, the governance layer becomes a bottleneck — and ungoverned agents start showing up because the official path is too slow.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I build this myself with SPIFFE, OPA, and OpenTelemetry?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Technically yes. Practically, you’ll spend 6–12 months on the &lt;a href="https://www.tigera.io/blog/calculating-the-kubernetes-integration-tax-what-your-diy-networking-stack-actually-costs/" rel="noopener noreferrer"&gt;integration glue&lt;/a&gt;, audit correlation across multi-hop chains, dual identity verification, attribute-based policy modeling, and the human oversight surface. We covered the build-vs-buy tradeoff in &lt;a href="https://www.tigera.io/blog/the-ai-agent-accountability-gap-why-network-policies-api-gateways-and-rbac-are-not-enough/#diy-security-patterns-four-tools-no-unified-policy-layer" rel="noopener noreferrer"&gt;post 3 of this series&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Are these principles specific to Tigera Lynx?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. These are architectural principles for any accountable agent governance platform — whether commercial, open source, or homegrown. We use them ourselves to evaluate Lynx, and we’d encourage you to use them to evaluate every option you consider.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Default-deny&lt;/strong&gt; is the only safe starting posture. Anything else leaves ungoverned interactions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attribute-based policy&lt;/strong&gt; is the principle that lets governance scale past 100 agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Zero-trust identity&lt;/strong&gt; must verify both the workload (is this the right agent?) and the user (on whose behalf is it acting?).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit by design&lt;/strong&gt; means audit records are a byproduct of enforcement, not a separate system.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kubernetes-native&lt;/strong&gt; ensures the platform actually gets adopted instead of bypassed.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get the strategic guide for accountable AI agents
&lt;/h2&gt;

&lt;p&gt;We wrote a strategic guide, &lt;a href="https://info.tigera.io/rs/805-GFH-732/images/Whitepaper_Accountability_for_AI_Agents.pdf" rel="noopener noreferrer"&gt;Accountable AI Agents: A Strategic Guide for AI &amp;amp; Security Leaders Governing Autonomous AI at Scale&lt;/a&gt;, that walks through these principles in depth, including a side-by-side comparison of common governance approaches and how they score against each principle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://info.tigera.io/rs/805-GFH-732/images/Whitepaper_Accountability_for_AI_Agents.pdf" rel="noopener noreferrer"&gt;Get the strategic guide for accountable AI agents →&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/five-principles-of-an-accountable-ai-agent-network-how-to-evaluate-any-governance-platform/" rel="noopener noreferrer"&gt;Five Principles of an Accountable AI Agent Network: How to Evaluate Any Governance Platform&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>technicalblog</category>
      <category>aiagentsecurity</category>
      <category>products</category>
    </item>
    <item>
      <title>Multi-Layer Policy for Securing AI Agents</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Wed, 03 Jun 2026 18:58:09 +0000</pubDate>
      <link>https://dev.to/tigeraio/multi-layer-policy-for-securing-ai-agents-4h95</link>
      <guid>https://dev.to/tigeraio/multi-layer-policy-for-securing-ai-agents-4h95</guid>
      <description>&lt;p&gt;As part of our work at Tigera building products that create secure runtime environments for enterprise agents at scale in the real world, one small part of this puzzle I think about a lot is policy, and runtime enforcement of policy, and how to create a comprehensive secure runtime, configured from one place. The more companies we talk to trying to lock down and secure these platforms at runtime, the more I believe &lt;a href="https://www.tigera.io/learn/guides/ai-agent-security/" rel="noopener noreferrer"&gt;AI Agent security&lt;/a&gt; needs policy in multiple places, not just one (e.g., not just at the gateway layer), and ideally expressed in the same policy language.&lt;/p&gt;

&lt;p&gt;At the L7 gateway layer, every agent call is observable: who is calling, what they are calling, what attributes both sides carry, what the requested action is. This is where you decide whether an agent should be permitted to talk to a particular MCP server, invoke a particular tool, delegate to another agent, or call a particular LLM. The atoms of policy here are identity, action, resource, and context.&lt;/p&gt;

&lt;p&gt;At the agent runtime layer, or kernel layer in a container, what the agent does inside its own runtime is observable: syscalls, file access, library loads, network connections that bypass the brokered channel. This is where you decide whether the agent can read a file, open a socket, spawn a subprocess, or load a library. The atoms of policy here are processes, paths, file descriptors, and system calls.&lt;/p&gt;

&lt;p&gt;Both layers are necessary. The gateway alone cannot constrain what an agent does inside its runtime once it holds a token. The kernel alone cannot reason about identity, delegation, or multi-hop intent. Building policy at one and not the other leaves a category gap.&lt;/p&gt;

&lt;p&gt;The architectural choice that makes this work in practice is using one policy language for both. We use Cedar at the gateway and interpret and translate Cedar to &lt;a href="https://docs.tigera.io/calico/latest/about/kubernetes-training/about-ebpf" rel="noopener noreferrer"&gt;eBPF&lt;/a&gt; policy for the kernel: same policies, two enforcement points, one place to author and review.&lt;/p&gt;

&lt;h2&gt;
  
  
  Policy at the gateway: enforcing agent intent
&lt;/h2&gt;

&lt;p&gt;The gateway sees intent. It is the right place to enforce &lt;em&gt;who can talk to whom, under what conditions, with what level of human oversight.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A Cedar policy that constrains which agents can invoke which tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rego"&gt;&lt;code&gt;&lt;span class="n"&gt;permit&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;principal&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="n"&gt;Group&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"finance-agents"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Action&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"invokeTool"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="n"&gt;ToolSet&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"finance-readonly"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;when&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;principal&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;risk_level&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"low"&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;
  &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;delegation_depth&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This policy expresses several things that are hard to model in RBAC or in a network policy. The principal is identified by group membership but constrained by attribute (&lt;code&gt;risk_level&lt;/code&gt;). The resource is a typed set of tools. The condition includes a check on delegation depth; agents three hops deep in a delegation chain are refused even if they pass every other check.&lt;/p&gt;

&lt;p&gt;The gateway layer naturally enforces delegation rules, per-hop token issuance with scope reduction, agent-to-MCP tool authorization, agent-to-LLM constraints, human-in-the-loop hooks for high-stakes actions, and attribute-based decisions across all of these.&lt;/p&gt;

&lt;p&gt;What the gateway cannot do is constrain what happens after it issues a token. Once the agent has the credential, the kernel is the only layer that sees what the process actually does with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Policy at the kernel: constraining agent behaviour
&lt;/h2&gt;

&lt;p&gt;The kernel sees behaviour. It is the right place to enforce &lt;em&gt;what an agent process is allowed to do at the operating system level&lt;/em&gt;, regardless of what tokens it holds.&lt;/p&gt;

&lt;p&gt;A baseline sandbox for an agent workload, expressed conceptually in the same Cedar policy model and compiled to BPF programs at runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rego"&gt;&lt;code&gt;&lt;span class="n"&gt;permit&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;principal&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="n"&gt;AgentClass&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"data-analyst"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Action&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"readFile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Action&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"writeFile"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="n"&gt;is&lt;/span&gt; &lt;span class="n"&gt;FilePath&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;when&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="n"&gt;like&lt;/span&gt; &lt;span class="s2"&gt;"/workspace/analyst-*"&lt;/span&gt; &lt;span class="p"&gt;||&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"/var/run/secrets/analyst-key"&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="n"&gt;forbid&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="n"&gt;principal&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="n"&gt;AgentClass&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"data-analyst"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;Action&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"connectNetwork"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt; &lt;span class="n"&gt;is&lt;/span&gt; &lt;span class="n"&gt;NetworkDestination&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;unless&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="n"&gt;in&lt;/span&gt; &lt;span class="n"&gt;DestinationSet&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="s2"&gt;"approved-llm-endpoints"&lt;/span&gt; &lt;span class="p"&gt;||&lt;/span&gt;
  &lt;span class="n"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"lynx-gateway.internal"&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The compilation target is BPF LSM hooks, cgroup network hooks, and file access enforcement at the inode boundary. When the agent process steps outside what the policy permits, the kernel refuses the operation – &lt;code&gt;EPERM&lt;/code&gt; for the syscall, &lt;code&gt;ECONNREFUSED&lt;/code&gt; for the network connection, &lt;code&gt;ENOENT&lt;/code&gt; for the file access. The agent gets the same error it would get for any prohibited operation, regardless of what credentials it holds.&lt;/p&gt;

&lt;p&gt;The kernel layer naturally enforces file access boundaries, network egress restrictions, syscall whitelisting, library load constraints, and process-spawn limits. The same observation pipeline that feeds enforcement also feeds threat detection.&lt;/p&gt;

&lt;p&gt;What the kernel cannot do is reason about why an action is being attempted. It sees a &lt;code&gt;connect()&lt;/code&gt; system call. It does not know whether the call is part of a legitimate multi-hop delegation that the gateway already authorized. That context only exists at the L7 layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The dual-layer architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs685pbznwc9qm61rs2vm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs685pbznwc9qm61rs2vm.png" width="800" height="492"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The architectural integration matters as much as either layer in isolation. Cedar policies authored once, evaluated at the gateway, compiled to BPF for kernel enforcement. The compilation is not magical—only the substrate-relevant subset of Cedar policies compiles. The rest stay at the gateway. Either way, security teams write Cedar; the runtime decides which layer is the right one to enforce at.&lt;/p&gt;

&lt;p&gt;This integration is what makes the dual-layer approach operationally sustainable. Without one policy language, you end up with two policy stores, two review processes, two engineering teams, and inevitable divergence between what the gateway permits and what the kernel allows. With Cedar at both layers, the policy you wrote is the policy that gets enforced everywhere.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why single-layer policy isn’t enough for AI agent security
&lt;/h2&gt;

&lt;p&gt;Policy at the gateway alone defends against unauthorized callers and out-of-scope actions. It does not defend against a compromised agent that has a legitimate token and uses it to do things outside its intended behaviour, like read credential files, exfiltrate data through side channels, and escalate privilege inside its runtime.&lt;/p&gt;

&lt;p&gt;Policy at the kernel alone defends against process-level misbehaviour. It does not understand identity or delegation, cannot reason about whether a network connection is part of a legitimate multi-hop chain, and has no way to enforce human-in-the-loop approval flows.&lt;/p&gt;

&lt;p&gt;Combined, the two layers cover the threat model that either layer alone misses. A compromised agent with a legitimate token can still call out through the gateway, but its local actions are constrained by the kernel sandbox. A misconfigured Cedar policy at the gateway is mitigated by the substrate baseline. A shadow agent that never registered is observed and contained at the kernel.&lt;/p&gt;

&lt;p&gt;For Kubernetes-native enterprises building agent infrastructure into regulated workloads, this is the architecture worth building toward. Gateway policy for what agents are allowed to ask for. Kernel policy for what they are allowed to do. Same language for both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going deeper
&lt;/h2&gt;

&lt;p&gt;Multi-layer policy is one piece of a larger problem: making AI agent infrastructure accountable end-to-end. Traceability, authorization provenance, identity and ownership, policy-based governance at scale, and human oversight and intervention—they all have to work together.&lt;/p&gt;

&lt;p&gt;Read: &lt;a href="https://www.tigera.io/blog/the-five-pillars-of-ai-agent-accountability-a-diagnostic-framework-for-engineering-leaders/" rel="noopener noreferrer"&gt;The Five Pillars of AI Agent Accountability →&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/multi-layer-policy-for-securing-ai-agents/" rel="noopener noreferrer"&gt;Multi-Layer Policy for Securing AI Agents&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>technicalblog</category>
      <category>aiagentsecurity</category>
      <category>bestpractices</category>
    </item>
    <item>
      <title>What’s new in Calico: Spring 2026 Release</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Tue, 02 Jun 2026 16:10:41 +0000</pubDate>
      <link>https://dev.to/tigeraio/whats-new-in-calico-spring-2026-release-1lgg</link>
      <guid>https://dev.to/tigeraio/whats-new-in-calico-spring-2026-release-1lgg</guid>
      <description>&lt;p&gt;Kubernetes has come a long way since its debut in 2014. It’s gone from running a couple of containerized microservices to orchestrating fleets of production workloads spanning everything from AI agents to full scale VMs running in pods. As Kubernetes adoption grows, and its use cases stretch to cover more ground, managing its increasingly complex networking and security landscape demands operational maturity and a platform that supports it.&lt;/p&gt;

&lt;p&gt;The Spring 2026 release of Calico provides that support in two key areas:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Unified operations across Kubernetes pods and VMs&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;KubeVirt Live Migration in Bridge Mode&lt;/strong&gt; allows you to migrate VM workloads with IPs preserved, minimal packet loss, and fast route convergence. VMs can move between nodes for planned maintenance, load balancing and to support high availability without interrupting network connectivity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Egress Gateway Layer 2 Advertisements&lt;/strong&gt; (Enterprise exclusive) lets pod traffic egress with IPs from the host’s own subnet so workloads get a stable identity the rest of your network already recognizes eliminating the need for BGP Peering to advertise Egress Gateway IPs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy recommendations for VMs and hosts&lt;/strong&gt; (Enterprise exclusive) automates and scales policy authoring for Calico-managed workloads running outside of your Kubernetes clusters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenStack Live Migration Improvements&lt;/strong&gt; lets you migrate VM workloads running in high availability OpenStack environments with minimal risk of service disruption during maintenance. Preloading policies on the target node keeps downtime inside the single-digit-second SLOs regulated workloads require.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Production-grade operations at scale&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Whisker Policy Verdict and UI Improvements&lt;/strong&gt; reveal connectivity blockers in minutes by letting you see the actual tier, policy, and rule that denied a flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Calico Load Balancer – Maintenance Mode&lt;/strong&gt; (Enterprise exclusive) supports graceful node maintenance by excluding backends on nodes marked for maintenance from new Maglev assignments, allowing existing connections to drain naturally. Operators can monitor active connections via Prometheus metrics to determine when it is safe to proceed with node maintenance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What’s new in Calico Open Source v3.32
&lt;/h2&gt;

&lt;p&gt;Two new noteworthy features headline this release: Kubevirt Live Migration and Whisker UI improvements.&lt;/p&gt;

&lt;h3&gt;
  
  
  KubeVirt Live Migration in Bridge Mode
&lt;/h3&gt;

&lt;p&gt;Running VMs in Kubernetes comes with many challenges, among them the need to preserve a VMs IP during live migration so that network traffic can continue uninterrupted. One way to handle this is with Multus and a bridge CNI, statically configuring the VM’s IP and plumbing it directly into the underlay. That preserves the IP, but the VMs sit outside of Calico which means no microsegmentation, no observability and no shared tooling with pods running alongside these VMs. With Calico v3.32, Calico IPAM assigns persistent IPs to KubeVirt VMs. The IP survives live migration and pod restarts and can be advertised upstream over BGP. VMs share the same Kubernetes-native pod network as containers, with the same CNI, policies, observability, load balancing, QoS, and Layer 7 traffic management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78gpfj5q82jo7im75jj1.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F78gpfj5q82jo7im75jj1.jpg" alt="Live migration in bridge mode ships as a tech preview in OSS v3.32 and moves to production GA in the June release." width="800" height="438"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Live migration in bridge mode ships as a tech preview in Calico Open Source v3.32 and moves to production GA in the June release.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of KubeVirt Live Migration in Bridge Mode:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Migrate VMs With Live Connections:&lt;/strong&gt; Ensure long-lived TCP sessions such as database queries stay connected across the migration so applications don’t have to reconnect.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep VM Workloads Reachable During Maintenance:&lt;/strong&gt; Live migrate VMs to new nodes without blocking user access to applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor VM Migrations in a Shared Dashboard:&lt;/strong&gt; Track live-migration success rates, duration, and post-move network metrics in the same place you track pod activity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Run One Network, Not Two:&lt;/strong&gt; Stop maintaining parallel networking layers with VMs sharing the CNI, policy framework, and observability stack with your pod workloads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario: Live Migration That Keeps VMs on the Pod Network
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Situation:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A financial services enterprise is consolidating its virtualization estate onto KubeVirt on Kubernetes. The VM count sits in the six figures across dozens of clusters. Live migration is part of routine operations: VMs move between nodes during patching, capacity rebalancing, and host failures. The current workaround is Multus and a bridge CNI plumbed into the underlay, which keeps the IP through the move but leaves the VMs outside Calico’s pod network. The platform team would like to implement microsegmentation and observability for VMs as they do for containerized applications.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The Calico Solution:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Calico IPAM assigns each KubeVirt VM a persistent IP that survives live migration and pod restarts, advertised to the upstream network over BGP. Every VM runs on the same Kubernetes-native pod network as the containers next to it, with the same network policies, observability, load balancing, QoS, and Layer 7 traffic management. When nodes go down for maintenance, VMs move and connections survive. The microsegmentation and observability story stays intact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Whisker Policy Verdict and UI Improvements
&lt;/h3&gt;

&lt;p&gt;Knowing a flow was blocked by policy is a good start to troubleshooting a connection problem. It does not, however, answer the more important question: what policy is responsible and why? Without knowing the reason a flow is denied, the problem cannot be fixed and tracing a flow’s journey across multiple policy tiers and rules can be unreliable and time consuming, potentially prolonging an outage.&lt;/p&gt;

&lt;p&gt;The Whisker updates in v3.32 put the verdict, the matching policy, and the full tier chain right in the flow log view. See all the policies that were invoked by drilling down into a flow. Filter by policy kind, tier, namespace and policy name to find out which flows selected policies take action on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of Whisker Verdict Improvements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;See the Policy Kind, Tier, and Rule Behind Every Verdict:&lt;/strong&gt; Surface the full evaluation chain, not just the allow/deny decision.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter by Verdict or Policy:&lt;/strong&gt; Narrow the flow log view to just denies or filter by kind, tier, namespace and name, or any combination, to see which flows a set of policies affects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Close Policy-Denial Tickets in Minutes:&lt;/strong&gt; Reduce the troubleshooting path from a lengthy and painstaking analysis of policy layers to a thirty-second click into the matching rule.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Let Application Teams Self-Serve:&lt;/strong&gt; Trace your team’s own policy denies without waiting on the platform team.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario: The Five-Minute Incident That Used to Take an Hour
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Situation:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A developer on the web-app team opens a ticket: their new service can’t reach the payment service. An on-call platform engineer pulls up Whisker, sees the flow was denied, and starts the usual investigation, checking tiers, scanning policies and cross-referencing rules, while walking the developer through each step. Forty minutes later, they find the issue: the payment tier has a default-deny policy that doesn’t include web-app in its allowed-set.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The Calico Solution:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
With the Whisker verdict view, the platform engineer opens the flow log, filters by denied flows for the web-app service, and clicks the first matching row. The verdict panel immediately shows the tier, policy, and rule that produced the deny with enough context to describe the fix. The incident is resolved in five minutes, and the ticket closes with a clear remediation path. The platform engineer then stages the fixed policy and then in Whisker filters by kind, tier and policy name to see if any other flows will be affected, averting potential problems.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigera.io%2Fapp%2Fuploads%2F2026%2F06%2FWhats-new-in-Calico-Spring-2026-Release-2.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.tigera.io%2Fapp%2Fuploads%2F2026%2F06%2FWhats-new-in-Calico-Spring-2026-Release-2.gif" alt="Click a denied flow to see the tier, the policy, and the rule that produced the verdict." width="640" height="362"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Click a denied flow to see the tier, the policy, and the rule that produced the verdict.&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  ClusterNetworkPolicy: Cluster-Wide Policy Goes Standard
&lt;/h3&gt;

&lt;p&gt;Calico has had GlobalNetworkPolicy for years, cluster-scoped policy that sits above namespace boundaries and gives platform teams a place to define org-wide guardrails, default-deny baselines, and cross-namespace controls. The Kubernetes SIG-Network ClusterNetworkPolicy spec is the upstream community’s version of the same idea, and Calico Open Source v3.32 implements it.&lt;/p&gt;

&lt;p&gt;While this is more housekeeping than a headline feature, it has two important implications. First, for the Kubernetes community, Calico’s conformant implementation keeps the spec moving and helps cement cluster-wide policy as a first-class part of the standard. Second, for platform teams already running Calico, ClusterNetworkPolicy provides the same cluster-level control surface as GlobalNetworkPolicy, but utilizes the standard upstream API. This means that tooling built around the spec remains reusable and consistent, regardless of the underlying network implementation.&lt;/p&gt;

&lt;p&gt;If you’ve been using GlobalNetworkPolicy in your policy pipelines, you don’t need to do anything; everything keeps working. If you’re starting fresh or building tooling that needs to work across multiple CNIs, ClusterNetworkPolicy is now an option to consider.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of ClusterNetworkPolicy:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Define Policy Cluster-Wide With the Standard API:&lt;/strong&gt; Use the upstream SIG-Network ClusterNetworkPolicy spec at the cluster level, no vendor-specific CRD required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adopt the Standard Without Re-Learning:&lt;/strong&gt; ClusterNetworkPolicy mirrors GlobalNetworkPolicy in shape and behavior, so platform teams already running Calico’s cluster-scoped policy keep the same mental model and tooling.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Stay Aligned With Where Kubernetes Is Heading:&lt;/strong&gt; Calico’s early implementation moves the SIG-Network ClusterNetworkPolicy spec toward general adoption, cementing cluster-wide policy as a first-class Kubernetes concept.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fthjmtlub8g9r5gru59ml.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fthjmtlub8g9r5gru59ml.jpg" alt="Cluster-wide network policy scope, now in the standard upstream API" width="800" height="395"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Cluster-wide network policy scope, now in the standard upstream API&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  OpenStack Live Migration Improvements
&lt;/h3&gt;

&lt;p&gt;Calico’s route management work in v3.32 closes the gap that’s kept regulated workloads out of OpenStack live migration. By preloading network policies on the target node ahead of a migration, traffic resumes the moment the VM lands instead of waiting for the network to catch up. This solution, which leverages the same route management code that powers KubeVirt Bridge-Mode live migration, addresses the pain of migration for specific industries that measure downtime in single-digit seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of OpenStack Live Migration Improvements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Migrate Within Your Downtime SLO: Complete OpenStack live migrations within the single-digit-second SLOs that regulated workloads require.&lt;/li&gt;
&lt;li&gt;Live Migration During Active Hours: Run live migration without having to wait for off-hours maintenance windows.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario: Migrating a Trading Workload During Market Hours
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Situation:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A regulated financial-data provider runs a trading workload on OpenStack with a single-digit-second downtime SLO for live migrations. Their current KVM live migration routinely stalls long enough to violate it. The platform team has been limited to performing host maintenance during narrow after-hours windows, and some migrations have simply been deferred indefinitely.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The Calico Solution:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
After upgrading to Calico v3.32, the team measures live-migration downtime against their reference workload and finds it consistently within SLO. Host maintenance is now possible during trading hours. Deferred migrations can be scheduled and completed without requiring an after-hours rotation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0hdljemxwn0ij97gbcr.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0hdljemxwn0ij97gbcr.jpg" alt="The node is ready when the VM arrives reducing downtime" width="800" height="426"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The node is ready when the VM arrives reducing downtime&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Also in this release: Istio Ambient Mode comes to Calico Open Source
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Not new, but new here&lt;/strong&gt;. Calico Enterprise v3.22.1 bundled Istio Ambient Mesh in the Tigera Operator bringing the production hardened and one hundred percent upstream Istio images with sidecarless mTLS to the Calico stack.&lt;/p&gt;

&lt;p&gt;As of Calico Open Source v3.32, the same capability is available in the open-source edition. If your platform team is running Istio in sidecar mode, or has given up on service mesh because of its complexity and resource usage, Istio’s ambient mode is worth a second look. In ambient mode there are no sidecars to wrangle on every upgrade, no per-pod CPU and memory overhead, and a much smaller surface to patch when the next CVE lands.&lt;/p&gt;

&lt;p&gt;For the full story including architecture, migration path, and a sidecar-tax deep dive, read the &lt;a href="https://www.tigera.io/blog/whats-new-in-calico-winter-2026-release/" rel="noopener noreferrer"&gt;Winter 2026 launch blog post&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s new in Calico Enterprise and Calico Cloud
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;KubeVirt Live Migration in Bridge Mode&lt;/strong&gt; that is part of Calico Open Source v3.32 is also available in Calico Enterprise where it arrives as a tech preview in v3.23 EP2. For organizations evaluating KubeVirt as their landing spot for VMs, this is the release that makes Calico a supported production target.&lt;/p&gt;

&lt;p&gt;Beyond KubeVirt, three Platform-exclusive capabilities help you achieve operational maturity at scale, keeping your policy estate clean, unifying management across cluster and non-cluster workloads, and running load-balancer maintenance without customer impact.&lt;/p&gt;

&lt;h3&gt;
  
  
  Last Evaluated Metrics, Now via API (Cloud and Enterprise)
&lt;/h3&gt;

&lt;p&gt;As customers extend microsegmentation across Kubernetes, the policy set grows sometimes into the thousands for large enterprises. Workloads change, applications change, and the policies that were essential six months ago may not match traffic anymore. Unused policies don’t announce themselves, they lurk, no longer evaluating traffic, but still on the books, a security and compliance risk that violates the least-privileged posture you’ve spent years building towards.&lt;/p&gt;

&lt;p&gt;The Winter 2026 release introduced the “Last Evaluated” metric to surface policies and rules that haven’t matched traffic within a configurable window. Spring 2026 adds API access. Platform teams can now query the metric programmatically and feed it into automated cleanup workflows, compliance reports, scheduled alerts, or command line utilities. The same data that supports a PCI DSS v4.1 audit conversation can now flow into a Prometheus alerting rule or a nightly cleanup-candidate report.&lt;/p&gt;

&lt;p&gt;One thing worth being explicit about: the metric tells you whether a policy is evaluating traffic, not whether it should still exist. Customers still make the call about what’s genuinely unused, based on knowledge of the workloads. The API uncovers the candidates. The platform team makes the decision.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of Last Evaluated Metrics:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Automate Policy Hygiene:&lt;/strong&gt; Pipe Last Evaluated data into Prometheus alerts, scheduled reports, or any other workflow you already run.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generate Compliance Evidence on Demand:&lt;/strong&gt; Show auditors that every active rule is in use, the proof PCI DSS v4.1 and similar standards require.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Troubleshoot From the CLI:&lt;/strong&gt; Query last-evaluated state directly via terminal during an incident, no browser required.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Decommission Unused Policies Without Guesswork:&lt;/strong&gt; Confidently clean up unused policies, not only to maintain that least-privileged posture but to reduce etcd memory pressure and shorten policy-engine evaluation time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario: Pruning a Microsegmentation Estate at Scale
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Situation:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A large financial-services platform team has been running Calico for several years. Their policy set has grown to several thousand policies accumulated from successive microsegmentation projects, decommissioned services, and one-off tickets. PCI DSS v4.1 audit is approaching, and the auditor wants evidence that every active rule is actually serving a purpose. Manually reviewing several thousand rules isn’t feasible, and the team can’t safely delete what they don’t understand.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The Calico Solution:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The platform team uses the Last Evaluated Metrics API to pull a list of policies and rules that haven’t matched traffic in the last 90 days. They route the output to a CSV, distribute it to the owning teams, and ask each team to confirm or contest each candidate. Within two weeks the policy set is several hundred rules smaller and the auditor gets the evidence trail directly from the metric output, not from a manual investigation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsez97k4u4zgyanu8w8n.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvsez97k4u4zgyanu8w8n.jpg" alt="Automate your least-privileged posture" width="800" height="395"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Automate your least-privileged posture&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Egress Gateway Layer 2 Advertisements
&lt;/h3&gt;

&lt;p&gt;With Egress Gateway Layer 2 Advertisements to Calico 3.23 EP2 eliminates the need for cluster-specific egress IP pools and for BGP peering with ToR switches. You can now assign addresses from the hosts subnet to egress gateways, SNAT egress traffic to the gateway’s host node IP and forward packets using ARP. This means less reliance on coordinating with the network team, more efficient use of routable IP addresses and simplified firewall rules for reduced operational overhead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of Egress Gateway Layer 2 Advertisements:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduce the Need for Coordination with the Network Team:&lt;/strong&gt; Allocate IPs to new egress gateways without extensive intervention by the networking team significantly increasing deployment velocity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forward Packets Using ARP:&lt;/strong&gt; Decrease operational overhead doing away with BGP session on top-of-rack switches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avoid Depleting Routable IPs in Large Environments:&lt;/strong&gt; Configure a shared set of allow-listed IPs rather than a per-tenant pool preserving scarce routable IPs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Maintain One Firewall Ruleset:&lt;/strong&gt; Pod egress IPs come from the host’s own subnet, so the firewall team works with the same address space it already maintains for hosts and VMs making firewall configuration and ongoing maintenance much simpler.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbxdm19ic1mocboi7x7eg.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbxdm19ic1mocboi7x7eg.jpg" alt="Pod egress lives in the same address space your network team already maintains for hosts and VMs" width="799" height="455"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Pod egress lives in the same address space your network team already maintains for hosts and VMs&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario: Cluster Scale-Up Without a Firewall Ticket
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Situation:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A financial services platform team exposes a set of cluster services to external partner systems through a corporate firewall. Pod egress traffic uses IPs from a cluster-managed pool that the network team registers in the firewall ruleset. Every time the platform team scales the cluster, the pool changes, the firewall ruleset needs updating, and a change-control ticket flows between the two teams. They meet monthly to reconcile drift.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The Calico Solution:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Egress Gateway Layer 2 Advertisements moves pod egress identity into the host’s own subnet. Pod traffic now exits the cluster using a uniquely identifiable IP address from the host’s routable subnet, which can be allowed by the network firewall. Cluster scale-ups stop triggering firewall changes. The reconciliation meeting comes off the calendar.&lt;/p&gt;

&lt;h3&gt;
  
  
  Policy Recommendations for VMs and Hosts
&lt;/h3&gt;

&lt;p&gt;Calico’s policy recommendations engine has been a valuable tool in a platform engineers arsenal giving teams a head start authoring policies for Kubernetes pods. Until now, however, they could not take advantage of this productivity boost when it came to hosts running outside a cluster. A new VM or bare-metal workload meant manually combing through flow logs and hand-authoring policies which, at scale, often became a significant microsegmentation bottleneck. Policy Recommendations for VMs and Hosts extends the policy recommendation engine to non-cluster workloads. As of v3.23 EP2, Calico observes traffic to and from VMs and bare-metal hosts generating recommended starting policies just as it does for the workloads running in your cluster. The same review-and-apply process platform engineers use for pods now applies to every workload Calico manages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of Policy Recommendations for VMs and Hosts:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dispense with Hand-Rolling Policies for VMs and Hosts:&lt;/strong&gt; Calico generates starting points for non-cluster workloads from observed traffic, the same way it does for pods.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scale Microsegmentation Across the Whole Estate:&lt;/strong&gt; Bring least-privilege policies to hundreds or thousands of non-cluster workloads without writing each one by hand.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use One Authoring Workflow for Every Workload:&lt;/strong&gt; Work with the same tooling and the same review pattern across pods, VMs, and bare-metal hosts.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario: Microsegmenting a Thousand VMs Without a Thousand Authoring Tasks
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Situation:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
A telco runs Kubernetes workloads for 5G edge services alongside thousands of VMs for legacy signaling systems. The platform team has automated policy recommendations for pods, but every new VM workload comes with a manual policy-authoring task. The team cannot keep pace with the VM side, so default policies on VMs trend toward permissive over time.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The Calico Solution:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
With Policy Recommendations for VMs and Hosts, the team’s existing recommendation workflow now covers VMs and bare-metal workloads. Recommendations come in based on observed traffic. The team reviews and applies them at the same rate they already review and apply pod recommendations. Microsegmentation extends across the entire estate without doubling the authoring workload.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5esj7ph1sz4gwxjxz9oc.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5esj7ph1sz4gwxjxz9oc.jpg" alt="One review-any-apply workflow across pods, VMs and bare-metal hosts" width="799" height="381"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;One review-any-apply workflow across pods, VMs and bare-metal hosts&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Calico Load Balancer – Maintenance Mode (Enterprise Exclusive)
&lt;/h3&gt;

&lt;p&gt;Choosing a software load balancer was already the right call for platform teams who wanted declarative service exposure and consistent-hash session affinity, capabilities Calico Load Balancer has delivered since v3.23 EP1.&lt;/p&gt;

&lt;p&gt;With v3.23 EP2, the call gets easier. The fast, predictable failover that a pair of hardware load balancers in HA handles cleanly is now native to Calico’s software LB and ready to take over from that expensive 2018 LB you thought you had to replace. Calico Load Balancer now supports label-based node exclusion. Setting &lt;code&gt;maglev.tigera.io/exclude=true&lt;/code&gt; on a node tells Calico Load Balancer to stop forwarding new connections to the backends the node hosts while keeping existing sessions flowing until they complete naturally. Prometheus metrics expose per-node active session counts so operators can watch them decline to zero before proceeding with the drain.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key Benefits of Graceful Maglev Session Handling:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Patch Nodes During Business Hours:&lt;/strong&gt; Take nodes out of load-balancer rotation for kernel patches, kubelet upgrades, or hardware work without scheduling around customer traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drain a Node With a Single Label:&lt;/strong&gt; Set &lt;code&gt;maglev.tigera.io/exclude=true&lt;/code&gt; on a node and Calico Load Balancer stops forwarding new connections to its backends, with no custom scripts or out-of-band coordination.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drain Without Forcing Disconnects:&lt;/strong&gt; Active sessions on the excluded node keep flowing until they complete naturally so maintenance doesn’t cut off in-flight work.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Know When It’s Safe to Drain:&lt;/strong&gt; Prometheus metrics expose per-node session counts so operators can watch them decline to zero before proceeding with maintenance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Scenario: Maintenance That Customers Never Notice
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The Situation:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Scheduled maintenance on a node serving live customer traffic has always been a balancing act. Take the node out of rotation too early and customers with in-flight transactions get cut off mid-session. Wait too long and the maintenance window slips. Most teams have either accepted some level of session disruption or built bespoke tooling to coordinate their load balancer’s health checks with the drain workflow.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;The Calico Solution:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
The platform engineer labels the node with &lt;code&gt;maglev.tigera.io/exclude=true&lt;/code&gt;. From that moment, Calico routes new connections to backends elsewhere in the cluster. Existing sessions on the excluded node keep flowing until they complete, so customers with in-flight transactions finish them naturally. The engineer watches per-node session counts in Prometheus, and when the count reaches zero, drains the node. The maintenance happens. The customers don’t notice.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1xw2mzxvjrvotal4k79.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr1xw2mzxvjrvotal4k79.jpg" alt="Same fast, predictable failover as hardware load balancers but Kubernetes native" width="799" height="509"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Same fast, predictable failover as hardware load balancers but Kubernetes native&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Started with Calico Spring 2026
&lt;/h2&gt;

&lt;p&gt;The Spring 2026 release closes some critical Day 2 operations gaps unifying operations across Kubernetes pods and VMs, collapsing two operational worlds into one network, one policy model and one observability stack. It removes long-standing operational friction and clears the way for scaling infrastructure securely and efficiently helping teams take that next step towards Kubernetes operational maturity.&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Environment&lt;/strong&gt; | &lt;strong&gt;Action Required&lt;/strong&gt; | &lt;strong&gt;Documentation Link&lt;/strong&gt; |&lt;br&gt;
| Calico Open Source | Upgrade to Calico v3.32 | &lt;a href="https://docs.tigera.io/calico/latest/release-notes/" rel="noopener noreferrer"&gt;Calico Open Source release notes&lt;/a&gt; |&lt;br&gt;
| Calico Enterprise | Upgrade to Enterprise v3.23 EP2 | &lt;a href="https://docs.tigera.io/calico-enterprise/3.23/getting-started/upgrading/" rel="noopener noreferrer"&gt;Upgrade Calico Enterprise documentation&lt;/a&gt; |&lt;br&gt;
| Calico Cloud | Follow instructions to update connected clusters | &lt;a href="https://docs.tigera.io/calico-cloud/get-started/upgrade-cluster" rel="noopener noreferrer"&gt;Upgrade Calico Cloud instructions&lt;/a&gt; |&lt;/p&gt;

&lt;p&gt;To learn more about these new product capabilities and see them in action, &lt;a href="https://www.tigera.io/demo/" rel="noopener noreferrer"&gt;schedule a demo&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/whats-new-in-calico-spring-2026-release/" rel="noopener noreferrer"&gt;What’s new in Calico: Spring 2026 Release&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>companyblog</category>
      <category>technicalblog</category>
      <category>opensource</category>
      <category>products</category>
    </item>
    <item>
      <title>The AI Agent Accountability Gap: Why Network Policies, API Gateways, And RBAC Are Not Enough</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Wed, 27 May 2026 18:45:39 +0000</pubDate>
      <link>https://dev.to/tigeraio/the-ai-agent-accountability-gap-why-network-policies-api-gateways-and-rbac-are-not-enough-49b8</link>
      <guid>https://dev.to/tigeraio/the-ai-agent-accountability-gap-why-network-policies-api-gateways-and-rbac-are-not-enough-49b8</guid>
      <description>&lt;p&gt;In &lt;a href="https://www.tigera.io/blog/the-five-pillars-of-ai-agent-accountability-a-diagnostic-framework-for-engineering-leaders/" rel="noopener noreferrer"&gt;The Five Pillars of AI Agent Accountability: A Diagnostic Framework for Engineering Leaders&lt;/a&gt;, we walked through each pillar of AI agent accountability (traceability, authorization provenance, identity and ownership, policy at scale, and human oversight) and argued that most enterprises today sit at Level 0 or Level 1 of the Accountability Maturity Model.&lt;/p&gt;

&lt;p&gt;The most common reaction we get when we share that framework is some version of: &lt;strong&gt;“We’re already covered. We have network policies. We have an API gateway. We have RBAC.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This article is for that reaction.&lt;/p&gt;

&lt;p&gt;Enterprises aren’t starting from zero. Most have invested in security, networking, and identity infrastructure that works well for traditional workloads. The problem isn’t a lack of tools. It’s that existing tools were &lt;a href="https://www.paloaltonetworks.com/cyberpedia/what-is-agentic-ai-governance" rel="noopener noreferrer"&gt;designed for model outputs, not autonomous actions&lt;/a&gt;; a world where services are deterministic, communication patterns are predictable, and humans make all the decisions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigera.io/learn/guides/ai-agent-security/agentic-ai-security/" rel="noopener noreferrer"&gt;Agentic AI&lt;/a&gt; breaks every one of those assumptions. Here’s where the most common approaches each leave a critical accountability gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Network policies: the wrong abstraction level
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.tigera.io/learn/guides/kubernetes-security/kubernetes-network-policy/" rel="noopener noreferrer"&gt;Kubernetes Network Policies&lt;/a&gt; are essential for securing any cluster. They restrict which pods can communicate with which other pods at the network level, and they should absolutely be part of your security posture.&lt;/p&gt;

&lt;p&gt;But network policies operate at the wrong abstraction level for agent accountability. They can say &lt;em&gt;“pods in namespace A can reach pods in namespace B.”&lt;/em&gt; They cannot say &lt;em&gt;“Agent A with risk-level=low can only call agents with risk-level=low.”&lt;/em&gt; They have no concept of agent identity, capabilities, or policy attributes.&lt;/p&gt;

&lt;p&gt;More critically, network policies produce &lt;strong&gt;no audit trail&lt;/strong&gt;. When a connection is allowed, there’s no record of &lt;em&gt;why&lt;/em&gt; it was allowed; no policy name, no attribute match, no traceable decision. When your compliance team asks “&lt;em&gt;was this interaction authorized by policy?”&lt;/em&gt; a network policy gives you nothing to show them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accountability gap:&lt;/strong&gt; No agent-level authorization, no audit trail, no provenance.&lt;/p&gt;

&lt;h2&gt;
  
  
  API gateways: built for north-south, not agent-to-agent
&lt;/h2&gt;

&lt;p&gt;API gateways (e.g. NGINX, Kong, Envoy, cloud-native gateways) are designed for request routing, rate limiting, and basic authentication. They work well for north-south traffic: external clients accessing internal services.&lt;/p&gt;

&lt;p&gt;But agent-to-agent communication is east-west traffic between internal services, often with complex multi-hop chains. API gateways don’t understand agent identities, don’t evaluate agent-specific policies, and don’t produce agent-aware audit trails that correlate across multiple hops.&lt;/p&gt;

&lt;p&gt;An API gateway can tell you &lt;em&gt;“a request came from IP 10.0.3.47 and was routed to service X.”&lt;/em&gt; It can’t tell you &lt;em&gt;“Agent A (owned by the finance team, risk-level=medium) called Agent B (owned by the compliance team, capability=audit-query) and this was permitted by policy P-2847.”&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That’s the level of detail your compliance team needs. An API gateway will never give it to them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accountability gap:&lt;/strong&gt; No agent identity awareness, no policy evaluation, no multi-hop trace correlation.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP and A2A protocols: communication without governance
&lt;/h2&gt;

&lt;p&gt;The Model Context Protocol (MCP) and &lt;a href="https://www.tigera.io/blog/how-ai-agents-communicate-understanding-the-a2a-protocol-for-kubernetes/" rel="noopener noreferrer"&gt;Agent-to-Agent (A2A) protocol&lt;/a&gt; represent major progress in standardizing agent communication. MCP standardizes how agents connect to tools. A2A standardizes how agents coordinate with each other.&lt;/p&gt;

&lt;p&gt;Both are important infrastructure. And both explicitly assume that someone else handles governance.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MCP&lt;/strong&gt; solves the &lt;em&gt;how&lt;/em&gt; of tool access: A consistent protocol for discovering and calling tools. It does not solve the &lt;em&gt;who&lt;/em&gt;: which agents are allowed to access which tools, under what conditions, and with what audit trail.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A2A&lt;/strong&gt; solves the &lt;em&gt;how&lt;/em&gt; of agent coordination: Capability discovery, task delegation, lifecycle tracking. It does not solve the &lt;em&gt;who&lt;/em&gt;: which agents are allowed to delegate to which other agents, or who is accountable when a delegated task goes wrong.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These protocols are necessary but not sufficient. They are the plumbing, not the governance. Using MCP without agent governance is like having HTTP without authentication; the communication works, but anyone can call anything.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accountability gap:&lt;/strong&gt; Protocols handle communication mechanics, not authorization, policy enforcement, or audit.&lt;/p&gt;

&lt;h2&gt;
  
  
  DIY security patterns: four tools, no unified policy layer
&lt;/h2&gt;

&lt;p&gt;The O’Reilly book, &lt;a href="https://www.oreilly.com/library/view/generative-ai-on/9781098171919/" rel="noopener noreferrer"&gt;Generative AI on Kubernetes (2026),&lt;/a&gt; documents four security patterns for securing MCP communication: token passthrough, service account delegation, OAuth2 token exchange, and mTLS with SPIFFE/SPIRE. Each pattern is sound on its own.&lt;/p&gt;

&lt;p&gt;The problem is that implementing all four creates four disconnected systems with no unified policy layer:&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Pattern&lt;/strong&gt; | &lt;strong&gt;What it does&lt;/strong&gt; | &lt;strong&gt;What it misses&lt;/strong&gt; |&lt;br&gt;
| Token passthrough | Propagates user identity through hops | No agent-level policy evaluation |&lt;br&gt;
| Service accounts | Authenticates workloads | Loses user attribution |&lt;br&gt;
| OAuth2 token exchange | Preserves both identities | Requires a separate token- exchange service to operate |&lt;br&gt;
| SPIFFE/SPIRE mTLS | Cryptographic workload identity | No knowledge of agent capabilities or team ownership |&lt;/p&gt;

&lt;p&gt;None of these patterns produce a correlated audit trail that spans the full agent interaction chain. None evaluate declarative policies based on agent attributes. None provide a dashboard for human oversight of agent communication patterns.&lt;/p&gt;

&lt;p&gt;Building accountability from these primitives is like building a car from raw steel, technically possible, but nobody should have to do it from scratch. We’ve seen platform teams sink six to twelve months of engineering into &lt;a href="https://www.tigera.io/blog/calculating-the-kubernetes-integration-tax-what-your-diy-networking-stack-actually-costs/" rel="noopener noreferrer"&gt;stitching this together&lt;/a&gt;, only to discover they still can’t answer the auditor’s question.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accountability gap:&lt;/strong&gt; Fragmented security, no unified policy layer, no correlated audit, significant engineering investment required.&lt;/p&gt;

&lt;h2&gt;
  
  
  RBAC alone: doesn’t survive agent #101
&lt;/h2&gt;

&lt;p&gt;Role-Based Access Control is the default model for most authorization systems. Assign agents to roles, grant roles permissions, done.&lt;/p&gt;

&lt;p&gt;RBAC works at a small scale. With 10 agents and 3 roles, the matrix is manageable. But RBAC requires explicit enumeration, where every agent must be assigned to a role, and every permission must be granted to a role. When you add agent #101, someone must decide which role it belongs to and update the bindings.&lt;/p&gt;

&lt;p&gt;More fundamentally, RBAC cannot express the nuanced policies that agent governance requires:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;“Agents with overlapping capabilities can communicate with each other.”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“Low-risk agents can call low-risk agents, but medium-risk agents can call both low and medium.”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“Agents on the same team can access that team’s MCP servers.”&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These attribute-based policies are natural to express in English but impossible to model cleanly in RBAC without an explosion of roles. By agent #200, the role matrix is unmaintainable and new agents start getting deployed without governance, exactly the shadow agent problem we covered in our previous blog post, ​​&lt;a href="https://www.tigera.io/blog/the-ai-agent-accountability-crisis-why-governance-isnt-keeping-up-with-deployment/#the-shadow-agent-problem" rel="noopener noreferrer"&gt;The AI Agent Accountability Crisis: Why Governance Isn’t Keeping Up With Deployment&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accountability gap:&lt;/strong&gt; Doesn’t scale, can’t express attribute-based policies, requires manual updates for every new agent.&lt;/p&gt;

&lt;p&gt;| &lt;strong&gt;Approach&lt;/strong&gt; | &lt;strong&gt;What it does well&lt;/strong&gt; | &lt;strong&gt;Accountability gap&lt;/strong&gt; |&lt;br&gt;
| Kubernetes Network Policies | Pod-to-pod isolation | No agent identity, no audit trail |&lt;br&gt;
| API gateways | North-south request routing | No east-west, no policy correlation |&lt;br&gt;
| MCP / A2A protocols | Standardize agent communication | Communication, not governance |&lt;br&gt;
| DIY security patterns | Per-pattern soundness | Four disconnected systems, no unified policy |&lt;br&gt;
| RBAC | Simple at small scale | Doesn’t scale well with large amount of agents, no attribute policies |&lt;/p&gt;

&lt;h2&gt;
  
  
  The AI agent accountability layer is the missing piece
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Every existing approach covers part of the problem. None of them, alone or stacked together, deliver AI agent accountability. The missing piece is the unified layer above them.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The industry has solved agent &lt;strong&gt;communication&lt;/strong&gt; (MCP, A2A) and agent &lt;strong&gt;infrastructure&lt;/strong&gt; (Kubernetes, GPUs, model serving). What’s missing is the &lt;a href="https://www.tigera.io/blog/your-ai-agents-are-autonomous-but-are-they-accountable/" rel="noopener noreferrer"&gt;accountability layer&lt;/a&gt;, the control plane that answers three questions for every agent interaction:&lt;/p&gt;

&lt;p&gt;Effective human oversight means:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Who authorized this?&lt;/strong&gt; Traceable to a specific, auditable policy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What policy permitted it?&lt;/strong&gt; With attribute-based evaluation, not hardcoded names.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What’s the full record?&lt;/strong&gt; End-to-end distributed trace with every hop, decision, and outcome.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The immaturity of the space is striking. A recent review of 43 AI risk frameworks found that &lt;a href="https://www.ibm.com/think/insights/ethics-governance-agentic-ai" rel="noopener noreferrer"&gt;only two even addressed agent-specific risks&lt;/a&gt;. This is the gap that will determine which enterprises can scale agentic AI responsibly, and which will be forced to cancel projects, face compliance failures, or deal with incidents they can’t investigate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions about AI agent accountability
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Aren’t network policies enough if I’m using a service mesh?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No. A service mesh adds mTLS and routing, but its policy layer still operates on workload identities and namespaces, not agent capabilities, owners, or risk levels. You still can’t produce an audit trail that names which policy permitted a specific agent-to-agent call, or scale that policy without manual updates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I add an authorization layer on top of MCP myself?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
You can, and many teams are trying. The hard part isn’t the policy engine; it’s the audit correlation across multi-hop chains, the dual identity verification (workload + user), the visual oversight surface, and the attribute-based policy model that scales. Stitching those together is a 6–12 month engineering investment that delivers a worse outcome than purpose-built tooling.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about ABAC instead of RBAC?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Attribute-Based Access Control (ABAC) is on the right track, it’s exactly the model AI agent governance needs. But ABAC by itself is a policy language, not a complete platform. You still need agent identity, agent registration, attribute population, audit correlation, and a human oversight surface around it. ABAC is a piece of the answer, not the whole answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does Tigera’s solution replace these tools?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Tigera’s solution complements them. Network policies still secure your cluster. Service meshes still handle mTLS. MCP and A2A still standardize agent communication. Our platform adds the accountability layer above them, the layer that answers who, what, and why for every agent interaction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Network policies, API gateways, MCP/A2A protocols, DIY security patterns, and RBAC each solve a different problem;&lt;/strong&gt;  none of them solves AI agent accountability.&lt;/li&gt;
&lt;li&gt;The missing layer is the accountability layer: the one that ties identity, policy, and audit together across every agent interaction.&lt;/li&gt;
&lt;li&gt;Without that layer, your compliance team has no answer to &lt;em&gt;“which policy permitted this?”&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Building it from primitives is technically possible, but it’s a 6–12 month investment that still leaves gaps.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get the strategic guide for accountable AI agents
&lt;/h2&gt;

&lt;p&gt;If your team is currently trying to assemble accountability from network policies, OAuth2 exchange, SPIFFE, and a homegrown policy engine, then read our guide first.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://info.tigera.io/rs/805-GFH-732/images/Whitepaper_Accountability_for_AI_Agents.pdf" rel="noopener noreferrer"&gt;Accountable AI Agents: A Strategic Guide for AI &amp;amp; Security Leaders Governing Autonomous AI at Scale&lt;/a&gt; covers the full framework: the five pillars, the maturity model, the principles, and the three-step roadmap. No code, no product demos. Just what your leadership team needs to make the build-vs-buy call.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://info.tigera.io/rs/805-GFH-732/images/Whitepaper_Accountability_for_AI_Agents.pdf" rel="noopener noreferrer"&gt;Get the strategic guide for accountable AI agents →&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/the-ai-agent-accountability-gap-why-network-policies-api-gateways-and-rbac-are-not-enough/" rel="noopener noreferrer"&gt;The AI Agent Accountability Gap: Why Network Policies, API Gateways, And RBAC Are Not Enough&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>featuredblog</category>
      <category>technicalblog</category>
      <category>aiagentsecurity</category>
      <category>bestpractices</category>
    </item>
    <item>
      <title>The Case for VM and Container Consolidation in 2026</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Tue, 26 May 2026 18:50:17 +0000</pubDate>
      <link>https://dev.to/tigeraio/the-case-for-vm-and-container-consolidation-in-2026-1fo4</link>
      <guid>https://dev.to/tigeraio/the-case-for-vm-and-container-consolidation-in-2026-1fo4</guid>
      <description>&lt;p&gt;&lt;em&gt;Two platforms, two teams, two procurement relationships, all doing one job. There’s a reason it ended up this way. There isn’t a reason it has to stay this way.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Ask anyone at a typical enterprise why the VM platform and the container platform are separate, and they’ll give you a sensible answer. The VM estate has been there for fifteen years. It runs the workloads the business depends on. &lt;a href="https://www.tigera.io/learn/guides/kubernetes-101/" rel="noopener noreferrer"&gt;Kubernetes&lt;/a&gt; got stood up later, when application teams started building microservices, and giving them their own environment made more sense than retrofitting one onto VMware. Two platforms, two teams, two roadmaps.&lt;/p&gt;

&lt;p&gt;That’s how most enterprises got here.&lt;/p&gt;

&lt;p&gt;The reasoning was sound at the time. The question is whether it still is.&lt;/p&gt;

&lt;p&gt;This is the consolidation question most enterprises haven’t actually revisited, and it’s the one quietly absorbing more of your budget each year.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfbkmr6lkbkfcrn0a4q3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfbkmr6lkbkfcrn0a4q3.png" alt="Figure 1. The current state most enterprises operate today." width="800" height="460"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 1. The current state most enterprises operate today.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why VM and container platforms ended up separate
&lt;/h2&gt;

&lt;p&gt;If you operate both platforms, you know the shape of this already. There’s a VMware team: vSphere admins, network engineers who know NSX, storage specialists, plus a separate procurement relationship for the underlying virtualisation stack. Then there’s a Kubernetes team: platform engineers, CNI specialists, GitOps people, a different set of vendor relationships. Each team runs its own upgrade calendar, its own monitoring stack, its own security posture, its own incident process. They share office space at offsites. They don’t share much else.&lt;/p&gt;

&lt;p&gt;Both teams are doing the same job. They keep infrastructure available for the workloads above it. One set of those workloads happens to be virtual machines and the other happens to be containers, which is a real technical distinction, but it isn’t the distinction your operational model was built around. Your operational model was built around the platforms themselves, and the platforms are separate because of when they were stood up.&lt;/p&gt;

&lt;p&gt;Most enterprises don’t re-examine this. The platforms are separate because they always have been. The teams are separate because the platforms are. The procurement is separate because the teams are. Every layer of duplication has a reasonable justification, but the foundational decision underneath all of them, that VMs and containers belong on different infrastructure, is one nobody actually revisits.&lt;/p&gt;

&lt;h2&gt;
  
  
  What KubeVirt changed about running VMs and containers together
&lt;/h2&gt;

&lt;p&gt;The technical answer to this stopped being theoretical a few years ago. &lt;a href="https://www.tigera.io/learn/guides/kubevirt/" rel="noopener noreferrer"&gt;KubeVirt&lt;/a&gt; is a &lt;a href="https://www.cncf.io/projects/kubevirt/" rel="noopener noreferrer"&gt;CNCF project&lt;/a&gt; that lets virtual machines run as native objects on a Kubernetes cluster. It’s in production at NVIDIA, Cloudflare, and ByteDance. This is no longer “an interesting research direction.” It’s the platform pattern that some of the largest, least forgiving infrastructure operators in the world use to run their VMs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkscu192vapdrefxs00un.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkscu192vapdrefxs00un.png" alt="Figure 2. The unified state — same workloads, one operational model." width="800" height="460"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Figure 2. The unified state — same workloads, one operational model.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Which means the original reason your platforms are separate doesn’t really hold anymore. You don’t need a VMware-specific stack to host VMs and a Kubernetes-specific stack to host containers. Both can run on Kubernetes. The platform team you already have, the one that operates your container infrastructure, can take on the VMs too, with the same tooling, the same security model, the same upgrade pattern. The networking layer is the part most teams underestimate. VMs have to keep their existing IPs, VLANs, and firewall references so the rest of your infrastructure doesn’t break. This is the part Calico was built for. Whether your platform team wants to &lt;a href="https://www.tigera.io/blog/lift-and-shift-vms-to-kubernetes-with-calico-l2-bridge-networks/" rel="noopener noreferrer"&gt;lift and shift VMs onto Kubernetes&lt;/a&gt; with the network they already have, or modernise them onto a more dynamic networking model over time, Calico supports both paths on the same platform. Teams don’t have to commit to one approach up front, and they don’t have to migrate the network and the workload in the same step.&lt;/p&gt;

&lt;p&gt;This isn’t a pitch about throwing out what you have. The migration is real work, and the order in which you do it matters. But consolidating onto one platform is no longer experimental, and that changes the math on staying where you are.&lt;/p&gt;

&lt;h2&gt;
  
  
  What VM and container consolidation means for your roadmap
&lt;/h2&gt;

&lt;p&gt;If you’re a CTO or VP Engineering, the question to ask your platform leads isn’t “should we adopt KubeVirt?” That’s an implementation question. The strategic question is whether running two platforms is still the right operational model, or whether it’s something worth a look now that the alternative is real.&lt;/p&gt;

&lt;p&gt;Running two platforms compounds slowly. Two teams, two upgrade cadences, two vendor relationships, two of everything. Until the next renewal cycle, the next hiring round, or the next hardware refresh forces a decision you’d rather have made on your own terms.&lt;/p&gt;

&lt;p&gt;The first step isn’t the whole programme. Before you can consolidate at scale, you have to migrate one real VM end-to-end without breaking the network it lives on. That’s what your platform team will need to evaluate first, and it’s what our migration guide walks through in detail. It’s written for the engineers who’ll do the work, not for the executive sponsoring it. But if you’re at the point of asking whether the two-platform arrangement is still serving you, it’s the right thing to send their way.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigera.io/lp/ebook-the-complete-guide-to-vm-networking-for-kubernetes/" rel="noopener noreferrer"&gt;Read the VM migration guide&lt;/a&gt; →&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/the-case-for-vm-and-container-consolidation-in-2026/" rel="noopener noreferrer"&gt;The Case for VM and Container Consolidation in 2026&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>featuredblog</category>
      <category>technicalblog</category>
      <category>bestpractices</category>
      <category>vmmigration</category>
    </item>
    <item>
      <title>Kubernetes Operational Maturity: Secure and Resilient Cluster Federation with Cluster Mesh</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Mon, 25 May 2026 19:26:14 +0000</pubDate>
      <link>https://dev.to/tigeraio/kubernetes-operational-maturity-secure-and-resilient-cluster-federation-with-cluster-mesh-3apk</link>
      <guid>https://dev.to/tigeraio/kubernetes-operational-maturity-secure-and-resilient-cluster-federation-with-cluster-mesh-3apk</guid>
      <description>&lt;p&gt;Practically no one runs a single Kubernetes cluster in production these days. Maybe that’s how it started but data sovereignty requirements, acquisitions, AI initiatives and the need for edge servers, among other considerations, have pulled most enterprises into multi-cluster territory whether they planned for it or not. Reaching Kubernetes operational maturity—the point at which a fleet of clusters operates as one secure, observable, policy-consistent system—depends entirely on how those clusters are connected. Operating in a &lt;a href="https://www.tigera.io/learn/guides/kubernetes-networking/kubernetes-multi-cluster/" rel="noopener noreferrer"&gt;multi-cluster environment&lt;/a&gt; has evolved into the unspoken standard, one requiring a careful re-evaluation of the network architectures used to link clusters together.&lt;/p&gt;

&lt;p&gt;That re-evaluation rarely happens. Most enterprises connect their clusters with the same networking patterns they were using before Kubernetes existed: load balancers fronting internal services, DNS records published to external zones, and IP-based firewall rules. Those patterns were built for north-south traffic moving in and out of a traditional data center perimeter, not for east-west traffic moving between internal workloads.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running east-west traffic on north-south plumbing
&lt;/h2&gt;

&lt;p&gt;The conventional way to make services in one cluster reachable from another is to expose them externally with a load balancer in front, a DNS name registered in a public zone, a firewall rule allowing traffic in. This works but it is not ideal as clusters are not separate entities making the odd API call to each other. They are part of a web of interconnected services that should be able to communicate securely, and with a minimum of friction.&lt;/p&gt;

&lt;p&gt;Having to expose these services through external DNS providers, adding additional hops to send traffic through load balancers and creating firewall rules to allow that traffic between internal workloads increases the potential attack surface, introduces latency and piles more responsibilities onto the network team. Securing traffic between workloads gets harder at every layer. Egress rules end up broad and permissive because there is no per-pod identity to write a tighter rule against. Source IPs are erased by SNAT before they reach the destination, so the audit trail compliance teams depend on is non-deterministic. Each cluster also runs its own set of &lt;a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/" rel="noopener noreferrer"&gt;network policies&lt;/a&gt; with no awareness of the others, leaving gaps wherever those policy sets disagree.&lt;/p&gt;

&lt;p&gt;Visibility suffers in the same way. Each cluster’s observability stack only sees traffic that lives inside it, so the moment a flow crosses a cluster boundary it becomes someone else’s problem. The destination workload sees a connection arriving from a load balancer or a NAT gateway rather than the workload that actually made the call, which means the receiving team can’t tell who is calling their endpoints or whether those endpoints should answer. Tracing a request from a service in one cluster to an endpoint in another means correlating timestamps and partial signals across two or three tools that were never designed to talk to each other. During an incident that gap is the difference between a five-minute fix and a three-hour bridge call. Mean Time To Resolution (MTTR) stretches accordingly.&lt;/p&gt;

&lt;p&gt;It is common to see enterprises with eight to twelve clusters where most internal-trust traffic now traverses external load balancers, public DNS, and inspection points designed for traffic from the open internet. This was probably the only option when that first cluster with its half dozen trailblazing workloads was first spun up. Now there’s a better way to connect clusters at scale, and it was built for Kubernetes from the start.&lt;/p&gt;

&lt;h2&gt;
  
  
  How cluster mesh rewires multi-cluster networking
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6tlooymzxv7q11x433xh.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6tlooymzxv7q11x433xh.jpg" alt="Cluster Mesh changes the way workloads connect" width="800" height="374"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Cluster Mesh changes the way workloads connect&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;A cluster mesh f&lt;a href="https://www.tigera.io/learn/guides/kubernetes-security/kubernetes-federation/" rel="noopener noreferrer"&gt;ederates Kubernetes clusters&lt;/a&gt; into a single flat overlay network. Pods talk to pods directly across cluster boundaries, services resolve through native Kubernetes DNS rather than an external provider and traffic is encrypted end-to-end, typically with WireGuard. &lt;a href="https://www.tigera.io/learn/guides/kubernetes-security/kubernetes-network-policy/" rel="noopener noreferrer"&gt;Network policy&lt;/a&gt; is expressed against workload identity such as namespace, label or service account instead of IP addresses that change every time a pod is rescheduled.&lt;/p&gt;

&lt;p&gt;Four important things change at the architecture level. East-west traffic stops leaving the trust boundary, because the overlay terminates inside the cluster nodes. DNS resolution moves back inside Kubernetes, removing the external dependency. Identity replaces IP as the unit of policy enforcement, which means a policy written today is still valid after the workload has moved across nodes, regions, or clusters. And telemetry flows through one fabric across every cluster instead of being assembled after the fact from per-cluster silos.&lt;/p&gt;

&lt;p&gt;A cluster mesh stops treating each cluster as a sovereign country with its own borders, customs, and identity papers, and starts treating the fleet as a federation where workloads move freely under shared rules.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cluster mesh means a more secure and resilient architecture
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7co4twpd0gd5z1pu3ut2.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7co4twpd0gd5z1pu3ut2.jpg" alt="Workloads connect across clusters in a Kubernetes native way" width="800" height="387"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Workloads connect across clusters in a Kubernetes native way&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;By treating a group of connected clusters as members of one network, cluster mesh shrinks the attack surface by keeping internal services off public DNS where they did not belong in the first place. Policies stay valid as workloads move across nodes, regions, and clusters, because identity rather than IP is what they bind to. Inter-cluster traffic can be encrypted and policies applied uniformly across the entire fleet.&lt;/p&gt;

&lt;p&gt;Pods connect to each other directly and observability stops being a per-cluster silo. Flow logs can now follow a request from the client all the way to the service handling it, even when those two live in different clusters.&lt;/p&gt;

&lt;p&gt;Day-to-day operations become smoother too, since the platform team stops having to file tickets with the networking team every time a new service ships and connecting that service no longer requires a new VIP or a new DNS record.&lt;/p&gt;

&lt;p&gt;In other words, calls between clusters are treated like the east-west traffic they are.&lt;/p&gt;

&lt;p&gt;Even compliance work gets noticeably lighter because the default state of the network already satisfies most of what auditors ask about: encryption in transit, identity attribution, and workload-level audit trails.&lt;/p&gt;

&lt;h2&gt;
  
  
  How mature is your inter-cluster networking?
&lt;/h2&gt;

&lt;p&gt;Here is what each of the four stages looks like in practice, and what each one says about the work that still lies ahead.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Beginner.&lt;/strong&gt; A single cluster, or multiple clusters with no inter-cluster connectivity. Services exposed via external load balancers and manual DNS records.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intermediate.&lt;/strong&gt; VPC peering or transit gateways connect the clusters. External DNS handles service discovery. Some traffic is encrypted, much of it isn’t.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advanced.&lt;/strong&gt; A cluster mesh with overlay networking, native Kubernetes service discovery, WireGuard encryption, and identity-based policies enforced consistently across clusters.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Optimized.&lt;/strong&gt; The cluster mesh is fully GitOps-managed, with unified observability and real-time anomaly detection across the fleet.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;| &lt;strong&gt;Stage&lt;/strong&gt; | &lt;strong&gt;Connectivity&lt;/strong&gt; | &lt;strong&gt;Service discovery&lt;/strong&gt; | &lt;strong&gt;Encryption&lt;/strong&gt; | &lt;strong&gt;Policy &amp;amp; observability&lt;/strong&gt; |&lt;br&gt;
| &lt;strong&gt;Beginner&lt;/strong&gt; | Single cluster, or multi-cluster with no inter-cluster connectivity | Manual DNS records, external load balancers | None | Per-cluster, no fleet view |&lt;br&gt;
| &lt;strong&gt;Intermediate&lt;/strong&gt; | VPC peering or transit gateways | External DNS | Partial | Per-cluster, inconsistent |&lt;br&gt;
| &lt;strong&gt;Advanced&lt;/strong&gt; | Cluster mesh with overlay networking | Native Kubernetes service discovery | WireGuard, end-to-end | Identity-based, consistent across clusters |&lt;br&gt;
| &lt;strong&gt;Optimized&lt;/strong&gt; | GitOps-managed cluster mesh | Native, fully automated | End-to-end | Unified observability, real-time anomaly detection |&lt;/p&gt;

&lt;p&gt;In our experience, most enterprises are at Intermediate stage for connectivity and Beginner for the surrounding pillars (egress, &lt;a href="https://www.tigera.io/learn/guides/microsegmentation/microsegmentation-security/" rel="noopener noreferrer"&gt;microsegmentation&lt;/a&gt; and observability) that compound on top of it. This will likely change as organizations grow into their Kubernetes adoption progressing step by step towards operational excellence.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI raises the stakes
&lt;/h2&gt;

&lt;p&gt;AI has made proper multi cluster architecture even more urgent. GPU scarcity by region, data residency requirements for training data, blast-radius isolation between training and inference, and the operational pattern of separating data preparation, training, and inference into purpose-built clusters are pushing teams into multi-cluster topologies whether they planned for it or not. The architecture you bring to that moment determines whether multi-cluster becomes a strength or a liability.&lt;/p&gt;

&lt;p&gt;The full nine-pillar reference architecture, including the egress, microsegmentation, observability, and service mesh pillars that build directly on cluster mesh, is in our ebook, &lt;a href="https://www.tigera.io/lp/ebook-building-resilient-multi-cluster-kubernetes/" rel="noopener noreferrer"&gt;&lt;em&gt;Building Resilient Multi-Cluster Kubernetes&lt;/em&gt;&lt;/a&gt;. If you would rather work through it hands-on, our r&lt;a href="https://www.tigera.io/event/from-reference-architecture-to-production-a-hands-on-kubernetes-workshop/" rel="noopener noreferrer"&gt;eference architecture workshop&lt;/a&gt; walks through the first five pillars, the next steps on your operational maturity journey, in a working environment.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;&lt;strong&gt;Read the ebook: &lt;a href="https://www.tigera.io/lp/ebook-building-resilient-multi-cluster-kubernetes/" rel="noopener noreferrer"&gt;Building Resilient Multi-Cluster Kubernetes →&lt;/a&gt;&lt;/strong&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/kubernetes-operational-maturity-secure-and-resilient-cluster-federation-with-cluster-mesh/" rel="noopener noreferrer"&gt;Kubernetes Operational Maturity: Secure and Resilient Cluster Federation with Cluster Mesh&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>technicalblog</category>
      <category>bestpractices</category>
      <category>unifiedplatform</category>
    </item>
    <item>
      <title>The Five Pillars of AI Agent Accountability: A Diagnostic Framework for Engineering Leaders</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Fri, 22 May 2026 17:51:17 +0000</pubDate>
      <link>https://dev.to/tigeraio/the-five-pillars-of-ai-agent-accountability-a-diagnostic-framework-for-engineering-leaders-34ip</link>
      <guid>https://dev.to/tigeraio/the-five-pillars-of-ai-agent-accountability-a-diagnostic-framework-for-engineering-leaders-34ip</guid>
      <description>&lt;p&gt;You’re in a board meeting. The CISO is presenting on AI risk. The CFO asks a simple question:&lt;/p&gt;

&lt;p&gt;_ &lt;strong&gt;“When that finance agent we deployed last quarter accessed a customer payment record, can we tell who authorized it, what policy permitted it, and produce the full audit trail?”&lt;/strong&gt; _&lt;/p&gt;

&lt;p&gt;The CISO looks at the head of the platform. The head of the platform looks at security. Nobody answers.&lt;/p&gt;

&lt;p&gt;If you can picture that meeting happening at your company, you’re not alone. &lt;a href="https://www.mckinsey.com/capabilities/risk-and-resilience/our-insights/trust-in-the-age-of-agents" rel="noopener noreferrer"&gt;McKinsey&lt;/a&gt; found that &lt;strong&gt;only one-third of organizations have AI agent governance maturity at level 3 or higher&lt;/strong&gt;. The other two-thirds are exactly the silence in that boardroom.&lt;/p&gt;

&lt;p&gt;This post is the diagnostic framework that closes that gap. It’s part 2 of a five-part series on AI agent accountability, and if you only have time to read one post in the series, read this one. By the end you’ll have a five-question assessment to run with your team this week, and a maturity model to score where you stand today.&lt;/p&gt;

&lt;p&gt;Not all governance equals &lt;a href="https://www.tigera.io/blog/your-ai-agents-are-autonomous-but-are-they-accountable/" rel="noopener noreferrer"&gt;AI agent accountability&lt;/a&gt;. Many enterprises believe they’re covered because they have network policies or an API gateway, but governance without accountability is a &lt;strong&gt;security theater&lt;/strong&gt; : it might prevent some bad outcomes, but it can’t prove why good outcomes were permitted, trace what happened when something goes wrong, or satisfy an auditor asking for evidence.&lt;/p&gt;

&lt;p&gt;True AI agent accountability requires five distinct capabilities working together. Miss any one and you have a gap that will surface during your next incident, audit, or regulatory review.&lt;/p&gt;

&lt;h2&gt;
  
  
  What are the five pillars of AI agent accountability?
&lt;/h2&gt;

&lt;p&gt;The five pillars are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Traceability:&lt;/strong&gt; Every agent interaction produces an end-to-end record automatically.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authorization provenance:&lt;/strong&gt; Every permitted action is traceable to a specific, auditable policy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Identity and ownership:&lt;/strong&gt; Every agent has a verified identity and a clear human owner.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy-based governance at scale:&lt;/strong&gt; Declarative, attribute-based policies that don’t break at 100 agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human oversight and intervention:&lt;/strong&gt; Humans can see, review, and override agent behavior in real time.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffia6tmjhemuryp5syqo5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffia6tmjhemuryp5syqo5.png" width="799" height="173"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Each pillar comes with a question you can ask your team. Below, we’ll work through each one, and at the end, a 5-level maturity model and a 5-question assessment to score where you stand today.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 1: Traceability
&lt;/h3&gt;

&lt;p&gt;_ &lt;strong&gt;“Can you trace what happened, end to end?”&lt;/strong&gt; _&lt;/p&gt;

&lt;p&gt;When Agent A calls Agent B, which calls Tool C, which accesses Database D, can you reconstruct the entire chain? Not just that it happened, but when, how long each step took, and what the outcome was at each hop?&lt;/p&gt;

&lt;p&gt;Traceability means every agent interaction produces a structured, correlated record automatically. This is distributed tracing applied to agent communication. Each hop in the chain is a span; the full trace tells the complete story of an interaction from trigger to outcome.&lt;/p&gt;

&lt;p&gt;Without traceability, incident response is guesswork. You know something went wrong, but you can’t determine the chain of events that led there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The test:&lt;/strong&gt; Can your team pull up a single interaction and see the full path it took across every agent and tool in your network, with timestamps and outcomes at every hop?&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 2: Authorization provenance
&lt;/h3&gt;

&lt;p&gt;_ &lt;strong&gt;“Can you prove why it was permitted?”&lt;/strong&gt; _&lt;/p&gt;

&lt;p&gt;Blocking unauthorized actions is table stakes. The harder (and more important) question is, can you prove why authorized actions were permitted?&lt;/p&gt;

&lt;p&gt;Authorization provenance means every allowed interaction is traceable to a specific, auditable policy. Not just “Agent A was allowed to call Agent B,” but “Agent A was allowed to call Agent B because Policy X grants agents with capability Y access to agents with risk-level Z.”&lt;/p&gt;

&lt;p&gt;This is the difference between a lock on the door and a sign-in sheet. The lock prevents unauthorized entry. The sign-in sheet proves who was authorized, when, and by what authority.&lt;/p&gt;

&lt;p&gt;Without authorization provenance, your compliance team cannot demonstrate that access was intentional and governed, only that it wasn’t blocked. That distinction is the difference between passing an audit and failing one..&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The test:&lt;/strong&gt; For any &lt;a href="https://www.tigera.io/blog/how-ai-agents-communicate-understanding-the-a2a-protocol-for-kubernetes/" rel="noopener noreferrer"&gt;agent-to-agent interaction&lt;/a&gt; in your network, can you identify the specific policy that permitted it and the attributes that triggered that policy?&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 3: Identity and ownership
&lt;/h3&gt;

&lt;p&gt;_ &lt;strong&gt;“Who owns this agent, and who is responsible when it acts?”&lt;/strong&gt; _&lt;/p&gt;

&lt;p&gt;Every agent must have two things: a verified identity (it is who it claims to be) and a &lt;a href="https://thehackernews.com/2026/01/who-approved-this-agent-rethinking.html" rel="noopener noreferrer"&gt;clear owner&lt;/a&gt; (a person accountable for its behavior).&lt;/p&gt;

&lt;p&gt;Identity means the governance layer can verify that an agent is genuinely the agent it claims to be, and not a compromised workload masquerading as a legitimate one. This requires cryptographic identity verification, not just a name in a configuration file.&lt;/p&gt;

&lt;p&gt;Ownership means that when an incident occurs, there is a specific person (not a team alias, not a Slack channel, not “the AI team”) who is accountable. Without clear ownership definitions, &lt;a href="https://www.paloaltonetworks.com/cyberpedia/what-is-agentic-ai-governance" rel="noopener noreferrer"&gt;accountability diffuses across components&lt;/a&gt;, and diffused accountability is no accountability at all.&lt;/p&gt;

&lt;p&gt;Agent registration should capture: who registered it, what team owns it, what it’s designed to do, and what permissions it’s been granted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The test:&lt;/strong&gt; Pick any agent in your network. Can you immediately identify it’s a verified identity, who registered it, which team owns it, and what permissions it has… all without asking around?&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 4: Policy-based governance at scale
&lt;/h3&gt;

&lt;p&gt;_ &lt;strong&gt;“Does your security model survive agent #101?”&lt;/strong&gt; _&lt;/p&gt;

&lt;p&gt;With 10 agents, you can manage permissions by hand. You write explicit rules: “Agent A can call Agent B. Agent C can call Agent D.” You maintain a spreadsheet. It works.&lt;/p&gt;

&lt;p&gt;With 100 agents, it doesn’t. With 1,000, it’s impossible. Every new agent requires updating every relevant policy. Permissions become a tangled web that nobody fully understands. New agents get deployed ungoverned because updating the allow-lists is too slow.&lt;/p&gt;

&lt;p&gt;Scalable governance requires &lt;strong&gt;declarative, attribute-based policies&lt;/strong&gt;. Instead of naming specific agents, policies reference agent attributes: capabilities, risk levels, teams, environments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;“Low-risk agents can communicate with low-risk agents.”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“Agents on the finance team can access finance MCP servers.”&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;“Agents in production can only call production-grade tools.”&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When a new agent registers with matching attributes, it’s governed from day one — automatically. No policy updates required. No spreadsheet to maintain. The governance scales with the agent network, not against it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The test:&lt;/strong&gt; When your team deploys a new agent next week, will it be governed by existing policies automatically, or will someone need to manually update an allow-list?&lt;/p&gt;

&lt;h3&gt;
  
  
  Pillar 5: Human oversight and intervention
&lt;/h3&gt;

&lt;p&gt;_ &lt;strong&gt;“Can a human review, approve, or override?”&lt;/strong&gt; _&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://thefuturesociety.org/aiagentsintheeu/" rel="noopener noreferrer"&gt;EU AI Act&lt;/a&gt; (Article 14) requires effective human oversight of high-risk AI systems. But human oversight doesn’t mean a human approves every agent action, that would eliminate the value of agents entirely.&lt;/p&gt;

&lt;p&gt;Effective human oversight means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Visibility:&lt;/strong&gt; Humans can see what agents are doing, which agents are communicating, and what policies govern them.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Review:&lt;/strong&gt; Humans can examine agent interactions after the fact, with enough context to understand what happened and why.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intervention:&lt;/strong&gt; Humans can modify policies, revoke agent access, or halt agent communication in real time when necessary.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard, not log file:&lt;/strong&gt; The oversight interface should be a visual dashboard with communication graphs and policy visualization, not a grep command on a log file.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The test:&lt;/strong&gt; Right now, can someone on your team open a dashboard, see which agents are communicating with which, and modify the policies governing that communication — all without touching a terminal?&lt;/p&gt;

&lt;h2&gt;
  
  
  How to assess your AI agent accountability maturity
&lt;/h2&gt;

&lt;p&gt;Run this five-question assessment with your platform lead, security lead, and one compliance representative in a 30-minute meeting. For each question, you have three possible answers: _ &lt;strong&gt;Yes&lt;/strong&gt; _ (you’ve got it), _ &lt;strong&gt;Partial&lt;/strong&gt; _ (you can answer for some agents but not all), or _ &lt;strong&gt;No&lt;/strong&gt; _ (gap).&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pick the most recent agent-to-agent interaction in your environment.&lt;/strong&gt; Can someone on the call pull up the full trace (every hop, timestamp, and outcome) in under five minutes? (Pillar 1)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;For that same interaction, can you name the specific policy that permitted it&lt;/strong&gt; and the agent attributes that triggered the match? (Pillar 2)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pick one production agent at random.&lt;/strong&gt; Can you produce (from a system, not a wiki) its verified identity, registered owner, team, and granted permissions? (Pillar 3)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Imagine your team deploys a brand-new agent tomorrow.&lt;/strong&gt; Will your existing policies govern it automatically, or will someone need to update an allow-list? (Pillar 4)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open whatever dashboard your team uses to view agent activity.&lt;/strong&gt; Does it show communication graphs and policy state visually, or are you grep-ing a log file? (Pillar 5)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Count your answers. &lt;strong&gt;Five Yes&lt;/strong&gt; = Level 4. &lt;strong&gt;Mostly Yes, occasional Partial&lt;/strong&gt; = Level 3. &lt;strong&gt;Yes on identity but No on policy enforcement&lt;/strong&gt; = Level 2. &lt;strong&gt;Inventory only, no identity verification&lt;/strong&gt; = Level 1. &lt;strong&gt;Couldn’t run the assessment because you don’t know what agents exist&lt;/strong&gt; = Level 0.&lt;/p&gt;

&lt;p&gt;If you scored below Level 3, you’re in the McKinsey two-thirds. The good news: you now know exactly which pillar to fix first.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Accountability Maturity Model
&lt;/h2&gt;

&lt;p&gt;The five pillars map to a five-level progression. Use it to track where you are today and where you’re heading.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4cm6osgqs2frjf646kq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi4cm6osgqs2frjf646kq.png" width="800" height="193"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;| Level | State | What you can do |&lt;br&gt;
| Level 0:&lt;br&gt;&lt;br&gt;
Blind | No visibility | You don’t know what agents exist in your network, let alone what they’re doing |&lt;br&gt;
| Level 1:&lt;br&gt;&lt;br&gt;
Inventory | Awareness | You know what agents exist, but not what they do, who they talk to, or what policies govern them |&lt;br&gt;
| Level 2:&lt;br&gt;&lt;br&gt;
Authenticated | Identity verification | Your agents have cryptographic identities, but communication is not yet governed by policy |&lt;br&gt;
| Level 3:&lt;br&gt;&lt;br&gt;
Controlled | Policy enforcement | You have policies governing agent communication, and unauthorized interactions are blocked |&lt;br&gt;
| Level 4:&lt;br&gt;&lt;br&gt;
Accountable | Full accountability | You can trace, prove, and audit every agent action — with authorization provenance, identity verification, and human oversight |&lt;/p&gt;

&lt;p&gt;Most enterprises today are at &lt;strong&gt;Level 0&lt;/strong&gt; or &lt;strong&gt;Level 1&lt;/strong&gt;. They lack verified identities, policy enforcement, and end-to-end auditability. The goal is Level 4, and the gap between where most organizations are and where they need to be is the &lt;a href="https://www.tigera.io/blog/the-ai-agent-accountability-crisis-why-governance-isnt-keeping-up-with-deployment/" rel="noopener noreferrer"&gt;AI agent accountability crisis&lt;/a&gt; this framework addresses.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What is the most important pillar of AI agent accountability?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
All five are required, but authorization provenance is the one most enterprises miss. Plenty of teams can block unauthorized actions; very few can show why an authorized action was permitted, traceable to a specific policy. Without provenance, you have security but not accountability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How is AI agent accountability different from observability?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Observability tells you what happened. Accountability tells you what was permitted, by which policy, and on whose authority. Observability is a prerequisite, but it’s not enough on its own; your trace data needs to be tied to policy decisions and identity claims to count as accountability.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How does AI agent accountability relate to AI agent security?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
They’re complementary, not interchangeable. &lt;a href="https://www.tigera.io/learn/guides/ai-agent-security/" rel="noopener noreferrer"&gt;AI agent security&lt;/a&gt; focuses on preventing compromise—stopping prompt injection, blocking unauthorized API access, eliminating shadow agents. Accountability focuses on proving what authorized agents did and why. You need both: security keeps the bad agents out, accountability keeps the good agents honest. The five pillars in this framework assume strong AI agent security is already in place.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I assess my AI agent governance maturity using these pillars?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Yes — that’s exactly what the assessment and maturity model above are for. Walk through each pillar’s “test” with your team. If you can’t answer cleanly on all five, you’re at Level 3 or below, regardless of what tooling you’ve deployed.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do I need all five pillars on day one?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
No, but you need a path to all five. A platform that delivers two pillars natively and forces you to bolt on the other three is an accountability gap waiting to surface. We cover what to look for in future articles of this series.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is the difference between Level 3 and Level 4 in the maturity model?&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Level 3 means unauthorized interactions are blocked, you have policy enforcement. Level 4 means you can also prove why every authorized interaction was permitted, with audit evidence tied to a specific policy and identity. Level 3 is security; Level 4 is accountability.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;AI agent accountability rests on five pillars: traceability, authorization provenance, identity and ownership, policy at scale, and human oversight.&lt;/li&gt;
&lt;li&gt;Each pillar has a clear test you can run against your environment today.&lt;/li&gt;
&lt;li&gt;The five pillars map to a five-level Accountability Maturity Model — most enterprises are at Level 0 or 1.&lt;/li&gt;
&lt;li&gt;Run the 5-question assessment with your platform, security, and compliance leads to score where you stand.&lt;/li&gt;
&lt;li&gt;Missing any single pillar creates a gap that will surface during your next incident, audit, or regulatory review.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get the strategic guide for accountable AI agents
&lt;/h2&gt;

&lt;p&gt;We wrote a strategic guide for engineering and security leaders that goes deeper into each pillar, including detailed assessment questions, the full maturity model, and a practical roadmap to Level 4.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accountable AI Agents: A Strategic Guide for AI &amp;amp; Security Leaders Governing Autonomous AI at Scale&lt;/strong&gt; — no code, no product demos. Just the framework your leadership team needs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://info.tigera.io/rs/805-GFH-732/images/Whitepaper_Accountability_for_AI_Agents.pdf" rel="noopener noreferrer"&gt;_ &lt;strong&gt;Get the strategic guide for accountable AI agents →&lt;/strong&gt; _&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/the-five-pillars-of-ai-agent-accountability-a-diagnostic-framework-for-engineering-leaders/" rel="noopener noreferrer"&gt;The Five Pillars of AI Agent Accountability: A Diagnostic Framework for Engineering Leaders&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>featuredblog</category>
      <category>technicalblog</category>
      <category>aiagentsecurity</category>
      <category>bestpractices</category>
    </item>
    <item>
      <title>KubeVirt Live Migration Done Right: What it Takes to Run VMs on Kubernetes</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Thu, 14 May 2026 20:53:44 +0000</pubDate>
      <link>https://dev.to/tigeraio/kubevirt-live-migration-done-right-what-it-takes-to-run-vms-on-kubernetes-369i</link>
      <guid>https://dev.to/tigeraio/kubevirt-live-migration-done-right-what-it-takes-to-run-vms-on-kubernetes-369i</guid>
      <description>&lt;p&gt;Running VMs in Kubernetes sounds like a crazy workaround for avoiding vendor lock-in, and standardizing legacy applications and newer containerized workloads on one control plane with one set of security policies to govern them all. It is, however, a rapidly growing pattern, and &lt;a href="https://www.tigera.io/learn/guides/kubevirt/kubevirt-live-migration/" rel="noopener noreferrer"&gt;KubeVirt live migration&lt;/a&gt; — moving running VMs between nodes without downtime — is increasingly central to platform engineering use cases that require full VMs, like on-demand CI/CD pipelines.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.tigera.io/learn/guides/kubevirt/" rel="noopener noreferrer"&gt;KubeVirt&lt;/a&gt; is gaining traction as a way to bring VMs into Kubernetes as first-class workloads, managed with the same tools and primitives that platform teams already use for containers. It has, however, introduced some unique challenges.&lt;/p&gt;

&lt;p&gt;Here’s the uncomfortable truth about that migration: compute and storage are the easy parts. Networking is where migrations stall, roadblock multiple, and platform teams start questioning whether KubeVirt was the right call in the first place.&lt;/p&gt;

&lt;p&gt;If your VMs have no fixed IP dependencies, no VLAN memberships, and no upstream firewall rules scoped to specific subnets, you can migrate them into Kubernetes without losing sleep over the networking layer. If you’re running hundreds or thousands of VMs with IP addresses hardcoded into application configs, DNS entries, and firewall ACLs — and you need to move those VMs to Kubernetes without rewriting any of it — then your networking layer is about to become the most important decision in your migration.&lt;/p&gt;

&lt;p&gt;What follows is a technical walk-through of the L2 plumbing that keeps KubeVirt VMs connected when they move between nodes in a production cluster and how it eliminates the need to update your complicated network infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Kubernetes Networking Wasn’t Built for VMs
&lt;/h2&gt;

&lt;p&gt;In a traditional hypervisor environment — vSphere, Hyper-V, Nutanix — VMs sit on VLANs and have fixed IPs. Upstream firewalls, load balancers, and DNS records all reference those IPs. A security team owns the VLAN segmentation while the network team owns the routing. This network infrastructure is the accumulated work of many years and forms a static, and somewhat brittle, system of securing hosts and getting traffic to its destination. The &lt;a href="https://www.tigera.io/learn/guides/kubernetes-networking/" rel="noopener noreferrer"&gt;Kubernetes networking&lt;/a&gt; model, with its dynamic allocation of IPs that are meaningful only inside a cluster, is at odds with this traditional approach. Therein lies the problem.&lt;/p&gt;

&lt;p&gt;The upstream network has no direct visibility into the pod network. When a VM is migrated from your existing hypervisor into Kubernetes, its original network segment is not preserved. The VM gets a new IP from the pod CIDR, and every firewall rule, DNS entry, and load balancer config that referenced the old IP is now broken. For a handful of VMs, you can reconfigure your firewall rules and routing manually. For hundreds or thousands reconfiguration becomes not only costly in terms of engineering effort but also injects the risk of breaking critical functionality and introducing security blind spots.&lt;/p&gt;

&lt;h2&gt;
  
  
  Two Networking Modes, Two Different Problems
&lt;/h2&gt;

&lt;p&gt;Before diving into solutions, it helps to understand how KubeVirt presents networking to VMs. There are two modes for the primary pod interface, and they solve different problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Masquerade mode&lt;/strong&gt; decouples the pod IP from the VM IP. KubeVirt assigns a static IP to the VM internally and uses NAT rules to translate between the two. Live migration works out of the box because the pod IP can change without affecting the VM. The trade-off is that you need a service-level abstraction to reach the VM from outside the pod, which makes this mode impractical for production workloads that need stable, directly-addressable IPs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bridge mode&lt;/strong&gt; is the production-grade option. The pod IP and the VM IP are identical. The VM is directly reachable on the network. No NAT, no service abstraction. But bridge mode introduces a hard problem: when a VM live-migrates to a new node, KubeVirt creates a new pod on the destination. That new pod gets a fresh IP from the CNI. The VM still thinks it has its original IP. The result is a routing mismatch — the network doesn’t know where to send traffic, and the VM’s connections break.&lt;/p&gt;

&lt;p&gt;KubeVirt only handles memory and disk migration. This does not matter much in masquerade mode since the VM’s IP is decoupled from the pod’s IP via NAT but becomes a critical consideration in bridge mode. So the &lt;a href="https://www.tigera.io/learn/guides/kubernetes-networking/kubernetes-cni/" rel="noopener noreferrer"&gt;CNI&lt;/a&gt; has to do three things to ensure nothing breaks: preserve the IP across the pod transition, converge routes so the rest of the network knows the VM has moved, and ensure network policy is in place on the destination before the VM goes live.&lt;/p&gt;

&lt;h2&gt;
  
  
  Live Migration in Bridge Mode: What Happens Under the Hood
&lt;/h2&gt;

&lt;p&gt;VMs need to move between nodes for a variety of reasons, for example maintenance, load balancing, or high availability. What actually happens during a live migration in bridge mode and why is making it work right so hard?&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpn602akqrbmext24o6av.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpn602akqrbmext24o6av.png" alt="The 5-step network handover during live migration in bridge mode" width="799" height="462"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;The 5-step network handover during live migration in bridge mode&lt;/em&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Core Challenge
&lt;/h3&gt;

&lt;p&gt;When a migration is triggered using the KubeVirt command line utility, &lt;a href="https://kubevirt.io/user-guide/user_workloads/virtctl_client_tool/" rel="noopener noreferrer"&gt;virtctl&lt;/a&gt;, KubeVirt creates a new pod on a destination node chosen by the Kubernetes scheduler in the usual way based on available resources, affinity rules, shared storage, etc. Next, KubeVirt copies the VM’s memory state using libvirt’s pre-copy and post-copy mechanisms.&lt;/p&gt;

&lt;p&gt;Then things get a bit interesting.&lt;/p&gt;

&lt;p&gt;The source pod continues running during the whole process. From a networking perspective, the same IP now needs to exist in two places temporarily — on the source node (where the VM is still running) and on the destination (where it’s about to go live).&lt;/p&gt;

&lt;p&gt;The CNI has to solve three problems simultaneously: IP persistence across pod lifecycles, route convergence during the handover window, and policy continuity so the VM isn’t exposed during migration.&lt;/p&gt;

&lt;p&gt;Let’s look at how Calico makes this happen.&lt;/p&gt;

&lt;h3&gt;
  
  
  IP Persistence: IPAM That Understands VMs
&lt;/h3&gt;

&lt;p&gt;Traditionally, Calico IPAM allocates IPs to pods. The IPAM handle (the ownership ticket for an IP reservation) is derived from the pod’s identity. This works for containers because pods are ephemeral. But a KubeVirt VM is more like a Kubernetes Deployment: you define a VirtualMachine resource, and KubeVirt creates a randomly-named pod to run it. Every time you restart or migrate the VM, the pod changes, but the VM stays the same with the same identity, memory state and the same IP.&lt;/p&gt;

&lt;p&gt;Since IPAM assigns the IP to the pod, every migration means a new IP, which defeats the purpose of preserving the VM’s IP and breaks any firewall rules, load balancer configurations or DNS records pointed at this IP.&lt;/p&gt;

&lt;p&gt;To fix this, Calico constructs the IPAM handle from the VM’s name instead of the pod’s name ensuring that the reservation persists across pod lifecycles. When a VM migrates and its old pod is destroyed, the IPAM handle survives because it’s tied to the VM identity. When the new pod starts, the IPAM finds the existing handle and reuses the same IP. During migration, the IPAM transiently tracks dual ownership — an active owner on the source node and an alternate owner on the destination — then converges to a single owner once the source pod is cleaned up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Route Convergence: The GARP Handover
&lt;/h3&gt;

&lt;p&gt;IP persistence ensures the VM keeps its address. Route convergence ensures the rest of the network knows where to find it. Here’s the sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Migration initiated.&lt;/strong&gt; The CNI watches for migration events in the Kubernetes API. As soon as one is created, it starts preparing the destination node’s networking — policies, routes, interface configuration — so that everything is in place before the VM actually moves.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Memory pre-copy.&lt;/strong&gt; KubeVirt and libvirt handle the iterative memory copy. The VM continues running on the source node. Traffic continues routing to the source at standard priority.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;VM goes live on destination.&lt;/strong&gt; The VM broadcasts a &lt;a href="https://www.practicalnetworking.net/series/arp/gratuitous-arp/" rel="noopener noreferrer"&gt;Gratuitous ARP (GARP)&lt;/a&gt; packet announcing “I own this IP now, and I’m on this node.” Felix picks up this GARP and immediately advertises a high-priority route for the VM’s IP via the destination node. The networking layer picks this up and immediately starts steering traffic for the VM’s IP toward the new node, overriding the old route.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Route priority override.&lt;/strong&gt; This is a critical engineering detail. Normal routing uses a standard metric (1024). During migration, the destination node advertises the VM’s route at a higher priority metric (512). Because the source pod still exists briefly in a post-life state, both nodes momentarily have routes for the same IP. The higher-priority route ensures all traffic is forwarded to the destination, even before the source pod is fully cleaned up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cleanup and steady state.&lt;/strong&gt; Once the source pod terminates, the high-priority route is replaced with a standard-priority route. The source node’s route is removed. The network converges to its normal state with the VM on its new node at the same IP.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Policy continuity
&lt;/h3&gt;

&lt;p&gt;The CNI watches for migration events and uses the lead time to pre-program network policies on the destination node while the memory copy is still in progress. By the time the VM cuts over, its security posture is already in place leaving no gap for unsanctioned traffic to slip through.&lt;/p&gt;

&lt;p&gt;This works because &lt;a href="https://www.tigera.io/learn/guides/kubernetes-security/kubernetes-network-policy/" rel="noopener noreferrer"&gt;Kubernetes network policies&lt;/a&gt; use label selectors, not IP addresses. The policies follow the VM’s identity, its labels, namespace, and network membership, not its physical location. When the VM appears on the destination node with the same labels, the same policies apply automatically. One nuance worth noting: while the policy rules carry over, stateful connection tracking (conntrack) does not currently replicate between nodes. Established connections survive because the routes converge, but the destination node evaluates them as new flows. Full conntrack replication is a planned future enhancement.&lt;/p&gt;

&lt;h2&gt;
  
  
  Portability and Standardization for VMs
&lt;/h2&gt;

&lt;p&gt;If you’re familiar with vSphere, you know vMotion, paired with the vSphere distributed switch, managed live migration networking seamlessly. However, this transparency relies on a vertically integrated stack that is not portable to other cloud environments.&lt;/p&gt;

&lt;p&gt;In Kubernetes, the stack is disaggregated. Components like KubeVirt (VM lifecycle), CNI (networking), policy engines (security), and storage operators (disks) each manage their own part. For live migration, the CNI must coordinate with KubeVirt’s migration state machine to manage the VM’s temporary dual-existence across two nodes and converge routing without a centralized controller.&lt;/p&gt;

&lt;p&gt;The Kubernetes approach is fundamentally different. It uses open standards: CRI, CNI, CSI, and NetworkPolicy. KubeVirt extends this; VMs are custom resources, managed by kubectl, and scheduled by the same control plane. This approach demands a CNI that understands the unique lifecycle, identity and networking requirements of a pod running a VM but it also makes VMs portable.&lt;/p&gt;

&lt;p&gt;It also means that now your containers and VMs can be managed and monitored using the same policies and tools and that means not only operational efficiency but better security and more reliable auditing.&lt;/p&gt;

&lt;p&gt;Live migration is one piece of a larger networking story. If your KubeVirt rollout involves bridge mode at scale, multi-cluster topologies, BGP peering, or policy parity across VMs and containers, those decisions compound quickly. We pulled the full picture into &lt;a href="https://www.tigera.io/lp/ebook-the-complete-guide-to-vm-networking-for-kubernetes/" rel="noopener noreferrer"&gt;The Complete Guide to VM Networking for Kubernetes&lt;/a&gt;, a practitioner’s reference covering the architectural choices, networking modes, and operational patterns that determine whether a migration ships or stalls.&lt;/p&gt;

&lt;p&gt;Get &lt;a href="https://www.tigera.io/lp/ebook-the-complete-guide-to-vm-networking-for-kubernetes/" rel="noopener noreferrer"&gt;The Complete Guide to VM Networking for Kubernetes&lt;/a&gt; →&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/kubevirt-live-migration-done-right-what-it-takes-to-run-vms-on-kubernetes/" rel="noopener noreferrer"&gt;KubeVirt Live Migration Done Right: What it Takes to Run VMs on Kubernetes&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>technicalblog</category>
      <category>bestpractices</category>
      <category>vmmigration</category>
    </item>
    <item>
      <title>The AI Agent Accountability Crisis: Why Governance Isn’t Keeping Up With Deployment</title>
      <dc:creator>Alister Baroi</dc:creator>
      <pubDate>Thu, 14 May 2026 18:08:21 +0000</pubDate>
      <link>https://dev.to/tigeraio/the-ai-agent-accountability-crisis-why-governance-isnt-keeping-up-with-deployment-5cl0</link>
      <guid>https://dev.to/tigeraio/the-ai-agent-accountability-crisis-why-governance-isnt-keeping-up-with-deployment-5cl0</guid>
      <description>&lt;p&gt;Every enterprise is building AI agents. Marketing has one summarizing campaign performance. Engineering has one triaging incidents. Customer support has one resolving tickets. Finance has one processing invoices. Each was built by a different team, using a different framework, with different assumptions about security.&lt;/p&gt;

&lt;p&gt;Now those agents are talking to each other &lt;a href="https://www.tigera.io/blog/how-ai-agents-communicate-understanding-the-a2a-protocol-for-kubernetes/" rel="noopener noreferrer"&gt;through agent-to-agent (A2A) communication&lt;/a&gt;. The incident-triage agent calls the customer-support agent to check affected accounts. The invoice agent calls an external payment API. The marketing agent queries a data warehouse with customer records.&lt;/p&gt;

&lt;p&gt;When something goes wrong (and at this scale of deployment, it will), can you answer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who authorized the action?&lt;/li&gt;
&lt;li&gt;What policy permitted it?&lt;/li&gt;
&lt;li&gt;What was the full chain of events?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you can’t, you have an accountability gap.&lt;/p&gt;

&lt;p&gt;This is part one of a five-part series on AI agent accountability for engineering and security leaders. We’ll work through the gap between agent deployment and governance, the diagnostic framework that exposes it, why your existing tools won’t close it, and the principles you’ll need to evaluate any solution that claims it can.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is AI agent accountability?
&lt;/h2&gt;

&lt;p&gt;AI agent accountability is the ability to trace, prove, and audit every action an AI agent takes. This includes which policy permitted the agent, which identity initiated it, and what the downstream effects were. It’s the layer above agent communication (MCP, A2A) and agent infrastructure (Kubernetes, GPUs, model serving) that answers the question: &lt;strong&gt;&lt;em&gt;who’s responsible when the agent acts?&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff56qeqh952pqywk8hden.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff56qeqh952pqywk8hden.png" width="800" height="398"&gt;&lt;/a&gt;&lt;br&gt;&lt;br&gt;
A landmark &lt;a href="https://fortune.com/2026/03/26/ai-agents-accountability-accenture-wharton-report/" rel="noopener noreferrer"&gt;2026 report from Accenture and the Wharton School of Business&lt;/a&gt; put the gap bluntly: “ &lt;strong&gt;Intelligence may be scalable, but accountability is not.&lt;/strong&gt; ” As enterprises race to deploy agents across every function, the governance architecture has not kept pace.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agents are scaling faster than governance
&lt;/h2&gt;

&lt;p&gt;The scale of the problem is not theoretical anymore. Major analyst firms have quantified it:&lt;/p&gt;

&lt;p&gt;| Source | Finding |&lt;br&gt;
| McKinsey, 2026 | 80% of organizations have encountered risky behavior from AI agents, actions that were unintended, unauthorized, or outside acceptable guardrails. |&lt;br&gt;
| McKinsey, 2026 | Only one-third (~33%) of organizations report governance maturity. |&lt;br&gt;
| Gartner, 2025 | Over 40% of agentic AI projects will be canceled by the end of 2027 due to escalating costs, unclear value, or inadequate risk controls. |&lt;br&gt;
| ISACA, 2025 | 66% of industry leaders believe formal agent accountability frameworks will become mandatory within the next two years. |&lt;br&gt;
| Dataiku, 2026 | 87% of CIOs report AI agents are already embedded in their enterprises, yet 75% lack real-time visibility into agent operations in production. |&lt;/p&gt;

&lt;p&gt;These are not edge cases. This is the mainstream enterprise experience with agentic AI in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  Shadow agents: the new AI agent security gap
&lt;/h2&gt;

&lt;p&gt;A decade ago, enterprises faced “ &lt;strong&gt;Shadow IT&lt;/strong&gt; “. Employees adopting cloud services without IT approval, creating ungoverned sprawl that took years to bring under control. The same pattern is repeating with AI agents, but faster and with higher stakes.&lt;/p&gt;

&lt;p&gt;Low-code platforms have made it easy for almost anyone to create an AI agent. Building agents are now table stakes. Scaling them with governance is the real differentiator.&lt;/p&gt;

&lt;p&gt;Unlike cloud services, agents don’t just store data. They act. They make decisions, call APIs or MCP servers, access databases, and communicate with other agents. An ungoverned cloud service might leak data. &lt;strong&gt;But an ungoverned agent will leak data, take actions on that data, and propagate those actions across other agents in a chain that nobody can trace&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;When an AI agent operates without clear ownership or accountability, productivity gains become systemic &lt;a href="https://www.tigera.io/learn/guides/ai-agent-security/" rel="noopener noreferrer"&gt;AI agent security&lt;/a&gt; risk. When something goes wrong, there is no clear owner to take responsibility, remediate, or even understand the full blast radius.&lt;/p&gt;

&lt;h2&gt;
  
  
  The regulatory deadlines
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://thefuturesociety.org/how-ai-agents-are-governed-under-the-eu-ai-act/" rel="noopener noreferrer"&gt;EU AI Act&lt;/a&gt;‘s main body takes effect in August 2026. For enterprises deploying agentic AI, three articles are particularly relevant:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Article 12&lt;/strong&gt; requires high-risk AI systems to log their actions to ensure accountability and traceability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Article 13&lt;/strong&gt; requires clear and comprehensible information about how AI systems function and make decisions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Article 14&lt;/strong&gt; requires that high-risk systems are subject to effective human oversight, which is especially important for agentic AI, given the challenges of supervising autonomous agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The European Commission may also assess degree of autonomy as a relevant factor when determining whether a system poses unacceptable risks. The more independent your agents are, the higher the regulatory bar.&lt;/p&gt;

&lt;p&gt;The US is not far behind. The &lt;a href="https://leg.colorado.gov/bills/sb24-205" rel="noopener noreferrer"&gt;Colorado AI Act (SB 24-205)&lt;/a&gt;, delayed to &lt;a href="https://www.clarkhill.com/news-events/news/colorados-ai-law-delayed-until-june-2026-what-the-latest-setback-means-for-businesses/" rel="noopener noreferrer"&gt;June 30, 2026&lt;/a&gt;, requires deployers of high-risk AI systems to implement risk management programs, complete impact assessments, disclose to consumers when AI makes consequential decisions, and report algorithmic discrimination to the state attorney general. It applies to any company doing business in Colorado.&lt;br&gt;&lt;br&gt;
And Colorado is not an unique outlier, it’s just the leading edge. &lt;a href="https://iapp.org/resources/article/us-state-ai-governance-legislation-tracker" rel="noopener noreferrer"&gt;California, New York, Utah, and Texas&lt;/a&gt; have also already enacted AI governance laws. At the federal level, &lt;a href="https://www.americanactionforum.org/list-of-proposed-ai-bills-table/" rel="noopener noreferrer"&gt;80+ AI governance bills&lt;/a&gt; are under consideration in the current Congress. The &lt;a href="https://www.nist.gov/itl/ai-risk-management-framework" rel="noopener noreferrer"&gt;NIST AI Risk Management Framework&lt;/a&gt; is already the de facto US enterprise standard, even where it isn’t legally required.&lt;/p&gt;

&lt;p&gt;Compliance deadlines on both sides of the Atlantic are weeks away, not months or years.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core tension, and why it’s solvable
&lt;/h2&gt;

&lt;p&gt;Enterprises want agent autonomy. That’s the entire point: agents acting independently to drive efficiency and scale. But they also need accountability; knowing what happened, why it was permitted, and who is responsible.&lt;/p&gt;

&lt;p&gt;These seem to conflict. More autonomy means less control. More control means less autonomy.&lt;/p&gt;

&lt;p&gt;But this is a false dichotomy. As &lt;a href="https://www.paloaltonetworks.com/cyberpedia/what-is-agentic-ai-governance" rel="noopener noreferrer"&gt;Palo Alto Networks&lt;/a&gt; puts it: _ &lt;strong&gt;autonomy changes how systems operate, it doesn’t change who’s responsible&lt;/strong&gt; _.&lt;/p&gt;

&lt;p&gt;The same tension existed in microservices a decade ago. Teams wanted independent deployments (autonomy) with reliable service communication (control). The answer wasn’t to choose one over the other. It was to build a governance layer: service meshes, mTLS, observability; that delivered both.&lt;/p&gt;

&lt;p&gt;AI agents need the same evolution. The question isn’t whether to give agents autonomy or accountability. It’s whether you have the governance infrastructure to deliver both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently asked questions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;What is the difference between AI agent accountability and AI agent security?&lt;/strong&gt; Security is about preventing unauthorized actions (blocking the bad). Accountability is about proving why authorized actions were permitted (auditing the good). You need both. A locked door (security) without a sign-in sheet (accountability) leaves your compliance team with nothing to show an auditor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Why is AI agent accountability a 2026 priority?&lt;/strong&gt;  Three forces are converging this year: rapid agent deployment (87% of CIOs report agents already in production), maturing regulatory regimes (EU AI Act in August, Colorado AI Act in June), and the first wave of public agent-related incidents driving boardroom attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Does the EU/US AI Acts apply to my AI agents?&lt;/strong&gt;  If your agent is classified as a high-risk AI system under the Acts, then yes; and Articles 12 (logging), 13 (transparency), and 14 (human oversight), from the EU AI Act, all apply directly. Degree of autonomy is one of the factors regulators consider when assessing risk classification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Are network policies and RBAC enough for AI agent governance?&lt;/strong&gt;  No. &lt;a href="https://www.tigera.io/learn/guides/kubernetes-security/kubernetes-network-policy/" rel="noopener noreferrer"&gt;Network policies&lt;/a&gt; operate at the wrong abstraction level (pod-to-pod, not agent-to-agent) and produce no audit trail. RBAC requires explicit enumeration that breaks down past about 100 agents, and can’t express attribute-based policies. We’ll cover this in detail in a later post of the series.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Key takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;80% of organizations have already encountered risky AI agent behavior, but only one-third have governance maturity to match.&lt;/li&gt;
&lt;li&gt;The EU AI Act and Colorado AI Act both take effect in 2026, so accountability requirements are no longer just optional, they are mandatory.&lt;/li&gt;
&lt;li&gt;AI agent accountability is the missing layer above agent communication (MCP, A2A) and agent infrastructure (Kubernetes).&lt;/li&gt;
&lt;li&gt;Autonomy and accountability are not in conflict, but you need a governance layer to deliver both.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Get the strategic guide for accountable AI agents&lt;/p&gt;

&lt;p&gt;We wrote our guide, &lt;em&gt;Accountable AI Agents: A Strategic Guide for AI &amp;amp; Security Leaders Governing Autonomous AI at Scale&lt;/em&gt;, to help engineering and security leaders close this gap. No code, no product demos, no fluff. Just the framework your leadership team needs to govern AI agents before the next incident (or the next regulation) forces your hand.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://info.tigera.io/rs/805-GFH-732/images/Whitepaper_Accountability_for_AI_Agents.pdf" rel="noopener noreferrer"&gt;Get the strategic guide for accountable AI agents&lt;/a&gt; →&lt;/p&gt;

&lt;p&gt;The post &lt;a href="https://www.tigera.io/blog/the-ai-agent-accountability-crisis-why-governance-isnt-keeping-up-with-deployment/" rel="noopener noreferrer"&gt;The AI Agent Accountability Crisis: Why Governance Isn’t Keeping Up With Deployment&lt;/a&gt; appeared first on &lt;a href="https://www.tigera.io" rel="noopener noreferrer"&gt;Tigera – Creator of Calico&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>featuredblog</category>
      <category>technicalblog</category>
      <category>aiagentsecurity</category>
      <category>bestpractices</category>
    </item>
  </channel>
</rss>
