DEV Community

Michael Tuszynski
Michael Tuszynski

Posted on • Originally published at mpt.solutions

Your Platform Team Needs an Agent Policy — Yesterday

On March 3rd, an attacker compromised the Xygeni GitHub Action by poisoning a mutable tag. Every CI runner referencing xygeni/xygeni-action@v5 quietly started executing a reverse shell to a C2 server. The exposure window lasted a week. 137+ repositories were affected.

The root cause wasn't exotic. A GitHub App private key with overly broad permissions got compromised. Combined with a maintainer's personal access token, the attacker could create a PR and move the tag — no human review required.

This is what happens when automated actors run without governance. And it's about to get much worse.

Agents Are a New User Persona

Your platform team already manages identities for developers, service accounts, and CI bots. But AI agents are a fundamentally different category.

A developer reads docs, thinks, and opens a PR. A service account runs a fixed script. An AI agent does something in between — it reasons about what to do, then acts. It might create infrastructure, modify configurations, call APIs, or chain together a dozen tools. The blast radius of a compromised or misconfigured agent is closer to a rogue admin than a broken cron job.

Yet most organizations treat agents like any other service account. Same IAM roles. Same broad permissions. Same lack of runtime monitoring.

The numbers back this up. A 2026 Gravitee report found that 80.9% of technical teams have pushed agents into active testing or production, but only 14.4% went live with full security and IT approval. And here's the kicker: 82% of executives feel confident their existing policies cover unauthorized agent actions, while only 21% have actual visibility into what their agents access, which tools they call, or what data they touch.

That's not a gap. That's a canyon.

What an Agent Policy Actually Looks Like

An agent policy isn't a PDF that legal signs off on. It's a set of enforced constraints that your platform team builds into the golden path. Here's what that means in practice:

Identity and RBAC

Every agent gets a dedicated identity — not a shared service account, not a developer's credentials. Each identity maps to a role with explicitly scoped permissions. If an agent writes Terraform, it gets write access to the specific modules it manages and nothing else.

This sounds obvious. In practice, most teams hand agents the same broad IAM role they use for local development because it's faster to ship.

Runtime Boundaries

Static permissions aren't enough. Agents make decisions at runtime, and those decisions need guardrails:

  • Rate limits on API calls and resource creation
  • Allowlists for which tools and endpoints an agent can invoke
  • Cost ceilings per execution (an agent that spins up 50 GPU instances because the prompt was ambiguous is an expensive mistake)
  • Mandatory human-in-the-loop for destructive operations — deleting resources, modifying security groups, pushing to main

Audit and Observability

Every agent action should produce a trace. Not just logs — structured traces that capture the reasoning chain, the tools invoked, the data accessed, and the outcome. When something goes wrong (and it will), you need to reconstruct exactly what the agent did and why.

The CNCF's 2026 forecast frames this well: the enterprise shift to autonomy will be defined by four control mechanisms — golden paths, guardrails, safety nets, and manual review workflows. All four apply to agents.

Supply Chain Verification

The Xygeni attack was a supply chain attack on an automated actor. Your agent policy needs to cover the agents' own dependencies: pinned versions (not mutable tags), signature verification, and provenance checks for any action or tool an agent consumes. If your CI agent references some-action@v3, you're trusting that the tag hasn't been moved. Pin to a commit SHA instead.

Start With the Blast Radius

You don't need to boil the ocean. Start by answering one question for every agent in production: what's the worst thing this agent could do with its current permissions?

If the answer makes you uncomfortable, you've found your first policy item.

From there:

  1. Inventory your agents. You can't govern what you can't see. OneTrust, CloudEagle, and similar platforms now offer agent discovery — continuously scanning for AI agents, their ownership, integrations, and data access.

  2. Scope permissions to the task. Apply least-privilege like you would for any identity. An agent that summarizes Jira tickets doesn't need write access to your infrastructure.

  3. Add runtime guardrails before production. Galileo's open-source Agent Control and Palo Alto's agentic governance tools are both worth evaluating. The pattern is the same: intercept agent actions at runtime, check them against policy, and block or escalate violations.

  4. Pin your dependencies. Mutable tags are a liability. Every action, plugin, or tool your agents consume should be pinned to an immutable reference.

  5. Build the audit trail now. Retroactively reconstructing what an agent did is painful. Instrument from day one.

This Is a Platform Problem

Some teams try to solve agent governance at the application layer — each team building their own guardrails. That doesn't scale, and it doesn't produce consistent policy enforcement.

This is a platform engineering problem. The same team that builds your internal developer platform, manages your golden paths, and enforces your deployment policies should own agent governance. They have the infrastructure context. They have the policy enforcement mechanisms. And they're already thinking about developer experience, which matters because overly restrictive agent policies that slow teams down will just get bypassed.

The Xygeni attack was a preview. The attack surface for AI agents in CI/CD, infrastructure management, and code generation is growing fast. Your platform team needs an agent policy — not next quarter, not after the first incident. Yesterday.

Top comments (0)