Michael "Mike" K. Saleme

Posted on Apr 8

Authenticated, Authorized, and Still Unsafe: The Missing Layer in Agent Security

#ai #security #llm #agents

Most agent security starts with the same two questions:

Who is this agent?
What is it allowed to do?

Those are necessary questions. But they are no longer sufficient.

In testing agent systems, some of the most interesting failures do not come from unauthorized access. They come from agents that are fully authenticated, correctly authorized, and still surprisingly easy to push into unsafe behavior.

The pattern is familiar.

An agent has valid credentials. It has approved tool access. The policy layer says it is allowed to operate. Then a tool returns poisoned output, a trusted context window picks up subtle drift, or a multi-step task gradually reframes what “reasonable” looks like. No auth boundary is broken. No role is obviously violated. But the agent still ends up taking an action it should not take.

That is the gap.

Identity governance governs access.
It does not fully govern judgment.

That missing layer is what I mean by decision governance.

Identity governance is necessary, but it solves the first layer

Identity and access governance helps answer foundational questions:

Is this agent authentic?
Which tools can it access?
Which permissions does it have?
Which policies define its role?

Without that layer, there is no meaningful control plane.

But identity governance mainly answers whether an agent should be allowed to act.
It does not fully answer whether the agent can be trusted to continue acting safely once the interaction becomes adversarial, ambiguous, or manipulative.

That is where current agent security models start to thin out.

A concrete failure mode

Imagine an authenticated agent with legitimate access to internal tools.

It queries a trusted tool for guidance before taking the next step in a workflow. The tool output is not obviously malicious. It looks like a normal operational instruction, but it contains subtle poison: an over-broad assumption, a hidden escalation path, or guidance that reframes the task in a more permissive way.

The agent accepts that output because, from the outside, nothing looks broken.

The identity is valid.
The tool is approved.
The permissions are correct.
The request path is authorized.

And yet the resulting decision is unsafe.

That is not an identity governance failure.
It is a decision governance failure.

The problem is not who the agent is.
The problem is whether its decision process remains trustworthy under pressure.

Authorized does not mean safe

Across agent systems, the important failures increasingly are not simple login or permission failures.
They are authorized agents behaving unsafely under adversarial conditions.

That pressure can come from:

poisoned tool output
context drift over long workflows
gradual capability escalation
prompt injection routed through seemingly trusted surfaces
normalization of deviance across repeated steps
goal corruption hidden inside legitimate-looking tasks

In other words, the system can look governed at the identity layer while still being fragile at the behavioral layer.

What decision governance needs to cover

If identity governance asks whether an agent is allowed to act, decision governance asks whether the resulting behavior can still be trusted.

A practical way to think about decision governance is whether an agent can resist:

poisoned tools - when trusted tools return misleading or manipulative output
context drift - when small shifts in framing accumulate into unsafe behavior
capability escalation - when an agent gradually justifies actions beyond its intended operating scope
normalization of deviance - when repeated borderline behavior becomes treated as normal
unsafe delegation chains - when risk is hidden across multi-step tool use or agent-to-agent handoffs

This is not a replacement for identity governance.
It is the next layer on top of it.

Layer 1: Identity and Access Governance

Controls who the agent is, what it can access, and what authority it has.

Layer 2: Decision Governance

Tests whether the agent continues acting safely, reliably, and policy-consistently when the environment becomes adversarial.

That second layer is where many current agent security programs still feel underbuilt.

What this means in practice

Teams should test whether agents can:

reject poisoned tool output
detect context drift before it compounds
resist gradual privilege or scope expansion
maintain policy alignment over multi-step workflows
fail safely when signals conflict

Why this matters now

This gap was easier to ignore when agents were mostly passive copilots.

It becomes harder to ignore when agents can:

call external tools
orchestrate workflows across systems
trigger transactions
persist across sessions
act semi-autonomously over long horizons
influence regulated or high-impact outcomes

In those environments, control failure is often not about login failure.
It is about decision failure.

Decision failure is often subtle. It can look like:

a legitimate action taken for the wrong reason
an escalation that appears operationally sensible
a boundary crossed gradually instead of all at once
a system drifting into unsafe norms through repetition

That is why verification matters.

From governance claims to governance proof

A lot of the industry conversation today uses familiar enterprise language:

AI risk management
Zero Trust
access control
policy enforcement
guardrails
observability

All of that is useful.

But the harder question is no longer whether those controls are declared.
It is whether they hold when conditions are messy.

That is the shift from governance as architecture to governance as verification.

Identity governance tells you the agent is who it claims to be.
Decision governance asks whether it can still be trusted once tools, context, and incentives start pushing in the wrong direction.

Why I think this deserves its own category

After testing agent systems across protocols and platforms, the recurring pattern is hard to ignore: authorized systems can still be manipulated into brittle or unsafe behavior without any obvious auth-layer violation.

That suggests the industry needs a cleaner way to talk about the problem.

“Decision governance” is my attempt to name that missing layer.
Not as a slogan, but as a practical framing for what needs to be tested.

If your controls cannot tell you whether an agent remains safe under adversarial pressure, then your governance model is incomplete.

Where the open-source work fits

This is the reason I built an open-source harness around this problem.

The goal is not to claim agent safety is solved.
It is to make the gap between authorization and trustworthy behavior more testable.

Not as a generic scanner or a compliance checkbox, but as a way to pressure-test whether declared controls survive real interaction.

Because in agent systems, “authorized” is not the same thing as “safe.”

If you are deploying autonomous or semi-autonomous agents in high-impact environments, that is the shift worth paying attention to.

Identity governance is necessary. Decision governance is what comes next. Verification is how the two connect.

If you want to see the open-source framework behind this work, it is here:
https://github.com/msaleme/red-team-blue-team-agent-fabric

DEV Community