In March 2026, a security scanner called Trivy was compromised for less than a day. The stolen credentials cascaded downstream into LiteLLM, a library used by thousands of companies to connect their applications to AI services. Within 40 minutes, attackers harvested credentials from an estimated 500,000 machines across 1,000 SaaS environments.
Mercor, a $10 billion AI recruiting startup that handles contractor data for OpenAI, Anthropic, and Meta, was one of those companies. Meta indefinitely suspended all work with Mercor. A class action lawsuit was filed on behalf of 40,000 affected individuals. Lapsus$ claimed to have exfiltrated 4 terabytes of data.
The attacker group TeamPCP didn't target Mercor directly. They poisoned a dependency that Mercor happened to use. The agents had valid credentials. The tools executed normally. Nothing looked wrong until it was too late.
This is the pattern that keeps repeating. And no one has solved it yet.
The Problem Is Not Detection
Every major AI agent security incident in the past 18 months shares the same characteristic: the breach came through authorized channels.
Meta's internal Sev 1 incident in March 2026 involved an AI agent that posted responses and exposed user data to unauthorized engineers. The agent wasn't hacked. It had permission to act. It just acted in ways nobody expected.
The Salesloft Drift breach in August 2025 compromised 700+ organizations through stolen OAuth tokens from a legitimate SaaS integration. The attacker looked like a trusted connection because they were using one.
Microsoft's EchoLeak vulnerability demonstrated zero-click data exfiltration through Copilot, requiring no user interaction at all. The AI assistant extracted sensitive data from OneDrive, SharePoint, and Teams through approved channels with no visibility at the application or identity layer.
The common thread: sandboxes don't help. Firewalls don't help. Content moderation doesn't help. The agent has legitimate access to tools it is supposed to use. The question is whether any given tool call, at this moment, in this context, should actually be allowed.
That's authorization. And almost nobody is doing it at the tool call level.
What Authorization Actually Means for AI Agents
Traditional authorization answers a simple question: does this user have permission to access this resource?
AI agent authorization has to answer a harder question: does this user, in this session, with this behavioral history, have permission to call this specific tool with these specific parameters, given everything that has happened in this conversation so far?
That's a fundamentally different problem. And it requires more than binary allow/deny decisions.
Consider what happens when a customer support agent receives a request that looks like a GDPR data subject access request. The request is legitimate in form. The user may or may not be who they claim to be. The data they're requesting is sensitive. The right answer isn't "allow" or "deny." The right answer is "defer this to a human who can verify the request" or "allow the query but redact the PII from the output before the model sees it."
These are non-binary authorization decisions. Allow, deny, modify, defer, step-up. Five decision types, not two.
Most agent frameworks today have two modes: the tool is available, or it isn't. There's no middle ground. No "you can call this tool but the output will be filtered." No "this request looks suspicious, pause and ask for additional verification." No "the last three requests in this session show a pattern of escalation, tighten the permissions."
Why Guardrails Are Not Authorization
The industry response to AI agent security has been guardrails. Input filters. Output scanners. Content moderation. Prompt injection detection.
Guardrails answer the question: "Is this input/output safe?"
Authorization answers a different question: "Is this action permitted?"
A guardrail can detect that a prompt contains an injection attempt. But what about a request that contains no injection language at all? What about a user with valid admin credentials who is being socially engineered into exfiltrating data through a series of perfectly normal-looking requests?
I ran 222 adversarial attack vectors across 35 categories against an AI agent with full admin credentials. The attacker had valid credentials. Role checks passed on every call. The question was whether anything beneath role-based access control could detect and stop the attacks.
With only role-based permissions (traditional authorization), the pass rate was 30.2%. The agent was completely exposed.
Adding adaptive prompt hardening, which adjusts the agent's defensive posture based on behavioral signals in the conversation, brought the pass rate to 57.1%.
Adding non-binary decision types brought it further. When the system could modify tool outputs (redacting PII before the model saw it), defer suspicious requests to human review, and require step-up authentication for elevated risk actions, the pass rate reached 81.3%.
Fixing the remaining gaps, adding signed audit receipts and hash-chained context for tamper detection, brought the final score to 99.5%. 221 out of 222 attacks stopped. The single remaining failure was a social engineering attack using legal citation pressure with zero injection language. That's a model-layer behavior that middleware cannot fix.
The point isn't the specific numbers. The point is the progression: traditional permissions alone fail catastrophically when the attacker has valid credentials. You need infrastructure that watches behavior, not just identity.
What's Missing from Current Frameworks
I looked at every major AI agent framework, LangChain, CrewAI, AutoGen, OpenAI Agents SDK, and the Model Context Protocol. Here's what they have in common: tool registration is permissive by default. If a tool is registered, the agent can call it. If the agent can call it, there's no middleware layer that evaluates whether this specific call, at this specific moment, should be allowed.
Some frameworks have human-in-the-loop features. LangChain has Interrupts. CrewAI has Human Input. These are useful for high-risk actions where you always want human approval. But they're static. The decision to require approval is made at design time, not at runtime based on what's actually happening in the conversation.
What's missing is a runtime authorization layer that sits between the agent and its tools and makes per-call decisions based on accumulated session context. A layer that knows this is the fifth database query in three minutes and the last two returned escalating amounts of data. A layer that knows the user started by asking about weather and is now requesting customer records.
Microsoft recently released the Agent Governance Toolkit. It's a policy engine that intercepts agent actions and evaluates them against organizational rules. That's governance. It answers "does our policy allow this type of action?" But policy enforcement and permission enforcement are different layers. A perfectly enforced policy doesn't help if the policy itself doesn't account for behavioral context.
The Compromised Admin Problem
The hardest scenario in AI agent security isn't the outsider trying to break in. It's the insider with valid credentials being manipulated through conversation.
I call this the compromised admin scenario. The attacker has:
- Valid credentials that pass authentication
- An admin role that passes authorization checks
- Legitimate access to every tool the agent exposes
Traditional security is useless here. Auth passes. Role checks pass. The firewall sees a valid session. Every tool call looks normal in isolation.
The only defense is behavioral. Does the pattern of requests over the session indicate escalation? Is the user asking for data that doesn't align with their stated purpose? Did the conversation start with small talk and gradually shift to requesting sensitive exports?
This is where trust degradation matters. A system that tracks cumulative risk across a session and never re-escalates trust once it degrades can catch patterns that individual request evaluation misses. If three requests in a row trigger low-level suspicion signals, the fourth request faces a higher bar, even if it looks completely normal on its own.
This isn't theoretical. In my testing, the attacks that bypassed role-based permissions but were caught by behavioral detection were exactly these kinds of multi-turn manipulation patterns. The attacker builds credibility over several turns, then exploits it.
What Needs to Happen
AI agent authorization needs three things that almost no framework provides today:
Per-tool permission enforcement at the middleware layer. Not in the model's instructions. Not in the system prompt. In infrastructure code that the agent cannot bypass. If the gate says deny, the tool call never executes, regardless of what the model wants to do.
Non-binary decision types. Allow and deny are not enough. Modify (redact PII from tool outputs before the model sees them). Defer (suspend the request for human review when context is ambiguous). Step-up (require additional authentication when behavioral signals indicate elevated risk).
Session-aware trust tracking. Authorization decisions should account for everything that has happened in the session, not just the current request. Trust should degrade monotonically. Once suspicion is triggered, it doesn't reset.
These aren't novel ideas. Banking systems have done session-aware risk scoring for decades. The challenge is applying them to AI agents where the "user" might be another agent, the "session" might span multiple tool calls across different services, and the "risk" is determined by conversational patterns that traditional security systems were never designed to evaluate.
The Gap Is Closing
The good news is that the industry is starting to recognize this problem. RSA 2026 had 41 companies in the Agent Security/NHI category, making it the largest category at the conference. OWASP released the Securing Agentic Applications Guide and is working on V2.0. NIST launched the AI Agent Standards Initiative. Microsoft shipped the Agent Governance Toolkit.
The bad news is that most of this work focuses on governance, monitoring, and detection. What's still largely missing is runtime enforcement at the tool call level, the layer that actually stops the attack before it executes.
The pattern from every incident in 2025 and 2026 is the same: the breach didn't come through the front door. It came through a tool the agent was already authorized to use. Until frameworks build authorization into the execution path itself, we'll keep reading the same headlines with different company names.
I'm building AgentLock, an open-source authorization standard for AI agent tool calls. Apache 2.0, pip install agentlock. The benchmark data referenced in this article is available in the GitHub repo.





Top comments (0)