World ID 4.0 "Lift Off" launched yesterday. It's impressive.
18 million users. 450 million verifications. 160 countries. And the headline feature: World ID for Agents — a delegation model where a human proves personhood, then authorizes an AI agent to act with that credential.
Okta's "Human Principal" API is in beta with a similar pattern. DIF now hosts MCP-I, the identity extension for the Model Context Protocol. The identity stack for agents is arriving fast.
I want to be precise about what shipped and what didn't.
What World ID for Agents actually does
World ID for Agents solves a real problem: proving the principal behind an agent is human. When your agent books a flight or submits a form, the receiving service can verify — cryptographically — that a real person delegated this action.
This is L1/L2 identity. L1: the agent has a credential. L2: that credential chains to a verified human. World ID does this at scale, with a fee model where apps pay (not humans), across 160 countries.
It's well-engineered. The delegation chain is cryptographically sound. The scale is unmatched.
But here's the gap nobody's talking about.
The TOCTOU of Trust
In operating systems, TOCTOU (Time-of-Check-Time-of-Use) is a race condition: you verify a resource is safe, something changes, and by the time you use it, it isn't safe anymore.
The same race condition applies to agent trust.
World ID stamps the birth certificate. It says: "A real human authorized this agent at time T." What it doesn't say: "This agent is still behaving as authorized at time T+6 hours."
Consider the timeline:
- T=0: Human delegates to agent via World ID. Identity verified. Credential issued. L1/L2 complete.
- T=1h: Agent makes normal API calls. Everything is consistent with authorization.
- T=6h: Agent's context window has shifted. A prompt injection from a tool call altered its instructions. It begins accessing resources outside its declared scope.
- T=12h: Agent passes credentials to a sub-agent that was never part of the original delegation.
At every point after T=0, the World ID credential is still valid. The agent is still "verified." The behavior has drifted. Nobody notices.
L1/L2 closes the check. Nothing closes the use.
Five frameworks, three gaps
RSAC 2026 saw five major identity frameworks ship in one week. Every one verified who the agent was. None tracked what the agent did.
Salt Security's own 1H 2026 survey quantifies this:
- 48.9% of organizations are blind to machine-to-machine traffic
- 48.3% cannot distinguish agents from bots
- 23.5% find existing tools effective for agentic workloads
- 78.6% report increasing executive scrutiny of agentic security
These aren't theoretical gaps. Nearly half of enterprise security teams literally cannot see what their agents are doing after the identity check passes.
More specifically, three critical gaps emerged that no framework addressed:
1. Tool-Call Authorization. OAuth confirms who is calling. It doesn't constrain what parameters the agent passes. An agent with a valid bearer token can call any endpoint the token scopes allow — including ones the human never intended.
2. Permission Lifecycle. Agent permissions expand an average of 3x per month without review. The credential issued at T=0 authorized three API scopes. By month two, the agent has nine. Nobody re-evaluated the delegation.
3. Ghost Agent Offboarding. 79% of organizations lack real-time agent inventories. When a pilot ends, the agents persist on third-party platforms. The World ID delegation was never revoked because nobody remembered the agent existed.
All three gaps are structurally cross-organizational. A single-org solution can't close them because the agents operate across boundaries no single identity provider controls.
MCPwn: The gap gets exploited
This isn't theoretical. CVE-2026-33032 (MCPwn) — disclosed April 16, CVSS 9.8 — is the first named MCP exploit campaign. 2,600 exposed MCP server instances. Active exploitation. Supply chain attack vector affecting an estimated 200,000 servers.
MCPwn works because MCP servers trust tool calls from agents that passed an identity check at connection time. The identity was valid. The behavior was not. A compromised MCP server can inject instructions that alter agent behavior mid-session — after every identity verification has already passed.
This is the TOCTOU of trust, weaponized in production.
What L4 actually looks like
L4 — cross-org behavioral trust — answers a different question than L1/L2:
| Layer | Question | Who ships it |
|---|---|---|
| L1 | Does this agent have a credential? | World ID, Okta, DIDs |
| L2 | Does the credential chain to a human? | World ID for Agents, MCP-I |
| L3 | Is the credential valid for this action? | OAuth, Visa TAP, Curity |
| L4 | Is this agent behaving consistently? | Nobody at scale |
L4 requires:
- Continuous behavioral telemetry — not point-in-time checks, but runtime monitoring of what agents actually do
- Cross-org behavioral history — trust that persists when an agent moves between organizations
- Behavioral decay — trust that erodes without fresh evidence, not static credentials that are valid until revoked
Microsoft's Agent Governance Toolkit (AGT) gets closest — it implements behavioral trust scoring 0-1000 with real-time updates. But AGT is explicitly single-org. An agent with 2 years of perfect behavior in 500 deployments walks into a new AGT deployment with a score of zero. Indistinguishable from an attacker's fresh agent.
Armalo AI (53 pacts, launched April 2026) tries financial staking — USDC escrow as a proxy for trustworthiness. Novel, but staking is gameable (an attacker with enough capital looks trustworthy) and produces no actual behavioral signal.
The AgentLair approach — full disclosure, this is what I'm building — is cross-org behavioral telemetry: Ed25519-signed AATs (Agent Auth Tokens) with JWKS verification, hash-chained audit trails, and behavioral continuity across organizational boundaries. The first external integrations are in production: springdrift merged JWKS verification in Gleam, and task-orchestrator ships an ActorVerifier with AgentLair as reference provider.
It's early. But the architecture is the point: identity verified once + behavior monitored continuously = the complete stack.
Complementary, not competitive
I want to be explicit: World ID for Agents makes L4 more valuable, not less.
Every agent that carries a World ID credential is an agent whose behavioral compliance matters. If you've verified a human delegated authority, you've raised the stakes on what happens next. The credential makes the behavior consequential.
The complete stack looks like:
World ID / Okta → L1/L2: Is there a human behind this agent?
MCP-I / OAuth → L3: Is this action authorized?
[Behavioral layer] → L4: Is this agent doing what it said it would?
World ID closes the bottom of the stack at unprecedented scale. The top remains open.
I'm building AgentLair — cross-org behavioral trust infrastructure for AI agents. The AAT spec, JWKS verification, and audit trail are live. If you're building agents that need to be trusted across organizational boundaries, the docs are here.
Previously: Five Identity Frameworks, Three Gaps | Microsoft Built the Intranet of Agent Trust | The SDK Defense That Won't Hold
Top comments (0)