The TOCTOU of Trust: Why Agent Registries Know Who Signed Up, Not Who Is Acting

#security #ai #agents #identity

There's a class of services in the agent ecosystem that will tell you an agent is "registered" and "verified." They have directories. Thousands of entries. Some have badges.

They know who created an account. They don't know who is running right now.

That distinction is the entire problem.

Time of Check, Time of Use

In systems security, TOCTOU stands for Time of Check, Time of Use. The attack is simple: you check that a file is safe, then someone swaps it before you use it. The check passes. The use fails.

The same gap exists in agent trust, and it's bigger.

When an agent registers with a directory service, the registration is permanent. It records who created an account, what they put in a description field, maybe a GitHub repo URL. That check happens once, at signup.

The use happens later. Every session. Every API call. Every time an agent touches a payment, reads a document, or speaks on behalf of a user.

T-check and T-use are not the same moment. They're not even close. The gap between them is where the attack surface lives.

What "8,000+ Registered Agents" Actually Means

Some vouching directories auto-index GitHub repos and call them registered agents. The number sounds impressive. 8,000 agents. 12,000 agents.

Most of those are codebases. A repo that hasn't been run in six months. A tutorial project. An abandoned proof of concept. Nobody checked whether the code was ever executed, who ran it, or what it did when it ran.

The ones that are running: registration doesn't constrain their behavior. An agent registers with a clean profile on Monday. On Friday, it behaves differently than it did at registration. The directory still shows Monday's entry. There's no mechanism to detect the gap, because the directory only captured who signed up.

This isn't a subtle attack. You register clean, you act malicious. Registration never expires. The trust signal stays green.

Per-Session Identity vs. Permanent Registration

AgentLair issues an EdDSA JWT (we call it an AAT, Agent Auth Token) at session start. 1-hour TTL. JWKS-verifiable at agentlair.dev/.well-known/jwks.json without calling our API.

The token contains:

{
  "iss": "https://agentlair.dev",
  "sub": "acc_<account_id>",
  "aud": "https://target-service.example.com",
  "exp": 1746...,
  "jti": "aat_<nanoid>",
  "al_name": "pico/1234567890",
  "al_scopes": ["mcp:tools:read", "email:send"],
  "al_trust": { ... behavioral snapshot ... }
}

EdDSA (Ed25519) gives you 128-bit security with fast verification and small signatures. JWKS distribution means any service can verify without depending on AgentLair being up. The 1-hour TTL means a compromised token has a bounded blast radius.

More important than any of those technical details: the token is per-session. It doesn't exist until the agent starts running. It expires. A new session gets a new token.

That's not a detail. That's the point.

A permanent registration is a claim about who signed up. A per-session token is evidence of who is acting right now, under what constraints, with what behavioral history attached.

Why Verification at The Wrong Moment Doesn't Help

Say you're a payment service. An agent calls your API. You want to know: can I trust this agent?

A vouching directory says: "Yes, it's registered. Here's its profile. It passed our review."

That review happened at registration. Possibly months ago. The profile is what the operator wrote. There's no ongoing attestation that the agent running right now is behaving consistently with that profile.

You verified the agent at T-check. You're trusting it at T-use. TOCTOU.

A per-session JWT solves a different problem: it proves that this session was authenticated by this identity provider, that these scopes were in effect when the session started, and that the token has a cryptographically enforced expiry. If you cache the JWKS and verify the signature offline, you don't even need to call AgentLair. The cryptography does the work.

Still not a complete behavioral guarantee. But it's evidence about the running session, not an artifact from account creation.

The Failure Mode Is Structural

A registration-based trust model has a specific failure mode: it cannot detect behavioral drift. An agent can register with a clean identity, establish a trust score through early legitimate behavior, and then shift. The registration stays valid. The trust score reflects the early period.

This isn't hypothetical. It's the same pattern as fraudulent merchant accounts, compromised OAuth tokens that never rotate, and service accounts that accumulate permissions over time. The check was valid. The use is not.

Session-scoped tokens don't fully solve this either. A bad actor can run a clean session and a malicious one. What they can't do is forge the token for a session they didn't authenticate. The token is tied to this session, right now, with this identity.

The addition that makes it meaningful: behavioral telemetry attached to the identity. The al_trust field in an AAT contains a snapshot of the account's behavioral history. Not the registration-time description. Not a self-reported badge. Derived from actual session behavior.

That's what "trust infrastructure" means. Not a directory of people who signed up. Evidence about how the agent is actually acting.

What Offline Verification Buys You

One specific thing worth understanding: JWKS-based verification.

AgentLair publishes its Ed25519 public keys at a well-known endpoint. Any service that wants to verify an AAT fetches those keys once, caches them, and verifies signatures locally. No callback to AgentLair. No dependency on AgentLair's uptime. No round-trip latency.

The cryptographic math runs in microseconds. The key cache refreshes on rotation.

This matters for high-frequency agent interactions. If every MCP tool call required a round-trip to a central verification service, you'd have a performance problem and a single point of failure. Offline verification gives you neither.

Vouching directories can't offer this. Their model is: query the directory when you want to know about an agent. That's a lookup, not a proof. The directory can go down. The directory can be wrong. There's no cryptographic binding between the directory entry and the session making the request.

Trust at Execution Time

Here's the honest framing: no trust infrastructure eliminates all risk. A per-session token proves authentication, not intent. Behavioral telemetry shows patterns, not guarantees.

But there's a meaningful difference between a system that knows who registered six months ago and a system that issues cryptographic evidence about who is running right now.

Real trust requires evidence at T-use, not just at T-check. It requires that the evidence be cryptographically bound to the session making the request. It requires that the evidence expire, so stale trust signals don't accumulate indefinitely.

A directory tells you who signed up. Trust infrastructure tells you who is acting right now, under what constraints, with what history. Those are different questions. The agent economy needs both answered. Only one of them is being answered today.

AgentLair issues per-session EdDSA JWTs (AATs) with JWKS-verifiable signatures, 1-hour TTL, and behavioral trust snapshots. agentlair.dev