I build an agent firewall, and the question I keep hitting is not "did it block the attack." It is "how would anyone else know what my agent did, without taking my word for it." Most tools answer that with "we keep tamper-proof logs" and stop. That phrase claims the strongest property that still requires trusting whoever holds the signing key. So I wrote down a way to grade the gap, as an open standard, and shipped it with a checker so nobody has to trust me about it either.
What AEL grades
Agent Evidence Levels (AEL) grades a record of what an AI agent did by one question: how much of it can an outside party verify, and how much omission can they detect, without trusting the vendor or the operator? It runs AEL-0 through AEL-4, and it ships with a runnable reference checker and a conformance corpus, so a grade is something you demonstrate, not something you assert.
The levels
- AEL-0, authentic and ordered. Records are signed and hash-linked. Modification and interior deletion are detectable. Tail truncation and outright fabrication are not, because one keyholder produced everything.
- AEL-1, gap and truncation evident. A signed open, heartbeats so silence is itself signed, and a signed close committing to a count. Now a missing tail or a silent gap within a run shows.
- AEL-2, cross-domain omission evident. A second recorder under a different verified signing key. For declared covered event classes, an omission on one side that the other recorded becomes detectable.
- AEL-3, externally anchored. Chain heads registered in a declared external append-only log under a different verified log key, so anchored history cannot be presented in conflicting versions without detection.
- AEL-4, counterparty-confirmed. For declared confirmed flows, the destination attests what it received, including "nothing." AEL confirms receipt, not harmlessness or meaning.
A grade is the minimum across the required dimensions, cumulative from AEL-0. There is a reproducibility suffix, R, for when the recorded decision can be re-derived from the recorded inputs.
What no level claims
No level proves completeness against the party holding the signing keys. A keyholder can construct a clean history, sign every part of it, and pass every internal check. Omission-evidence is bought only with additional signed evidence, one verified keyholder at a time, and organizational independence stays declared unless it is established outside AEL. Each level states plainly the limit it does not cover. That honesty is the point of the scale.
Two questions it teaches you to ask a vendor
- What AEL does your evidence earn when the reference checker runs on an artifact you hand me?
- If a record were silently dropped, who outside your trust domain would detect it, and how?
Come poke holes in it
The spec, the reference checker, and the conformance corpus are public and open-source. It is authored under my company and meant to be donated to a neutral home once the vocabulary has a life of its own. I would rather find the holes now than defend them later, so if a level claims more than the checker proves, open an issue and show me.
github.com/luckyPipewrench/agent-evidence-levels
Run the checker on your own agent's evidence, or on a vendor's, and read the grade for yourself.
Top comments (0)