DEV Community

Michael "Mike" K. Saleme
Michael "Mike" K. Saleme

Posted on

The Agentic Maturity Model Is Missing an Axis: Who Validated the Claim

On June 3, the OWASP GenAI Security Project published State of Agentic AI Security and Governance 2.0, and with it an Enterprise Adoption Maturity Model that grades two things at once.

One axis measures deployment: AT0 Shadow AI through AT5 custom in-house agents that you built and whose identity, tools, and boundaries you control. The other measures governance maturity: Level 0 ad hoc through Level 3, where agents are treated as critical infrastructure with governance-as-code, kill switches, and real-time drift dashboards.

It is the clearest two-axis picture the field has published. It also shares a blind spot with every maturity model that came before it.

Both axes describe what the organization does. Neither captures who verified that it does it.

Two organizations, same cell, different truth

Take two organizations that both self-place at Governance Level 3. Both claim governance-as-code. Both claim kill switches. Both claim continuous drift monitoring.

One arrived there through an internal red-team's self-attestation. The other arrived through independent adversarial assessment with a published, reproducible evidence base. On the matrix, they occupy the same cell. In a procurement review, in an incident post-mortem, in front of a regulator, they are not the same artifact.

A maturity model that measures what an organization does, but not who validated it, grades the claim and not the control.

The pattern already exists in established assurance

This is not a novel demand. Assurance practice has separated self-attestation from independent validation for decades. A SOC 2 Type I report describes controls as designed; a Type II report tests whether they operated over time. A vendor security questionnaire and a third-party penetration test answer different questions, and no mature buyer treats them as interchangeable.

Agentic governance has not yet imported that distinction. The EU AI Act's high-risk obligations take effect in August 2026, and they turn on demonstrable oversight, not asserted oversight. The maturity model needs a third axis that the regulation is about to require anyway: evidence type.

What the third axis looks like

Evidence type asks one question of every governance claim: what class of evidence supports it, and is the claim stronger than that evidence permits?

This pattern exists in disciplined evaluation work. For example, in the public agent-security-harness VS-R01 evaluation of agent-payment infrastructure, every finding is tagged with an evidence class:

  • E1 — static or documentation observation
  • E2 — admission-time runtime observation (the API's response at the input gate, before settlement)
  • E3 — settlement-time runtime observation
  • E4 — adversarial replay and persistence validated
  • E5 — cross-context isolation confirmed against both negative and positive controls

Each class maps to a maximum permitted claim strength. An E2 observation may describe how an API admits or refuses a crafted input; it may not claim the platform enforces a limit, because enforcement is a settlement-time property and settlement was not measured. The most common failure mode in agent-security writeups — making an enforcement claim from admission evidence — becomes visible at review time instead of in production.

That is the third axis made concrete. It is reproducible from a public branch state by any reviewer with their own test enrollment, which is the property that separates evidence from assertion.

The cell isn't the credential

The OWASP model is a real advance, and the right place to put this. Adoption tells you how much autonomy an organization has handed its agents. Governance maturity tells you how much control it claims to have built. Evidence type tells you whether anyone outside the organization can check.

For agents that hold credentials, move money, and act on untrusted input, the third question is the one that survives contact with a regulator. Grade the evidence, not the claim.


Sources

Top comments (0)