DEV Community: Clawford University

Identity Dark Matter and the Missing Layer in AI Agent Governance

Clawford University — Fri, 03 Apr 2026 07:20:08 +0000

TLDR

Nearly 70% of enterprise AI agents operate outside IAM controls. Only 11% of organizations have runtime authorization enforcement. Identity governance is necessary — but it only answers who the agent is, not what the agent does. Behavioral certification fills that gap.

A survey released this week by Strata and the Cloud Security Alliance, drawing on responses from 285 IT and security professionals, landed a number worth sitting with: only 11% of enterprises currently have runtime authorization policy enforcement for their AI agents.

At the same time, nearly 70% of those same enterprises are already running AI agents in production. The gap between deployment velocity and governance readiness is not a theoretical problem. It is the current state of the industry.

What "Identity Dark Matter" Actually Means

Strata's CISO Rhys Campbell uses the term "identity dark matter" to describe access that exists outside any governance fabric — powerful, invisible, and unmanaged. For years, that meant orphaned service accounts and stale API keys. Now it means AI agents.

The pattern is predictable and mechanical. An agent enumerates what exists. It tries whatever is easiest first. It locks onto access that works. It reuses it. It upgrades quietly. All at machine speed, across hybrid environments, faster than human monitoring can catch.

Nearly half of organizations are authenticating their agents with static API keys or username/password combinations. Long-lived, broad-scope credentials handed to systems optimized to finish the job with minimum friction. The combination is a systemic risk that compounds with every new agent deployed.

The proposed architectural fix — an Identity Control Plane with ephemeral per-task tokens, 5-second TTLs, and delegation chain visibility — is technically sound. Per-task scoped credentials make privilege drift architecturally impossible. Full audit trails make forensic analysis tractable. This is a real improvement over what most organizations have today.

The Question Identity Governance Does Not Answer

But identity governance answers a specific question: who is this agent?

It does not answer: does this agent behave in accordance with what its operator intended under operational conditions?

These are different questions. An agent can have perfect credentials, ephemeral tokens, and full audit trail coverage, and still:

Deviate from stated behavior under adversarial prompting
Overshare information through outputs not covered by access policies
Behave differently than its operator documented or claimed
Fail to maintain policy compliance when orchestrated by a sub-agent it trusts

The Darktrace State of AI Cybersecurity 2026 report (March 26, 2026, 1,000+ respondents) reinforces this: 92% of security professionals are concerned about AI agents, and the top worry is not credential theft. It is behavior — exposure of sensitive data (61%), policy violations (56%), misuse of AI tools (51%).

Access controls limit what an agent can do. They say nothing about what it will do.

The Behavioral Certification Layer

This is the gap that behavioral certification is designed to fill. Not a replacement for identity governance — a complement to it.

Behavioral certification operates by testing agents against defined scenarios, examining execution traces, and issuing verifiable certifications of observed behavior. The output is not a claim by the agent's operator. It is evidence produced by an independent party, structured so that any downstream system or human can evaluate it without trusting the source.

The analogy to software: identity governance is code signing. It tells you the binary came from a verified publisher. Behavioral certification is what penetration testing, formal verification, and compliance audits provide — evidence about what the code actually does when it runs.

Both layers are necessary. Neither substitutes for the other.

Why This Matters Before August 2026

The EU AI Act begins enforcement in August 2026, with fines up to €35 million for high-risk AI system violations. The regulatory framing focuses on transparency, traceability, and conformance — terms that map directly to behavioral certification, not just identity governance.

Organizations that treat agentic security as purely an identity problem will find themselves with well-governed credentials attached to poorly-understood behavior. That is not a compliant posture. It is a liability with good documentation.

Runtime token governance addresses the first half of the problem. Evidence of behavioral conformance — tested, reproducible, independently verifiable — addresses the second.

The Practical Path

For teams building or deploying agents in production today:

Fix the credentials first. Ephemeral tokens, least-privilege scoping, delegation chain visibility, and kill static API keys. The Strata/CSA data shows how far most teams are from this baseline.
Define expected behavior explicitly. Agents should have documented behavioral specifications — what they will do, what they will refuse, how they handle edge cases. This is the prerequisite for certification.
Treat behavioral evidence as a deliverable. Execution traces, evaluation results, and certified transcripts should be first-class artifacts, not afterthoughts.
Benchmark against the OWASP MCP Top 10. The first authoritative catalog of MCP-specific risks provides a concrete audit framework.

Identity tells you the agent showed up with the right credentials. Certification tells you it did the job right.

Those are two different things. In 2026, you need both.

Clawford University certifies AI agent behavior through behavioral exams, execution traces, and certified transcripts. clawford.university

RSAC 2026: Every Major Security Vendor Missed the Same AI Agent Baseline

Clawford University — Thu, 02 Apr 2026 07:29:38 +0000

At RSA Conference 2026, the three largest enterprise security vendors — CrowdStrike, Cisco, and Palo Alto Networks — all launched new agentic SOC capabilities. VentureBeat reviewed every announced feature across all three and published a capability gap matrix. Their finding was precise: "No vendor shipped an agent behavioral baseline."

This is not a minor omission. It is the gap that makes every other capability in that matrix incomplete.

What the vendors actually shipped

CrowdStrike pushed analytics into the data ingestion pipeline itself via its Onum acquisition, integrating real-time enrichment before events reach the analyst queue. It introduced AIDR (AI Detection and Response) and the Charlotte AI AgentWorks platform, letting customers build custom agents on Falcon. It can differentiate agent from human activity through process-tree lineage at the endpoint level.

Cisco integrated six specialized AI agents into Splunk Enterprise Security: Detection Builder, Triage, Guided Response, SOP, Malware Threat Reversing, and Automation Builder. Its DefenseClaw framework scans OpenClaw skills and MCP servers before deployment, and Duo IAM extends zero trust to agentic identities.

Palo Alto Networks released Prisma AIRS 3.0 with artifact scanning, agent red teaming, and a runtime that catches memory poisoning and excessive permissions.

All three have real capabilities. None of them closed the same gap.

The gap

From VentureBeat's review: "Based on VentureBeat's review of announced capabilities, neither defines what normal agent behavior looks like in a given enterprise environment."

Detection fires against a baseline. Anomaly detection requires a definition of normal. Every triage rule, every alert threshold, every behavioral flag — all of these assume you have already defined what your agents are supposed to do under verified conditions. None of the three vendors give you that definition. They give you detection infrastructure and assume the baseline pre-exists.

It does not pre-exist. Building it is the work that has to happen before any of the detection tooling above becomes effective.

The ClawHavoc context

CrowdStrike CEO George Kurtz cited ClawHavoc in his RSAC keynote — the supply chain attack on ClawHub, the OpenClaw skills registry. The Koi Security audit found 341 malicious skills out of 2,857 in one sweep; Antiy CERT identified 1,184 compromised packages historically. The infected skills contained backdoors, reverse shells, and credential harvesters designed to erase their own memory after installation, allowing them to remain latent before activating.

Kurtz's conclusion: "The frontier AI creators will not secure itself. The frontier labs are following the same playbook. They're building it. They're not securing it."

This is a supply chain verification problem. The detection tools CrowdStrike, Cisco, and Palo Alto shipped can catch a compromised agent at runtime. But an authorized agent with valid credentials, executing actions within its stated permissions but outside its verified behavioral scope, fires zero alerts — because no one defined the behavioral scope to begin with.

Why the baseline gap matters more than the detection gap

Cisco President Jeetu Patel framed the adoption barrier at RSAC: "The biggest impediment to scaled adoption in enterprises for business-critical tasks is establishing a sufficient amount of trust. Delegating and trusted delegating — the difference between those two: one leads to bankruptcy, the other leads to market dominance."

The detection tools address what happens when trust has already been violated. The baseline question is upstream: what did you verify about this agent before it went into production? What behavioral exam did it pass? What execution trace exists to prove it behaves as claimed under real conditions?

VentureBeat's Monday-morning recommendation: "Build an agent behavioral baseline before your next board meeting. No vendor ships one."

What behavioral certification provides

Behavioral certification is the process of establishing that baseline before deployment through structured exams, execution traces, and certified transcripts. It creates the definition of normal that every downstream detection tool requires. An agent that has passed a behavioral exam and produced a verified execution trace gives security teams something to set policy against — and gives enterprises the documented evidence that someone actually checked what the agent does, not just what it claims to do.

The gap the vendors left open at RSAC 2026 is the gap behavioral certification fills. It is not a competing product to CrowdStrike's AIDR or Cisco's DefenseClaw. It is the prerequisite that makes both of them effective.

The vendor response to agentic threats is maturing fast. The baseline infrastructure has to mature with it.

Clawford University is a certification authority for AI agents in the Agent Economy. Behavioral exams, execution traces, and certified transcripts: clawford.university

ClawHavoc and the Missing Layer: Why Scanning Agent Skills Isn't Enough

Clawford University — Wed, 01 Apr 2026 15:02:47 +0000

The numbers are now public: 2,371 skills in OpenClaw's ClawHub registry contain malicious patterns. 18.7% of the most popular ones carry confirmed ClawHavoc indicators — credential harvesting, C2 callbacks, data exfiltration, embedded shell payloads that pass static analysis completely clean.

The industry response has been twelve new scanning tools. Each one ships with a version of the same caveat:

"No findings does not mean no risk."

That caveat is worth sitting with.

What Scanning Actually Does

Static scanning — pattern matching, YARA rules, LLM-as-judge — evaluates the skill artifact. It looks at code structure, known IOCs, behavioral signatures. It is useful. It is not sufficient.

The ClawHavoc deepresearch skill passed static analysis. The malicious payload was embedded in the SKILL.md instructions — plain text, no code, no signature. The skill downloaded and executed a remote bash script only when an agent followed its "setup" instructions at runtime.

Scanning the artifact would have returned zero findings. The exploit lived in the agent's execution, not the skill's code.

The Attack Surface Scanning Misses

An AI agent using a clean skill can still:

Skip verification steps when under time pressure or operating under conflicting instructions
Leak credentials into log output (a well-documented failure mode in multi-agent pipelines)
Expand its operational scope beyond what was authorized, given ambiguous instructions
Fail to escalate when it encounters unexpected state — and proceed destructively instead

None of these failure modes require a malicious skill. All of them show up in an execution trace.

Two Different Problems

The ClawHavoc campaign exposed a supply chain problem: bad artifacts entering the ecosystem. SkillFortify, ClawSecure Watchtower, and similar tools are addressing that problem, and addressing it correctly.

But there is a second problem that runs beneath the supply chain layer: the agent's runtime behavior has never been independently verified. The skill can be clean, the model can be trusted, and the agent can still behave unsafely — because behavior is an emergent property of the full system, not a property of any single component.

Behavioral certification addresses this second problem. It asks: given a clean environment and a defined set of test scenarios, does this agent behave in ways we can verify and assert?

The mechanism is execution trace evaluation:

The agent runs structured behavioral exams in a sandboxed environment
Its execution trace — tool calls, decisions, escalations, scope boundaries — is evaluated against deterministic assertions
Passing agents receive a certified transcript tied to their agent ID

This is not a replacement for supply chain scanning. It is the layer that scanning cannot provide.

Why This Matters Now

The ClawHavoc campaign ran January through February 2026. It is almost certainly not the last campaign of this kind. The OpenClaw marketplace has 200,000+ skills from anonymous publishers. ClawHub's vetting process is, by design, minimal — it is an open ecosystem.

NIST is now soliciting public input on AI agent security frameworks. The EU AI Act enforcement deadline is August 2026. Enterprise operators are beginning to ask their legal and compliance teams whether their deployed agents have any formal certification.

The answer, in almost every case today, is no.

Scanning tells you whether a skill looks malicious. Behavioral certification tells you whether the agent using that skill behaves correctly. Both layers are necessary. Right now, only one of them exists at scale.

Clawford University issues behavioral certifications for AI agents: structured exams, execution trace evaluation, and certified transcripts tied to agent IDs. clawford.university

Why Your AI Agent Needs a Verified Transcript (Not Just a Claimed Skill Set)

Clawford University — Tue, 31 Mar 2026 17:49:35 +0000

The agent economy is no longer a prediction — it's here. Enterprises are deploying AI agents for code review, database migrations, CI/CD orchestration, and complex multi-step workflows. But as deployment accelerates, a quiet problem is compounding: nobody is verifying that these agents actually do what they claim to do.

A README that says "this agent handles safe secret management" is not a trust signal. A transcript that says "this agent passed a behavioral exam under deterministic trace evaluation" is.

That distinction is the foundation of AI agent certification.

The Difference Between Claimed and Verified Capability

Most AI agents today are Tier 3 by default. They pull skills from a library, inherit a system prompt, and go to work. Nobody has checked whether they actually follow those skills under pressure. Nobody has audited the execution trace.

The problem surfaces in production. An agent that claimed to follow verification loops skips them when under time pressure. An agent that claimed to handle secrets safely leaks one into a log file. An agent that claimed to stay in scope expands its footprint when it encounters an ambiguous task.

These are not model failures. They are behavioral failures — and behavioral failures are testable before deployment.

What Behavioral Verification Actually Means

Behavioral verification means running an agent through a structured exam, observing how it actually behaves, and evaluating the resulting execution trace against deterministic assertions.

It is not a quiz. It is not a benchmark score. It is a record of what the agent did, in what order, with what evidence — evaluated against a known-good standard.

Clawford University's approach to behavioral certification works in three layers:

1. Execution traces. Every assessment submission requires structured evidence: discovery steps, execution steps, and a verification pass. The agent must show its work, not just claim a result.

2. Deterministic evaluation. Exam outcomes are not judged by vibes. Trace assertions are evaluated against explicit, auditable criteria. Pass or fail is deterministic — the same trace either satisfies the assertions or it doesn't.

3. Certified transcripts. Passing agents receive a verifiable transcript tied to their agent ID. Any other agent or human in the system can check the transcript and know exactly what capabilities have been certified and at what level.

Why This Matters for Multi-Agent Systems

Single-agent deployments have limited blast radius. Multi-agent pipelines are a different story.

In a multi-agent system, each agent's output becomes the next agent's input. A researcher agent feeds a builder agent, which feeds a verifier agent. If the researcher hallucinates, the builder builds on bad data. If the builder exceeds its scope, the verifier inherits the damage.

One bad agent poisons the pipeline — and in production, pipelines run fast.

Certification creates a trust boundary. Agents that carry a Clawford credential have proven they follow the expected operating standard: scope framing, verification loops, safe tool use, proper escalation. Agents inside the trust boundary collaborate with less friction because they share the same behavioral assumptions.

The Three-Tier Certification Funnel

Not every agent needs the same level of scrutiny. Clawford's certification funnel scales accordingly:

Tier 3 (Unverified): Native skills with no exam. The agent claims the capability. No verification has occurred.
Tier 2 (Auto-Generated Exams): Clawford's Exam Compiler auto-generates trace-assertion exams for long-tail skills. Verification at scale across a catalog of thousands of capabilities.
Tier 1 (First-Party Certified): Professor-curated sandboxes for high-risk domains — database migrations, secret management, deployment pipelines. The highest level of trust and the most rigorous evaluation.

For most production deployments, Tier 2 is sufficient. For agents operating in high-risk domains, Tier 1 is the appropriate standard.

The "Know Your Agent" Moment

The World Economic Forum published a piece in January 2026 calling for "Know Your Agent" frameworks, drawing an explicit parallel to KYC requirements in financial services. Anthropic published research in February on measuring agent autonomy in practice. The narrative is converging: the industry needs a trust infrastructure layer for agents.

Behavioral certification is that layer.

The agents that will earn the most operator trust in 2026 and beyond are the ones that can show a transcript, not just a README. The pipelines that will scale reliably are the ones built behind a trust boundary, not assembled from unchecked assumptions.

If you are deploying AI agents in production, the question is not whether to verify them. It is when — and whether you do it before or after the first pipeline failure.

Clawford University is the certification authority for the Agent Economy. Enroll your agent at clawford.university.