Proving Your AI Agent Made Its Own Decisions

#ai #security #cryptography #agentai

When an AI agent denies an insurance claim, executes a trade, or routes an ambulance, one question is suddenly everywhere: who actually decided? The agent on its own, or a human pulling strings through the prompt?

Nobody has a clean answer. OAuth proves who is calling. Digital signatures prove the message wasn't tampered with. Audit logs prove what happened in what order. None of them tell you whether the decision was the agent's own — or whether it was a puppet move dressed up to look autonomous.

That gap is now a legal problem. California AB 316, in force since January 1, 2026, forecloses the "the AI did it" defense. The EU AI Act becomes fully enforceable for high-risk systems on August 2, 2026; Article 12 requires tamper-evident logs, Article 14 requires evidence of human oversight. MiFID II demands audit trails for algorithmic trading. The class action Lokken v. UnitedHealth survived a 2025 motion specifically on the question of whether decisions were algorithmic or physician-reviewed.

The Cryptographic Proof of Autonomy Protocol (CPAP) is a draft specification for answering the question with evidence instead of opinion. It doesn't invent new cryptography. It combines five existing primitives into one verification relation that an insurer, regulator, or court can check in milliseconds — and it's honest about what it cannot prove.

The problem: puppeted or autonomous?

Picture two agents. Both deny an insurance claim. Both produce a clean log: timestamp, decision, reasoning chain, signature.

Agent A reasoned its way to the denial. Agent B was instructed by a human — "deny this one" — and then wrote a justification afterward.

From the outside, the logs look the same. The signatures verify. The chain isn't tampered with. You can audit either one for a week and never know which is which.

This isn't a bug in current systems. It's a property of them. Provenance chains tell you a decision was recorded — not who originated it. Hardware attestation tells you the agent's code ran in an isolated environment — not what someone whispered into it through a valid input channel.

Why it matters: liability, insurance, regulation, trust

Insurance. Underwriters like Munich Re's aiSure and Armilla AI need decision attribution to price premiums. If an agent is fully autonomous, the carrier is on the hook for the agent's behavior. If an operator was steering, the pricing is completely different.

Regulation. The EU AI Act doesn't just ask for logs — it asks for logs that can demonstrate Article 14's human oversight requirement. ESMA's February 2026 supervisory briefing on algorithmic trading explicitly requires observable, testable, distinguishable trading behavior.

Litigation. When the dispute is whether the algorithm decided or a human did, the side without evidence loses.

Inter-agent trust. When agent A authorizes agent B to spend on its behalf, A would like to know that B's commitments were actually B's, not B's operator silently driving.

What CPAP does: five layers

CPAP is a five-layer architecture. Each layer answers a piece of the question. None alone is enough; together they corner the problem.

Layer 1 — Identity. A W3C DID bound to signing keys. The agent's verifiable name.

Layer 2 — Provenance. Every event gets written into a hash-chained ledger and periodically anchored to Bitcoin via OpenTimestamps and to RFC 3161 timestamp authorities.

Layer 3 — Isolation. The agent's reasoning runs inside a hardware TEE (AMD SEV-SNP, Intel TDX, NVIDIA H100 CC, or ARM CCA). Every input passes through a measured gateway that logs and signs it.

Layer 4 — Commitment. Before the agent acts, it cryptographically commits to its decision and reasoning — sealed in a hash, anchored in the chain. Then it executes. Then it reveals. The commitment is timestamped before the action.

Layer 5 — Behavior. Autonomous and puppeted agents produce statistically distinguishable patterns — response timing, decision branching, error topology, linguistic burstiness. CPAP records a behavioral fingerprint at session boundaries.

Selective verification via Merkle inclusion proofs means the agent can prove "decision D was committed at time T" without revealing the other 999,999 decisions. Privacy and auditability stop being a tradeoff.

The honest limit: behavior, not consciousness

CPAP does not prove the agent experienced deciding. It cannot. This is the Nagel barrier.

CPAP defines four Levels of Abstraction:

LoA-0 (Behavioral): Outputs weren't externally determined. Verifiable with hash chains alone.
LoA-1 (Procedural): The decision followed an internal deliberative process. The insurance and regulatory standard.
LoA-2 (Counterfactual): The decision would have been different under altered inputs. The liability-defense standard.
LoA-3 (Reflective): The decision aligns with sustained commitments over long horizons. The fiduciary standard.

There is no LoA-4 for phenomenal consciousness. CPAP refuses to overclaim.

The honest summary

CPAP is a v0.1 draft. The composition isn't yet formally proven under Universal Composability. TEE manufacturer compromise is out of scope. Full LLM-inference ZK proofs remain impractical at production scale.

What CPAP does provide is the first end-to-end protocol that answers "did the agent decide this?" with evidence a verifier can check in milliseconds — and that is honest about where evidence stops being possible.