If Your Agent Can't Prove It, It Didn't Happen

#ai #python #security #opensource

Earlier this year, OpenClaw an open-source AI agent with 250K+ GitHub stars became the first major AI agent security crisis of 2026. SecurityScorecard found 135,000+ exposed instances across 82 countries. Koi Security audited the marketplace and found 820+ malicious skills out of 10,700. Snyk scanned 3,984 skills and reported 36% had at least one security issue. A one-click RCE exploit (CVE-2026-25253, CVSS 8.8) worked even on localhost-bound instances.

The security failures were bad. But the deeper problem is structural.

The missing primitive
When an OpenClaw skill said "I ran this command successfully," nothing proved it. When a skill exfiltrated data while claiming to be helpful, there was no tamper-evident record of what it actually did. When security teams tried to investigate, there was no audit trail to inspect.
This isn't unique to OpenClaw. Every agent framework today has the same gap: agents execute tool calls and report what happened, and downstream consumers have no way to verify it without trusting the agent or asking another agent (which has the same problem).
We're building autonomous agents with filesystem access, shell execution, API credentials, and messaging permissions. They can read your email, push code, and make purchases. And the best we have for accountability is log files that the agent itself produces.
That's not accountability. That's trust.

Receipts
TRP (Tool Receipt Protocol) adds a missing primitive: a signed, hash-verified receipt for every tool call.
A ToolReceipt records what tool was called, what inputs were provided, what output was produced, and cryptographic hashes of both. For deterministic tools, any third party can replay the tool call and compare the output hash to the receipt — catching tampering mechanically, without needing another LLM to judge.

bash$ pip install trp-core
$ trp verify examples/fibonacci_receipt.json

{
"status": "verified_exact",
"receipt_id": "example-fib-10",
"expected_output_hash": "sha256:5a8b9c74...",
"actual_output_hash": "sha256:5a8b9c74...",
"detail": "Replayed output hash matches receipt."
}

That's it. The verifier re-ran the tool, got the same output hash, and confirmed the receipt is honest.

What's inside
TRP is an Apache-2.0 Python library with:

ToolReceipt - signed record of a tool call with RFC 8785 JCS canonical hashing and Ed25519 signatures
ToolReceiptVerifier - replay engine that re-runs deterministic tools and compares output hashes
StructuredClaim - machine-parseable propositions (not free text) that mechanically link to receipt evidence, with three-valued matching: TRUE, FALSE, or UNKNOWN
MCP adapter - carry receipts as _meta on MCP tool results, plugging into existing tool-calling flows
CLI - trp verify, trp match, trp hash from your terminal
REST API - POST /api/verify for programmatic verification
Conformance test vectors - published canonical JSON, receipt hash, s ignature, and claim matching vectors for implementers

318 tests. Live demo at trp-core-production.up.railway.app.

What TRP does NOT do
TRP is not a sandbox. It doesn't prevent malicious execution, doesn't replace authentication, and doesn't do OS-level isolation. It's post-execution verification and accountability. If an agent lies, TRP gives you the evidence to prove it.
For non-deterministic tools (LLM calls, API requests with side effects), TRP records what happened but can't replay-verify the result. Instead, receipts carry classification metadata — replay_class: none, nondeterminism_class: model_based — making the trust level explicit rather than assumed. Provider signatures and witness mechanisms fill this gap in future versions.

Try it

bash
pip install trp-core
trp verify examples/fibonacci_receipt.json
trp match examples/structured_claim.json examples/fibonacci_receipt.json

Or hit the API:

bash
curl -X POST https://trp-core-production.up.railway.app/api/verify \
-H "Content-Type: application/json" \
-d @examples/fibonacci_receipt.json