AI agents are no longer passive.
They execute shell commands, modify files, call APIs, trigger real-world actions.
Now consider this:
Your agent deletes production data. You check the logs. Logs say: "No destructive action executed."
Now what?
The Real Problem
Logs are not evidence. They are:
- editable
- reorderable
- controlled by the same system that produced them
A log is just a story told after the fact. And with AI agents? That story may not be trustworthy.
Failure Scenario
Here's what actually executed:
1. read config
2. call API
3. rm production.db
Here's what the logs showed:
1. read config
2. call API
# <missing>
Was step 3 never executed? Removed? Corrupted?
You cannot prove anything.
What "Proof" Requires
For logs to become evidence, they must be:
- tamper-evident
- sequential
- independently verifiable
The Idea: Hash-Chained Execution
Each action is:
- canonicalized (RFC 8785)
- hashed (SHA-256)
- linked to the previous entry
- signed (Ed25519)
Entry 0 → Entry 1 → Entry 2 → ...
Modify anything — the chain breaks instantly.
Demo
$ guardclaw verify ledger.jsonl
✓ VALID — 1024 entries
Edit one byte:
$ guardclaw verify ledger.jsonl
✗ CHAIN BREAK at entry 47
No ambiguity.
What This Guarantees
✅ Order of execution
✅ Integrity of records
✅ No silent modification
What It Does NOT Guarantee
❌ Correctness
❌ Safety
❌ Truthful inputs
Integrity ≠ intelligence. This is not observability. It's an integrity layer for agent execution — similar to what Git does for code history, or Certificate Transparency does for TLS.
Open Question
If your AI agent deletes data, sends money, or executes infrastructure changes —
how do you prove what actually happened?
What I'm Building
I've been exploring this problem with an open-source project:
👉 github.com/viruswami5511/guardclaw
Would love feedback from anyone running agents in production.
Top comments (0)