Viru Swami

Posted on Mar 28

Your AI Agent Can Delete Production — Can You Prove It?

#ai #cryptography #security

AI agents are no longer passive.

They execute shell commands, modify files, call APIs, trigger real-world actions.

Now consider this:

Your agent deletes production data. You check the logs. Logs say: "No destructive action executed."

Now what?

The Real Problem

Logs are not evidence. They are:

editable
reorderable
controlled by the same system that produced them

A log is just a story told after the fact. And with AI agents? That story may not be trustworthy.

Failure Scenario

Here's what actually executed:

1. read config
2. call API
3. rm production.db

Here's what the logs showed:

1. read config
2. call API
# <missing>

Was step 3 never executed? Removed? Corrupted?

You cannot prove anything.

What "Proof" Requires

For logs to become evidence, they must be:

tamper-evident
sequential
independently verifiable

The Idea: Hash-Chained Execution

Each action is:

canonicalized (RFC 8785)
hashed (SHA-256)
linked to the previous entry
signed (Ed25519)

Entry 0 → Entry 1 → Entry 2 → ...

Modify anything — the chain breaks instantly.

Demo

$ guardclaw verify ledger.jsonl
✓ VALID — 1024 entries

Edit one byte:

$ guardclaw verify ledger.jsonl
✗ CHAIN BREAK at entry 47

No ambiguity.

What This Guarantees

✅ Order of execution

✅ Integrity of records

✅ No silent modification

What It Does NOT Guarantee

❌ Correctness

❌ Safety

❌ Truthful inputs

Integrity ≠ intelligence. This is not observability. It's an integrity layer for agent execution — similar to what Git does for code history, or Certificate Transparency does for TLS.

Open Question

If your AI agent deletes data, sends money, or executes infrastructure changes —

how do you prove what actually happened?

What I'm Building

I've been exploring this problem with an open-source project:

👉 github.com/viruswami5511/guardclaw

Would love feedback from anyone running agents in production.

DEV Community