DEV Community

Viru Swami
Viru Swami

Posted on

Your AI Agent Can Delete Production — Can You Prove It?

AI agents are no longer passive.

They execute shell commands, modify files, call APIs, trigger real-world actions.

Now consider this:

Your agent deletes production data. You check the logs. Logs say: "No destructive action executed."

Now what?


The Real Problem

Logs are not evidence. They are:

  • editable
  • reorderable
  • controlled by the same system that produced them

A log is just a story told after the fact. And with AI agents? That story may not be trustworthy.


Failure Scenario

Here's what actually executed:

1. read config
2. call API
3. rm production.db
Enter fullscreen mode Exit fullscreen mode

Here's what the logs showed:

1. read config
2. call API
# <missing>
Enter fullscreen mode Exit fullscreen mode

Was step 3 never executed? Removed? Corrupted?

You cannot prove anything.


What "Proof" Requires

For logs to become evidence, they must be:

  • tamper-evident
  • sequential
  • independently verifiable

The Idea: Hash-Chained Execution

Each action is:

  • canonicalized (RFC 8785)
  • hashed (SHA-256)
  • linked to the previous entry
  • signed (Ed25519)
Entry 0 → Entry 1 → Entry 2 → ...
Enter fullscreen mode Exit fullscreen mode

Modify anything — the chain breaks instantly.


Demo

$ guardclaw verify ledger.jsonl
✓ VALID — 1024 entries
Enter fullscreen mode Exit fullscreen mode

Edit one byte:

$ guardclaw verify ledger.jsonl
✗ CHAIN BREAK at entry 47
Enter fullscreen mode Exit fullscreen mode

No ambiguity.


What This Guarantees

✅ Order of execution

✅ Integrity of records

✅ No silent modification

What It Does NOT Guarantee

❌ Correctness

❌ Safety

❌ Truthful inputs

Integrity ≠ intelligence. This is not observability. It's an integrity layer for agent execution — similar to what Git does for code history, or Certificate Transparency does for TLS.


Open Question

If your AI agent deletes data, sends money, or executes infrastructure changes —

how do you prove what actually happened?


What I'm Building

I've been exploring this problem with an open-source project:

👉 github.com/viruswami5511/guardclaw

Would love feedback from anyone running agents in production.

Top comments (0)