Radoslav Tsvetkov

Posted on Jun 13

How to prove what your AI agent actually did (to someone who doesn't trust you)

#ai #rust #opensource #security

You let an AI agent edit your repo, run commands, call tools. The next morning someone asks: what did it actually do? And you realize the only honest answer you have is "trust me."

That answer does not survive contact with a client, a security reviewer, or an auditor. I got tired of it, so I built a tool that produces an answer a skeptic can check on their own machine. This is a walkthrough of how it works, with commands you can run in about two minutes.

The tool is soma, an open-source (Apache-2.0) runtime. But the interesting part is not the tool, it is the four ideas it stitches together. You could rebuild any of them yourself; the point of this post is the pattern.

The problem, stated precisely

Every big vendor shipped agent governance this year (Microsoft Agent 365, OpenAI Frontier, Google's agent platform). They are good at managing fleets. But the audit trail they produce lives inside the vendor's cloud. To believe the log, you have to believe the cloud that produced it.

That is fine until the person asking does not trust your cloud, your machine, or you. A freelancer cannot show a client their M365 tenant. A regulated team cannot put an American cloud in the trust path. An auditor wants to verify, not to take your word.

So the design goal is unusual: evidence that survives leaving your trust boundary. A record that a stranger can verify with tools they already have, no special software, no account.

The four ideas

1. Gate at the spawn boundary, not inside the agent

Agents are good at finding paths around guardrails you put inside the loop. So the check happens before the process exists.

# Install: one binary, zero dependencies to fetch.
git clone https://github.com/radotsvetkov/soma && cd soma
cargo build --release
alias soma=$PWD/target/release/soma

soma init --name demo --with-builtins

# Govern any agent CLI. The launch is checked before anything runs.
soma wrap --label fix-tests -- claude -p "fix the failing tests"

If the command matches a deny rule or the autonomy level forbids it, nothing spawns:

soma wrap -- sudo rm -rf /tmp/something
# refused, exit non-zero, and the refusal is recorded with the rule that fired

The key move: the agent never gets a chance to run, and the refusal itself is part of the record. You can prove later that something was blocked.

2. Hash-chain the log so tampering is detectable

Every action becomes a line in an append-only JSONL file. Each line carries the SHA-256 of the previous line. That makes the file a hash chain: change any line and every line after it no longer matches.

soma log verify
# recomputes the whole chain from scratch

Let's break it on purpose:

# edit one byte of history
sed -i.bak 's/governed/harmless/' .soma/events.jsonl

soma log verify
# TAMPERED at line 7: stored hash mismatch   (exits 1, names the exact line)

# restore it byte for byte
mv .soma/events.jsonl.bak .soma/events.jsonl
soma log verify
# chain valid again

This is the part that makes a demo land. You can see the tamper get caught at the exact line, and you can see the chain heal when you restore the byte.

3. Make the evidence verifiable without the tool

A hash chain you can only check with my tool is not much of a proof. So the export bundles everything a stranger needs, plus a VERIFY.md that spells out the checks in plain shell.

soma export
# produces a bundle in exports/ with manifest.json, events.jsonl, VERIFY.md

On the skeptic's machine, with no soma installed, two checks:

# 1. every file digest matches the manifest
shasum -a 256 events.jsonl   # compare against manifest.json

# 2. recompute the chain by hand: for each line, sha256 of the line with its
#    own hash field removed must equal the stored hash, and each "prev" must
#    equal the previous line's hash.

The verification cost is seconds, on their hardware, with shasum and a JSON reader. No trust in me required. That is the whole point.

4. Anchor the chain so even the operator cannot rewrite history

Here is the honest hole in everything above: I control the binary, the journal, and the machine. The hash chain proves the file was not casually edited. It does not prove I did not regenerate the whole thing.

That is what timestamp anchoring is for. soma submits the chain head to a public RFC 3161 timestamp authority and stores the response.

soma preset apply hybrid-default   # opt in to network (fresh projects are local-only)
soma anchor now                    # timestamps the chain head at a public TSA

Now anyone can verify the timestamp with stock openssl against the authority's certificate. After an anchor, I cannot backdate or rewrite history that crosses it without the public timestamps contradicting me.

What this proves: this exact journal state existed at time T, and everything after extends it. What it does not prove: that the events were true when written. Nothing prevents an operator writing fiction forward; everything prevents revising it after the fact. That is a much smaller and checkable claim than "trust my logs."

The honest limits (these are the point)

An evidence tool that oversells itself is broken on arrival, so:

wrap is not a sandbox. It gates the launch and records evidence. A wrapped agent can still do what your OS user can do. For hard isolation, run it under a container or a restricted user.
Anchoring proves no-rewrite, not truth. See above.
Single operator, single machine. No fleets, no agent identity, no RBAC. If you need to govern 10,000 agents across an org, the hyperscaler products are genuinely the right call. soma is for the case where you are the one who has to prove things.
Zero dependencies means hand-rolled crypto. SHA-256 (tested against the FIPS vectors), JSON, and HTTP are written out in src/. TLS deliberately is not; that goes through system curl. The trade is an audit surface you can read in a sitting, and cargo tree is one line. Reviewing the supply chain is reviewing src/.

Why bother

Nobody reads audit logs, which is the usual objection. True. The design does not optimize for reading; it optimizes for disputes. Nobody looks at the journal until something goes wrong, and then the only question is whether the record settles the argument or becomes another argument. A bundle that verifies in seconds on the skeptic's machine settles it.

If you run agents for clients, or compliance keeps asking you questions you can only answer with screenshots, this might be useful. The repo is github.com/radotsvetkov/soma, there is a desktop cockpit at github.com/radotsvetkov/cockpit, and I would genuinely like to hear where it breaks. The honest-limits list grows with every good objection.

What evidence do you wish you had the last time an agent touched something that mattered?

DEV Community