Here's a pattern I kept hitting with coding agents (Claude Code, Cursor, doesn't matter which):
Session 1: the agent fixes an auth module. Runs the tests. 42/42 pass. Great.
Session 2, next morning: a different agent (or the same one, fresh context) touches the same area. It doesn't know the tests passed yesterday. So it re-reads the file, re-runs the suite, re-derives the same conclusions — burning a few thousand tokens to rediscover state that already existed.
Multiply that by every file, every session, every agent in a multi-agent setup. Agents have no memory for what's been verified. Everything is Groundhog Day.
The fix: stamps as cached verification
A stamp is ~15 tokens of JSON that travels with the file (YAML frontmatter in markdown, XMP in images, OOXML part in docx, sidecar for everything else):
bash
# Session 1 — agent finishes, stamps what it verified
$ akf stamp auth.py --agent claude-code --evidence "42/42 tests passed"
# Session 2 — any agent, any tool, checks before building on it
$ akf check auth.py
OK trust=0.65 agent=claude-code evidence=test_pass age=1d claims=1
One line, ~20 tokens to read, exit codes for scripting (0 = trust it, 1 = re-verify, 2 = no metadata). The math that made me build this: a stamp costs ~15 tokens; re-verifying costs 15,000.
The parts that turned out to matter
Evidence is weighted, not binary. A bare "AI generated this" stamp scores low no matter how confident the model claims to be — that's by design, it's unverified inference. A test-run receipt lifts it. A human review lifts it more:
$ akf stamp report.md --agent claude-code # bare
$ akf check report.md
LOW trust=0.21 reason=below_threshold # don't trust it
$ akf stamp report.md --evidence "tests pass" # verified
$ akf check report.md
OK trust=0.65 evidence=test_pass
Staleness is mechanical. If the file changed after it was stamped, the stamp no longer describes the file:
$ echo "quick edit" >> auth.py
$ akf check auth.py
STALE trust=0.65 reason=modified_after_stamp # exit 1 → re-verify
Memories decay. This one surprised me with how useful it is. Agent memory files get a trust half-life — a fact stamped three months ago about a since-refactored codebase shouldn't rank like one from yesterday:
$ akf stamp memory/project-facts.md --preset memory --agent claude-code
# fresh: OK trust=0.75 age=2d
# two months: LOW trust=0.17 age=61d → agent re-verifies instead of trusting a stale memory
Zero-effort mode. Nobody stamps by hand at scale, including me. akf init wires a git post-commit hook and a Claude Code hook so every file an agent writes gets stamped automatically.
What I'll be honest about
- Stamps are claims, not proofs. An agent can lie in a stamp exactly like a developer can lie in a commit message. There's Ed25519 signing for the who, but "were the tests actually run" is an accountability trail, not a cryptographic guarantee.
- Staleness is mtime-based right now, which false-positives on fresh git checkouts. Content-hash comparison is the next fix.
- It's a metadata format, so it has the standard chicken-and-egg problem. The single-player value (your own agents across your own sessions) has to carry it until the network effect exists — that's exactly why check had to be worth running selfishly.
Try it
pip install akf # or: npm install akf-format
akf init # hooks for git + Claude Code
akf quickstart # 60-second demo
There's an MCP server (https://pypi.org/project/mcp-server-akf/) (works with Claude Desktop/Code, Cursor, any MCP client), a GitHub Action (https://github.com/marketplace/actions/akf-certify) that posts trust reports on PRs, and the spec + SDKs on GitHub (https://github.com/HMAKT99/AKF). MIT licensed.
I'd genuinely like to hear where this model breaks — especially from anyone running multi-agent pipelines.
Top comments (1)
The 15-tokens-vs-15k economic framing is the right lens, and staleness-by-file-hash is the clever half. The gap I would poke at is transitive invalidation. You stamp auth.py green, a helper it imports changes, auth.py's own bytes never move, so the stamp stays OK while the thing it verified is no longer true. Content-hashing one file catches direct edits but not the dependency closure.
Two options that worked for us on a similar cache: hash the import set into the stamp so a changed dependency flips it stale, or fall back to a coarse mtime-of-any-dep check when you can't resolve imports. Also curious how you stop stamp inflation, since the cheapest way for an agent to earn a green check is to run a weak test and stamp evidence=test_pass. The weight has to key off coverage or it turns into a rubber stamp.