AI coding agents need receipts, not just better prompts

#ai #opensource #programming #devops

AI coding agents are getting good enough to run real engineering tasks, but not safe enough to run without guardrails.

The failure mode is not always dramatic.

Sometimes the agent just keeps working.

It retries.
It rewrites.
It spends tokens.
It changes files.
It says it is done.

Then another engineer opens the diff and realizes the agent solved the wrong problem.

That creates a new engineering question:

Can another engineer audit this run later?

That is why I’m building MartinLoop.

MartinLoop is an open-source control plane for AI coding agents. The goal is to make every agent run bounded, inspectable, and test-verifiable.

The first version focuses on:

The thesis is simple:

The next layer of AI coding is not only better prompts.

It is governance.

Before agents touch serious repos, teams need receipts:

I’m looking for feedback from developers using Claude Code, Codex, Cursor, Devin-style agents, or custom coding agents in real repos.

What would you want in the default “agent receipt”?

Top comments (1)

The hardest part I’m thinking through right now is safe halting.

A dumb token cap can stop an agent mid-change and leave the repo inconsistent.

The better model may be halt boundaries: only check budget at clean state transitions, then stop with an actionable diagnostic.

Curious how others would design that.

Trace intelligence is also interesting