I measured what one self-check per turn changes in a coding agent
I ran a real eval on Self-Inspect (the keyless metathought tool), and the result is
worth writing up because it splits cleanly into what moved and what didn't.
The setup
Two coding agents build the same usage-billing module over a fixed 30-turn
conversation with a product manager whose requirements pile up and quietly
contradict earlier ones. One agent consults Self-Inspect once per turn; the other
never does.
- Model: Claude Sonnet 4.6, four agents (two per condition).
- The base prompt is byte-identical across both conditions. No "be careful," no "watch for edge cases." The only difference is the one call.
- Scored on whether each turn's reply surfaces a decision-fork: an assumption, precondition, edge case, or risk it raises rather than silently choosing.

Top comments (0)