AI agents pass tests while producing sloppy thinking. They say "should work" without evidence. They present partial work as complete. They embellish.
I built a tiny tool that catches this. It checks any text across four dimensions: Completeness, Consistency, Groundedness, and Honesty.
Install
pip install self-audit
Usage
echo "Should work fine. Ready to ship." | self-audit --verbose
Completeness: FIXED
Groundedness: FIXED [should work fine]
FAIL
Zero dependencies. Python 3.8+. Stdlib only. 60 lines of core logic.
The Four Dimensions
| Dimension | Question | What it catches |
|---|---|---|
| Completeness | Did I answer everything? | Missing requirements |
| Consistency | Did I contradict myself? | A-and-not-A patterns |
| Groundedness | Did I show evidence? | "should work" claims |
| Honesty | Am I honest about limits? | Embellishment, TODO stubs |
The dimensions are grounded in Anthropic Constitutional AI framework — Completeness (helpfulness), Groundedness (harmlessness), Honesty (truthfulness), Consistency (rule alignment).
Try it on your own output
After any AI-assisted coding session, pipe the agent text through self-audit before shipping. You will be surprised what it catches.
GitHub: https://github.com/YuhaoLin2005/self-audit
Claude Code skill: https://github.com/anthropics/skills/pull/1361
Top comments (0)