I Made My AI Rules Self-Verifiable. Now They Catch Their Own Violations.

#ai #claudecode #productivity #tdd

Series: Building AI Systems That Verify Themselves

Part 1 (you are here): Self-verifiable rules

Part 2: Closed-loop self-healing

Part 3: Why zero output is the goal

The problem wasn't that my rules were bad. The problem was I had no idea if they were being followed.

I had 15 rules for my AI assistant. "Think before acting." "Check context before asking." "Capture learning after complex tasks." Good rules. Important rules.

After 30 sessions, I finally checked: my "think before acting" rule had been followed exactly zero times.

Not because the AI was ignoring it. Because the rule was a wish — not a verifiable constraint. It had no teeth.

The TDD Insight Nobody Applied to AI Config

Test-Driven Development taught us a hard lesson: requirements without assertions aren't requirements. A test that always passes is useless. But a requirement without a test is just a wish.

Every software team learned this. CI pipelines enforce it. Linters check it. But AI configuration rules? We write them like sticky notes on a monitor — good intentions with no verification mechanism.

I realized my rules were all wishes. Every single one.

The Fix: Embed the Assertion in the Rule

Here's the before and after:

# Before (wish):
Think before acting on any request.

# After (verifiable):
Think before acting → mark [✓THINK] when done.

That [✓THINK] isn't decoration. It does two things simultaneously:

For the AI: the marker creates a binary action — "did I think or didn't I?" No ambiguity.
For the tooling: grep -c '\[✓THINK\]' transcript.jsonl becomes a health metric. No AI required.

The marker is the assertion. The grep is the test runner. And the rule text itself is the specification — self-contained, self-verifying.

The Three-Line Architecture

The whole system boils down to three lines of logic:

Rule text     → contains its own success criteria ([✓MARKER])
Transcript    → machine-readable evidence (regex-countable)
Health script → mechanical verification (no AI inference)

I wrote config-health.py — 350 lines of Python, zero dependencies, stdlib only. It runs as a Claude Code Stop hook. Every session, it:

Greps the transcript for rule markers ([✓THINK], [✓CONTEXT], [✓DELIVERY])
Writes execution rates to a JSONL log
Manages a pending-verifications.md file — rules that haven't been followed show up as pending items

The script never blocks. It only reports. Exit code 0, always. But the data it produces changes behavior — because I read pending-verifications.md on next startup.

What Changed

After deploying this: my rule execution rate went from "I have no idea" to tracked visibility across every session. I still miss rules sometimes. But now I know when I miss them. The dashboard surfaces exactly which rules need attention.

The surprising part: the markers themselves changed how the AI behaves. [✓THINK] is a concrete action — not a vague "consider the problem," but a specific thing to output. The AI hits it more often because it's clearer what "done" looks like.

The General Principle

Any rule without an embedded success criterion is a wish. Wishes don't execute.

This applies beyond AI config. Your team's code review checklist? If "check for security issues" doesn't have a verifiable step, it's not happening. Your personal productivity system? If "review weekly goals" isn't measurable, you're not reviewing them.

Verifiability isn't about distrust. It's about closing the loop between intention and evidence.

Try It Yourself

Pick one rule in your AI config. Add a verification marker. Write 5 lines of grep to count it:

grep -c '\[✓YOUR_MARKER\]' ~/.claude/transcripts/*.jsonl

You'll learn more from that grep output than from 10 new rules you never verify.

config-health.py is part of delivery-gate, a zero-dependency Stop hook for Claude Code. MIT licensed.

This article is part of the *Engineering Trustworthy AI Output** series:*

Part 1: I Made My AI Rules Self-Verifiable — Here's How ← you are here
Part 2: Your AI Agent's Best Work Produces Zero Output — And That's the Point
Part 3: I Built a Closed-Loop Self-Healing System for My AI Config — By Accident

This is Part 1 of a 3-part series. Next: I built a closed-loop self-healing system for my AI config — by accident

What unwritten AI rules do you wish you could make self-verifiable? Pick one, add a marker, and grep for it — you might be surprised what you find.

中文版：掘金/YuhaoLin2005yhl · Code on GitHub