I didn't set out to build a self-healing system. I just wanted to stop forgetting things. What emerged was a feedback loop that fixes itself without me.
After 40+ Claude Code sessions, I noticed a pattern: identify a problem → write a rule → forget about it → rediscover the same problem 5 sessions later → write the same rule again.
The rules folder was growing. But behavior wasn't changing.
The Alertmanager Problem
Prometheus Alertmanager has a concept that changed how I think about systems: alert → acknowledge → fix → auto-resolve.
Without the "auto-resolve" step, alerts pile up. You get a graveyard of acknowledged-but-never-fixed issues. They're not "open" — they're dead. Nobody looks at them. Nobody fixes them. They just sit there, eroding trust in the entire monitoring system.
My AI config had the same problem. Problems were identified. Rules were written. But nothing tracked whether the rules were actually followed — or whether the problem was actually fixed.
The missing piece wasn't more rules. It was a closed loop.
The Accidental Architecture
Three files. That's all it took to close the loop:
config-health.py (Stop hook)
↓ counts markers, detects gaps
pending-verifications.md (persistent memory)
↓ surfaces unresolved items
BODY.md (startup rules, read every session)
↓ tells AI to focus on pending items
↓ AI follows the rule, produces markers
config-health.py (next Stop hook)
↓ detects markers → auto-deletes from pending
The circuit:
-
Measure: config-health.py greps the transcript for rule markers like
[✓THINK]and[✓CONTEXT] -
Remember: rules with low execution rates get written to
pending-verifications.md— they persist across sessions -
Surface: BODY.md reads
pending-verifications.mdon startup — the AI sees "these rules need attention" - Re-check: next Stop hook, config-health.py counts again — if the rule is now being followed, auto-delete from pending
I didn't design this loop. It emerged from three independent decisions:
- Rules should be self-verifiable (embed markers in rule text)
- Monitoring should be silent on success (zero tokens on normal path)
- Unresolved issues should persist across sessions (write to file, read on startup)
Each decision solved an immediate problem. Together, they formed a complete circuit.
The Self-Healing Part
Here's the part I didn't expect: the loop runs without me.
I don't check dashboards. I don't review reports. I don't maintain a tracking spreadsheet. The system surfaces issues when they exist and silences them when they're fixed. My only job is to follow the rules when BODY.md tells me to.
Last week's session: BODY.md surfaced "THINK rule execution: 40% (target: >70%)." I paid extra attention. Hit [✓THINK] 12 times. Next session, pending-verifications.md was clean — config-health had auto-deleted the entry.
I didn't manually "close" anything. The loop closed itself.
What Makes This Work
Three properties that any self-healing loop needs:
- Measurement is mechanical, not judgment. Regex counting doesn't get tired, doesn't rationalize, doesn't say "eh, probably fine." It counts or it doesn't.
-
Memory survives sessions.
pending-verifications.mdis a file on disk. It doesn't disappear when the AI's context window resets. Problems can't hide by waiting you out. - Re-check is automatic. You don't have to remember to verify. The hook runs every session. Fixed problems auto-resolve. Only real issues stay visible.
The General Principle
Closed loops don't require complex architecture. They require a measurement, a memory, and a re-check. Three files, if they form a complete circuit, are enough.
This is the same pattern that makes TDD work (write test → run test → fix → re-run). The same pattern that makes CI work (push → build → test → report). The same pattern that makes any feedback system work.
The insight isn't new. What's new is applying it to personal AI configuration — the rules and habits that govern how you work with AI assistants.
Try It Yourself
Take one rule in your AI config. Add these three pieces:
-
A measurement:
grep -c '\[✓YOUR_RULE\]' transcript.jsonl - A memory: write the count to a file that gets read on startup
- A re-check: run the same grep next session, compare
That's your first closed loop. If you get the circuit right, it might start self-healing without you noticing.
The loop lives in delivery-gate (the mechanical layer) and checkgrow (the methodology layer). Both MIT licensed, both open to contributions.
This article is part of the *Engineering Trustworthy AI Output** series:*
- Part 1: I Made My AI Rules Self-Verifiable — Here's How
- Part 2: Your AI Agent's Best Work Produces Zero Output — And That's the Point
- Part 3: I Built a Closed-Loop Self-Healing System for My AI Config — By Accident ← you are here
Top comments (0)