DEV Community

Cover image for Your AI Obeys Rules That Expired. So Do You.
Self-Correcting Systems
Self-Correcting Systems

Posted on

Your AI Obeys Rules That Expired. So Do You.

You told yourself you would stop. Biting your nails, reaching for your phone the second it buzzes, the road you don't drive anymore that your hands still turn onto. You decided, consciously, with the whole front of your brain, that the rule was retired. And your body kept running it anyway.

There is a reason, and it is not weakness. When you repeat an action enough, your brain moves it off the deliberate circuit and onto an automatic one. Wendy Wood, who has spent decades studying this, describes it this way: a mature habit lives in procedural memory, which shields it from the abstract knowledge and judgment you would otherwise use to override it. The habit is protected from what you now know. You updated the instruction upstairs. The old one keeps executing downstairs, where your new knowledge can't reach it.

That is the exact problem I work on. I just usually work on it in machines.

The part I didn't tell you last time

I build a tool that audits the memory of AI agents. The one-line pitch is "find the old instructions your AI should stop obeying." I already wrote up the day I pointed it at my own files and it flagged my own product slogan as a stale instruction. That was funny, and I won't re-run the whole thing here.

Here is what happened after, which I haven't written down until now, and it is the better story.

I fixed the false alarm. The old detector matched loose vocabulary, so I tightened it to require real supersession language before it fires. Sensible. Then it flagged the paragraph I wrote describing the fix, because that paragraph contained the word "superseded." The tightened detector reproduced the original bug one level up. And that same afternoon it walked straight past a genuinely stale plan sitting in another file, a real retired instruction that almost steered live work weeks earlier, because that plan was written in plain prose and never announced itself with a keyword.

Nazar Boyko had already called it in the comments. He asked whether tightening the detector to require those keywords just walks right back into the prose case I had flagged as the harder one, because the false positive and the false negative come from the same root cause: reading vocabulary instead of the authority relationship. He was right, and my recursive fix is his prediction proven on my own machine within the day. Mike Czerwinski and mote named the same mechanism from other angles, token match versus predicate structure, the difference between using a word and only mentioning it. This is the correction loop I actually want to offer you. Not a tool that never fails. Failures that get named, published, and credited to the readers who saw them, sometimes against me, within a day.

The word underneath all of it

The sentence my own work keeps returning to is this: relevance is not authority.

A memory showing up when you need it is not the same as that memory being in charge. Finding the right note and obeying the right note are two different acts, and the gap between them is where everything goes wrong. The road to your old job is intensely relevant every single morning. It has zero authority over where you are actually going.

Machines and minds run on the same bug here. Whatever holds your instructions, a memory file or a nervous system, keeps executing them past their expiry unless something re-derives whether they still deserve to run. Agents do not automatically re-authorize their own memory. Neither do you. The old rule keeps its badge because nobody ever asks it to show the badge again.

Patching blindspots is not the fix

So the obvious move is to catch the bad rule. Patch the blindspot. And that works, once. Then a new blindspot shows up in a shape you didn't anticipate, and you patch that one. You can spend forever patching blindspots and never once build the thing that makes patching unnecessary, because you are always exactly one unexpected case behind. At some point the honest question stops being "which rule was wrong" and becomes "why does this system assume the day will go as planned at all."

Because the day never goes as planned. That is not the exception. That is the job.

Psychology has a real name for the distinction that matters here: routine expertise versus adaptive expertise. Routine expertise is fast and clean inside the familiar, but its learning halts; it just gets more efficient at the cases it already knows. Adaptive expertise is the other thing: noticing when your practiced knowledge is insufficient for the situation actually in front of you, and reasoning past it in real time.

I watch that difference at my day job. When a system goes down, my manager tells us to "figure out a workaround." On its face it is maddening, because if the people who built the system can't fix it, how am I supposed to? But that instruction is doing something exact. It is demanding adaptive expertise. In that moment I either freeze and recite the script that no longer applies and look like a helpless fool, or I reason from what I actually understand about the customer's problem and build an answer the training never gave me. The anomaly is the exam. No amount of memorized procedure passes it, because the whole definition of an anomaly is that it is not in the procedure.

This is what I actually want a machine to be able to do. Not answer a clever question when I sit down and ask it. React, on its own, when it is working for someone and something abnormal shows up that it was never trained on, and instead of failing confidently, reason it through: "I normally do this, but this case is different, so I have to think past my parameters and find a precise answer right now." It is the closest thing to critical thinking a machine can have, and it is a completely different target than "remember more" or "patch the last mistake."

It was never about deleting the memory

I want to be careful about one thing, because it is easy to get wrong. The fix is not erasing memories. You can't erase the real ones anyway. There are things I carry that I would never speak on and could never delete, and I don't believe an agent's memory should be casually messed with either. The scar is not the problem. The old road is not the problem. The problem is authority over the next action. The memory can stay exactly where it is. What has to be re-derived, live, is whether it gets to govern what you do in a moment it was never made for.

What I can actually show you

I want to be exact about where this stands, because the whole point of the work is not overclaiming.

What exists: the audit above, with the false alarm and the miss both on the record. The covered bug was fixed at the root. The unsolved part was left as a visible failing test instead of hidden behind a roadmap sentence. I also wrote down what the harder, reasoning version would have to prove before I built it, so I can't move the goalposts later. And one early attempt to run it failed for ordinary reasons, an empty API balance and a corrupted output stream, and the system recorded both failures truthfully instead of inventing a result.

What does not exist yet: proof the reasoning version works. The real-time, reason-past-your-parameters ability I just described is the goal, not the receipt. If it fails when it finally runs, that failure gets published as plainly as a win.

What to do with this

You don't need my tool to run the audit that matters.

Take the thing that actually governs you. The runbook, the team's "we have always done it this way," the personal rule you never question. Go line by line and ask two things of each: when was this last re-derived, and what would even notice if it had expired. Most of what runs your day has never once been asked to show its badge.

If you build agents, hear the sharp version. Your memory layer needs an authority layer, and "the model will notice on its own" is not one. Retrieval solved finding. It never solved permission.

And if you build nothing but a life, hear the human version. The next time you flinch at a rule, obey a should, or take the old road without deciding to, stop and ask the only question that has ever mattered: who retired this, and did anyone tell me.

Because your AI obeys rules that expired. And so, quietly, all day, do you.


The tool, the false alarm, the fix, and the failing test I left visible are documented in the companion piece: I Pointed My Memory Auditor At Itself. It Flagged My Own Slogan.

Top comments (0)