I deleted forty rules from our harness last week. The agent got better.
I have written before about the flywheel that adds rules: reviews surface patterns, patterns become rules, rules go in CLAUDE.md, the agent stops making that mistake. The cycle is satisfying because every turn produces something. The harness gets denser, the agent gets sharper, the team gets more out of every piece of feedback.
What nobody talks about is what happens to those rules six months later.
The story is the same in every team that has run the flywheel long enough. The CLAUDE.md crossed 800 lines. The agent started ignoring some rules and overweighting others. New contributors stopped reading the file because it was too long to absorb. I started writing the same rule twice, in two different sections, with two slightly different formulations, because I had forgotten the first one. The flywheel was still spinning. The harness was getting worse.
The fix is obvious in retrospect. Rules have lifecycles. They are born, they mature, they go stale, and they should be deleted. A harness that only ever grows is a codebase that only ever grows; it ends the same way.
How a rule is born
The good ones come from incidents. Something went wrong, you traced it to a behavior, you wrote a rule that would have caught it. The rule has a clear origin and a clear test: would this rule, applied to the agent's behavior, have prevented the incident.
A surprising number come from the opposite direction: an aspirational rule, written from taste rather than from a real failure. "Always prefer composition over inheritance." "Never use abbreviations in variable names." These rules are not wrong, but they are not battle-tested either. They are a wish list dressed as a rulebook, and they tend to age badly because nobody can remember whether the agent is actually violating them.
The test I use: if I cannot point to the commit, PR, or incident that produced the rule, the rule is on probation. It may turn out to be useful. It is more likely to be noise.
When a rule is doing its job
A mature rule has two properties. The agent respects it without prompting, and the team can name the failure mode it prevents. The first is observable; the second is institutional memory.
Mature rules are quiet. They do not fire during reviews because the agent does not produce the violating pattern in the first place. The temptation, looking at a quiet rule, is to assume it is dead. That is exactly wrong. A mature rule is doing its work invisibly. You only know whether it is dead by removing it and watching what comes back.
This is the discipline most teams skip. They look at a rule, do not remember what it does, and either leave it (the harness grows) or delete it without checking (the harness regresses). Neither is the practice. The practice is to remove it deliberately, run a representative task, see whether the agent still does the right thing, and then decide.
I call this the dead-rule test. It is the only honest way to know whether a rule is still pulling its weight.
How a rule goes stale
Three shapes show up over and over.
First, the code the rule was guarding gets deleted or refactored away. A rule about how to extend a particular base class is irrelevant once the base class is gone. The rule sits in CLAUDE.md, the agent reads it, the agent tries to apply it to code that no longer exists, the agent gets confused. The rule is now actively harmful.
Second, a newer rule replaces or contradicts the older one. Teams accumulate two rules that say almost the same thing in different words, or worse, two rules that point in opposite directions. The agent picks one, sometimes the wrong one, often inconsistently across sessions. The team does not notice because no individual review surfaces the contradiction.
Third, the rule was always wrong. It was added in a moment of frustration, codified a bad lesson from a single incident, or reflected a preference one person held and nobody else does. These rules are the easiest to delete and the hardest to find, because nobody wants to admit they wrote a rule that did not pay rent.
How a rule should retire
The act of deleting a rule is anticlimactic. Three lines vanish from CLAUDE.md, the agent's context is two hundred tokens lighter, and life goes on. The discipline is not in the delete itself; it is in the audit that produces the delete.
I run an audit once a month. The shape is simple. Read the harness top to bottom. For each rule, ask three questions: do I remember why this rule exists, is the pattern it prevents still possible in the current codebase, and would removing it change anything I have observed in the last month? Any rule that fails two of the three is a candidate. Any rule that fails all three is gone.
There is no mechanism to comment in CLAUDE.md, so for the cautious: either rely on the git history or create a separate markdown file for the deleted rules that is not read by Claude. If the agent's behavior degrades, restore the rule and learn something about what it was actually doing. If nothing changes, delete the rule entirely.
The audit is the part that almost nobody does. It is uncomfortable because it requires admitting that the rule you wrote in March was not as load-bearing as you thought. It is necessary because the alternative is a harness that nobody can read.
The second flywheel
The original flywheel adds rules. The second flywheel, the one I am arguing for, removes them. They have to run together or the system breaks.
The way I think about it: every rule has a cost in tokens and in attention. The agent reads the harness on every session. The team reads the harness when onboarding, when reviewing changes to the harness itself, when investigating an agent failure. Every line of CLAUDE.md is a tax that the rule has to be worth paying. The audit is how you stop paying taxes on rules that no longer return value.
Two flywheels, opposite directions, both running.
That is the shape of a healthy harness. Not the size of it.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.