DEV Community

Cover image for Your agent stops obeying your rules halfway through the session. Here's the structural reason — and the fix.
Anisa
Anisa

Posted on

Your agent stops obeying your rules halfway through the session. Here's the structural reason — and the fix.

You've seen this one. You give your coding agent a clear rule: "Don't edit files I didn't ask you to touch." It behaves. Twenty minutes and a dozen tool calls later, it edits the three files you never mentioned, and you find out when the test suite goes red.
The rule never left the prompt. The agent just stopped following it. No error, no warning — it quietly aged out.
This isn't a smarts problem and it isn't bad luck. It's structural, it's measurable, and once you see the mechanism you can stop it for good. Here's the failure, the fixes that don't work, the real cause, and the rebuild.
The failing example
Say your CLAUDE.md (or .cursorrules — same story) opens with:

Never edit files outside the src/payments/ directory without asking first.

Early in the session the agent is perfect. It asks before it strays. Then you send it on a multi-step task — read these files, run the tests, fix the failures, refactor the helper. Each step piles tool calls and output into the context. Around the fifteenth tool call it edits src/auth/session.ts to "fix an import" — a file you never put in scope — and keeps going like nothing happened.
A developer on the Cursor forum said it cleaner than I can: "If we can't trust it to use our rules, how can we rely on any output?"
The fixes that don't work
Most people reach for one of three, and all three fail for the same reason.

  1. Write the rule louder. Bold it. ALL CAPS. Add "IMPORTANT" and three exclamation points. This feels like it should help. It doesn't — stronger wording is subject to the same attention dynamics as the original line (codeongrass, 2026).
  2. Repeat the rule five times in the system prompt. Now you've spent more of the prompt and bought a few more tool calls of compliance before the same decay sets in.
  3. Buy a bigger context window. The most expensive non-fix. Chroma's Context Rot study (July 14, 2025) tested 18 frontier models — GPT-4.1, Claude 4, Gemini 2.5, Qwen3 — and every one got less accurate as the input grew, with the steepest drop between 100K and 500K tokens. A million-token window doesn't fix this; it just delays the point where you notice. The structural cause The model attends to every token in context, but not equally. Attention is shaped by recency and density. Your rule was written once, at the very top. Since then the context has filled with tool calls, file dumps, and the agent's own reasoning — thousands of tokens, all newer than the rule. Past roughly fifteen tool calls, the system prompt is a small, old island in a sea of recent tokens. Its effective weight drops and the constraint stops steering. One engineer documented the threshold directly: Claude agents reliably break system-prompt constraints past about fifteen tool calls, from attention dilution at high context depth — format requirements ignored, approval gates skipped, autonomy rules bypassed, all with no error (codeongrass, 2026). It scales past coding, too. One widely-cited 2025 analysis attributed nearly 65% of enterprise agent failures to context drift and memory loss during multi-step reasoning — not to the model being incapable. The failures are silent: the agent produces confident, plausible output while running on a rule it has effectively forgotten. So the rule isn't being disobeyed. It's being out-voted by everything you've added since. The rebuild If the problem is that the rule is old and far away, the fix is to make it recent and close — at the exact moment it's needed. Stop treating the rule as something the model must remember. Make it a step the model must perform, re-stated right before the action that tends to break it.
  4. Move the constraint to the point of action. Before: # CLAUDE.md Never edit files outside src/payments/ without asking. After: # CLAUDE.md Before EVERY file edit, output this gate and stop if it fails:
    • Files in scope:
    • File I'm about to edit:
    • In scope? If no, ask before proceeding. The rule is now the most recent thing in context at the moment of the decision, not the oldest.
  5. Make the good behavior a required action, not a forbidden one. "Don't edit out-of-scope files" is a thing not to do — easy to drop. "State the file and confirm it's in scope before editing" is a thing to do — it leaves an artifact you can see in the transcript. Forbidding decays; requiring leaves a trace.
  6. Re-inject at the boundary. If your harness allows it, repeat the one rule that matters in the last user turn before a risky step. The last message gets the highest effective attention — use that slot for the constraint you can't afford to lose. It's the same reason "put the important instruction at the end" keeps working. The pattern underneath all three: don't ask the model to hold the rule across a growing context. Re-present the rule at the moment of the decision, every time. Same model, different shape — the rule holds at tool call fifty the same as tool call five. The 4-point checklist

Is every must-hold rule a step it performs, not a line it remembers? Stated once at the top = it will age out.
Does the rule live next to the action it governs? Distance from the decision equals decay.
Is compliance visible? A rule the agent has to answer for in the transcript beats a rule it silently should follow.
Did you test past tool call fifteen? Rule-drop mostly shows up deep in a session. A five-call demo proves nothing.

One more thing
If you've got a prompt that behaves at the start and drifts by the end, that's the kind of failure I rebuild. I send one short teardown a week — a real flaky prompt, the structural reason it broke, and the rebuild that holds:dm me.

Top comments (0)