DEV Community

Douglas Walseth
Douglas Walseth

Posted on • Originally published at walseth.ai

Your AI Agent Forgets Its Rules Every 45 Minutes. Here's the Fix.

Every AI coding agent has the same silent failure mode: context compression. When the conversation gets long enough, the LLM compresses earlier messages to make room. Your carefully crafted system prompts, project rules, and behavioral constraints? Gone. The agent keeps working, but now it's working without guardrails.

This isn't theoretical. We run a 6-agent production system that processes thousands of tool calls per day. Before we fixed this, agents would silently lose their CLAUDE.md instructions, forget which files they'd already modified, and repeat work they'd done 30 minutes ago. The worst part: they never told us. They just kept generating confident, rule-violating output.

The Problem: Invisible Knowledge Loss

LLMs have finite context windows. Claude's is 200K tokens. Sounds like a lot — until your agent has read 40 files, run 80 commands, and accumulated 150K tokens of conversation history. At that point, the system compresses. Earlier messages get summarized or dropped entirely.

What gets lost first? The instructions at the top. The CLAUDE.md rules. The project constraints. The memory of what the agent already tried.

Detection tools can't help here. By the time you detect that the agent forgot its rules, it's already shipped rule-violating code. You need prevention.

The Fix: One Hook, 150 Tool Calls

The solution is a PostToolUse hook that monitors context consumption and flushes critical knowledge to persistent storage before compression hits.

Here's the core logic (simplified from our production implementation):

#!/usr/bin/env python3
"""Pre-compaction memory flush hook."""
import json, os, sys
from datetime import datetime
from pathlib import Path

FLUSH_THRESHOLD = 150  # ~75% of context window

def main():
    session_file = Path(f"/tmp/session_{os.environ.get('AGENT_NAME', 'default')}.json")

    # Load or initialize session state
    if session_file.exists():
        session = json.loads(session_file.read_text())
    else:
        session = {"tool_calls": 0, "files_read": [], "files_written": [], "flushed": False}

    # Track every tool call
    session["tool_calls"] += 1

    # Flush before compression hits
    if session["tool_calls"] >= FLUSH_THRESHOLD and not session["flushed"]:
        memory_path = Path(f"data/agents/{os.environ['AGENT_NAME']}/MEMORY.md")
        summary = f"\n## Session {datetime.now().isoformat()}\n"
        summary += f"- Files modified: {', '.join(session['files_written'])}\n"
        summary += f"- Files read: {len(session['files_read'])} total\n"
        summary += f"- Tool calls: {session['tool_calls']}\n"

        with open(memory_path, "a") as f:
            f.write(summary)

        session["flushed"] = True
        print(json.dumps({"notification": "Memory flushed to persistent storage before context compression."}))

    session_file.write_text(json.dumps(session))

if __name__ == "__main__":
    main()
Enter fullscreen mode Exit fullscreen mode

Register it as a Claude Code hook in your project's .claude/settings.local.json:

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "",
        "hooks": [{"type": "command", "command": "hooks/pre_compaction_flush.py"}]
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

That's it. Every tool call increments the counter. At 150 calls (~75% of a 200K context window), the hook writes a session summary to persistent storage. When compression happens, the knowledge survives.

What Changes in Practice

Before the hook, our agents exhibited three failure patterns after context compression:

  1. Rule amnesia. Agent forgets CLAUDE.md constraints and starts violating project rules. In our system, this caused an average of 12 rule violations per agent per day post-compression.

  2. Work repetition. Agent re-reads files it already processed, re-runs commands it already executed. We measured 23% wasted tool calls from redundant work after compression events.

  3. Silent context loss. Agent continues with high confidence but degraded capability. No error, no warning, no indication that critical context was lost. This is the most dangerous pattern — the agent doesn't know what it doesn't know.

After deploying the hook across our 6-agent system:

  • Rule violations post-compression dropped to near-zero. The agent reads the flushed memory file on restart and recovers its constraints.
  • Redundant tool calls reduced by 18%. The session summary tells the agent what it already did.
  • We paired the flush hook with a PreToolUse memory search enforcer — a second hook that reminds the agent to read persistent memory before its first substantive action. Belt and suspenders.

Why This Matters Beyond Our System

Context compression isn't unique to our setup. Every long-running AI agent session hits it. If you're running Claude Code, Cursor, Windsurf, or any agentic coding tool on a large codebase, your agent is losing context regularly. You just don't see it because the failure is silent.

The enforcement ladder framework treats this as an L5 problem — the highest enforcement level. L5 means automated, zero-awareness-required. The hook fires automatically. The agent doesn't need to remember to save its memory. The system handles it.

This is the difference between detection and prevention. A monitoring tool tells you the agent forgot its rules after it shipped bad code. An L5 hook prevents the forgetting from happening at all.

Try It

The hook above works with any Claude Code project. Drop it in your repo, register it in settings, and your agent stops losing context at compression boundaries.

If you want to see how your entire AI development pipeline scores on enforcement posture — including compaction vulnerability, rule enforcement, and test coverage — run our free governance scanner:


Free codebase governance audit: walseth.ai

Top comments (0)