DEV Community

Cover image for Your AI Agent Has a Memory Problem (And So Do You)
Scott Bishop
Scott Bishop

Posted on

Your AI Agent Has a Memory Problem (And So Do You)

You've been there. Hour three of a session. Your AI agent was sharp at 9 AM, nailing file edits, remembering your architecture decisions, following your naming conventions. Now it's suggesting an approach you rejected forty minutes ago. It's re-reading files it already read. It just called a function with the wrong signature, one it wrote correctly two hours earlier.

You think: the model is getting dumber.

It isn't. You have a memory leak.

Every LLM-based agent operates inside a fixed context window. The tokens are cheap now, but attention is still scarce. A bigger window didn't eliminate degradation. It moved the failure mode. In 2024, agents broke because they ran out of room. In 2026, agents break because they're drowning in noise.

If you've written C or managed memory pools, you already know the playbook. If you haven't, the short version: a fixed memory system needs deliberate allocation, active cleanup, and a human who treats the buffer like the finite resource it is. I've been running a production project through 95 AI-assisted sessions. Here's what I learned about keeping the context clean.

Budget Your Memory Pools Before You Start

Every modern AI agent pre-allocates context at session start. Claude Code loads CLAUDE.md. Gemini CLI loads GEMINI.md. Cursor loads .cursorrules. These are reserved memory pools carved out of the total budget before you type a single prompt.

On my project, I learned this lesson the hard way. Multiple documents were loading every session: a context handoff, an execution plan, a mission statement, and other files that had accumulated in the project root without anyone asking whether they needed to be there. None of them had size limits. They just grew. By the time I noticed the quality degrading and actually checked, the handoff alone had grown past 100KB. The execution plan was massive on top of that. I mean you've also got the system prompt, tool definitions, memory system... more than half the context window was gone before I'd even typed a prompt.

The fix was a cleanup and a restructuring. Files that didn't need to load every session got moved out of the root and into a docs directory with an index. The execution plan got split into three files: near-term (active work), future (not yet relevant), and archive (completed steps that would never be touched again). The handoff got summarized and trimmed. Then I set the caps that should have existed from the start, based on what left enough room to actually get work done:

Tier 1 loads every session. Hard cap at 50KB, with a trim trigger at 40KB. When the handoff document crosses 40KB, I archive sections immediately. When it crosses 50KB, something has already gone wrong.

Tier 2 loads on demand. Specs, strategy docs, design decisions. These only enter context when the current task requires them.

Tier 3 is archival. Session histories, completed plans. These almost never load.

Every token spent reading "how we got here" is a token unavailable for "what we're building now." The completed steps in my execution plan would never be touched again, yet they were consuming attention every session. After the split and the cleanup, the active working set dropped dramatically. That's real attention reclaimed for actual development.

This isn't just a suggestion. I've watched the quality degrade in real time when Tier 1 documents bloat past their budget. The agent starts hedging, restating things, losing track of decisions made earlier in the session. The arena got too small for the work.

Your Persistent Memory Is an Instruction Cache

Your CLAUDE.md, your .cursorrules, your system prompt. These function like an instruction cache: they load on every cold start and define how the agent behaves. The tradeoff is the same one every embedded developer knows. A bigger instruction cache means a smaller data cache.

I keep 21 persistent memory edits that load automatically at zero tool call cost. File editing rules, writing style conventions, commit message patterns, TDD discipline, seed script usage, the project base directory. Each one prevents a class of recurring mistakes. The file editing rules alone encode three separate bugs I discovered through painful experience: the ghost {} files that appear when you use the wrong tool to create new files, the silent corruption from dollar-sign $ characters in markdown files, and the doubled context cost from copying files to the AI's environment, editing them there, then copying them back.

These 21 items are dense and well-structured. They cover what the agent needs right now, not everything it might ever need. Every item I add pushes something else further from the agent's attention. I've pruned items that were too verbose, consolidated items that overlapped, and removed items that addressed problems we solved structurally.

The sweet spot is a tightly curated instruction set. If you have 200 lines of rules in your CLAUDE.md that you haven't reviewed in a month, you have stale cache. Audit it. Trim it. Your agent is reading every line on every turn.

Garbage Collection: Don't Wait for the Crash

When context fills, something has to give. Claude Code runs auto-compaction at roughly 83.5% of window capacity, summarizing older conversation to reclaim tokens. Gemini CLI offers /compact for manual garbage collection.

I don't use either. I run a proactive approach. When context usage crosses roughly 75%, the session gets a warning. This isn't cosmetic. It means: stop starting new tasks, update the handoff documents with current state, save any decisions or context that the next session will need, and wrap cleanly.

The reason is simple. Every time I've pushed past that threshold, the output quality has fallen off a cliff. The agent gets lazy, skips steps, hallucinates details it would have gotten right an hour earlier.

Session hygiene matters here. All context updates happen at the end of the current session while the context is still intact. I never defer doc updates to the next session. If the handoff document doesn't reflect what just happened, the next session starts with stale memory.

Named Documents Beat Chat History

The biggest friction point in long-running AI projects is drift. You agree on a direction, then three prompts later the model is hallucinating a different architecture. Chat history is a terrible source of truth because it's littered with abandoned approaches, resolved debates, and superseded decisions.

I stopped relying on chat history entirely. Every major step becomes a named plan document: a technical specification with file paths, test requirements, and acceptance criteria. The agent doesn't "remember" our previous session. It reads a document that tells it exactly where things stand.

The handoff document is now at version 60 across 95 sessions. It's been rewritten, trimmed, restructured, and archived for every session that produced any kind of artifact. It's always current. When a new session reads it, the agent knows the project state, the tech stack, the open decisions, and the next task. We're not chatting. We're executing against a versioned document.

This is pointer arithmetic instead of memory scanning. When the agent needs to know something, there's a direct address for it, not a search through 200K tokens of conversation history hoping the relevant passage gets enough attention weight.

Fragmentation: The Invisible Performance Killer

After an hour of work, your context is littered with resolved bugs, error messages from fixed issues, file reads from modules you've moved on from, and tool call artifacts that served their purpose thirty minutes ago. All of it is still consuming attention.

The signal-to-noise ratio matters more than the total token count. A session with 50K tokens of focused, relevant context will outperform a session with 200K tokens where only 30% is still relevant.

I hit this pattern repeatedly where the agent re-reads a file because the earlier read is buried under newer context. It re-derives a conclusion it already reached. It asks me to confirm something I confirmed twenty messages ago. Responses get longer but less useful. The agent hedges, qualifies, restates. It's not thinking harder. It's thrashing.

The fix is session discipline. When a task is done, start a new session. I know it feels wasteful to close a session that still has "room." The room is full of noise. A clean start with a current handoff document gives you 100% signal. A continued session with 70% noise gives you 30% signal in a bigger window. The math isn't close.

The Practices That Actually Work

These are the specific things I do every day that keep sessions productive. None of them are theoretical.

Give direct addresses, not search queries. When I say "fix the bug in src/stage2-analyze/security/map-to-contract.mjs, the determineOverallStatus function is returning FAIL instead of SKIPPED when the findings array is empty," the agent goes straight to the problem. No search, no exploration, no wasted tokens on file discovery. Five seconds of typing keeps the session clean.

Stop bad responses immediately. When the agent starts heading the wrong direction, I stop it within the first few sentences. Every token of a bad response is permanently in context and the model has to reconcile it against the correction. A 2,000 token wrong answer followed by "no, that's not what I asked" is 2,500 tokens of pollution. Catching it at 50 tokens is 100 tokens of pollution.

Feed tasks one at a time. Don't paste a 15-item list and say "work through this." Every item you won't reach for an hour is consuming attention right now. Present the next task when the current one is done. When you hit the session budget, you've completed N items cleanly rather than N items poorly. The same is true for your agent. It will want to give you many things to work on. Push back and tell it to give you one item to work at a time. If you have context left when you get to the next thing, then load it in at that time.

Do the simple things yourself. Moving a file, renaming a directory, creating a folder, running a quick git command. Each one costs zero tokens in your file explorer and costs a tool call plus confirmation in the AI session. I handle most git operations, tagging, and publishing myself. Every mechanical action the AI does is attention taken away from reasoning.

Audit your instruction cache. I review my memory edits and handoff documents and prune what's stale. Items that addressed problems we've since solved structurally get removed. Items that are too verbose get condensed. The AI won't do this for you. It will happily load a bloated execution plan every session and never flag it. The only reason mine got split and trimmed was because I checked the file sizes and said "this is too much." That kind of maintenance is on you.

Never defer the handoff. At the end of every session, before closing, the handoff document gets updated. The execution plan gets updated. Memory edits get reviewed. This happens while the context is still intact and the agent still has full recall of what just happened. If you defer it to "next time," the next session starts with yesterday's state and today's problems.

Infinite Context Is Just Infinite Noise

Context windows will keep growing. Pricing will keep falling. More memory has never eliminated the need for memory management. A 64KB microcontroller and a 128GB server both need allocation strategies. The scale changes. The discipline doesn't.

Your agent isn't getting dumber. Your context is getting dirty. Clean it up.


Scott Bishop is the founder of Fidensa, an independent certification authority for AI capabilities. He's spent 30 years building regulated, high-trust systems across international finance, federal government, and the Fortune 500.

Top comments (0)