Your agent has a 200K token context window. So you dump everything in there — MEMORY.md, daily logs, project notes, old conversations — and figure the model will sort it out. It won't.
The research says your middle context is a dead zone
In 2023, researchers from Stanford, UC Berkeley, and Samaya AI published "Lost in the Middle: How Language Models Use Long Contexts." They tested models on tasks where the relevant information was placed at different positions in the input. The results were consistent: models performed best when key information appeared at the very beginning or the very end of the context. Information in the middle got ignored.
This wasn't a fluke finding. Nelson Liu and the team tested across multiple model families and context lengths. Performance degraded significantly — sometimes by 20% or more — when the answer was buried in the middle third of the input.
Google DeepMind followed up with similar findings. So did Anthropic's own internal research on Claude's attention patterns. The pattern holds: long context doesn't mean good context.
What this means for your agent
If you're loading 50KB of MEMORY.md into every session, here's what actually happens:
- The model reads the first few thousand tokens carefully
- Attention drops off through the middle
- It picks back up near the end, where your actual conversation starts
That preference you stored six months ago about using TypeScript? It's sitting in paragraph 47 of your memory file. The model probably won't notice it when it matters.
The math makes it worse. A 50KB MEMORY.md is roughly 12,500 tokens. At $3 per million input tokens (Claude Sonnet pricing), that's about $0.04 per session just to load memories your agent might not even use. Run 50 sessions a day and you're spending $2/day on context that's partially invisible to the model.
Stuffing vs. retrieval: a real comparison
Stuffing approach (MEMORY.md):
- Load everything every session: ~12,500 tokens
- Model sees all memories but attends unevenly to them
- Cost: $0.04 per session regardless of relevance
- Old memories compete with new ones for attention
Retrieval approach (MemoClaw recall):
- Query for relevant memories: 5-10 results, ~500-1,000 tokens
- Model sees only what's relevant to the current conversation
- Cost: $0.005 per recall + ~$0.003 in input tokens
- Important memories surface when they're actually needed
The retrieval approach uses roughly 8% of the tokens and puts them where the model actually pays attention — right before the conversation starts.
Why "just use a bigger context window" doesn't fix this
Every few months, someone announces a longer context window. Gemini hit 1M tokens. Claude went to 200K. GPT-4 Turbo did 128K. And every time, people assume the memory problem is solved.
It isn't. Longer windows don't change the attention distribution. They make the middle-zone problem worse because there's more middle to lose things in. A 1M token context with your answer at position 500K is worse than a 4K context with your answer at position 2K.
The lost-in-the-middle researchers tested this explicitly. Extending context length didn't improve retrieval from the middle. It just gave models more text to skim past.
What actually works
The fix isn't bigger contexts. It's smaller, targeted contexts with the right information.
With MemoClaw, instead of loading everything, you recall what's relevant:
memoclaw recall "user's TypeScript preferences"
You get back 5-10 semantically matched memories. You inject those at the start of your prompt. The model sees exactly what it needs, right where it pays the most attention.
For an OpenClaw agent, this looks like:
- Session starts
- Agent calls
recallwith a query about the current task - Gets back relevant memories (preferences, past decisions, corrections)
- Those go into the system prompt, before the conversation
- Agent works with full context on what matters, zero noise from six months of irrelevant notes
The token cost drops from ~12,500 to ~800. The relevant information moves from "somewhere in the middle" to "right at the top." The model stops missing things.
The numbers
Here's a side-by-side for an agent running 30 sessions per day over a month:
| MEMORY.md stuffing | MemoClaw retrieval | |
|---|---|---|
| Tokens loaded per session | ~12,500 | ~800 |
| Monthly input token cost | ~$33.75 | ~$2.16 |
| MemoClaw API cost | $0 | ~$4.50 (30 recalls/day) |
| Total monthly cost | ~$33.75 | ~$6.66 |
| Relevant info position | Scattered | Top of context |
| Missed memories | Common (middle zone) | Rare (semantic match) |
You save about $27/month per agent and your agent actually remembers the things that matter.
Start with the expensive memories first
You don't have to migrate everything at once. Start with the memories your agent keeps forgetting:
- User corrections ("I prefer tabs over spaces" stored with importance 0.9)
- Project-specific context that only matters for one workspace
- Preferences that were set months ago and keep getting lost in the file
Move those to MemoClaw, keep the rest in MEMORY.md for now, and see if your agent starts getting things right more often. If you've got an OpenClaw agent running, install the skill and run a migration:
memoclaw migrate ~/path/to/MEMORY.md --namespace my-project
Your context window is expensive real estate. Stop filling it with things the model won't read.
References:
- Liu et al., "Lost in the Middle: How Language Models Use Long Contexts" (2023). arXiv:2307.03172
- Pricing based on Anthropic Claude 3.5 Sonnet rates as of early 2026.
Top comments (2)
I ran into this exact problem a month ago — dumped everything into context, agent performed worse than when I gave it less information. Counterintuitive until you read the Stanford paper.
The practical solution I landed on: a three-layer system. Layer 1 is a short scratchpad that gets overwritten every cycle (current state only). Layer 2 is structured knowledge files organized by topic. Layer 3 is semantic search that pulls only what's relevant per query. The agent never sees the full corpus — it searches first, reads specific sections, then acts.
The biggest win was stopping the agent from reading its own full memory file at startup. Sounds obvious in retrospect, but "load everything on boot" is the default pattern everyone starts with. The retrieval-first approach cut my context usage by maybe 70% and the output quality actually went up.
The multi-turn problem compounds this in a way that doesn't get enough attention. Even if you do careful top-loading of retrieved context at session start, every exchange adds tokens that push that context deeper into the middle. After 15-20 turns, your well-placed retrieved memories are sitting in the dead zone you were trying to avoid.
One approach I've found useful: re-retrieval at decision points, not just at session start. When the agent is about to make a significant architectural or implementation choice, trigger a fresh recall pass specifically for that decision type. It costs a bit more per session but keeps the relevant context in the primacy position when it matters.
The token cost table is a good reality check. People focus on "is my agent getting good results" without tracking what they're spending to load context that isn't helping. $27/month per agent is real money at scale, and that's before you factor in the quality degradation from middle-zone attention loss.