You've felt it. The first twenty minutes with Claude Code, Cursor, or whatever agent you live in are magic. It nails the refactor, remembers your conventions, one-shots the test.
Then, an hour in, it turns into an intern who skipped lunch. It forgets a function you defined ten messages ago. It re-suggests a fix you already rejected. It loops. You start a sentence with "as I said earlier..." and feel slightly insane.
Everyone has a theory. "The model got nerfed." "It's MCP." "They're throttling me." I believed all three at various points.
The boring truth is none of them. It's your context window quietly filling up with garbage — and I can show you exactly where the garbage comes from.
The model didn't get dumber. The signal did.
LLMs don't have memory. Every turn, the entire conversation gets re-sent to the model: your messages, its replies, and — this is the killer — every byte of output from every tool it ran.
That last part is where it falls apart. Watch what a single innocent command costs you:
-
npm run buildon a real project: 2–10 KB of webpack noise. -
git logwithout limits: pages of commits. - One
catof a 600-line file: the whole file, verbatim, forever. - A failing test suite: stack traces stacked on stack traces.
Ask your agent to "check why the build is failing" and it might dump 50 KB into the context in a single tool call. You never see most of it — it scrolls past — but the model carries all of it on every subsequent turn.
This is context rot: as the window fills, the ratio of signal (your actual task) to noise (raw tool sludge) collapses. Attention spreads thinner across a longer, junkier transcript. The model isn't lazier. It's drowning. There's solid research now showing model accuracy drops well before you hit the hard token limit — performance degrades with how full the window is, not just whether it overflowed.
So the agent that "got dumber mid-session" got dumber because you (via its tools) fed it a haystack and then asked it to find a needle in it.
How I actually confirmed it
I stopped guessing and measured. Before blaming the model, look at what's eating your window:
- In Claude Code, run
/contextto see the breakdown. The first time I did this, tool results were the single biggest slice — bigger than my code, bigger than the conversation. - Notice which calls are fat. For me it was always the same culprits: build output,
git log, dependency trees, and reading whole files just to "have a look."
That was the whole diagnosis. The expensive stuff wasn't my thinking or the model's replies. It was raw command output I never needed to read in full.
The fix: keep raw output OUT of the window
The principle is one sentence: the model should see conclusions, not raw data.
Once I framed it that way, the fixes were obvious — and most of them need zero special tooling:
1. Summarize at the source. Don't let the agent run npm test and inhale 8 KB. Run it so it returns "3 failures: auth.test.ts:40, 51, 77". Pipe through grep, tail -n 20, --quiet, wc -l. A summary is worth a thousand log lines.
2. Never read a whole file just to understand it. Reading is for editing. For "where is JWT validated?", search and read the 15 relevant lines, not the 600-line file. Whole-file reads are the most common self-inflicted context wound I see.
3. Use sub-agents as a firewall. Spin a throwaway agent to do the noisy exploration — grep the codebase, read the logs, crawl the docs — and have it report back only a paragraph of findings. All the sludge dies with the sub-agent. Your main context stays clean.
4. Start fresh more often than feels necessary. Finished a sub-task? New session. A clean window with a tight summary beats a "full" window with all the history every single time. Long-lived threads are a vanity metric.
5. Route the truly heavy stuff through a sandbox. This is the one I leaned on hardest. I run my commands through a layer that executes them outside the model's context, indexes the full output, and returns only the slice I searched for. So a 56 KB build log becomes "here are the 4 lines mentioning the error." The model gets the answer; the noise never touches the window. (I use a tool called context-mode for this, but the pattern matters more than the tool — you can fake it with command | tee log.txt | grep ERROR and only ever feeding the agent the grep.)
The mindset shift
Stop treating the context window like RAM you fill until it's full. Treat it like a whiteboard in a meeting: everything you write on it competes for attention, and a cluttered board makes the whole room dumber. Wipe it often. Only write what matters.
Your agent didn't get nerfed. It got buried. Dig it out and the magic comes back — for the whole session, not just the first twenty minutes.
If this matched your experience, I'd love to hear which tool eats your context the most — for me it's git log every time. Follow me @enjoy_kumawat for more practical AI-tooling notes.
Top comments (0)