Keith MacKay

Posted on May 12 • Originally published at tlcmentor.substack.com

Context in Context: Why AI Tools Degrade Over Longer Work Sessions

#convex #productivity #coding #management

Context in Context: Why AI Tools Degrade Over Longer Work Sessions

What every executive needs to understand about AI context windows (and how to get the best performance from your AI assistants)

You invested in AI assistants. You were excited. Your initial tests were impressive. But three months in, as sessions are getting longer, you are hearing (or find yourself thinking): "I didn't notice before, but it seems to be forgetting stuff I told it just a little while ago."

You're not imagining it.

A hidden constraint sits at the heart of every AI assistant. It determines whether these tools accelerate your work, or frustrate you by losing important information that has been provided. Many users don't know this constraint exists.

It's called the context window. And if you're using AI for any significant tasks, you need to understand it.

The 30-Second Explanation

Did you ever take a test where you were allowed to create only a single page of reference notes? That page was your context window for that test (especially if you were someone who slept through classes and saw homework as an optional exercise). It contained your entire understanding of the subject at hand.

Another way to think about it: picture your AI assistant with a notepad. It uses this pad as a working memory where it tracks: your conversation, the files it reads, the domain knowledge it adds for this task, and the instructions it follows. That notepad has a fixed size. Everything the AI needs must fit on that pad--your prompts, the LLM responses, supporting documents, tools--all of it. When the notepad fills up, the AI doesn't "remember" earlier information like a human might (if they actually attended the lectures!). It loses access. Or worse, it gets confused trying to juggle too much at once.

That notepad is the "context window." And it fills up faster than a conference room whiteboard on day three of a strategy offsite.

Even worse, your LLM front-end provides additional opportunities to load EVEN MORE things into your context window, taking up precious space: MCP server details, skills, local documents, and so forth.

Why This Hits Your Bottom Line

Currently, typical AI assistants have context windows of roughly 200,000 tokens: about 150,000 words, or a 500-page book. Sounds enormous.

It's not.

Here's what actually fills that space:

Every message exchanged (yours and the AI's responses)
Every file the AI reads to understand your code
System instructions governing the AI's behavior
Tool definitions for every capability you've enabled
Search results, error messages, test outputs

One complex coding session can exhaust that entire budget in minutes.

When the budget runs low, three things break:

Quality tanks. Responses become generic, like getting advice from someone who skimmed your email instead of reading it.
Memory fails. Decisions from early in the conversation get pushed out by newer information. The AI develops corporate amnesia (ever see the movie "Memento"?).
Consistency disappears. The AI contradicts itself. It ignores patterns it established an hour ago. It's like working with someone who wasn't at the last three meetings but won't admit it.

Your impression wasn't wrong. The tool really does degrade the longer it's used.

...And You Can't Even Use ALL of It

Many experiments have shown that using ALL of the context window available to you is, to use the technical term, "a very bad idea". The models have been trained on human data, and the analogs to human cognition even extend to things like giving greater weight to information at the beginning and end of the context window, yielding a mushy middle that is de-emphasized (I think of this as analogous to primacy and recency bias in humans... I haven't researched whether this was designed or entirely an emergent property, feel free to educate me in the comments!).

In many tools, when the window is full, it will automatically compact the information to what it deems important, in order to clear workspace and continue its task. This is "lossy", actually throwing away some of the information that has been provided for context and keeping only a summary. That is a serious limitation for complex tasks.

Many coding shops have standards around how to better preserve project context as the context window fills up -- I have often seen the figure "80%" as the target for maximum amount of context window used for a task. Given my own personal experience, I like to take action when my context window is about 60% full, meaning even LESS available room for context. I've seen some well-respected AI coders in the field recommend using no more than 30%-40% of your context window...so YMMV.

What do you do when you hit your context window target limit? Historically, my colleagues and I would generally used a strategy of pushing session context into an external file, starting a brand-new session, reloading the context from the file, and continuing where we left off. That proves to be an annoying interruption, but it's a small price to pay for continued high-quality work. That strategy has evolved. Now, where possible, we break tasks into small independent components that can be distributed across a swarm of multiple agents, each of which has its own context window. With upfront thinking and strategizing, we can use the tools in such a way that we (almost) never hit the context window limits...but there are still real limits when working with massive codebases, very intertwined systems, high levels of detail, and so forth.

The Real Business Impact

This isn't a minor annoyance. It directly affects four things you care about:

Developer productivity. Developers who don't understand context waste hours fighting degraded AI. Those who do understand achieve 2-4x the output from the same tools.

AI costs. Many commercial AI tools charge effectively by token. Poor context management means paying for waste: the AI reading unnecessary files, repeating explanations, loading unused tools, spinning in circles with polluted context.

Code quality. An AI drowning in context produces inconsistent code, misses requirements, and makes errors that must be caught and fixed.

Adoption success. Nothing kills AI enthusiasm faster than unpredictable performance. Teams that hit degradation repeatedly often abandon tools that could have been transformative. You paid for a sports car; they're convinced it's a lemon.

The Insight That Separates Winners from Strugglers

One mental shift changes everything:

Context is a budget, not a bucket.

Buckets fill passively until they overflow. Budgets demand active allocation.

Teams that treat context as a budget ask: What does the AI need to see right now? What can we exclude? When do we start fresh? How do we move tasks AWAY FROM the AI and into deterministic code where possible? (For instance: a colleague of mine had a terrific retro agent that would review completed sessions and improve processes for subsequent runs. I added an instruction to look for opportunities to write scripts for things that could be done deterministically in future runs, and it estimated a 60-80% future savings in tokens. This was almost certainly overstated, but even a 20% savings in any budget has real value!)

Teams that treat context as a bucket keep piling in information until performance craters. Then they blame the tool.

What Good Looks Like

Developers who get it:

Run workflows that break all projects into discrete, achievable-within-a-single-context-window tasks
Start fresh sessions for distinct tasks instead of running marathon conversations
Delegate focused tasks to specialized "subagents" that return summaries, not raw data dumps (the tools themselves are beginning to do this: for instance, Claude Code tasks can now launch a subagent per task that runs in its own context window)
Write scripts for repetitive operations instead of walking the AI through each step
Maintain project docs that give the AI architectural context without reading every file

Teams that get it:

Establish clear conventions for when to start new sessions
Create shared documentation that bootstraps AI context efficiently, and constantly edit the document with new learning
Run regular retrospectives to find tasks that should be scripts, not conversations
Track context-related productivity metrics

Organizations that get it:

Recognize that AI tool performance depends on usage patterns, not just tool selection
Invest in training for context-aware workflows
Choose AI tools that show context consumption (you can't manage what you can't see), or use tool add-ins or augmentation to allow context monitoring
Set realistic expectations: AI assistants are powerful, but they have constraints

The Hidden Tax on Every Session

Modern AI tools are extensible. They connect to databases, file systems, APIs, and specialized capabilities through plugins (generally via MCP, the Model Context Protocol...a widely-adopted standard for allowing existing software to outline its capabilities and available information for AI agents). Many thousands of MCP integration implementations already exist to allow AI agents to connect to existing software tools and data sources.

Each integration provides power. Each also consumes context: silently, automatically, whether used or not. This is a great simplifier for connectivity/integration, but the MCP standard has no inherent security provisions and comes with a high up-front token cost.

Think of it like subscribing to every streaming service, every magazine, and every meal kit delivery "just in case." Except instead of cluttering your mailbox, you're cluttering your AI's brain.

When you connect an MCP extension, its entire capability description loads into the context window. One extension with 20 features might consume 10,000-15,000 tokens just by being referenced in your project. Organizations that enable every available extension "just in case" can burn 30-40% of their context budget before anyone types a prompt.

The lesson is counterintuitive but critical: More capability often means less intelligence. Enable what you'll use. Disable what you won't.

Alternative strategies include:

connect with necessary tools via their API through scripts created by the LLM
use alternate "progressive disclosure" strategies like "skills" in Claude Code (where the skill components are only loaded into the context window on an "as-and-when needed" basis)
create separate processes or agents with their own context windows for specific tasks or specific MCP instances. ## The Counterintuitive Truth

Here's what surprises most people:

The goal isn't to maximize the amount of information that goes into the AI. It's to optimize what it pays attention to.

More context often produces worse results. An AI with focused, relevant context significantly outperforms one drowning in information. Just like a human does.

Experienced AI users spend effort curating what goes into context. Amateurs just collect stuff and pile more in. Curate for quality, don't collect for completeness.

Warning Signs Your Team Has Context Problems

Listen for these phrases:

"It keeps forgetting what I told it": Classic context overflow
"Sometimes it's brilliant, sometimes it's useless": Inconsistent context load
"The longer I work, the worse it gets": Session running too long
"I have to keep re-explaining the same thing": Important context pushed out
"I just restart when it stops working": Context pollution forcing fresh starts

Five Questions for Your Next Leadership Meeting

If your organization uses AI development tools, ask these:

Do we train developers on context management? Most teams don't. They pay for it in lost productivity every day.
Can we measure context consumption? If you can't see it, you can't manage it. Does your tooling provide visibility?
Do we have conventions for session management? Ad-hoc approaches leave productivity on the table. What triggers a fresh start?
Are we using extensions judiciously? Or does everyone enable everything "just in case"?
Have we identified tasks that should be scripted? Repetitive AI-assisted tasks often work better as simple automation.

Relief Is Coming Soon (But Not Here Just Yet)

The good news: context constraints won't stay this tight forever.

Context windows are growing. Gemini has a 1-million-token context window. Claude Code allows 1-million-token context windows via API usage, but not (yet) to subscribers. Context windows are expected to continue to grow in size, and management of them will only continue to improve.

Usage strategies are also improving. A recently-described technique from MIT researchers called Recursive Language Models (RLMs) shows particular promise. RLM isn't a specific product. It's a strategy that can be applied to any LLM: decompose large problems into smaller pieces, analyze each in its own focused context, then synthesize the results. For those familiar with MapReduce (the strategy Google invented to distribute search problems across many different servers and then synthesize results), you might think about this strategy as vaguely analogous. Divide-and-conquer for AI.

I fully expect that, within 12 months, RLM-based tools will navigate multi-million-line codebases with the coherence that today's tools bring to small projects. "Codebase too large" will stop being a hard limit.

But here's the catch: RLMs solve scale, not quality. Understanding what code does differs from understanding why it exists. And the fundamentals of context hygiene (attention, relevance, signal-to-noise) will still matter. Larger windows just mean more opportunities to drown in noise if you're not deliberate.

Master context management now, and you'll be ready when the constraints loosen. Ignore it, and you'll just have more rope to hang yourself with.

The Bottom Line

AI coding assistants are genuinely transformative. But they're not magic. They operate within real constraints (constraints that, when ignored, undermine the entire investment).

Context management is one of the highest-leverage skills for AI-assisted development. Teams that understand it extract dramatically more value from identical tools. Teams that don't blame the technology for problems rooted in usage patterns.

The AI isn't getting worse. Its notepad just filled up.

Once you understand this, everything changes: how you deploy AI tools, how you train teams to use them, how you measure success.

The notepad is finite. How you manage it determines whether AI becomes your organization's secret weapon, or its most expensive frustration.

The technical details matter less than the organizational practices built around them. Context management isn't an engineering problem. It's a capability your teams either have or don't.

Top comments (2)

Rasmus Ros • May 15

Good explanation. In practice the drop usually shows up well before the hard limit. The model can still have earlier material in the window and still stop using it well. That is why restarting early or breaking work into smaller chunks tends to help.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.