Why more context often makes LLMs worse—and what to do instead
1. Introduction
The Context Window Arms Race
The expansion of context windows has been staggering:
- Early 2023: GPT-4 launches with 32K tokens
- November 2023: GPT-4 Turbo extends to 128K
- March 2024: Claude 3 reaches 200K
- February 2024: Gemini 1.5 hits 1M—later expanding to 2M
In just two years, context capacity grew from 32K to 2M tokens—a 62× increase.
The developer intuition was immediate and seemingly logical:
“If everything fits, just put everything in.”
The Paradox: More Context, Worse Results
Practitioners are discovering a counterintuitive pattern:
the more context you provide, the worse the model performs.
Common symptoms include:
- Passing an entire codebase → misunderstood design intent
- Including exhaustive logs → critical errors overlooked
- Providing comprehensive documentation → unfocused responses
This phenomenon has a name in the research literature:
“Lost in the Middle” (Liu et al., 2023).
Information placed in the middle of long contexts is systematically neglected.
The uncomfortable truth is this:
A context window is not just storage capacity. It is cognitive load.
This article explores why Context Stuffing fails, what Anthropic’s Claude Code reveals about effective context management, and how to shift from Prompt Engineering to Context Engineering—the discipline of architectural curation for AI systems.
2. Why “More Context” Doesn’t Mean “Better Understanding”
Capacity vs. Capability
We must distinguish between two fundamentally different concepts:
- Capacity: How much data fits in memory (e.g. 200K, 2M tokens)
- Capability: The ability to prioritize, connect, and reason over that data
Just because a model can ingest 2 million tokens does not mean it can pay attention to them equally.
Providing a 2M-token context to an LLM is like handing a new developer 10,000 pages of documentation on day one and expecting them to fix a bug in five minutes.
They won’t understand the system—they will immediately drown in it.
Attention Dilution and “Lost in the Middle”
This limitation is rooted in the self-attention mechanism.
As token count increases, attention distributions flatten, signal-to-noise ratios drop, and relevant information gets buried.
Liu et al. (2023) demonstrated that information placed in the middle of long contexts is systematically neglected—even when explicitly relevant—while content at the beginning and end receives disproportionate attention.
In short:
Context expansion increases what can be accessed, not what can be understood.
Real-World Symptoms
In practice, adding information often degrades accuracy:
- Entire codebases → architectural misinterpretation
- Exhaustive logs → critical signals buried
- Comprehensive docs → answers drift off-topic
These are not failures of model intelligence.
They are failures of information structure and prioritization—problems no amount of context capacity can solve.
3. The 75% Rule: Lessons from Claude Code
The Problem: Quality Degradation in Long Sessions
The strongest evidence against Context Stuffing comes from Claude Code, Anthropic’s terminal-based coding agent with a 200K context window.
In early 2024, users reported recurring issues:
- Code quality degraded over long sessions
- Earlier design decisions were forgotten
- Auto-compact sometimes failed, causing infinite loops
At the time, Claude Code routinely used over 90% of its available context.
The Solution: Auto-Compact at 75%
In September 2024, Anthropic implemented a counterintuitive fix:
Trigger auto-compact when context usage reaches 75%.
This meant:
- ~150K tokens used for storage
- ~50K tokens deliberately left empty
What looked like waste turned out to be the key to dramatic quality improvements.
Why It Works: Inference Space
Several hypotheses explain why this works:
- Context Compression — Low-relevance information is removed
- Information Restructuring — Summaries reorganize scattered data
- Preserving Room for Reasoning — Empty space enables generation
As one developer put it:
“That free context space isn’t wasted—it’s where reasoning happens.”
This mirrors computer memory behavior:
Running at 95% RAM doesn’t mean the remaining 5% is idle—it’s system overhead. Push to 100%, and everything grinds to a halt.
Takeaway
Filling context to capacity degrades output quality.
Effective context management requires headroom—space reserved for reasoning, not just retrieval.
4. The Three Principles of Context Engineering
The era of prompt wording tweaks is ending.
As Hamel Husain observed:
“AI Engineering is Context Engineering.”
The critical skill is no longer what you say to the model, but what you put in front of it—and what you deliberately leave out.
Principle 1: Isolation
Do not dump the monolith.
Borrow Bounded Contexts from Domain-Driven Design.
Provide the smallest effective context for the task.
Example: Add OAuth2 authentication
Needed:
-
Usermodel -
SessionController -
routes.rb - Relevant auth middleware
Not needed:
- Billing module
- CSS styles
- Unrelated APIs
- Other test fixtures
Ask:
What is the minimum context required to solve this problem?
Principle 2: Chaining
Pass artifacts, not histories.
Break workflows into stages:
Plan → Execute → Reflect
Each stage receives only the previous stage’s output—not the entire conversation history.
This keeps context fresh and signal-dense.
Ask:
Can this be decomposed into stages that pass summaries instead of transcripts?
Principle 3: Headroom
Never run a model at 100% capacity.
Adopt the 75% Rule.
Token limits usually cover input + output. Stuffing 195K tokens into a 200K window leaves almost no room for reasoning.
Ask:
Have I left enough space for the model to think—not just respond?
Treat the context window as a scarce cognitive resource, not infinite storage.
5. Why 200K Is the Sweet Spot
Despite 2M-token models, 200K is the practical sweet spot for Context Engineering.
Cognitive Scale
150K tokens (75% of 200K) is roughly one technical book—about the largest coherent “project state” both humans and LLMs can manage. Beyond that, you need chapters, summaries, and architecture.
Cost and Latency
Attention scales at O(n²).
Doubling context quadruples cost.
200K balances performance, latency, and cost.
Methodological Discipline
200K forces curation.
Exceeding it is a code smell: unclear boundaries, oversized tasks, or stuffing instead of structuring.
Anthropic offers 1M tokens—but behind premium tiers.
The implicit message:
1M is for special cases. 200K is the default for a reason.
The constraint is not a limitation—it is the design principle.
6. Conclusion: From Prompt Engineering to Context Engineering
The context window arms race delivered a 62× increase in capacity.
But capacity was never the bottleneck.
The bottleneck is—and always has been—curation.
The shift is fundamental:
| Prompt Engineering | Context Engineering |
|---|---|
| “How do I phrase this?” | “What should the model see?” |
| Optimizing words | Architecting information |
| Single-shot prompts | Multi-stage pipelines |
| Filling capacity | Preserving headroom |
Three Questions to Ask Before Every Task
Am I stuffing context just because I can?
Relevant beats exhaustive.Is this context isolated to the real problem?
If you can’t state the boundary, you haven’t found it.Have I left room for the model to think?
Output quality requires input restraint.
The era of prompt engineering rewarded clever wording.
The era of context engineering rewards architectural judgment.
The question is no longer:
What should I say to the model?
The question is:
What world should the model see?
7. References
Research Papers
- Liu et al., Lost in the Middle: How Language Models Use Long Contexts (2023) https://arxiv.org/abs/2307.03172
Tools & Methodologies
- planstack.ai — https://github.com/planstack-ai/planstack
Empirical Studies
- Greg Kamradt, Needle in a Haystack https://github.com/gkamradt/LLMTest_NeedleInAHaystack
Articles & Analysis
How Claude Code Got Better by Protecting More Context (2024)
https://hyperdev.matsuoka.com/p/how-claude-code-got-better-by-protectingHamel Husain, Context Rot
https://hamel.dev/notes/llm/rag/p6-context_rot.html

Top comments (0)