OpenAI says "Context is a scarce resource."
Treat it like one.
A giant instruction file feels safe. It feels thorough. But in reality, it crowds out the actual task, the code, and the relevant constraints.
The agent doesn't get smarter with more text.
It just gets distracted.
It either:
- Misses the real constraint buried in noise
- Starts optimizing for the wrong objective Or worse, overfits to instructions that don't matter right now
The Right Mental Model is to think of context like RAM in a running system.
RAM is:
- Finite
- Expensive
Meant for what's actively being processed
You don't load your entire hard drive into memory just because it might be useful.
Same with LLM context.
So what would you do to optimize RAM?
Do the same for context.
Garbage Collect Aggressively
Remove:
- Old decisions that no longer apply
- Duplicated instructions
- Outdated constraints
- "Nice-to-know" explanations
If it's not needed for this task, it shouldn't be in memory.
Load on Demand (Lazy Loading)
Don't preload:
- All coding standards
- All architecture docs
- All squad rules
Instead:
- Inject only what's relevant to the current step
- Use smaller scoped agents
- Pull specific docs when needed
Context should be dynamic, not monolithic.
Compress, Don't Copy
Replace:
- Long paragraphs
- Repeated policy text
- Verbose explanations With:
- Bullet summaries
- Structured rules
- Canonical references
You don't duplicate libraries in RAM — you reference them.
Modularize Instructions
Instead of one giant instruction file:
- core-standards.md
- frontend-guidelines.md
- backend-guidelines.md
- architecture-principles.md
Load only what the current task touches.
Context should be composable.
Separate Long-Term vs Working Memory
Some things are:
- Stable principles (coding philosophy, architectural values)
- Temporary task constraints (fix this bug, implement this endpoint) Don't mix them.
Keep:
- Stable principles lean and abstract
- Task context precise and scoped
Avoid Over-Specification
The more constraints you add, the more the model optimizes for instruction compliance.
The less it reasons about the problem, high-signal beats high-volume.
Optimize for Relevance, Not Completeness
You don't win by giving the model everything.
You win by giving it exactly what it needs to think clearly.
The goal isn't:
"Did I include all the instructions?"
The goal is:
"Did I include the right instructions?"
Final Take
Large context != better output.
Relevant context = better reasoning.
Treat context like RAM:
- Keep it lean
- Keep it current
- Load intentionally
- Evict aggressively
Systems that manage memory well perform better.
Agents are no different.
Top comments (0)