Prathamesh Deshmukh

Posted on Feb 25

Stop Wasting Context

#ai #memory #llm

OpenAI says "Context is a scarce resource."

Treat it like one.

A giant instruction file feels safe. It feels thorough. But in reality, it crowds out the actual task, the code, and the relevant constraints.

The agent doesn't get smarter with more text.
It just gets distracted.

It either:

Misses the real constraint buried in noise
Starts optimizing for the wrong objective Or worse, overfits to instructions that don't matter right now

The Right Mental Model is to think of context like RAM in a running system.
RAM is:

Finite
Expensive

Meant for what's actively being processed
You don't load your entire hard drive into memory just because it might be useful.

Same with LLM context.

So what would you do to optimize RAM?
Do the same for context.

Garbage Collect Aggressively

Remove:

Old decisions that no longer apply
Duplicated instructions
Outdated constraints
"Nice-to-know" explanations

If it's not needed for this task, it shouldn't be in memory.

Load on Demand (Lazy Loading)

Don't preload:

All coding standards
All architecture docs
All squad rules

Instead:

Inject only what's relevant to the current step
Use smaller scoped agents
Pull specific docs when needed

Context should be dynamic, not monolithic.

Compress, Don't Copy

Replace:

Long paragraphs
Repeated policy text
Verbose explanations With:
Bullet summaries
Structured rules
Canonical references

You don't duplicate libraries in RAM — you reference them.

Modularize Instructions

Instead of one giant instruction file:

- core-standards.md
- frontend-guidelines.md
- backend-guidelines.md
- architecture-principles.md

Load only what the current task touches.
Context should be composable.

Separate Long-Term vs Working Memory

Some things are:

Stable principles (coding philosophy, architectural values)
Temporary task constraints (fix this bug, implement this endpoint) Don't mix them.

Keep:

Stable principles lean and abstract
Task context precise and scoped

Avoid Over-Specification

The more constraints you add, the more the model optimizes for instruction compliance.
The less it reasons about the problem, high-signal beats high-volume.

Optimize for Relevance, Not Completeness

You don't win by giving the model everything.
You win by giving it exactly what it needs to think clearly.

The goal isn't:
"Did I include all the instructions?"

The goal is:
"Did I include the right instructions?"

Final Take

Large context != better output.
Relevant context = better reasoning.

Treat context like RAM:

Keep it lean
Keep it current
Load intentionally
Evict aggressively

Systems that manage memory well perform better.
Agents are no different.

Top comments (3)

MaxxMini • Feb 26

The RAM metaphor is spot on, and I've been living this exact problem running a 24/7 AI agent on a Mac Mini.

Your "Separate Long-Term vs Working Memory" section really resonated. I ended up with almost the exact same architecture:

MEMORY.md = long-term curated knowledge (like L2 cache — small, high-value)
memory/YYYY-MM-DD.md = daily working logs (like L1 — hot, disposable)
Periodic "garbage collection" where daily logs get distilled into long-term memory, and stale entries get evicted

The part I'd push further: eviction is harder than loading. It's easy to add context ("just in case"). It's genuinely hard to remove something that was useful last week but isn't anymore. I've found that if a context entry hasn't been referenced in 3+ sessions, it's probably noise now — even if it felt critical when written.

One pattern that surprised me: the agent's own mistakes are the highest-signal context. A lessons-learned.md with "don't do X because Y happened" is worth 10x more tokens than architectural descriptions. The agent already knows the architecture from the code — but it can't know that brew services restart silently overwrites your plist unless you tell it.

Curious about your experience with the "Load on Demand" approach at scale — do you find agents reliably know when to pull specific docs, or do they need hints about what's available?

Prathamesh Deshmukh • Feb 26

I have tried adding a router context md file to actually load context conditionally for specific type of tasks. It is basically an index file which tells agents to refer to certain deeper context files.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.