Sam

Posted on May 12

I Cut Coding Agent Context Usage by 22–45% by Killing Context Bloat

#ai #webdev #productivity #machinelearning

A lot of AI coding workflows degrade the exact same way.

At first, everything feels incredible.

Your coding agent:

understands the project
moves insanely fast
eliminates boilerplate
compounds your momentum

Then a few weeks later:

AGENTS.md turns into a novel.

Prompts get bloated.

The model starts missing obvious things.

Responses become inconsistent.

Token usage quietly becomes absurd.

I kept running into this while building Empirical.

Eventually I realized the problem wasn’t:

“The model needs more context.”

The problem was:

“The model is carrying too much irrelevant context at once.”

That distinction changed everything.

The Hidden Failure Mode of Coding Agents

Most teams solve AI memory like this:

“Just add it to the prompt.”

And over time the context fills up with:

Permanent Context Soup

architecture decisions
coding standards
deployment notes
UI preferences
old implementation details
temporary fixes
abandoned experiments
half-finished thoughts

Eventually every request drags all of it around forever.

Even when most of it has absolutely nothing to do with the current task.

That creates a brutal signal-to-noise problem.

The model starts treating temporary junk and critical architecture decisions with equal importance.

You can actually feel the degradation happen.

Symptoms:

the agent gets fuzzier
architecture drift increases
outputs become inconsistent
you spend more time correcting than building

Bigger Context Windows Aren’t the Real Solution

I think the industry is optimizing the wrong thing right now.

Everyone keeps pushing toward:

Bigger Everything

million-token windows
infinite memory
larger context sizes
stuffing more into prompts

But humans don’t work like that either.

Good engineering teams don’t bring every document into every meeting.

Most information is situational.

Most memory should stay dormant until it becomes relevant.

That was the shift for me.

Not:

“How do I fit more into context?”

But:

“How do I load only what matters right now?”

What Worked Better

I started treating AI memory more like layered working memory instead of permanent prompt stuffing.

1. Lean Persistent Context

Keep permanent instructions extremely small.

Only things like:

architecture principles
coding philosophy
project identity
non-negotiables

That layer should stay lean on purpose.

2. Retrieved Context

Pull implementation knowledge dynamically based on:

Relevance Signals

semantic similarity
current task
related code paths
previous work in the same area

Only relevant context enters the active prompt.

3. Session Context

Use temporary working memory for:

Active Work

bugs
in-progress features
short-lived implementation decisions

Then let it expire naturally instead of polluting long-term memory forever.

What Changed

The biggest surprise wasn’t even the token savings.

It was how much sharper the agents became once the noise disappeared.

After reducing context bloat:

responses became more focused
architecture stayed more consistent
prompt babysitting dropped significantly
outputs drifted less between sessions

The token reduction was just the measurable side effect.

Results

Workflow	Context Reduction
Smaller focused tasks	~22%
Larger iterative workflows	Up to ~45%

That compounds fast once agents start looping.

The Bigger Realization

I think a lot of AI tooling is accidentally recreating bad human organizational habits.

We already know what happens when people dump everything into:

Organizational Chaos

giant docs
giant meetings
giant Slack threads
giant Notion pages

Clarity collapses.

Coding agents seem to behave better when memory works more like human working memory:

Better Memory Pattern

small active focus
relevant recall
long-term memory separated from immediate attention

That mattered far more than raw context size.

Full Breakdown

I wrote the complete breakdown here:

retrieval architecture
layered memory strategy
implementation lessons
where the 22–45% savings actually came from

→ Reducing Coding Agent Context Usage by 22–45% with Retrieval-Based Memory Systems

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.