Looped

Posted on May 24

Context bloat is the new bundle bloat

#ai #programming #github #githubactions

TL;DR: AI coding agents are now part of the dev workflow, but most repos still treat context like it is free. It is not. dist/, coverage output, source maps, generated clients, logs, snapshots, and lockfile churn all make agents waste attention. I built ContextLevy as a small PR guardrail for that.

The thing nobody wants to admit

Your AI coding agent reads your repo mess.

Not always perfectly. Not always literally every file. But enough that messy repositories become worse environments for tools like:

Cursor
Codex
Claude Code
Copilot
local coding agents
random CLI agents your team experiments with next week

And the mess usually does not look dramatic.

It looks like this:

+ dist/index.js
+ coverage/lcov.info
+ generated/client.ts
+ bundle.js.map
+ snapshots/
+ debug.log
+ package-lock.json with 9,000 changed lines

None of that necessarily breaks your app.

But it does make your repo heavier to reason about.

For humans, we learned to ignore junk.

For agents, junk becomes context.

Bundle bloat already taught us this lesson

Frontend devs understand bundle bloat because we learned to measure it.

A pull request adding 30 KB of JavaScript is easy to miss.

A bot comment saying this is harder to ignore:

Bundle size increased by 30 KB
Main chunk increased by 18 KB
Vendor chunk increased by 12 KB

That comment does not replace engineering judgment.

It creates friction at the right moment.

Before the cost lands in main.

That is the same mental model I think AI-heavy repos need now.

Bundle bloat was hidden cost for users.
Context bloat is hidden cost for agents.

What is context bloat?

Context bloat is when a repo accumulates files that are technically valid, but low-value or noisy for AI coding workflows.

Common examples

File/change	Why it is noisy
`dist/`	Usually generated output, not source of truth
`coverage/`	Huge text output that agents should rarely inspect
`*.map`	Source maps can be massive and low-signal
generated clients	Sometimes needed, often overwhelming
lockfile churn	Can dominate PR diffs with little semantic value
snapshots	Useful for tests, noisy for reasoning
logs	Almost never belong in repo context
agent instruction files	Small changes can affect agent behavior a lot

The issue is not that these files are always bad.

The issue is that they should not become invisible cost.

“Just use `.gitignore`” is not enough

You should use .gitignore.

Seriously.

But .gitignore only helps with files before they are tracked.

It does not help much when:

generated files are intentionally committed
old junk is already tracked
snapshots grow over time
lockfiles churn hard
someone changes agent instructions in a risky way
different tools use different ignore/indexing behavior

Also, .gitignore is local hygiene.

A PR comment is team hygiene.

It shows up where the merge decision happens.

“Just tell the AI to ignore it” also does not scale

This sounds good until you remember how real teams work.

One dev uses Cursor.

Another uses Claude Code.

Someone else uses Codex.

Someone runs a local model through a CLI.

Next month, the team tries another agent entirely.

Every tool has different rules for indexing, retrieval, file search, ignore behavior, and context selection.

The repo is the shared layer.

Cleaner repo context helps every tool downstream.

So I built ContextLevy

ContextLevy is a small open-source tool that acts like:

bundle-size checks, but for AI coding context

It runs on pull requests and flags diffs that add a lot of context weight.

It catches things like

committed build output
coverage reports
source maps
generated clients
large lockfile churn
snapshots
logs
agent instruction changes

It can run as

a GitHub Action
a GitHub App
a local CLI

It does not

call an LLM
upload your code
judge code quality
replace code review
pretend to be an AI platform

It just analyzes the diff and leaves a focused PR comment.

That is the whole point.

Small guardrail. Clear feedback.

What a ContextLevy comment is supposed to do

The goal is not to shame people for committing generated files.

Sometimes generated files belong in the repo.

The goal is to make the cost visible:

ContextLevy · Warning · ~84k added context tokens

Largest contributors:
+ coverage/lcov.info
+ dist/index.js
+ generated/client.ts

Suggestion:
Consider ignoring coverage output and build artifacts unless they are intentionally tracked.

That is it.

Just a useful nudge before main gets heavier.

Why this is not just another AI wrapper

Most AI devtools try to add more intelligence.

ContextLevy does the opposite.

It assumes the boring part matters:

what files exist
what changed in the PR
how much text was added
whether that text is likely useful
whether the repo is getting noisier over time

A lot of AI tooling discourse focuses on better models.

But model quality is only half the story.

The other half is context quality.

Garbage context still hurts, even with better models.

Bigger context windows do not fix this.

They just make it easier to stuff more junk into the prompt.

The fair criticism

The obvious criticism is:

“Couldn’t I make this with a script?”

Yes.

You can also write your own formatter, linter, bundle-size checker, release script, changelog generator, and dependency bot.

Most useful devtools are not valuable because the underlying idea is impossible.

They are valuable because they package the boring workflow into something teams actually run.

The value is in:

useful defaults
CI integration
PR comments
config
predictable output
low setup cost
making the issue visible consistently

That is what ContextLevy is trying to be.

Who this is for

ContextLevy makes sense if:

your team uses AI coding agents heavily
your repo has lots of generated or build output
your PRs often include noisy files
you care about keeping AI context clean
you want a lightweight CI guardrail

It probably does not make sense if:

your repo is tiny
you barely use coding agents
your team already has strict generated-file policies
you do not want another PR check
you expect semantic code review from it

That last point matters.

ContextLevy is not a reviewer.

It is a warning light.

Why I think this will matter more

AI coding agents are moving from autocomplete to actual development loops.

People now ask agents to:

explain unfamiliar codebases
implement cross-file features
review pull requests
debug CI failures
migrate frameworks
generate tests
refactor architecture

That means repo context is becoming part of the development environment.

We already optimize package size.

We already optimize test speed.

We already optimize CI time.

We already optimize dependency weight.

So why are we pretending AI context is free?

My actual question

I am not claiming ContextLevy is the final answer.

I am trying to figure out if this problem deserves more serious tooling.

Repo:

https://github.com/unloopedmido/contextlevy

I would genuinely like blunt feedback:

Is “context bloat” a real problem you have felt?
Would you install a PR check for this?
Are the default noisy-file categories correct?
What would make this feel like a serious devtool instead of AI-tool noise?
Is the bundle-size analogy clear, or does it feel forced?

Final thought

Bundle bloat became obvious once teams started measuring it.

Context bloat is still mostly invisible.

But as AI agents become normal parts of development, invisible repo noise will matter more.

Maybe the fix is not complicated.

Maybe it starts with a simple PR comment saying:

“This change adds a lot of context weight. Are you sure?”

That is what ContextLevy is trying to do.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.