DEV Community

Cover image for Context bloat is the new bundle bloat
Looped
Looped

Posted on

Context bloat is the new bundle bloat

TL;DR: AI coding agents are now part of the dev workflow, but most repos still treat context like it is free. It is not. dist/, coverage output, source maps, generated clients, logs, snapshots, and lockfile churn all make agents waste attention. I built ContextLevy as a small PR guardrail for that.


The thing nobody wants to admit

Your AI coding agent reads your repo mess.

Not always perfectly. Not always literally every file. But enough that messy repositories become worse environments for tools like:

  • Cursor
  • Codex
  • Claude Code
  • Copilot
  • local coding agents
  • random CLI agents your team experiments with next week

And the mess usually does not look dramatic.

It looks like this:

+ dist/index.js
+ coverage/lcov.info
+ generated/client.ts
+ bundle.js.map
+ snapshots/
+ debug.log
+ package-lock.json with 9,000 changed lines
Enter fullscreen mode Exit fullscreen mode

None of that necessarily breaks your app.

But it does make your repo heavier to reason about.

For humans, we learned to ignore junk.

For agents, junk becomes context.


Bundle bloat already taught us this lesson

Frontend devs understand bundle bloat because we learned to measure it.

A pull request adding 30 KB of JavaScript is easy to miss.

A bot comment saying this is harder to ignore:

Bundle size increased by 30 KB
Main chunk increased by 18 KB
Vendor chunk increased by 12 KB
Enter fullscreen mode Exit fullscreen mode

That comment does not replace engineering judgment.

It creates friction at the right moment.

Before the cost lands in main.

That is the same mental model I think AI-heavy repos need now.

Bundle bloat was hidden cost for users.
Context bloat is hidden cost for agents.


What is context bloat?

Context bloat is when a repo accumulates files that are technically valid, but low-value or noisy for AI coding workflows.

Common examples

File/change Why it is noisy
dist/ Usually generated output, not source of truth
coverage/ Huge text output that agents should rarely inspect
*.map Source maps can be massive and low-signal
generated clients Sometimes needed, often overwhelming
lockfile churn Can dominate PR diffs with little semantic value
snapshots Useful for tests, noisy for reasoning
logs Almost never belong in repo context
agent instruction files Small changes can affect agent behavior a lot

The issue is not that these files are always bad.

The issue is that they should not become invisible cost.


“Just use .gitignore” is not enough

You should use .gitignore.

Seriously.

But .gitignore only helps with files before they are tracked.

It does not help much when:

  • generated files are intentionally committed
  • old junk is already tracked
  • snapshots grow over time
  • lockfiles churn hard
  • someone changes agent instructions in a risky way
  • different tools use different ignore/indexing behavior

Also, .gitignore is local hygiene.

A PR comment is team hygiene.

It shows up where the merge decision happens.


“Just tell the AI to ignore it” also does not scale

This sounds good until you remember how real teams work.

One dev uses Cursor.

Another uses Claude Code.

Someone else uses Codex.

Someone runs a local model through a CLI.

Next month, the team tries another agent entirely.

Every tool has different rules for indexing, retrieval, file search, ignore behavior, and context selection.

The repo is the shared layer.

Cleaner repo context helps every tool downstream.


So I built ContextLevy

ContextLevy is a small open-source tool that acts like:

bundle-size checks, but for AI coding context

It runs on pull requests and flags diffs that add a lot of context weight.

It catches things like

  • committed build output
  • coverage reports
  • source maps
  • generated clients
  • large lockfile churn
  • snapshots
  • logs
  • agent instruction changes

It can run as

  • a GitHub Action
  • a GitHub App
  • a local CLI

It does not

  • call an LLM
  • upload your code
  • judge code quality
  • replace code review
  • pretend to be an AI platform

It just analyzes the diff and leaves a focused PR comment.

That is the whole point.

Small guardrail. Clear feedback.


What a ContextLevy comment is supposed to do

The goal is not to shame people for committing generated files.

Sometimes generated files belong in the repo.

The goal is to make the cost visible:

ContextLevy · Warning · ~84k added context tokens

Largest contributors:
+ coverage/lcov.info
+ dist/index.js
+ generated/client.ts

Suggestion:
Consider ignoring coverage output and build artifacts unless they are intentionally tracked.
Enter fullscreen mode Exit fullscreen mode

That is it.

Just a useful nudge before main gets heavier.


Why this is not just another AI wrapper

Most AI devtools try to add more intelligence.

ContextLevy does the opposite.

It assumes the boring part matters:

  • what files exist
  • what changed in the PR
  • how much text was added
  • whether that text is likely useful
  • whether the repo is getting noisier over time

A lot of AI tooling discourse focuses on better models.

But model quality is only half the story.

The other half is context quality.

Garbage context still hurts, even with better models.

Bigger context windows do not fix this.

They just make it easier to stuff more junk into the prompt.


The fair criticism

The obvious criticism is:

“Couldn’t I make this with a script?”

Yes.

You can also write your own formatter, linter, bundle-size checker, release script, changelog generator, and dependency bot.

Most useful devtools are not valuable because the underlying idea is impossible.

They are valuable because they package the boring workflow into something teams actually run.

The value is in:

  • useful defaults
  • CI integration
  • PR comments
  • config
  • predictable output
  • low setup cost
  • making the issue visible consistently

That is what ContextLevy is trying to be.


Who this is for

ContextLevy makes sense if:

  • your team uses AI coding agents heavily
  • your repo has lots of generated or build output
  • your PRs often include noisy files
  • you care about keeping AI context clean
  • you want a lightweight CI guardrail

It probably does not make sense if:

  • your repo is tiny
  • you barely use coding agents
  • your team already has strict generated-file policies
  • you do not want another PR check
  • you expect semantic code review from it

That last point matters.

ContextLevy is not a reviewer.

It is a warning light.


Why I think this will matter more

AI coding agents are moving from autocomplete to actual development loops.

People now ask agents to:

  • explain unfamiliar codebases
  • implement cross-file features
  • review pull requests
  • debug CI failures
  • migrate frameworks
  • generate tests
  • refactor architecture

That means repo context is becoming part of the development environment.

We already optimize package size.

We already optimize test speed.

We already optimize CI time.

We already optimize dependency weight.

So why are we pretending AI context is free?


My actual question

I am not claiming ContextLevy is the final answer.

I am trying to figure out if this problem deserves more serious tooling.

Repo:

https://github.com/unloopedmido/contextlevy

I would genuinely like blunt feedback:

  1. Is “context bloat” a real problem you have felt?
  2. Would you install a PR check for this?
  3. Are the default noisy-file categories correct?
  4. What would make this feel like a serious devtool instead of AI-tool noise?
  5. Is the bundle-size analogy clear, or does it feel forced?

Final thought

Bundle bloat became obvious once teams started measuring it.

Context bloat is still mostly invisible.

But as AI agents become normal parts of development, invisible repo noise will matter more.

Maybe the fix is not complicated.

Maybe it starts with a simple PR comment saying:

“This change adds a lot of context weight. Are you sure?”

That is what ContextLevy is trying to do.

Top comments (0)