Everyone has a version of this story. Mine: I git add -A'd in a hurry, a
stray build artifact came along for the ride, and a few commits later the repo
had a 200 MB binary baked into its history. Cloning got slow for the whole team.
Fixing it meant git filter-repo, a force-push, and a Slack message that began
with "sorry, everyone needs to re-clone."
The cheaper versions are just as annoying: an accidental node_modules/, a
dist/ folder, a database dump, a .DS_Store. And the genuinely scary one — a
.env or a *.pem — where the moment it's pushed, the secret is burned and
rotating it is the good outcome.
The thing all of these share: they're trivial to prevent and expensive to undo.
Once it's in history, it's a rewrite. The leverage is entirely at the moment
before the commit.
"Just use .gitignore"
Sure — for the files you remembered to list. But .gitignore is a fixed list
you maintain by hand, and it's quietly easy to defeat:
- a
git add -fwalks straight past it, - a file that was committed before you added the ignore rule stays tracked,
- a fresh clone of a template repo arrives with half the rules missing.
.gitignore is a default. What I wanted was a guard — something that looks
at what's actually staged, right before it becomes a commit, and stops me.
So I built bloatguard: one CLI, zero dependencies, Node and Python.
What it does
It flags two things in whatever you're about to commit:
-
Big files — anything over
--max-size(default 5 MB), whatever it is. This is the 200 MB binary catch. -
Junk patterns — a curated set that almost never belongs in git:
node_modules/,dist/,build/,target/,*.log,*.zip,*.sqlite,.env(but not.env.example),*.pem,*.key,.DS_Store, and friends.
$ git commit -m "wip"
bloatguard 3 item(s) should not be committed (14 staged file(s) scanned)
✗ node_modules/ (1,240 files, 88.4 MB) — dependency directory — reinstall instead of committing
✗ assets/demo.mp4 (84.0 MB) — larger than the 5.0 MB limit
✗ .env (412 B) — .env file — may contain secrets
Fix: add the pattern to .gitignore then git rm --cached <file>, or keep it on purpose with --allow <glob> / --max-size <size>
That whole node_modules/ — 1,240 files — collapses into a single line with a
count and total size, instead of scrolling your terminal off the screen.
Wire it into the commit
npx bloatguard install # writes .git/hooks/pre-commit
# or: pip install bloatguard && bloatguard install
Now a commit that stages anything flagged exits non-zero and is blocked
before it happens. Fix it (or --allow it on purpose) and commit again. It
refuses to clobber a pre-commit hook you already have, and bloatguard uninstall
removes it cleanly.
You can also just run it ad hoc — bloatguard scans the staged set, bloatguard sweeps the whole working tree (honoring
scan.gitignore), --json for tooling.
A few choices I'd defend
-
Read-only, always. bloatguard never stages, modifies, or deletes anything.
It reports and sets an exit code; the fix is yours. A tool that auto-
git rms your files is a tool you stop trusting. -
No false-positive theater.
.env.example/.env.sampleare explicitly spared;--allow "assets/*.zip"whitelists the archive you do commit on purpose. A guard that cries wolf getsgit commit --no-verify'd into oblivion, and then it's protecting nothing. - Zero dependencies, two languages. Pure stdlib on both sides, and the Node and Python ports sort their input so they emit byte-identical output — a mixed-language team gets one answer.
MIT, both repos public:
bloatguard (Node) ·
bloatguard-py (Python).
I want to grow the junk list sensibly. What's the worst thing you've watched
go into a repo — and what pattern would've caught it? Terraform state? Jupyter
checkpoints? A 2 GB ML model? Tell me and I'll add a rule.
Top comments (0)