The use of AI Agents creates a distinctive smell... One can tell the GH Repo owner was high on Claude just by looking at verbose and hard to follo...
Some comments have been hidden by the post's author - find out more
For further actions, you may consider blocking this person and/or reporting abuse
The reframe that lands: the skill isn't a make it smaller prompt, it's an anti-cheating harness. SLOC only works because it forces real simplification where improve quality stays a taste argument. Sharp.
The "code is cheap" take always comes from people who've never inherited the mess. Generating code is cheap. Living with it isn't. The next time you open that codebase, whether it's you or an agent, you're paying for every lazy abstraction, every comment that explains nothing, every subsystem that exists because nobody deleted it, and the worst part is, it never looks obviously bad. Each piece is defensible on its own. The bloat just... accumulates
Using line count as a target is probably dumb in theory, and apparently works in practice. Because "make it smaller while keeping tests green" forces actual simplification in a way that "please improve code quality" never does, also, the Opus checking in constantly saving the run rather than ruining it is such a good point. Full autonomy would have just confidently gamed the metric the whole time
Bloat is the visible symptom. The deeper problem is that AI-generated code lacks the constraint signals human-written code accumulates from architectural reviews. Without pre-flight invariants on what shape new code is allowed to take, every generation adds a little more surface area than it should.
Really enjoyed this. The 31.7% SLOC reduction with tests still green is a great reminder that AI code quality problems are usually accumulation problems, not one big failure. I especially liked your point that deep boundaries and simpler interfaces matter more than just chasing a line-count metric — that matches what I keep seeing when shipping agent-assisted products.
nice one 👍️
This is one of the clearest writeups I’ve seen on the difference between working software and owned software.
I ran into a similar pattern while building a real front end/back end system with AI assistance. The obvious failures were not the hardest part. The harder part was that the repo could keep functioning while quietly accumulating things that no longer had a rightful place: old assumptions, duplicated state, partial fixes, placeholder systems that stayed wired in, and code that looked intentional only because it had been explained into looking intentional.
That is actually the path that led me to build Scarab as a diagnostic system. Not as another agent, and not as a “fixer,” but as a way to inspect whether the codebase still matches its own declared structure and ownership.
The SLOC reduction here is interesting, but the deeper signal is what the reduction exposed: bloat is often not just size. It is residue from decisions the system never properly retired.
This mirrors what we see across our NestJS microservices. Each AI-generated PR looks reasonable in isolation, but nobody catches the slow accumulation of redundant state layers or placeholder abstractions until cognitive load becomes unbearable. Your anti-cheating framing is the key insight — without it, the agent optimises the scoreboard, not the system. We've started treating AI like an army of capable juniors: great at local execution, but you still need seniors who can step back and say "delete this entire subsystem, it's ceremonial." The SLOC-as-forcing-function idea is sharp. Going to try this on one of our repos.
First of all, there’s no point in comparing different models—it doesn’t make much sense because models are constantly evolving. Besides, no two models are exactly alike. I agree that a model needs to receive clear instructions (guidelines) from the programmer.
AI-grown codebases often accumulate “plausible structure” faster than real design. The cleanup step needs ownership: delete unused abstractions, collapse duplicate patterns, and make the tests describe intended behavior before asking the agent for more code.
"The codebase had that familiar AI smell: a lot of local competence, a lot of plausible safety" — this is the single best description of AI-generated code I've read.
I've noticed the same pattern in my own projects. The code looks right at every individual call site, but the system-level cost (coupling, dead paths, abstraction layers for problems that don't exist) accumulates invisibly. Human-written code has the opposite problem: individual functions are often messier, but the system-level architecture tends to be cleaner because the human was holding the full picture.
Interesting question: if we know AI code tends to bloat, should we be running SLOC budgets as a CI check? Enforce a "you can add code if you remove code" ratio?
感谢分享,这个角度很受用。AI 写代码的「重量」问题确实不是 token 数能衡量的,更多是「三个月后回来还能不能看懂」。