Solid article — the GitClear and METR data really drive the point home. The perception gap (feeling 20% faster while being 19% slower) is probably the most dangerous finding here, because it means teams won't self-correct without measurement.
I agree with most of this, with one caveat: I think the framing slightly over-indexes on AI as the cause. The refactoring decline and duplication trends were already underway — AI just poured fuel on existing habits. Teams that were already rigorous about architecture tend to stay rigorous with AI tools. The ones that weren't are now generating debt faster.
The real challenge, in my experience, is that the mitigations (careful prompting, pushing back on AI's proposed approach, thorough code reviews, periodic refactoring) all require discipline that scales poorly with team size. On a small team where everyone knows the codebase, you can catch the subtle architectural drift. On a larger team? No-one wants to spend their entire day reviewing AI-generated PRs, and code reviews themselves become performative when you're staring at 100 changed files.
One practical tip I'd add to your detection patterns: aim for smaller, more frequent PRs. This is probably the single highest-leverage workflow change for teams using AI tools heavily. A 15-file PR gets a genuine review. A 100-file PR gets a rubber stamp. If AI is helping you write code faster, use that speed to ship smaller increments — not bigger ones. It makes every other mitigation (review quality, refactoring ratio, clone detection) actually feasible.
The "AI generates, humans consolidate" framing is exactly right. The problem is when teams treat the generation step as the finish line.
The perception gap is exactly what makes this insidious — teams optimizing for velocity metrics look great on paper while the codebase quietly degrades. Measurement has to include churn rate and code half-life, not just output speed.
The perception gap is what makes this systemic — teams genuinely believe they are shipping faster while the codebase degrades underneath. Without explicit measurement (churn rate, refactoring ratio, duplication index), there is no feedback loop to self-correct. Curious what caveat you had in mind — always interested in where the nuance lies.
You nailed it — AI amplified existing habits, not created new ones. And the smaller PRs tip is the most actionable takeaway I wish I'd emphasized more. A 15-file PR gets genuine review; a 100-file PR gets a rubber stamp. That alone changes everything downstream.
You're absolutely right that the refactoring decline predates AI — it accelerated trends that were already there. Your point about smaller, more frequent PRs is the practical lever I should have emphasized more. A 15-file PR gets genuine scrutiny; a 100-file one gets a rubber stamp regardless of whether AI wrote it. That's probably the single highest-leverage workflow change teams can make right now. The perception gap compounds exactly because the feedback loop is broken — teams feel faster, so they never measure, and the drift stays invisible until it's structural. Appreciate you adding that nuance.
For further actions, you may consider blocking this person and/or reporting abuse
We're a place where coders share, stay up-to-date and grow their careers.
Solid article — the GitClear and METR data really drive the point home. The perception gap (feeling 20% faster while being 19% slower) is probably the most dangerous finding here, because it means teams won't self-correct without measurement.
I agree with most of this, with one caveat: I think the framing slightly over-indexes on AI as the cause. The refactoring decline and duplication trends were already underway — AI just poured fuel on existing habits. Teams that were already rigorous about architecture tend to stay rigorous with AI tools. The ones that weren't are now generating debt faster.
The real challenge, in my experience, is that the mitigations (careful prompting, pushing back on AI's proposed approach, thorough code reviews, periodic refactoring) all require discipline that scales poorly with team size. On a small team where everyone knows the codebase, you can catch the subtle architectural drift. On a larger team? No-one wants to spend their entire day reviewing AI-generated PRs, and code reviews themselves become performative when you're staring at 100 changed files.
One practical tip I'd add to your detection patterns: aim for smaller, more frequent PRs. This is probably the single highest-leverage workflow change for teams using AI tools heavily. A 15-file PR gets a genuine review. A 100-file PR gets a rubber stamp. If AI is helping you write code faster, use that speed to ship smaller increments — not bigger ones. It makes every other mitigation (review quality, refactoring ratio, clone detection) actually feasible.
The "AI generates, humans consolidate" framing is exactly right. The problem is when teams treat the generation step as the finish line.
The perception gap is exactly what makes this insidious — teams optimizing for velocity metrics look great on paper while the codebase quietly degrades. Measurement has to include churn rate and code half-life, not just output speed.
The perception gap is what makes this systemic — teams genuinely believe they are shipping faster while the codebase degrades underneath. Without explicit measurement (churn rate, refactoring ratio, duplication index), there is no feedback loop to self-correct. Curious what caveat you had in mind — always interested in where the nuance lies.
You nailed it — AI amplified existing habits, not created new ones. And the smaller PRs tip is the most actionable takeaway I wish I'd emphasized more. A 15-file PR gets genuine review; a 100-file PR gets a rubber stamp. That alone changes everything downstream.
You're absolutely right that the refactoring decline predates AI — it accelerated trends that were already there. Your point about smaller, more frequent PRs is the practical lever I should have emphasized more. A 15-file PR gets genuine scrutiny; a 100-file one gets a rubber stamp regardless of whether AI wrote it. That's probably the single highest-leverage workflow change teams can make right now. The perception gap compounds exactly because the feedback loop is broken — teams feel faster, so they never measure, and the drift stays invisible until it's structural. Appreciate you adding that nuance.