According to a CodeRabbit analysis of 1,000+ repositories, AI co-authored code introduces 1.7x more major issues than human-written code. The vulnerability rate is 2.74x higher. GitHub's 2025 Octoverse data shows Copilot now generates 46% of code in files where it's enabled. And a METR study found that experienced developers using AI assistants were actually 19% slower on real tasks — despite believing they were 24% faster.
The productivity feels real. The debt is real too. We're starting to see the bill.
The three-month cliff
Every team I've talked to that adopted AI coding tools heavily describes the same pattern: massive output gains in months one through three, followed by an escalating maintenance burden that erases those gains by month six.
The pattern has a name now. Developers are calling it the "Spaghetti Point" — the moment where the codebase generated by AI assistants becomes harder to modify than code written from scratch would have been.
According to GitClear's 2025 developer productivity report, code churn (lines modified or deleted within 14 days of being written) increased 39% in repositories with heavy AI assistance. That's not refactoring — that's rework. Code written fast, reviewed inadequately, and fixed repeatedly.
The economics are brutal. A 2025 analysis by Uplevel estimated that AI-generated code carries maintenance costs 4x higher than human-written code by year two. The initial velocity gain — real, measurable, impressive — gets consumed by debugging sessions where no one can explain why the code works the way it does, because the "why" never existed. This is the same epistemological problem that's eroding trust in open source: AI-generated code has no intent. You can't reconstruct reasoning that never happened.
Why the bugs are different
AI-generated bugs are structurally different from human bugs, and that difference makes them more expensive to find and fix.
Human bugs have intent trails. A developer who writes a race condition usually has a mental model that's almost right — they thought about concurrency but missed one case. You can read the code, reconstruct the thinking, find the gap. The fix follows from understanding the original intent.
AI bugs have no intent. The code was generated from a probability distribution, not a mental model. When a Copilot-generated function has a subtle type coercion error, there's no reasoning to reconstruct. You can't ask "what were they thinking?" because nothing was thinking. You have to understand the code from scratch, as if reading a stranger's work with no comments and no commit history that explains decisions.
According to Snyk's 2025 AI security report, 35 new CVEs were attributed to AI-generated code in March 2026 alone. Repositories using Copilot leak 40% more secrets (API keys, credentials, tokens) than non-Copilot repositories. The AI doesn't understand what's secret — it patterns matches from training data that included leaked credentials.
| Bug type | Human-written code | AI-written code |
|---|---|---|
| Root cause analysis | Follow the intent trail | Start from zero — no intent exists |
| Time to diagnose | 1-2 hours typical | 3-5 hours (no reasoning to reconstruct) |
| Recurrence after fix | Low (developer updates mental model) | High (same prompt generates same pattern) |
| Security issues per KLOC | Baseline | 2.74x higher (CodeRabbit data) |
| Code churn within 14 days | Baseline | +39% (GitClear data) |
The organizational blind spot
The real damage isn't technical — it's organizational. Teams measuring developer productivity by lines of code or PRs merged are seeing their best numbers ever. The dashboards look great. Velocity is up. Sprint commitments are being met.
What the dashboards don't show: time spent in code review has increased 45% (because reviewers now treat every PR as potentially AI-generated and requiring deeper verification). Bug reports from production are up 30% despite passing all automated tests. And senior engineers are spending more time reading and understanding code than writing it — the exact inverse of what AI tools were supposed to enable.
This is the same productivity illusion we measured in team velocity: AI makes individual tasks faster while making the overall system slower. The local optimization creates a global pessimization.
What I got wrong
I initially thought the problem was adoption immaturity — that teams would learn to use AI tools effectively and the quality issues would resolve. After watching a dozen teams go through the cycle over the past year, I think the problem is structural.
AI code generation optimizes for plausibility, not correctness. The output looks right, passes superficial review, and often works for the happy path. The failures are in edge cases, error handling, security boundaries, and long-term maintainability — exactly the things that junior developers also get wrong, because those are the things that require understanding, not pattern matching.
The teams that are succeeding with AI code generation share three practices:
1. AI writes, humans architect. The AI generates implementation within a structure that a human designed. The human defines the interfaces, the error handling strategy, the security boundaries. The AI fills in the bodies. This preserves intent at the architectural level while leveraging AI speed at the implementation level.
2. Review budgets increased, not decreased. Teams that cut code review time because "the AI wrote it" are the ones hitting the Spaghetti Point fastest. The teams that survive allocate more review time — not less — because the verification burden is higher for machine-generated code.
3. Aggressive deletion of AI-generated code that can't be explained. If a developer can't explain why a function works the way it does — regardless of whether it passes tests — it gets rewritten by hand. This is expensive in the short term and cheap in the long term.
The historical pattern
This cycle is familiar. Every productivity tool that dramatically increases output velocity eventually forces a reckoning with quality.
3D printing was going to democratize manufacturing. It did — and it also created a mountain of low-quality plastic objects that nobody needed. The lasting value came from professionals using 3D printing within disciplined design processes, not from everyone printing everything.
No-code tools were going to replace developers. They did increase output — and they also created a generation of applications that couldn't scale, couldn't be debugged, and couldn't be maintained when the original builder left. The lasting value came from no-code as a prototyping tool, not a production platform.
Vibe coding is following the same arc. The output explosion is real. The quality reckoning is coming. The lasting value will come from AI as an implementation accelerator within disciplined engineering practices — not from AI as a replacement for engineering judgment.
The question worth asking
If your team adopted AI coding tools in the last twelve months, run this check: compare the bug rate and code churn rate in your most AI-assisted repositories against your least AI-assisted ones. Normalize for team size and feature complexity.
If the AI-heavy repos show higher churn and more production bugs — even if they also show higher velocity — you're accumulating the debt. The hangover is coming. The question is whether you pay it down deliberately (with review discipline, architectural boundaries, and aggressive deletion) or discover it when the codebase becomes unmaintainable.
The trust tax isn't just an open-source problem. It's inside your organization too.
::: {.schema-faq style="display:none;"}
[{"q":"Does AI-generated code have more bugs than human code?","a":"Yes. According to CodeRabbit's analysis of 1,000+ repositories, AI co-authored code has 1.7x more major issues and a 2.74x higher vulnerability rate. GitClear found code churn (rework within 14 days) increased 39% in repositories with heavy AI assistance."},{"q":"What is vibe coding and what are the risks?","a":"Vibe coding is using AI tools like Copilot or ChatGPT to generate code by describing what you want in natural language. The risk is maintenance debt: code generated without human intent is harder to debug, carries 2.74x more vulnerabilities, and costs an estimated 4x more to maintain by year two."},{"q":"Are developers actually faster with AI coding tools?","a":"Not necessarily. A METR study found experienced developers were 19% slower on real tasks with AI assistants, despite believing they were 24% faster. Local task speed increases, but time spent in review, debugging, and understanding AI-generated code offsets the gains at the team level."}]
:::
Originally published at talvinder.com.
Top comments (0)