A developer recently ran a permutation test on 29 years of rsync releases. The question was simple: did bug density increase after Claude-assisted development started?
The answer is yes. And it's statistically significant.
Here's what happened, what the data actually shows, and why the real lesson isn't "AI bad" — it's that our workflows haven't caught up.
The Setup
rsync is a 29-year-old C codebase. Battle-tested. Used everywhere. Maintained by a small team, sometimes one person at a time.
When Claude was introduced as a coding assistant, the team got a productivity boost. More changes landed faster. But the bug profile changed too.
The analysis used a severity-weighted bugs-per-10-commits metric. Every release was plotted. Then the post-Claude releases were checked against the historical distribution.
Where they landed matters.
The Methodology
The author spent days building this. Not a quick "I asked ChatGPT" take. A proper statistical pipeline:
- DuckDB for collating data from every release
- Exact permutation test (not a parametric approximation)
- Severity-weighted bug scoring
- Reproducible end to end — the full pipeline is on GitHub
The methodology was reviewed by a statistician before any code was written. Every number in the final report is auto-templated from the Python analysis script. Zero hallucination risk on the numbers themselves.
What the Distribution Shows
The post-Claude releases fall outside the historical distribution. Not by a small margin. The permutation test flags them clearly.
This doesn't mean Claude caused bugs in the "Claude wrote bad code" sense. The data can't tell us why. What it shows is a shift in the release profile.
My take: when you accelerate development, you change the review dynamics. The maintainer reviews differently when they know AI wrote some of the code. They trust the AI's output differently. The cadence changes. And when the cadence changes, the defect profile changes too.
What This Actually Means
Three takeaways for engineering teams using AI coding assistants:
Review practices must adapt — Your existing code review process was designed for human-written code. It assumes certain patterns of mistakes. AI makes different mistakes. Your review checklist needs updating.
Metrics matter — This analysis is possible because rsync has 29 years of release data. Most teams don't track bugs per release rigorously enough to detect shifts like this. If you're introducing AI tools, instrument your process first.
The frame is wrong — The conversation around AI code quality is stuck on "does AI write good code." That's the wrong question. The right question is: "does our process handle AI-assisted code correctly?" The code itself might be fine. The process around it might not be.
The Hard Part
The hardest thing about this analysis is that it's reproducible. Anyone can verify it. And so far, no one has poked holes in the methodology.
That's uncomfortable if you've been telling yourself that AI coding assistants are a pure productivity win. They are a productivity win. But they also change the risk profile.
The teams that will succeed with AI coding tools aren't the ones that adopt them fastest. They're the ones that adapt their processes to match the new reality.
What changes has your team made to code review since adopting AI assistants? I'm genuinely curious what's working — and what isn't.

Top comments (0)