Every major tech company is pushing AI-generated code. Microsoft says 30% (targeting 80%). Google says 30%. Uber reports 65-72%. Amazon mandated 80% AI tool usage. Shopify made AI "mandatory."
But Google's own DORA research shows a paradox: AI increases throughput while decreasing delivery stability. Teams ship faster, but the code breaks more often.
I wanted to understand why. So I built Evolution Engine, an open source CLI that detects development process drift — when patterns in commit history, CI builds, deployments, and dependency signals shift in ways that often precede production issues. Then I ran it on 10 major open source repos across cloud infrastructure, frontend frameworks, AI tooling, and developer platforms.
No AI APIs are called during analysis. All pattern detection is deterministic and statistical. The tool runs entirely locally — your code never leaves your machine.
Here's what I found.
The scale
Across 10 repos, the tool analyzed over 130,000 commits, generating 250,000+ events across git, CI, deployment, and dependency signal families. It matched patterns from a knowledge base calibrated across 200+ open source repositories.
| Metric | Result |
|---|---|
| Repos analyzed | 10 |
| Total commits | 130,000+ |
| Total events | 250,000+ |
| Signal families | 4 (git, CI, deployment, dependency) |
| Repos with significant drift | 10 out of 10 |
| Average drift signals per repo | 6.6 |
| Average correlation patterns per repo | 24.3 |
Every single repo had significant drift signals. Every one.
Finding 1: CI build times spike dramatically — and nobody notices
The most consistent pattern across all 10 repos: CI build duration spikes that dwarf historical baselines.
| Repo type | Normal CI time | Spike | Deviation |
|---|---|---|---|
| Cloud SDK (monorepo) | ~45 seconds | 6+ hours | 1,552x |
| AI framework | ~95 seconds | 55 minutes | 889x |
| Cloud infrastructure toolkit | ~26 seconds | 70+ minutes | 111x |
| Edge platform SDK | ~33 seconds | 60+ minutes | 74x |
| Commerce framework | ~64 seconds | 8+ minutes | 43x |
| Code editor | ~41 seconds | 23+ minutes | 34x |
| Frontend framework | ~45 seconds | 6 minutes | 13x |
| Fullstack framework | ~6 minutes | 32 minutes | 5x |
These aren't gradual slowdowns — they're sudden spikes, often tied to a single commit or dependency change. The problem? Most teams don't track CI duration as a process signal. They notice when builds fail, but a 34x slowdown that still passes? That drifts silently.
8 out of 10 repos had CI spikes exceeding 10x their baseline. The median spike was 53x.
Finding 2: Release cadence gaps correlate with code spread
When a repo's release cadence suddenly lengthens, it's almost always accompanied by increased code dispersion — changes spread across unrelated parts of the codebase.
| Repo type | Normal cadence | Gap | Slowdown |
|---|---|---|---|
| Cloud SDK (monorepo) | ~2.9 hours | 22 days | 182x |
| Commerce framework | ~1.5 days | 37 days | 24x |
| Cloud infrastructure toolkit | ~21 hours | 16.5 days | 18x |
| Fullstack framework | ~6 days | 96 days | 16x |
| Logging library | ~13 days | 200 days | 15x |
| Frontend framework | ~28 days | 113 days | 4x |
This correlation showed up as a known pattern in 8 out of 10 repos. When engineers touch more unrelated files per commit and releases slow down, something structural has shifted — often a large refactoring, a dependency migration, or (increasingly) an AI-assisted batch change that touches more files than a human would.
Finding 3: Co-change novelty drops to zero
"Co-change novelty" measures how often files that change together in a commit have changed together before. A score of 1.0 means entirely novel pairings. A score of 0.0 means the exact same files are changing together repeatedly.
In 9 out of 10 repos, we found commits where co-change novelty dropped to zero — indicating repetitive, pattern-locked changes rather than organic development. This is a hallmark of:
- Automated dependency bumps (bots touching the same lockfiles repeatedly)
- Code generation tools producing similar diffs
- AI-assisted changes that follow templates rather than addressing unique problems
The interesting question: is this a problem? Sometimes repetitive changes are exactly right (automated security patches). But when novelty drops to zero and CI times spike and release cadence gaps appear, the correlation suggests something has gone wrong.
Finding 4: Merge-back commits create statistical blind spots
Three repos had single commits touching 10,000-21,000+ files. These are merge-back commits in monorepos — technically expected, but they create extreme statistical outliers that mask real drift signals underneath.
If your drift detection (or any metrics tool) doesn't account for these outliers, the signal-to-noise ratio collapses. A legitimate 34x CI spike looks insignificant next to a 14,000x files_touched outlier.
Finding 5: Cross-family correlations reveal systemic patterns
The most interesting findings weren't individual metrics — they were correlations between signal families:
- CI duration <-> files touched: When commit size increases, build times increase non-linearly. This correlation appeared in all 10 repos.
- Deployment cadence <-> code dispersion: When releases slow down, changes spread wider. Found in 8/10 repos.
- Dependency changes <-> change locality: When dependencies change, subsequent code changes tend to be less focused. Found in 7/10 repos.
These cross-family patterns are invisible if you only monitor one signal family (just CI, or just git). You need the full picture.
What this means for AI-assisted development
Google's DORA research found that AI increases throughput but decreases stability. Our findings suggest why:
- AI generates larger commits — more files touched per change, increasing CI load
- AI follows templates — co-change novelty drops, creating repetitive patterns
- AI doesn't respect cadence — large batch changes break release rhythm
- The drift is gradual — no single commit looks wrong, but the aggregate pattern shifts
The fix isn't to stop using AI tools. It's to monitor the process signals they affect. The same way you'd monitor application performance after a deployment, you should monitor development process patterns after adopting AI coding tools.
What's next
This is the first in a series. In upcoming posts, I'll publish detailed case studies of individual repos (with permission from maintainers where applicable) and dive deeper into specific patterns — like how dependency drift predicts deployment instability, and what "healthy" drift patterns look like versus problematic ones.
Try it yourself
Evolution Engine is open source. Install it and run on any repo:
pip install evolution-engine
evo analyze /path/to/your/repo
The tool generates an interactive HTML report with all findings, plus an investigation prompt you can paste into any AI assistant for root cause analysis — so your AI tools can help diagnose the drift patterns they create.
All analysis is local and statistical. No code leaves your machine. No AI APIs are called.
GitHub: github.com/alpsla/evolution-engine
Website: codequal.dev
I built this. Evolution Engine is open source — dual-licensed: CLI and adapters are MIT, core engine is BSL 1.1 (converts to MIT in 2029). Happy to answer questions in the comments.
Top comments (0)