A Carnegie Mellon study tracked 807 GitHub projects that adopted Cursor (an AI-native code editor) and compared them against 1,380 control repos over 20 months. The result is the most detailed picture we have of what AI coding tools actually do to a codebase over time.
The headline: developers wrote 281% more lines of code in the first month after adopting Cursor. By month two, the boost dropped to 48%. By month three, it was effectively zero. But code complexity increased 41% and static analysis warnings rose 30%, and those numbers never came back down.
This is the first large-scale longitudinal study of AI coding tool adoption, and the pattern it reveals matters for every team evaluating these tools.
The velocity spike is real but temporary
The CMU researchers (He, Miller, Agarwal, Kastner, Vasilescu) measured lines added per month as a proxy for velocity. The data is unambiguous for the first month: AI-assisted developers produce significantly more code.
| Time after adoption | Velocity change vs. control | Complexity change |
|---|---|---|
| Month 1 | +281% | +41% |
| Month 2 | +48% | +41% |
| Month 3 | +12% | +41% |
| Month 6+ | ~0% | +41% |
The velocity gains dissipate completely. The complexity stays.
Why does the speed advantage vanish? The paper attributes it to a feedback loop: the added complexity from AI-generated code makes subsequent changes harder, which slows development, which eliminates the velocity advantage. The technical debt from the fast phase becomes the drag on the slower phase.
This matches other studies
The CMU findings are consistent with three other independent datasets.
GitClear analyzed 211 million changed lines across repos from Google, Microsoft, and Meta (2020-2024). They found copy-pasted code surpassed refactored code for the first time ever in 2024. Code churn (new code rewritten within two weeks) nearly doubled from 3.1% to 5.7%.
The METR randomized trial tested 16 experienced open-source developers (repos averaging 22,000+ stars) on 246 tasks. With Cursor Pro and Claude 3.5 Sonnet, they were 19% slower. They expected to be 24% faster and still believed AI helped after seeing the data.
A Qodo survey of 609 developers found 65% say AI misses relevant context during refactoring and code review. The most alarming finding: junior developers (under 2 years experience) reported the lowest quality improvements from AI but the highest confidence in shipping unreviewed AI code.
| Study | Sample | Key finding |
|---|---|---|
| CMU/Cursor (2025) | 807 repos | +41% complexity, velocity gone by month 3 |
| GitClear (2025) | 211M lines | Copy-paste surpassed refactoring for first time |
| METR (2025) | 16 devs, 246 tasks | 19% slower with AI (perceived 24% faster) |
| Qodo (2025) | 609 devs | 65% say AI misses context, juniors most overconfident |
Why complexity increases but velocity doesn't stay
The core mechanism is context loss. AI coding tools generate code that works in isolation but doesn't account for how it interacts with the rest of the project. The generated code compiles and passes the test it was written for, but it introduces coupling and structural assumptions that make the next change harder.
This is amplified in codebases with tightly coupled systems. Game development is a clear example: physics, rendering, input, audio, and state management all interact, which means AI-generated code that works in isolation can break five other systems. But the same dynamic applies to any sufficiently complex backend: microservice interactions, database migrations, auth flows, and caching layers all share this property.
The 2025 Stack Overflow survey provides the adoption context: 84% of 49,000+ developers use or plan to use AI tools, but 46% distrust their accuracy (up from 31% in 2024). Adoption is near-universal. Trust is declining. The CMU data explains why.
What the data suggests about using AI tools effectively
The CMU study doesn't conclude that AI coding tools are useless. It suggests something more specific: the value is front-loaded and the cost is back-loaded. This has direct implications for how teams should integrate these tools.
AI works best for discrete, bounded tasks. Generating a function, writing a test, scaffolding boilerplate. These are one-shot tasks where the context window is small enough for the model to get right. The CMU velocity spike in month one is likely driven by early adoption picking the low-hanging fruit: boilerplate and simple implementations.
AI fails when it makes architectural decisions. File organization, system boundaries, module coupling. These are exactly the decisions that drive the 41% complexity increase. The model doesn't understand the project's architecture, so its code additions fight the existing structure.
Context-aware tools perform differently. Tools that integrate with the project's build system, type checker, or runtime environment can avoid some of the context loss that drives complexity. In game development, tools like Ziva that read the engine's scene tree and node structure produce code that follows engine idioms rather than guessing. The same principle applies to any domain: an AI tool that understands your Rails models, your Terraform state, or your Kubernetes config will generate better code than one that only sees the file you have open.
Review AI code more aggressively, not less. The Qodo data on junior developer overconfidence is the most actionable finding. Senior developers get the most quality benefit from AI (68.2%) precisely because they review harder. Juniors get less benefit (51.9%) but skip review more often. If your team uses AI tools, the code review process matters more, not less.
The bottom line
AI coding tools are productivity tools with a hidden interest rate. The velocity you borrow in month one comes due as complexity in month three. The CMU data proves this quantitatively across 807 repos.
The GDC 2026 survey found only 7% of game developers view AI positively, down from 13% the year before. The Stack Overflow 2025 survey shows 77% of developers say vibe coding is not part of their professional work. The industry is past the hype phase. What remains is a tool that works within a specific scope and fails predictably outside it.
Use AI for the tasks it's good at. Review what it produces. Don't let it decide how your systems connect. The data is clear enough now to make these decisions with confidence rather than hope.
Top comments (0)