Artem Koltunov

Posted on Apr 25 • Originally published at Medium

AI Coding Tools in Practice: What a 25-40% Productivity Gain Really Looks Like

#ai #programming #webdev #productivity

Our JavaScript team tested AI-assisted development on production code. Here's what we measured, what surprised us, and why we think the real gain is 25-40% -- not the 10x you keep hearing about.

Over the past year, AI coding tools have been surrounded by bold claims: "Develop twice as fast." "10x developer productivity." "Code that practically writes itself."

We decided to test these claims on real work -- not demo projects, but production code. The kind of long-lived repositories that power SDKs and developer platforms, systems that must be maintained, reviewed, and understood years after the code is written.

What We Tested

Our JavaScript team works with AI models like GPT Codex, GPT-5.2, Opus 4.5, and Gemini 3.5 through IDE plugins -- specifically GitHub Copilot Chat in WebStorm and IntelliJ IDEA.

Recently, we also got access to Cursor, an IDE with deeply integrated AI that can operate across an entire project. Unlike traditional AI plugins where you manually select files and copy code into prompts, Cursor sees the whole codebase, creates files in the right locations, and applies changes directly.

The biggest immediate impact wasn't smarter code generation -- it was the disappearance of small mechanical tasks. Less time copying code, managing context, and stitching pieces together. That alone produced an early productivity improvement of roughly 20%.

To see where this advantage held up -- and where it didn't -- we ran three experiments on active codebases.

Three Experiments

Important note: The first two experiments used GitHub Copilot Chat inside WebStorm, our usual IDE. The third introduced Cursor, which gave us a chance to compare a traditional AI plugin approach with a full-project AI environment.

Experiment 1: Extending a Production SDK

We added new AI-related functionality to an existing JavaScript SDK: AI Summarize (generating summaries from ~1000 chat messages) and AI Gateway (recognizing text in images and generating descriptions). The task included API integration, SDK adaptation, tests, and usage examples.

For this task we used GitHub Copilot Chat inside WebStorm. The AI could generate useful code, but we still had to gather context manually -- selecting files, pasting snippets, and explaining how modules interact -- before integrating whatever came back.

Even with that overhead, AI assistance made a noticeable difference.

Result: ~18 hours with AI vs. 24+ hours without. A gain of 30-35%.

What sped things up wasn't deep architectural insight. It was the smaller tasks: generating scaffolding, following existing patterns, and wiring pieces together faster than a human would type them.

Experiment 2: Untangling Long-Lived Branches

Several parallel branches had been evolving separately since 2021. They contained overlapping logic, slightly different implementations, and subtle behavioral differences.

Normally, merging something like this is slow and mentally draining. It requires reading a lot of unfamiliar code and carefully comparing approaches.

Using Copilot Chat, we could feed sections of each branch to the model, ask it to highlight overlaps and divergences, and get explanations of unfamiliar code. That made it much easier to focus on the important part of the job -- deciding which implementation actually made sense.

Result: ~1.5 days with AI vs. ~1 week without. Acceleration was several times for tasks involving analysis and comparison of large codebases.

The biggest advantage here wasn't generating code at all. It was simply making large amounts of existing code easier to understand.

Experiment 3: Integrating an SDK Into a Product (with Cursor)

This experiment used Cursor. Two developers worked in parallel using different AI models (GPT-5.2 Codex and Opus 4.5). We created a complete Redux environment, connected Figma, generated layouts, and integrated business logic.

At first, the results looked impressive.

Result: ~20 hours with Cursor vs. ~40 hours without. Getting to working code 2x faster.

But this experiment also exposed a limitation that didn't show up in the earlier tasks.

The Hidden Problem With AI-Generated Code

The AI-generated code from Experiment 3 compiled, the interface behaved correctly, and the basic tests passed. If we had stopped there, we would have considered the integration complete.

But during code review, one of the developers noticed something odd.

An image identifier already existed inside one of the objects being passed through the system. Logically, the code should have simply reused that ID. Instead, the generated implementation took a much longer route: it fetched the ID, downloaded the associated blob, created a new file from it, uploaded that file back to the server, and then returned a new identifier.

From the outside, nothing was broken. Internally, the process was doing far more work than necessary. Each time the logic ran, it duplicated data, added network calls, and quietly increased resource usage.

We discovered this only because we opened the code and read it carefully.

This turned out to be a pattern we started noticing more often with AI-generated code. The output usually works, but the logic behind it doesn't always match the architecture of the system it's being added to. In shared components like SDKs, such inefficiencies can spread quietly through every product that depends on them.

What Industry Research Shows

While we were running these experiments, we studied key industry research. Our experience aligned closely with what independent analysts are measuring.

Productivity and Code Quality

GitClear's 2025 analysis found that AI tools can increase development speed by 20-55%, but the amount of "sustainable code" -- code that stays in the codebase without being rewritten -- grows by only about 10%. Developers produce code faster, but a noticeable portion still ends up being revised or refactored later. Full PDF report.

A randomized controlled study by METR (July 2025) produced a striking result: experienced developers working on their own mature projects actually spent 19% more time with AI tools, while subjectively estimating a 20% speedup. The key takeaway: perceived speed and actual speed are different things. Full data on arXiv and GitHub.

The Cost of Reviewing AI Code

Sonar's State of AI in Code report (January 2026) found that 95% of developers spend significant effort checking AI-generated code, and 38% consider it harder to review than human-written code. Developers read and verify code far more slowly than AI generates it, which creates a natural ceiling on productivity gains. Full PDF.

Architectural Limitations of AI-Generated Code

Ox Security's "Army of Juniors" report (October 2025) describes AI-generated code as "highly functional but systematically lacking architectural thinking." This explains why the code works but accumulates hidden problems. Report PDF.

Technical Debt

HFS Research + Unqork (November 2025) surveyed 123 respondents from Global 2000 organizations: while 84% expect AI to reduce costs, 43% admit that AI creates new technical debt. Opinions on long-term impact are split almost evenly -- 55% expect debt reduction, 45% expect increase.

Forrester predicts that by 2026, 75% of tech leaders will face moderate or serious technical debt, with AI code generation without engineering discipline being a key factor.

Impact on Delivery Stability

Google DORA Report 2024 found a critical correlation: a 25% increase in AI usage leads to a 7.2% decrease in delivery stability. There's a 2.1% productivity gain and 2.6% job satisfaction increase -- but at the cost of 1.5% throughput decrease and 7.2% stability decrease. Full PDF. The 2025 DORA Report confirms these findings.

Why the Real Gain Is 25-40%

Looking across both our experiments and the broader research, the same pattern keeps appearing.

AI tools clearly speed up certain parts of development: reducing boilerplate, navigating large codebases, scaffolding new functionality, and accelerating the path to a working implementation.

But those gains come with a counterweight. The code still needs to be understood, reviewed, and integrated into an existing system. Developers reason about code far more slowly than AI can generate it.

Without proper review, teams accumulate what we call "AI legacy code" -- code that works but nobody on the team truly understands. Over time, it becomes easier to regenerate than to modify. But regeneration means spending time and resources on problems that were already solved. In high-debt environments, losses reach 30-40% of the change budget and 10-20% of system operation costs.

This situation can develop within months after active AI code adoption without full developer involvement.

That's why the dramatic claims about "10x productivity" rarely hold up in real engineering environments. In practice, the gains stabilize in the 25-40% range -- meaningful enough to matter, but not so large that engineering judgment becomes unnecessary.

Conclusion

AI coding tools are most useful when treated as assistants rather than replacements for engineering judgment.

They excel at analyzing and comparing large volumes of code -- tasks that take humans significant time, AI handles very quickly. They reduce friction in everyday development and can meaningfully accelerate time-to-working-code.

At the same time, tasks requiring deep understanding of business logic and architectural optimization are often solved by AI in suboptimal ways. The resulting code works but is redundant. The system functions correctly on the surface, but hidden problems related to performance, resource usage, and maintainability can form inside.

Architectural decisions, quality control, and responsibility for results must stay with the team. With this discipline in place, AI tools deliver a real, measurable, and sustainable productivity boost.

DEV Community

AI Coding Tools in Practice: What a 25-40% Productivity Gain Really Looks Like

What We Tested

Three Experiments

Experiment 1: Extending a Production SDK

Experiment 2: Untangling Long-Lived Branches

Experiment 3: Integrating an SDK Into a Product (with Cursor)

The Hidden Problem With AI-Generated Code

What Industry Research Shows

Productivity and Code Quality

The Cost of Reviewing AI Code

Architectural Limitations of AI-Generated Code

Technical Debt

Impact on Delivery Stability

Why the Real Gain Is 25-40%

Conclusion

References

Productivity and Code Quality

AI Code Review and Security

Technical Debt

DevOps Metrics

Independent Reviews

Top comments (0)