Maksim

Posted on May 14

I tried to measure how much Claude Code and Cursor actually changed our team's productivity

#ai #webdev #programming #productivity

I tried to evaluate how AI has affected frontend development at our company. I looked at all 7 of our projects and 5 developers who use AI in their work.

The general feeling was that we'd started writing code faster. But it was only a feeling — I wanted either confirmation or refutation. Management is over the moon about how wonderful AI is, and even gave us corporate access to Cursor and Claude Code. They love to bring this up at every All Hands meeting, but nobody has actually shown any proof that adopting AI has had a real impact on how much useful-to-our-users code we ship.

The Metric

For the metric, I decided to simply take the amount of code written. I know that's a bit crude, but I don't see how to measure this objectively otherwise (suggestions welcome in the comments).

Methodology

Take four six-month periods:

Last six months (2025-10-01 → 2026-03-31) — most of our developers actively used tools like Cursor and Claude Code.
The six months before that (2025-04-01 → 2025-09-30) — used them, but not as actively.
Another six months back (2024-10-01 → 2025-03-31) — this was mostly just chatting with an LLM. Nothing like launching Claude Code from the console, and the models were significantly dumber than today.
Another six months back (2024-04-01 → 2024-09-30) — the ChatGPT 3.5–4o era, when if anyone used AI for coding at all, it was only for isolated snippets in chat mode. This last period serves as the baseline.

Digging further back didn't seem worth it.

Caveats in Counting Code

So we compare the volume of code written, but there are some caveats:

Auto-generated files. Things like package-lock.json, test mocks, vendored branding dependencies, and the like have to be excluded.
Unit tests. There's a real observation that we've started writing more unit tests via AI. In fact, that's now the only way we write them. Tests are great, of course, but what matters to the end user is product code, not tests. So for a clean experiment, tests need to come out of the comparison (the numbers with tests will be there too, below).
Documentation. We've also started writing more documentation with plans for AI, plus all sorts of instructions and skill files. Since this code/text doesn't ship to the user, we exclude it as well.
Noise. All sorts of garbage needs to be removed: commits that apply new linter/formatter rules, accidental inclusions of auto-generated code, mass code moves, chains of reverts, and similar junk — to get a reasonably clean dataset.

Accounting for the Number of Developers

Before comparing numbers between the half-year periods, we need to handle one more thing: what if we started writing more code simply because there are more developers, or because the new folks happen to write more? In our case, the team has indeed grown in headcount over the last while (regards to everyone who insists AI will replace us all). So the cohort for comparison only includes people who have been at the company for the past two years, and only those who definitely use AI in their work. We have employees who are pretty lukewarm on AI and use it sparingly — they're out of the calculation.

A spoiler right away: among the 5 developers, there are two who use AI in their work but whose numbers haven't moved at all over the last two years — they wrote what they wrote, and they still write the same amount. Make of that what you will, but the fact stands: if you give someone a microwave and they spend less time heating food, that doesn't mean they'll start eating more. Maybe these people just rest more now while AI writes the code for them. Or maybe code-generation speed was never the bottleneck in their work in the first place.

For the metric, we'll use the average amount of code per working day for the whole group.

The Tool

For all of this, we'll use the Git Insight plugin for JetBrains IDEs — it can do everything described.

Preparing the Data

The first step is to merge all the big feature branches into some integration branch. Our project happens to have two feature branches where large features have been in development for several months, and that code needs to be counted too, otherwise the statistics get skewed.

Step-by-Step

1. Remove unwanted file extensions

Open the plugin, go to the Project Statistics tab, look at the list of file extensions, and drop the ones that shouldn't be counted — you might immediately spot some XML files that have no business being in the stats. They can be removed from the statistics right from the context menu (right click).

2. Look for suspiciously large files

Walk through the extension list and drill into them (double click) — look for suspiciously large files (sort by size). It depends on your project, but in ours these were mock.json files generated automatically for unit tests — we don't want them in the stats. They can be removed from the same context menu, and not one by one but by glob patterns.

3. Exclude folders and files

By default the plugin doesn't count files already in .gitignore, plus some others like package-lock.json, archives, and images. But your project might have some folder you want to exclude too — for example, one containing branding dependencies with library code. Go to the Settings tab and add a glob pattern to exclude such files/folders.

4. Deal with suspicious commits

Go to Developer Statistics → Suspicious commits. This view collects commits the plugin considers suspicious: too large, looking like a formatter run, reverts, and so on.

It's really worth working through this list — you can exclude either whole commits from the statistics, or individual files/folders inside a commit. Don't skip this! In one of our projects I found a chain of commits applying a new ESLint ruleset: a commit, its revert, another commit, then another revert, and finally a new commit — 200,000 lines of changes in total. I removed all of them from the stats.

You can also click on bars in the chart if there are anomalies and look at the commits from that period — maybe you'll find something else to drop.

5. Create Tests and Docs file categories in Settings

We need to be able to view statistics both with and without tests and documentation. A category consists of a name and glob patterns matching files. For our projects, tests look like this:

*.spec.[tj]s
*.spec.[tj]sx
*.test.[tj]s
*.test.[tj]sx

For documentation:

*.md

6. First Results

This is what we get out of it:

All code:

Code without tests and docs:

I extended the timeline back to 2023, when all 5 developers were already at the company.
Judging by the charts, the overall trend is positive. The only thing left is to look at the actual numbers.
You can also see that during the first two years our developers were producing a fairly stable amount of code — which is at least an indirect sign that the methodology holds up.

7. The Numbers

Period	all code	without test and docs
base period	+249/-100	+193/-89
second	+327/-146 (+31%/+46%)	+263/-119 (+36%/+34%)
third	+472/-243 (+90%/+143%)	+399/-222 (+107%/+149%)
fourth	+721/-332 (+190%/+232%)	+479/-283 (+148%/+218%)

8. Conclusions

There's a clear trend of code volume rising steadily. Code review is becoming the bottleneck in development.
Overall, we got at least a 2x increase in code produced. Maybe even closer to 3x.
On the other hand, all we've learned is that we write three times more code. We don't really know anything about that code. Maybe there's more tech debt now? Maybe it's code for features our customers don't actually need? Or maybe it really is solid, useful code. That's a much harder question, and not an easy one to answer objectively.
Either way, in the AI era we keep writing more and more code — that part is a fact. Development has changed dramatically over the past couple of years, and I can't recall another shift like it in our industry over the past 15 years. Interesting to see where this all takes us.

DEV Community