Matthias Bruns

Posted on Mar 6 • Originally published at appetizers.io

AI-Assisted Engineering: The Productivity Paradox Nobody Warns You About

#ai #developerproductivity #engineeringstrategy #llm

The Numbers Don't Add Up

Every vendor pitch says the same thing: AI will make your developers 50% faster. GitHub claims Copilot users complete tasks 55% faster. Sounds great on a slide deck.

Here's the reality check.

METR's randomized controlled trial — actual experienced open-source developers, on their own repos, doing real work — found that AI tools made developers 19% slower. Not faster. Slower. And the kicker: developers believed they were 24% faster while being measurably slower.

Faros AI's study across 10,000+ developers and 1,255 teams tells a similar story. Individual throughput goes up. Developers merge 98% more pull requests. But PR review time balloons by 91%. Bug rates increase 9% per developer. PR sizes grow 154%.

At the company level? No measurable productivity improvement.

That's not a tooling problem. That's a systems problem.

Where AI Actually Helps (and Where It Doesn't)

AI coding assistants are genuinely good at:

Boilerplate and repetition. Config files, test scaffolding, CRUD endpoints. The stuff nobody wants to write.
Exploration and prototyping. Trying three approaches in the time it took to try one.
Translation between languages and frameworks. Porting patterns from Go to TypeScript, or vice versa.
Documentation first drafts. Getting from blank page to decent starting point.

AI coding assistants are bad at:

Architecture decisions. LLMs don't understand your system's constraints, trade-offs, or history.
Debugging production issues. They lack context about your infrastructure, traffic patterns, and failure modes.
Code review. Automated review catches style issues. It misses the "this will cause a race condition under load" kind of problems.
Knowing when not to write code. The best engineering decisions are often about what you don't build.

The pattern is clear: AI accelerates the cheapest part of software development (writing code) while doing nothing for the expensive parts (design, review, debugging, deployment, maintenance).

The Bottleneck Shift Nobody Talks About

When developers produce more code faster, the bottleneck moves downstream. Every team we've worked with that adopted AI coding tools aggressively saw the same sequence:

Week 1-4: Developers feel faster. PR volume spikes.
Month 2: Review queues back up. Senior engineers spend all day reviewing AI-generated PRs.
Month 3: Bug reports climb. AI-generated code passes CI but fails in edge cases nobody tested.
Month 4: Lead times are longer than before because everything is stuck in review.

Faros AI's data confirms this: developers on high-AI-adoption teams touch 47% more PRs per day but the review bottleneck absorbs all the gains. Amdahl's Law in action — the system moves at the speed of its slowest link.

What Engineering Leaders Should Actually Do

1. Fix the Pipeline, Not the Typing Speed

If your deployment pipeline takes 45 minutes and your code review queue is 3 days deep, making developers type faster changes nothing. Invest in:

Automated review gates that catch the obvious stuff before a human looks at it
Smaller PR culture — AI makes it easy to generate massive changesets, but massive PRs are review killers
Deployment confidence — feature flags, canary releases, automated rollbacks

2. Measure What Matters

Stop measuring lines of code or PRs merged. Start measuring:

Lead time from commit to production
Change failure rate — are you shipping more bugs?
Review turnaround — is your review queue growing?
Time to recovery when something breaks

If AI adoption increases PRs merged but also increases your change failure rate, you're not winning. You're just failing faster.

3. Use AI for the Boring Stuff, Humans for the Hard Stuff

The best teams we see treat AI as a force multiplier for the tedious work:

Generating test cases from specifications
Writing migration scripts
Producing first-draft documentation
Scaffolding new services from templates

And they keep humans firmly in control of:

System design and architecture
Security-critical code paths
Performance-sensitive implementations
Cross-team integration points

4. Don't Skip the Learning Phase

The METR study found one reason AI slowed developers down: context switching between their own thinking and the AI's output. Experienced developers had deep mental models of their codebases. AI suggestions often didn't match those models, requiring extra time to evaluate and adapt.

This means junior developers might get bigger speed gains (they have less established mental models to conflict with), but they also need more supervision, not less. AI doesn't replace mentorship. If anything, it makes mentorship more important — someone needs to catch the plausible-looking-but-wrong code that juniors accept uncritically.

The Honest Assessment

AI-assisted engineering is here to stay. 82% of developers use AI tools weekly. About 27% of production code is now AI-authored. That's not going backwards.

But the productivity gains are real only when you redesign your entire delivery pipeline around higher code volume — not when you bolt AI onto an unchanged process and hope for the best.

The teams getting actual value from AI are the ones that:

Invested in review automation before increasing code output
Set up quality gates that catch AI-specific failure modes
Trained their developers on when to use AI and when to think for themselves
Measured end-to-end delivery metrics, not just coding speed

The rest are just generating more code to review, more bugs to fix, and more PRs to merge — while wondering why the roadmap still slips.

Need help integrating AI tools into your engineering workflow without creating new bottlenecks? Let's talk.

Top comments (2)

Henry Godnick • Mar 14

The context switching point from the METR study is the one nobody wants to hear. We talk about AI making developers faster but completely ignore that switching between "your own mental model" and "evaluating AI output" has a real cognitive cost.

This connects to a broader productivity issue I keep thinking about: we've optimized for output speed everywhere while ignoring the cost of fragmented attention. AI tools, notification systems, algorithmic feeds... they all share the same design philosophy of "interrupt the human with something relevant right now." And the cumulative effect is that deep, sustained focus becomes nearly impossible.

The teams getting real gains from AI probably aren't the ones using it the most. They're the ones who've figured out when to use it and when to protect uninterrupted thinking time from ALL interruptions, AI-generated or algorithmic.

Matthias Bruns • Mar 16

You are basically reviewing code the whole day. When you come from a PR-review-culture, you know that reviewing code is way more challenging than writing it while you are in the domain mentally. The biggest problem of vibing is not using the tool, but not really checking what has been written since it looks fine. If you use the tools, and I think there will be no way around it in the future, you cannot simply rely on some fancy scrolling code-generation logs, but have to dig into what has been done.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.