Michael Larson

Posted on Jul 23 • Edited on Aug 4

Which speeds up development more: AI Coding Agents or Pair Programming?

#ai #productivity #development #datascience

AI is transforming software engineering, but how much does it really speed up development? Big tech companies claim that AI can boost code output by 30% or more. But does that mean teams are actually 30% more efficient? Are features reaching customers 30% faster? Let's dig in and find out.

With my knowledge of queue theory and the theory of constraints made me skeptical. Sure, developers might code faster with AI, but what about bottlenecks elsewhere in the pipeline? Could speeding up coding actually slow things down downstream? And what about tried-and-true practices like pair programming and trunk-based development, are they even faster than using pull requests and feature branches? I decided to put these ideas to the test with a simulation, using GitHub Copilot and some queue theory tools.

Setup

Here's how I set up the simulation:

General Key Assumptions

Infinite backlog of tickets: Developers always have something to work on, and tickets are independent (no merge conflicts).
Each ticket takes about one day to complete.
After finishing a ticket, a developer immediately picks up a new one.
AI-assisted scenarios benefit from a 30% speedup in coding time.

Key Assumptions for Pull Request Scenarios

Every ticket is submitted for review.
About 75% of PRs get feedback that requires rework, with the time to fix randomly distributed.
Once rework is done, the code goes back into the review queue.

Key Assumptions for Pairing and Trunk-Based Development Scenarios

The team uses trunk-based development, so there is no review queue, pairs push code directly to main.
The pairs wait about 6 minutes for automated tests.
If defects are found, the pair fixes them before moving on.
Pair programming has 60% the defect rate of solo work.

This setup lets us compare how different workflows—traditional PR, AI-enhanced PR, pair programming, and AI-enhanced pairs—affect lead times and throughput, with all other variables held constant.

Results

After running the simulation for 60 business days, here are the results for each scenario:

Results for Pull Request Workflow

A total of 246 tickets was completed, with a mean lead time of about 23 hours. This serves as our baseline for everything else.

Results for Pull Request + AI Workflow

Now our team gets an AI coding tool they love and starts using it right away, making them 30% faster at coding than before. This results in about a 23% increase in the number of tickets completed, but our lead time is still 23 hours. We also notice a lot more PRs waiting for review, and the wait time for feedback increases by 2.6 times! So while we're getting more done, it's putting increasing pressure on our pull request queue and the resources we have allocated for reviewing code.

Results for Pair Programming + Trunk-Based Development Workflow

Now imagine the team looks at their PR queue and sees that lead times are still high. They decide to pair up instead of working alone, and agree to trust each pair to push code directly into the main branch. Each pair is committed to fixing any build breaks immediately before moving on to other work. In this scenario, they complete 240 tasks, but their lead time drops to about 9 hours, and the time spent in rework is reduced by 79% compared to our baseline.

Results for Pair Programming + Trunk-Based Development + AI Workflow

Our team now re-introduces AI coding tools, with the same 30% speedup as our other AI-enhanced scenario. Their lead time falls to 7 hours, and their rework time drops by 82% compared to our baseline. They also complete 316 tickets, the most of all the scenarios tested.

Overall Results

Implications

So, what does all this mean for teams thinking about adopting AI coding tools?

AI coding tools alone won’t solve workflow bottlenecks. If your process is slowed down by reviews, handoffs, or queues, speeding up the coding step won’t make a big difference. The simulation showed that even with a generous 30% speedup, lead times barely changed when the review queue was still present. This makes sense—queue theory tells us that if we increase the arrival rate but do nothing to address the service rate, backlog will continue to increase.
Pair programming offers both speed and predictability. By working together and skipping the review queue, the scenario that used pair programming and trunk-based development saw much faster and more consistent lead times. This means less waiting, more predictability, and a smoother flow of work.
Defect rates matter, but workflow matters more. While pair programming reduced defect rates in the simulation, the biggest gains came from removing bottlenecks. AI did not reduce defects in this model, and in some real-world studies, it may even increase rework.
Optimize the process before investing in new tools. If you want to ship faster, focus on removing queues and unnecessary handoffs. Once your workflow is streamlined, then consider how tools like AI can help you go even faster.

Conclusion

The recent METR research study shows that while AI can speed up several tasks developers do day to day, rework from fixing mistakes made by AI can consume more time than is saved elsewhere. When you also look at how effective pair programming and trunk-based development are at reducing lead time and rework, the sweet spot seems to be combining AI with pair programming.

So often when we're working alone, we lose track of time and think, "Let me try one more thing, it's going to work this time!" But we end up wasting hours. This also happens with AI coding assistants: "If I prompt it one more time, then everything will be fixed!" With a human pairing partner, our pair can help us see the bigger picture and pull us out before we waste time on a dead end.

If you want to use AI tools, experiment and use DORA metrics to see if they're really helping your developers. Listen to their frustrations and joys using AI tools. Adopt practices such as trunk-based development and pair programming. Use AI to help automate manual reviews that slow work and cause handoffs. Bottom line: don't just assume AI is going to be a miracle cure and fix all your problems. Experiment, but also pair this with good technical practices and observability.

Feel free to look at my simulation here on GitHub.

DEV Community