Blaine Osepchuk

Posted on Oct 16, 2017 • Edited on Nov 8, 2017 • Originally published at smallbusinessprogramming.com

Optimal pull request size

#productivity #efficiency #codequality

Every team has an optimal pull request size, it's likely much smaller than you think, and making your pull requests your optimal size will improve the performance of your team. I'm going to convince you that these three statements are true by the end of this post.

What does the research say?

A large body of research has shown that code reviews are the most cost-effective way of finding defects in code¹. But most of the value of a code review comes from the first reviewer in the first hour².

How much code can my team review in an hour?

My team tracks how much time we spend on each code review. I compared those times to the number of lines changed in our pull requests to produce this scatter plot³.

The data is pretty messy but if we want a 90% chance of completing a code review in an hour, we probably need to cap our pull requests at around 200 lines of code.

Finding your optimal pull request size

This is really just an optimization problem. You have a cost associated with creating, reviewing, managing, and merging a pull request, which we can think of like a transaction cost. But you also have a cost associated with letting the pull request remain as work in progress, which we can think of like a holding cost. I'll define these costs in more detail in a minute. I just want to establish that you are trying to find the pull request size that minimizes the sum of your costs (the lowest point on the blue line below).

Pull request transaction costs

It takes time and effort to:

break stories into bite sized chunks
make the them independent or schedule them so they don't block each other
then actually make the changes required
create a testing plan
test the changes
create a pull request
review it
merge it
deploy it to production (we do continuous delivery)

If you did this for every line you changed, you would be at the far left on graph above. Not good. But making larger pull requests creates costs of its own.

Pull request holding costs

The larger the pull request, the:

longer it takes for the reviewer to find a big enough slot of time to address it (and the less motivated the reviewer will be to review it)
less chance it will pass code review on the first attempt and therefore require multiple reviews
longer it will take to review
more chance the review will exceed an hour (or whatever your personal limit is for effective code reviews) and become be ineffective
more chance it will grow larger still and need even more review
bigger the risk that the author made a bad assumption and will have to redo much more work
bigger the risk that the requirements will change before the code is released
more demoralizing it becomes to work on this code (multiple reviews of the same code sucks for everyone)
more it sucks to have to re-review and re-test the whole pull request if some little part of it fails
less certainty we have over when the code will be merged
more chance of a merge conflict
more chance it will block other stories in the sprint and derail the schedule
longer it will take to get the code into production and start delivering business value

If someone submitted a 10,000 line pull request in our project I can guarantee that all of these points would come into play. But our data and experience show that much smaller pull requests are still incurring huge holding costs.

Why would smaller pull requests by better?

Smaller batch sizes increase flow, according to Donald Reinertsen in his book: "The Principles of Product Development."

Specifically:

Reducing batch size reduces cycle time
Reducing cycle time reduces variability in flow
Reducing batch sizes accelerates feedback
Reducing batch sizes reduces risk
Reducing batch sizes reduces overhead
Large batch sizes reduce efficiency
Large batch sizes inherently lower motivation and urgency
Large batch sizes cause exponential cost and schedule growth
Large batches lead to even larger batches
The entire batch is limited by its worst element

Applying lean manufacturing concepts to pull requests

Some very bright people at Toyota invented the Toyota Production System, which includes single-piece flow. And they used this system to take over the auto industry. This is a similar idea but Reinertsen tweaked it for product development. The key to making this really work is to lower your transaction costs. If you become really efficient at processing pull requests, your overall cost curve will shift down. And your optimal batch size will shift to the left (get smaller). It's a virtuous cycle.

You want to get stories moving across your sprint board very rapidly. And you want to get new code into production as quickly as possible. You don't want pull requests stalled in review for days at a time. And you really don't want failed pull requests and all the rework that goes with them.

Still not convinced that the optimal pull request size is small?

Let's do some math. The more code in your pull request, the more chance that you've introduced a defect. I know software defect rates vary widely but let's use the often quoted 10-50 defects/kloc for released commercial software. But, we are talking about a code review--not released software--so let's assume that your defect rate is 50 defects/kloc. That's one defect every 20 lines of code on average⁴.

Good code reviews catch 60-90%¹ of the defects in the code so let's pick a middle value and say that you can catch 75% of them. That means you should find one defect for every 27 lines of code on average.

Smaller pull requests contain fewer defects

A 200 line pull request should contain 10 defects and you should expect to find 7.5 of them. That means it's extremely unlikely that it will pass code review on your first attempt. Now let's redo the math with a 50 line pull request. There should be 2.5 defects in this pull request and you should expect to find 1.9 of them. You've got a much better chance of passing code review on the first try with a smaller pull request.

When your big pull request fails code review, you have to fix the whole pull request, re-review the whole pull request, re-test the whole pull request, and hope that the nobody has introduced a new error along the way. When your small pull request fails code review, you still have to do all the same steps. But it will go much more quickly because you are dealing with much less code.

What's wrong with the way my team handles pull requests?

My team is struggling with our pull requests and our problems are all related to what I like to call "mega pull requests". Mega pull requests are pull requests that have gotten out of control and have become difficult to review and deploy. We recognize them by their large diffs, multitude of comments, multiple reviews, many days spent in review, and many hours spent trying to get them into the 'done' column.

How a mega pull request is born

Suppose my colleague creates a 200 line pull request. I see it in Jira and just from the title I know it's going to need my full attention. So I schedule it in a big block of time tomorrow and plan to tackle it all at once. I review it and find a couple of problems. I write my comments in BitBucket and bounce it back to the author. He's moved onto something else so it takes him until the next morning to review my comments, change the code, update the testing plan, retest the branch, and put it back in review.

We repeat this cycle once or twice while we deal with other stories, meetings, unplanned work, and other interruptions. By now, this pull request has been in review for the better part of a week and now it might have merge conflicts. If so, the author fixes them, retests, and sends it back to review. At some point I eventually get all the tests in the testing plan to pass, I approve the pull request, merge into master, and move on to the next story.

We probably have one pull request like this per sprint⁵. But on rare occasions, we've had pull requests bounce several times over several weeks. I'm not joking. Mega pull requests kill our productivity.

What's helped

We've tried all kinds of things to prevent the formation of mega pull requests but our ideas have mostly been ineffective. The only things that helped a little were:

automating our code styling and formatting with a commit hook so it's never an issue
doing a refactoring story in front of the main story
avoiding stray changes in the main story no matter how ugly the code - we just put it in the backlog and make a comment in the code with the Jira story ID

We think we've still got room to improve so we're interested in finding our optimal pull request size.

How do we find our optimal pull request size?

We conduct experiments. Our current policy is to only create a pull request that will take less than an hour to review. But we suspect our velocity will improve if we create smaller pull requests. So we'll try a 45 minute limit for two months, examine the data, set a new target, and repeat the experiment until we find our optimal pull request size.

Wrapping up

Every team has an optimal pull request size, it's likely much smaller than you think, and making your pull requests your optimal size will improve the performance of your team. You can find your optimal pull request size by doing experiments and measuring the change in your velocity.

I think I've made a pretty strong case for smaller pull requests. But I'd love to hear what works for your team. Have you experimented with the size of your pull requests? Have you found your optimal pull request size? How big are they and how has your velocity changed since you found it?

Fagan ME (1976) Design and code inspections to reduce errors in program development. IBM Syst J 15: 182-211. ↩
Cohen J (2010) Modern code review. Oram A, Wilson G, editors. Making software: what really works, and why we believe it. Sepastopol (California): O'Reilly. pp. 329-336. ↩
I removed spurious data like file renames, automatically generated files, and changes caused by running a formatting tool over the code. I also removed pull requests that didn't pass review. ↩
This is just a back-of-the-napkin math. You can use a wide range of defect and detection rates and still reach the conclusion that large code changes should contain detectable errors that cause your code review to fail. Please don't clobber me in the comments about the exact numbers I've used here. ↩
We have a 'zero known defects' policy so code goes back in forth in code review until we think it has zero defects and meets our other quality objectives. We're trying to improve the quality of our code base and repay our technical debt. This will change our optimal pull request size compared to a team that has a 'ship it and we'll fix it in maintenance' policy. ↩

Top comments (10)

Jason Huebel • May 30 '19

I just submitted by first PR to a project on GitHub. Since I'm a new contributor to the project, would this still apply? My PR fully implemented a new feature for an open sourced CRM.

I probably made some other mistakes in how I submitted the PR, though. For instance, the code formatting was a mess. Being new to the project, code readability was important so I could wrap my head around the author's preferred way of doing things. So I ran PHP CS Fixer against the app source with PSR2/Symfony formatting. That changed the formatting in a large number of files, which probably wasn't the smartest idea. While the code to implement the new feature probably only amounted to about 400 - 500 lines, the PSR2 formatting ballooned that to 17K lines changed (like I said, the code was a mess).

What could I have done differently?

Blaine Osepchuk • May 31 '19

Hey Jason, I recommend you follow the instruction/rules of the project. If the project doesn't have any rules, you might email the maintainer and ask what they prefer before you spend your valuable free time writing a PR that's going to get rejected after a 30 second review, which is probably what's going to happen to your PR.

In the absence of specific advice from the maintainer, I would follow my own advice in this post. PRs need to be easy to review. They need to be small enough that your brain doesn't give it before you understand the changes and make sure it's okay.

Since I wrote this post we've lowered the time limit for reviews to 40 minutes. My team rejects PRs when we hit the 40 minute mark--"too long" is the feedback the author gets. But that's our limit. The sweet spot seems to be around 20 minutes.

So how would we handle your situation? Chained PRs.

The first PR is just formatting the whole project in one commit with the message: "ran the project through the format -no manual code changes."

Then we'd branch off of that and create one or more stories to refactor the code where we want to make our change and/or add unit tests. Refactoring means zero behavior changes--if the behavior changes you're not refactoring. We call them refactoring PRs and we keep them small and tightly focused.

Once we've got a nice clean place for our new code we branch off of the last refactoring branch and start creating the classes we'll need for our new functionality. That's often one class plus its unit tests per PR. The new classes aren't called by any existing code yet but they give the reviewer a chance to think about them, look at the tests, and see if there are any errors or missing test cases. We keep creating classes with unit tests and joining them with the other classes we've created in small steps until we've built everything we need for the feature.

Then the final PR joins the new feature to the existing system and activates it.

What else? The author usually discusses his design/code/PR plan with the reviewer at the beginning of the process. They agree on the design and the approach before a single branch is created. The reviewer is also reviewing the PRs as soon as possible after they are created so if things are going off the rails the author can get feedback from the reviewer before he wastes a bunch of time.

Note: this is what works for my team. We are a permanent team, we work together all time, and we value low defect, highly maintainable code. This approach might not work with the maintainer of an open source project you've never met or someone with other priorities for their code base and time.

Jason Huebel • May 31 '19

Pointing to your last comment about "someone with other priorities", this seems to be the case with the project I'm contributing to. It hasn't had any development activity for a couple of years.

Truthfully, if the original author never merges my PR it won't bother me at all. But I felt obligated (regardless of the permissive MIT license) to contribute back to a project whose code I was basing my changes on.

If it turns out that the original author isn't very responsive, I'll probably just keep moving along with my fork since the changes I'm making are beneficial to the company I work for.

Blaine Osepchuk • May 31 '19

That sounds like a very reasonable approach.

Cheers, Jason.

Leonardo Holanda • Nov 9 '23

Hi @bosepchuk! I have dealt with this problem before as an intern and decided to build a VSCode extension to help developers control their PR size. I would like to ask what do you think about it! Do you think it can play a role as a solution to this problem?

Here's the link: marketplace.visualstudio.com/items...

Blaine Osepchuk • Dec 23 '23 • Edited

Hi Leonardo,

Sorry I missed your question. I've been away from the platform for while.

I looked at your extension and I'm not sure how helpful I would find it.

But I have pretty well-developed eye for limiting PR size and I'm thinking about it when I'm making stories (and breaking planned changes into reviewable chunks), not after I've already started making code changes.

BUT there's no harm in making or using such a tool and I applaud you for releasing a tool that might help others do better work. It might be a really great for less experienced devs.

Cheers.