Nicolas Mattia

Posted on Mar 7, 2019

Hunt bugs down before they are merged!

#productivity #testing #agile #coding

I discuss a few reasons why catching bugs in master is more expensive than
before they are merged, try to explain why some people think this isn't true,
and talk a bit about merge queues.

Hunt bugs down before they are merged!

Here's something I heard a few times from recalcitrant coworkers after I've
asked them to write more tests in one of my signature uptight PR reviews:

The cost of fixing a bug after it has been merged into the master branch
is the same as that of fixing it before it has been merged into the
master branch.

I beg to differ! Catching a bug once it has landed in the main codebase has a
huge associated cost. It boils down to the following reasons, which I'll
describe individually in the next few sections:

When someone hits a bug in the main code base, they were most likely not looking for one, but trying to get work done.
Whomever is tasked with fixing the bug may have lost context relevant to the faulty code, or maybe they never had any to start with.
Reporting a bug creates noise in the form of tickets and pull requests (PRs).

Let's jump right in!

story time

You are a frontend engineer for a company that stores online notes. You're
tasked with writing an autocomplete function allowing the user to find notes
more quickly:


+-----------------------+  +------+
| Gr|                   |  | open |
+-----------------------+  +------+
|                       |
| Great minds.txt       |
|                       |
| Grocery list          |
|                       |
| Grrr! said the lion   |
|                       |
| ...                   |
+-----------------------+

The following HTTP call returns the list of all of a user's notes:

GET /users/<user>/notes?query=<prefix>

It gets in the way

story time

You've been doing amazing work. You got the flow. Mind like water. You're done
with your frontend changes. You spin up a local instance of the server.

You try it out and realize that the server segfaults when there are no matches.

You despair. There goes your flow. The name of the backend engineers flash
before your eyes. You consider not inviting them to your wedding. You make a
mental list of who's most likely to have introduced the bug. You consider
quitting your job and becoming a barista in a different country -- maybe even
in New Orleans. You'll need a specialty, maybe peanut butter cookies. You'll
kill it with your peanut butter cookies. People will know about them from New
England to New Mexico. You'll become famous and, in a few years, run for
president of the United States. Then you consider the logistics: your fiancée
will need to quit her job, you'll need to hire a moving service, your dog may
not survive the flight.

You stash your local changes and set out to write a test case reproducing the
issue.

Unless you're a quality analyst, you probably don't go around trying to find
bugs for fun. Most of us discover bugs while trying to get something done:
maybe you're simply using the server, or you're writing a new feature that
depends on a different feature or function. Discovering wrong behavior in a
codebase is almost always more of an annoyance than a nice surprise.

Had the bug been caught before merge, the bug reporter would most likely have
been the person who introduced the faulty behavior, and would have most likely
been expecting to encounter "some" bug (correct software on the first attempt
is a myth, sorry).

It creates noise

story time

you> @channel I'm experiencing server crashes when there are no matches on
/users/foo/notes?query=bar, anybody knows anything about this?

tom> please use @here instead of @channel

pat> @tom please don't use "at" channel

jen> @you yes I think this is related to #2551

mol> @jen #2551 was fixed last week

jen> okay nevermind, no idea @you :(

mkt> @channel lunch?

Whether you are reporting issues on Slack, GitHub, JIRA, you name it, reporting
a bug creates some noise. Some people will need to label the ticket, will try
to help you figure out the cause, others may simply get distracted because of
the extra Slack/GitHub/JIRA notifications.

All this can be avoided if bugs are caught before they are merged to master:
bugs are typically not reported before the code is part of the mainstream
codebase.

### The Hunt

This one should be pretty straightforward. Finding a bug that's been introduced by a diff before merge into the master branch limits the search space to that diff only. Trying to find a bug on master means potentially having to consider the whole codebase, unless your codebase lends itself to things like bisecting.

Lack of context

story time

The VP of engineering comes to you, in his typical nonchalant tread:

VP: You'll need to fix that server crash bug.

You: But!

VP: I know it's not your area of expertise, but we need this fixed before
release.

You: But!

VP: See the bright side: you'll get to learn some Rust!

You: But... wait what?

VP: Yeah Andy rewrote the backend in rust last week. He's on holiday now
though.

You: ...

VP: Good luck! Don't forget about Engineering breakfast tomorrow.

As the VP leaves, you sing to yourself:

... there's a moooooon over bourbon street ...

Most of the time no one knows where the bug is and who introduced it. This
means that the odds of picking the right engineer to hunt down a bug are about
the same as trying to guess who in your team got their tonsils removed last --
if anybody actually got their tonsils removed.

The person tasked with fixing the bug then is not particularly likely to know
about the part of the codebase that is at fault. Even if the person who
introduced the bug is working on hunting it and fixing it, that person might
not have worked on that part of the codebase for a while. The context is either
absent or has been lost with time, which make debugging longer and more
frustrating.

This can be avoided by catching the bugs right after they've been written, and
before they've been merged.

Moar noise

story time

you> @channel I've got a fix for the server crash issue, anybody care to
review #2578?

tom> please use @here instead of @channel

pat> @tom please don't use "at" channel

jen> @you yes I'll have a look in a sec

mol> @you sure np

mkt> @channel coffee break?

Fixed the issue? A new round of PRs and reviews creates even more noise. At best this justifies hiring the intern whose job it is to move JIRA tickets around. Most likely it's a waste of time for everyone involved.

Why the fallacy?

So why do some people tend to think that it's more efficient to merge new
code and do damage control later (a.k.a shoot-from-the-hip coding)? Well, in
the short term, it may save time for the implementer (let's call a spade a
spade: the culprit) by:

Not writing tests or coming up with enough test cases.
Potentially merging without review.

These may save time to one person in the short term by allowing them to merge
faster and move on to other things; in the greater scheme of things however
time and energy is wasted.

Another reason why people tend to believe that fixing a bug in master isn't
necessarily worse than catching it before it is merged is that people often
don't realize that there's a cost associated with reporting bugs. And this cost
grows linearily with the number of people subscribed to the bug tracker,
involved in triage, and QA.

If you like your team, please make sure you catch your bugs before others find
them themselves!

Appendix: merge queues for greater good

Bugs are not the only way to break master, race conditions also apply to
software process. They may happen when a CI system only runs tests on a branch,
without rebasing on master first (note: some CI systems, like Travis and
CircleCI, will test both your branch as it is and after having rebased it
on or merged it into master). The following diagram lists three branches,
master, bob and alice:


bob master alice
 .    ✓      .   master is green :)
      |
 .    +------+   Alice uses `makeRainbow()`
      |      |
 .    |      ✓   branch alice is green :)
      |      |
 +----+      |   Bob removes `makeRainbow()`
 |    |      |
 ✓    |      |   branch `bob` is green :)
 |    |      |
 +---->      |   branch "bob" is merged
      |      |
      ✓      |   master is green :)
      |      |
      <------+   branch "alice" is merged
      |
      x          master is red :(
      |
      v

Alice forks a branch and used the function makeRainbow(). No bug introduced,
CI is green, everyone's happy. Around the same time Bob decides to do some
clean up, forks master, and removes the function makeRainbow(). From his
point of view, no one is using it, CI is green, everyone's happy.

Regardless of who merges first -- Alice or Bob -- the end result can't be good:
Alice's feature makes use of a function that doesn't exist in the codebase
anymore after Bob's changes. The solution to avoid those issues is to always
rebase a branch before running CI tests:


bob master alice
 .    ✓      .   master is green :)
      |
 .    +------+   Alice uses `makeRainbow()`
      |      |
 .    |      ✓   branch "alice" is green :)
      |      |
 +----+      |   Bob removes `makeRainbow()`
 |    |      |
 <----+      |   branch "bob" is rebased on master
 |    |      |
 ✓    |      |   branch "bob" is green and ready for merge :)
 |    |      |
 +---->      |   branch "bob" is merged
      |      |
      |      ?   merge is prevented on branch alice :|
      |      |
      +------>   branch "alice" is rebased on master
      |      |
      |      x   branch "alice" is red :(
      |      |
      |      |   Alice writes a new implementation of `makeRainbow()`
      |      |
      |      ✓   branch "alice" is green and ready for merge :)
      |      |
      <------+   branch "alice" is merged
      |
      ✓          master is green :)
      |
      v

The only thing you have to do is to ensure that no branch is merged without
being strictly on top of master. The simplest solution is to manually keep
track of the branches that should be merged:

Whenever someone wants to merge, their branch is enqueued.
Exactly one person pops branches from the queue, rebases them, and merges them if CI is happy.

This is a very tedious process and is better automatized by tools like bors-ng.

The Poor Man's Merge Queue

Merge queues need not be fancy. The first time we realized that we were wasting
a lot of time fixing rebase issues, my team and I came up with a very simple
solution: keep the PR numbers in a Slack channel's "topic" field. Got a PR you
want to merge? Add it to the right of the list. The left-most PR has either
been merged or failed CI? Remove it from the list, and rebase the new left-most
PR.

Top comments (4)

Dan • Mar 8 '19

I appreciate this article more than many others I’ve read because it comes from experience. I’m just trying to learn git and haven’t really been very involved with it yet.

In your scenario everything is happening with the master branch. New features are merged there and the CI is connected to it. So I get the point of this that bugs should be taken care of before being deployed. I like how you illustrated that. Very easy to understand. I also agree that taking care of bugs before merging is very important and means less mess later.

I do have a question that hopefully you can comment on...

If one goal is to avoid bugs in deployment, then why not have a separate branch for development?

Again I have little experience but I believe something like this:

branch master - deployment version connected to CI
branch develop - features get merged into this branch; testing is done before a release is merged into master; all new features are branched from here

I'm just genuinely curious because I'm trying to learn and wonder what reasons a workplace may have in choosing a certain workflow.

Nicolas Mattia • Mar 8 '19

Hey Dan!

I’m just trying to learn git and haven’t really been very involved with it yet.

Nice, you're in for a treat! It's a bit confusing at first. After a while you start realizing how amazing it is.

If one goal is to avoid bugs in deployment, then why not have a separate branch for development?

If you have a branch for development that is not connected to CI, you might run into a lot of issues: it means that people can basically commit anything to that branch, without ensuring that the tests are run!

When you decide to "release" that branch by merging it into the master branch, you'll run the tests, and some might fail. If a test fails there you face the "Hunt" problem I mentioned above: any one of the commits that you added to the development branch -- since the last release to master -- may be the culprit.

Another somewhat related workflow is to have a development branch where tests are run and fork a new branch for each "release". You might add a few features to the development branch, running the tests every time, and (say) once a month fork a branch. This new branch might be release-april-2019.

Quite a few large projects operate this way; those release- branches are basically feature freeze, and no new bugs are introduced. It becomes a bit tricky when bugs are found: you need to first fix them in the develop branch and then "backport" them to the release branches. If the develop branch and the release branch have diverged too much, you may have to implement the fix twice.

Hope that answers your question!

Dan • Mar 9 '19

Thanks Nicolas, that does answer my question and of course leaves me with more to think about. You are right that git is a bit confusing at first. I understand how it works a lot more now - but the one thing I don't find online much is anything written about weird scenarios that happen when using git with a team. Mostly I have found tutorials and such - which are good to an extent but my mind naturally wonders... "what if this happens...?" Your article has helped to answer some of those questions. Good stuff. Thanks again!

Ben Dowen • Mar 14 '19

And even more expensive if it makes it to production! and Even cheap if you find it during "Design" :)