I'm sure many of us have been there. A bug has appeared and you're sure it was working a couple of weeks ago. But it's a big project, with hundreds of commits and thousands of changes since then. Luckily Git has some tools to help track down exactly when and why that breaking change was introduced.
The simpler tool is
git blame. This can be used when you know which code is causing the problem, and you only need to find the commit to figure out why the change was made. For example, say the code in
boggles.lang looks like this
refresh_button.enable() frequency = compute_frequency(next_time) baffler.aeronate(frequency)
and the problem is that
refresh_button is disabled.
Now, perhaps you're familiar with the code. Maybe you even wrote most of this functionality. You go "Of course! Aeronating the baffler starts the norigator and the refresh button is disabled whenever the norigator is running." So you know that the problem is with the line
Of course you can't just go deleting this line. Someone added it for some specific purpose. This is where
git blame comes in. You start with
git blame boggles.lang
and browse to the line
It will show up as something like
30376ff1 (Tomasz 2018-10-23 14:56:11 +0100 65) baffler.aeronate(frequency)
This line has the following pieces of information, in order:
- The commit identifier
- The author of the commit
- The date and time of the commit
- The line number
- The contents of the line itself
What this tells you is the last commit that touched this line. That doesn't mean that it was the commit that introduced the problem. To check the commit, you would run
git show 30376ff1
which might tell you something like
commit 30376ff158bd8713fbf3308efcc6a46d0b9cdf57 Author: Tomasz <email@example.com> Date: Tue Oct 23 14:56:11 2018 +0100 Clean up trailing whitespace
This is not very helpful...
Whitespace changes are practically never what I'm looking for when trying to track down a regression, so that's why I use the
-w option of
git blame to ignore whitespace changes:
git blame -w boggles.lang
Now the output might look like this
3f056217 (Lucia 2018-10-19 10:22:41 +0100 65) baffler.aeronate(frequency)
Again going in with
git show 3f056217
and scrolling to the appropriate line in the diff might now reveal something like this
refresh_button.enable() -fq = compute_frequency(next_time) -baffler.aeronate(fq) +frequency = compute_frequency(next_time) +baffler.aeronate(frequency)
This is a diff that shows only that the local variable
fq was renamed to
frequency but the issue-causing aeronation is still present. We must dig deeper.
git blame has a nice feature where it can begin looking from any specific commit. We now know that the issue we're tracking was introduced before commit
3f056217, so we can run
git blame -w 3f056217^ boggles.lang
^ is important; it tells to start the search from the parent of 3f056217. You can also use
~ instead of
^ to get the parent; the differences of those two do not matter in this case.)
This sequence is already familiar. Scroll to the appropriate line in the output:
7e3d2394 (Peter 2018-10-17 14:02:01 +0100 65) baffler.aeronate(fq)
Check the commit:
git show 7e3d2394
Scroll down to the code in the diff:
refresh_button.enable() +fq = compute_frequency(next_time) +baffler.aeronate(fq)
And success! Now we have found the commit that introduced the problematic aeronation.
So what next? Now I would read the commit message of
7e3d2394, and potentially the whole commit diff, to determine what functionality the commit introduced, and try to figure out how to reconcile those needs with the needs of the refresh button to stay enabled. If I could not figure that out, I would likely have a chat with Peter who introduced the aeronation to see if together we could come up with something. A typical case might be that the aeronation is doing too much and does not in fact need to start the norigator in all cases. This would be handled by splitting the method into two and calling only the aeronation part here.
What if you can't tell where the problematic code is? You can tell that the code has changed since it was first written, but you have no idea how aeronating the baffler could have anything to do with the refresh button, so that's not where your mind goes first. And there have been many changes here recently, so it's not that there's only one change to consider.
Git itself can help you here. The
git bisect command is made for exactly this situation. If you know a commit that is "good" and another later commit that is "bad", you just tell
git bisect that, and it will let you look through commits between them until you zero in on the one that caused the change from good to bad.
The way I begin this process is by first running
git log --first-parent
and look for some commit where the feature was still working. Of course I can't tell that by just reading the log, so I mostly rely on the commit timestamps and my memory of when things were definitely fine.
--first-parent option makes
git log show commits only on the current branch, not commits from other branches that have been merged. It's useful for this kind of case.)
Let's say I picked commit
7102f250 as a likely one where the feature was still working. Now I will do
git checkout 7102f250
and test this version. If I discover that the feature is not working, I will make a note that it was broken already on this version and go back in the logs until I find a working version.
After I have a working version, it's time to start bisecting:
git bisect start git bisect bad master git bisect good 7102f250
(If I discovered some earlier commit where the feature was already broken, I would place that commit's identifier in place of
master on the
git bisect bad master line.)
After executing all these statements, there will be output something like
Bisecting: 105 revisions left to test after this (roughly 7 steps) [3f056217e73576fc6ba8f0cb8c277627fd154e43] <Imagine useful commit message here>
This means that git has determined there are 105 possible versions that could have introduced the bug. It estimates needing 7 tries to determine the actual version (the search is binary-search-like so it needs roughly the binary logarithm of the number of possibilities). And it has checked out a suitable version to try.
Now that git has checked out a version to test, you need to do exactly that. Test whether this now-checked-out version has the bug or not. If it does, tell git
git bisect bad
If it works correctly, tell git
git bisect good
And if for some reason you can't determine whether the bug is present or not, you can do
git bisect skip
For instance, if the version has another bug that prevents you from getting to where your bug of interest is, you would need to use
Continuing with this process, testing each version that git checks out and marking it as
bad, should eventually end up with something like
7e3d2394e73576fc6ba8f0cb8c277627fd154e43 is the first bad commit commit 7e3d2394e73576fc6ba8f0cb8c277627fd154e43 Author: Peter <firstname.lastname@example.org> Date: Wed Oct 17 14:02:01 2018 +0100 Make sure baffler is aeronated...
Now we found the same commit that we found earlier with
git blame. The
bisect process is more time-consuming, requiring testing of each version, but it also requires less manual work, since git is taking care of checking out versions to test.
Finally, after finding where the problem was introduced, you should make a note of the commit identifier and run
git bisect reset
to stop the bisection process.
git bisect is not limited to finding regressions. It is a general tool to locate any change that happened because of a single commit. The "testing" phase doesn't need to involve running the program if the
bad distinction can be made by inspecting the code. Also, when not specifically finding regressions, you can use the aliases
bad, so that you don't get confused by the labels if, say, you're instead tracking down when a specific bug was fixed.
git blame is useful for finding when a specific line was added. But it cannot be used to find when a specific line was deleted. It occasionally happens that I notice a crucial line of code is no longer present, so I want to know when and why it was deleted.
git bisect is a good tool for this: A version containing the line is
good and a version not containing it is
bad. With this, bisecting will find the commit that deleted the line.
There are even more features. You can record the bisect session with
log and get to the same state later with
replay. Editing the log in between lets you change your
good decisions, for instance if you made a mistake. It is also possible to use a script for the testing part with
run, which makes the whole bisection process completely automatic. I rarely find a use for this, since for me the testing is typically unique to the issue, and trying to script it would often be more work. But in some cases, e.g., failing automated tests, it can be useful.
It's never as simple as in a constructed example specifically made to demonstrate how
bisect works. In one of my projects, the result was often very unstable and didn't seem to find the actual problematic commit. Turned out, the compilation process didn't properly recompile everything that was needed after
bisect checked out another commit. That's when I learned to run
git clean -dxf between
bisect steps to make sure I had a clean tree. Usually it's enough to do a regular clean build and installation, though.
Other times, it's happened that I haven't been able to find a working version, no matter how far back I go. That's usually caused by a toolchain update on my computer, or sometimes a dependency update that doesn't get rolled back when checking out earlier code. This can cause serious self-doubt when you first encounter it, but after a few times, you start recognizing it. It helps immensely if the dependency management system uses lock files, or if you specify precise versions for them (some teams ensure this by committing also the dependencies), so that the historical system behavior remains reproducible.
git bisect skip is there when some commit has other problems that make it not testable. But if you end up skipping many commits,
git bisect may in the end just tell you "The problem is in one of these 10 commits". One common situation where this can happen is when a feature branch has been
rebased and the main branch has changed something that new functionality on the feature branch depended on, making all the older commits on the feature branch unusable.
But after the obvious potential problems are dealt with, I find I can trust
git bisect fully. One case that has stuck with me was when
bisect pointed to a commit that seemed to have nothing to do with the bug. I did rerun the bisection but got the same result. After accepting that, yes, this was the commit that introduced the bug, I dug deeper and discovered that the change had in fact exposed a race condition that already existed in the code. This might have blown up in our face at any time. Being able to focus on just a small change discovered with
bisect greatly reduced the time to fix that bug, and perhaps we never would have figured it out without that focus.