César H. Vortmann

Posted on May 13, 2019 • Edited on May 19, 2019

What makes a good commit message?

#git

What makes a good commit message?

Communication is arguably the most important of the five Extreme Programming (XP) values.

With the rise of distributed teams it is extremely important to maintain good habits of communication in all forms. Given the differences in timezones, written communication becomes very important. Commit messages are important means of communication between team members and for the lifecycle of the teams and projects since they include the context on which they were created. By inspecting the project history we can find out why some decisions were made when they were made.

During the research to write this post I've stumbled over mostly 2 types of messages:

Countless uninformative commit messages
Messages written in a format that blurred away what was important about the change by pointing outwards of the repository. Let's call them look-elsewhere messages.

I'm going to talk about each one of these types, how we can improve the message to convey more meaning and information and, finally, give some examples of good commit messages.

What matters most in team software development is communication. (Kent Beck, Extreme Programming Explained: Embrace Change)

Uninformative messages

Uninformative messages are those that do not convey any information when you read them. For example:

add cli new
fixes
fix code review comments
no message
description
wip
hackz

They are specially bad on pressure situations where we could leverage the Git history to help us discover information about a problem.

Imagine the following situation: You've just arrived at the office and there's a major bug and your company is losing money. Since it's doesn't seem related to parts of the codebase you're comfortable working with you start investigating by using git log --grep and some keywords. You find nothing. Then you use git log -- file/path in a file you think it may be related to the problem. You find a commit that look suspicious, maybe the commit introduces the error message you're seeing. Unfortunately, again, the commit message does not tell you why the code is that way. You then look the author's name in the hope you can ask for that person's help. No luck, that person is long gone from the company.

Wouldn't be great if the commit messages helped you? None of this is possible with these messages. As I've said, they do not convey any information to the reader. We should avoid them.

Look-elsewhere messages

In this category are messages that include Github issue numbers or Jira identifications in their description. They tend to not include any more information. For example:

Fix issue #123
[JIRA-1234] Fix a problem with the route system when...

These messages have 2 problems:

Favour information that is outside of codebase
Identifications take valuable space from the commit message

Favouring information that is outside the codebase

It's important to information to live the closest to the codebase as possible since the codebase is the single source of truth. Jira or Github or any separated tool from Git are secondary tools. In case any of them burn in a fire or we decide to switch providers, we'll still have access to the Git repository. If all information lives somewhere else that's not the Git repository we would be in a very bad situation. Thus, we should strive to write as much information in the commit messages as we can. Those extra minutes of thought to explain to whoever comes next why the changes were made (not the what, since git show will tell you that) will be worth.

Identifications take valuable space from the commit message

A Jira ticket identification with 10 characters take 20% of the suggested 50 characters of the subject line. Interesting though is that our eyes seem to naturally skip that part of the commit message. That happens because normally we're looking for something in the Git history that we still don't know what ticket it relates to. Similarly, if you go to a Github project searching for an error messages and all commit messages start with a bug tracking number you'll soon notice that your eyes skip those parts. Your eyes are telling you that they are not the most important part of the information you're looking for.

We end up, then, with way less space to write about what was really important. When looking at the Git history tree we want to know how the code evolved, why the changes were made the way they were. We're, hardly, looking for an identification since we do not yet know that JIRA-1234 is related to cached roles while investigating a cached role problem.

That's not to say that Jira identifications are not useful but that they shouldn't take space away from most important information. Simply, they should not come first but should, rather, appear elsewhere. A suggestion is to write them as the last part of the message, after the body of the message was written. For that, we could use --- as a separator and have a numbered or bulleted list of useful information. We can also use [1] tags throughout the message similarly to how references are written in scientific papers.

With those suggestion we can still glimpse information when doing git log. And we can also find commits with git log --oneline --grep=JIRA-1234 if we are actually looking for commits related to JIRA-1234. This way all the information stays in the Git repository and then point outward to other helpful tools.

Good commit messages

Following are examples of what I consider good commit messages given the arguments I've made and guidelines put forward by other people (check the links section). They were taken from Open Source projects (with some adaptations). I've ommited authors and project names to focus in the messages themselves.

Make the cached role coherrent with the actual one

When observing the leader running a master role, set the cached role stored in the state_handler to master as well. Failure to do so resulted in the manually promoted node to continue running with a cached 'replica' role. This led to the failure to create replication slots for the new replicas.

We could do it conditionally, but both reading and writing the role require the same lock, and the unconditional approach makes the unit tests simpler.

Fix rewind behavior when paused

* Check if we can rewind when deciding to call Postgresql.follow during pause.

Without this the following sequence of events occurs:

1. Manual failover from node, demotion sets need_rewind flag. Rewind is not done because can_rewind is False.
2. Cluster is put into paused mode. Rewind is not attempted because recovery.conf matches.
3. PostgreSQL goes down (pg_ctl stop).
4. Because need_rewind is set PostgreSQL is restarted.
5. Sysadmin is unhappy because Patroni is doing stuff behind his back during pause.

* Allow superuser user to be missing from config for rewind.

In other places a missing superuser authentication section is allowed and we default to libpq's use default OS user with no password. This makes rewind work with a missing superuser configuration.

---

Issue: [#365](https://link/to/issue/365)

Make Source.indexOf(ByteString) significantly faster

Previously the algorithm that did this was extremely inefficient, and had worst case runtime of O(N * S * S) for N is size of the bytestring and S is the number of segments.

The new code runs in O(N * S). It accomplishes this by not starting each search at the first segment, which could occur many times when called by RealBufferedSource.

Convert specs to RSpec 3.1.7 syntax with Transpec

This conversion is done by Transpec 2.3.8 with the following command:

    transpec

* 46 conversions
    from: it { should ... }
      to: it { is_expected.to ... }

* 7 conversions
    from: it { should_not ... }
      to: it { is_expected.not_to ... }

* 2 conversions
    from: it { should have(n).items }
      to: it 'has n dependencies' do expect(subject.size).to eq(n) end

* 2 conversions
    from: obj.should
      to: expect(obj).to

* 2 conversions
    from: obj.stub(:message)
      to: allow(obj).to receive(:message)

* 1 conversion
    from: == expected
      to: eq(expected)

* 1 conversion
    from: it { should have(n).items }
      to: it 'has n feature' do expect(subject.size).to eq(n) end

* 1 conversion
    from: it { should have(n).items }
      to: it 'has n features' do expect(subject.size).to eq(n) end

* 1 conversion
    from: it { should have(n).items }
      to: it 'has n files' do expect(subject.size).to eq(n) end

For more details: https://github.com/yujinakayama/transpec#supported-conversions

Formatting

Beside all the previous points is always nice to read something that is well written. We feel better when someone took care with proper formatting and spacing. In that way we can think of a commit message as an Email with subject and text. In fact they are used for that when creating and sending patches by Email. Or even a letter, since that's still used heavily in some parts of the world (🇩🇪). So, remember:

Start your sentences with an uppercase letter
Don't finish the subject line with a dot

Messages generated by Git

It's also recommended that we don't edit the generated commit messages because it's misleading to Git users. When we see a commit messages that starts with Revert we immediately think that it was generated by calling git revert. That same suggestion goes to Merge messages or any other message generated by Git.

Links

Modified and updated from original

DEV Community

What makes a good commit message?

What makes a good commit message?

Uninformative messages

Look-elsewhere messages

Favouring information that is outside the codebase

Identifications take valuable space from the commit message

Good commit messages

Formatting

Messages generated by Git

Links