For probably my first year writing code, I used Git like I take a multivitamin: I was told I should do it, but I didn’t really understand how it works and I only vaguely understood the benefits. Unsurprisingly, I made a ton of Git mistakes. Running ‘git rebase’ was terrifying. There were many times when I sat down to bang out some code and hours later, I was still going line by line fixing merge conflicts wondering what the hell just happened.
But little by little, I started understanding Git. I even came to appreciate it. It is an amazing piece of technology. Although it is highly sophisticated, it is useful and accessible to even beginner developers. You can find lots of beginner guides out there to using Git. You can find lots of guides that define the key terms. In this post, I want to focus on giving you the intuition you need to confidently use Git.
Git really does just 2 things. It tracks changes to a set of code and it enables people to manage those changes. Based on that core functionality, Git offers 2 primary benefits: time travel and collaboration.
Over time, a codebase changes. New features are added. Bugs are fixed. Parts are rewritten (or “refactored” as developers like to call it). Ideally, all these changes are positive. But sometimes they are not. Let’s say you unintentionally introduced a bug into your code and now things are seriously broken. You have 2 options. You could write more code to fix it. Or you could just undo the broken code that you added; you could just go back in time to when your code didn’t have that bug. With Git that is as easy as running one command. This isn’t a hack. Huge companies use Git to undo changes to products that millions of people use.
Changes happen over time and they also happen between people working on the same codebase. Git enables you to track and manage changes that different people make to the same codebase. If you have even one collaborator on a project, this benefit becomes evident immediately.
I think definitions are important, so let’s look at the official definition from the Git homepage:
Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows.
There’s clearly lots of advantages to Git. It’s free, open source, fast, efficient, small, and it works with any size project -- all great! But there are also some things in that definition that maybe weren’t so obvious. One of them may have been “SCM”.
I’ll admit, I had to Google it. The first result on Google didn’t seem right:
So I tried “what does CSM mean” and that also did not work:
Finally, I tried “what does CSM mean git” and got what I was looking for:
No surprises there, hopefully. It should make sense by now that Git is a source code management tool. It turns out that there are other SCMs out there too. Git is actually only 15 years old. SCM tools had existed for decades before Git came along. Apparently they just weren’t that great. Linus Torvalds decided to make a better one and that became Git. (By the way, Linus also created Linux and maintained it for many years -- what a legend).
Another key term from the definition is “distributed version control system”. The “distributed” part in particular is what really sets Git apart. In practice, one of the things it means is that each person working on a codebase keeps a full version on their own computer. A big benefit of this is that it is much faster to work on the code because you don’t have to be constantly in sync with a central server. There are a lot more benefits to a distributed version control system, and you can read about them here.
As I said before, there are tons of guides out there about the commands you need to run to use Git. This one is a good example. Here I’m instead going to focus on giving you an intuitive understanding of the concepts so that you can understand when to use those commands and what they are doing.
I think of Git like a courtroom stenographer. It sits there silently in the background just recording all the changes you make to your code. It only “speaks up” when you ask it for something. You can ask it to show you the changes or you can tell it to do something with those changes.
In any Git project there are two versions of the same repository: the remote repo and the local repo. The remote repo lives on some remote server (i.e. “the cloud”). These days, that remote server is probably GitHub, GitLab, or Bitbucket. Your local repo is a copy of the same code that is in the remote repo, but the local repo is stored on your computer. When you’re working with Git, one of the main things you’ll be doing is keeping the remote repo in sync with your local repo and vice-versa. I’ll elaborate. Let’s say you’re adding a new feature. You will make changes to your local repo. Once you’re finished, you will then push these changes to the remote repo. Everyone else on the project can then pull these changes so that their local repo is up to date with the remote repo.
We organize human history by years. The Magna Carta was written in 1215. The US was founded in 1776. You are X number of years old.
We organize a codebase’s history by commit. This feature was added in “commit ABC”. That bug was introduced in “commit XYZ”. The latest stable version is “commit JKL”.
A commit is a snapshot of the current state of the codebase. When a change is made, there is a new current state of the codebase, and this new current state is captured in a new commit. That’s the cycle, over and over: commit + changes = new commit.
These commits form a timeline of the project. This timeline is called the “git history”.
At any point in time, any number of things could happen. You could go eat some food. You could call a friend. You could close your laptop. In other words, branching out from this point in time are many possible futures.
When you’re working with other people, this branching feature becomes especially useful. Let’s say you want to add a new feature. You would get the latest version of the codebase. The convention is that the main version of a codebase is stored on a branch named “master”. So to be more precise, you will get the latest commit on the master branch. You create a new branch off that commit and start implementing your feature. Once you’re done, you create a new commit and add that commit to the master branch.
What if while you were working on that feature your boss comes to you and asks you to fix a small bug? You can just create a new branch from the latest master commit, fix the bug, create a new commit, and add it to the master branch. Then you can just go back to the branch you were building the new feature on and pick up where you left off.
The pattern is the same over and over. You create a branch off of the latest commit in the master branch, make changes, create a new commit, and then add that commit to the master branch.
Now that you have the intuition, let me just clarify 2 terms that you will come across a lot. The first is “merging.” When you add a commit from your branch to the master branch, what actually happens is that your branch is “merged” into the master branch. It’s like two roads merging into one. Your branch becomes the master branch, just like the one road becomes the other road. This analogy is not perfect though because your branch doesn’t actually disappear automatically. What is really happening is that the contents of your branch are copied and merged with the contents of the master branch. Your branch and its contents will still exist until you delete them.
The other term is “pull request,” and it is far less intuitive than merging. If you’re working on a codebase with other people, you probably won’t be allowed to just add a commit to the master branch, especially if you’re working at a company. Instead, you make a request to add your commit. Someone else reviews it, makes sure everything looks good, and then they merge your branch into the master branch and your commit is added.
In GitHub this request is called a “pull request”. This is weird. You pull changes from the remote repo and push changes to the remote repo. This is how naturally everyone describes it. When you submit a request like this, in your mind, you say “I am requesting to push these changes into master.” Therefore, I think it’s more intuitive to call a “pull request” a “push request”. But the logic GitHub uses is that you are requesting that your changes be pulled into master. That’s why it’s called a “pull request”. GitLab, probably GitHub’s main competitor, actually calls these requests “merge requests” which makes a lot more sense to me.
This is where Git starts to amaze. Just storing snapshots of the state of the repo isn’t that impressive. But Git can determine the differences between 2 snapshots. This is impressive.
Recall that a git history is just a chain of commits. This is useful but this is like knowing just dates on a timeline. For example, World War I started in 1914. Then World War II started in 1939. But how did we go from one world war to another one? Why did a second world war happen just 25 years after the first one ended? To answer these questions, we need to look at how things changed in between those two points in time.
The same intuition applies to the history of your codebase. Git can find the differences between two commits, so it can tell you what changed between one commit and the next. With Git, you can easily see how the codebase changed over time.
This difference-finding functionality has all kinds of uses. When you create a pull request (merge request), Git will determine the differences between your commit and the latest commit on the master branch. GitHub (and the other products) display these differences in a UI. If you’re reviewing code, you only have to look at what changed. Another common use is to find bugs. If you know when a bug appeared, you can compare two commits to see the precise code that introduced it.
If you’ve had bad experiences with Git, it’s likely because of merge conflicts. They have caused developers to use more bad words than perhaps anything else. The thing with merge conflicts, however, is that Git has to have them. Even if Git could read your mind, they couldn’t be eradicated completely. This will make sense shortly.
Imagine you’ve been working on a new feature in a branch called: “new-buttons”. Yesterday, your collaborator added a new commit to the master branch. Before you keep working on your feature, you want to make sure that you are working with the latest version of the code. To do this, you have to do the following: You pull the latest commit from the master branch on the remote repo. Now your local version of master is up-to-date with the remote one. Now you need to merge this up-to-date version of master into the “new-buttons” branch you’ve been working on. Let’s pause for a moment. These steps may be unfamiliar but if you’ve read this far, you should have the intuition to understand them. If they don’t make sense yet, reread the paragraph, think about it some more, and if you’re still not getting it, just ask in the comments and I’ll be happy to help :-)
You merge the master branch into your “new-buttons” branch — and there are merge conflicts. Why? You touched the same code that your colleagues touched in their commits. Let’s get some intuition for why this is a problem. Remember that branches are like alternate futures branching out from one point in time. In one of those alternate futures, you have short hair on November 23, 2030. In another one, you have long hair on that date. But in any one future / branch, you can’t have both. If you tried to combine those two possible futures, if you tried to merge those two branches, you would have to pick: Will you have short or long hair?
When there is a merge conflict Git is telling you, “You need to pick.” Let’s say both you and your collaborator changed the background color of the app. When you try to merge their changes into your branch, Git will raise a merge conflict and ask you to resolve it by picking either your version or your collaborator’s version of the conflicting code. Git doesn’t know which one you want to use. Maybe you don’t even know! Maybe you need to call a meeting and make a decision with your collaborator. Thanks to Git, you know that you and your collaborator have incompatible versions of the future and that you need to fix this.
Merge conflicts can be avoided by making sure that no two developers touch the same piece of code. Because merge conflicts are so annoying, and can require a lot of work to resolve, developers go to great lengths to stay organized and coordinate work between one another in order to avoid them. However, in practice merge conflicts are nearly impossible to avoid if there is more than one person working on a project. Thankfully Git not only clearly flags the conflicts, but it also makes it easy to pick the versions of the code you want to use in order to resolve them.
I hope you now have some good intuition about Git that will help you have a better experience working with it. From here, I would suggest jumping into particular commands and workflows in Git. You could start with this great article that goes in-depth on the main parts. I would encourage you to think about what each command and series of commands is actually doing so that you can continue building your intuition.
Good luck and happy coding!
Cover photo credit: Caleb Jones on Unsplash