A recurring conversation in developer circles is if you should use
git --squash when merging or do explicit merge commits. The short answer: you shouldn't.
People have strong opinions about this. The thing is that my opinion is the correct one. Squashing commits has no purpose other than losing information. It doesn't make for a cleaner history. At most it helps subpar git clients show a cleaner commit graph, and save a bit of space by not storing intermediate file states.
Let me show you why.
In many ways you can just see git as a filesystem.
– Linus (in 'Re: more git updates..' - MARC)
Git is in many ways a very dumb graph database. When you check in code, it actually stores the content of all the tracked files in your repository.
The content of each file is stored as a "blob" node in the database. The filenames are stored separately in a "tree" node: If you rename a file, no new content node will be created. Only a new tree node will be created.
Commits are store as "commit" nodes. A commit object points to a tree, and adds metadata: author, committer, message and parent commits. A merge commit has multiple parents.
Here is a visualization from Scott Chacon's Git Internals:
Enough theory, we have work to get done. Let's create a simple git repository:
> mkdir squash-merges-considered-harmful > cd squash-merges-considered-harmful > git init > echo hello > foo.txt > git add foo.txt > git commit -m "Initial commit" [main (root-commit) 02a154b] Initial commit 1 file changed, 1 insertion(+) create mode 100644 foo.txt > echo more >> foo.txt > git add foo.txt > git commit -m "Add more" [main 16660f8] Add more 1 file changed, 1 insertion(+)
We can now look at the contents of the objects we created:
# initial commit ❯ git cat-file -p 02a154b tree f269b7cd59094d5365ef6b5618098cbcbeee0c43 author Manuel Odendahl <email@example.com> 1653303427 -0400 committer Manuel Odendahl <firstname.lastname@example.org> 1653303427 -0400 Initial commit # initial tree ❯ git cat-file -p f269b7cd59094d5365ef6b5618098cbcbeee0c43 100644 blob ce013625030ba8dba906f756967f9e9ca394464a foo.txt # initial foo.txt ❯ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a hello # second commit ❯ git cat-file -p 16660f8 tree 5a0c4a660a13c0ada7611651399abb362756f83e parent 02a154bc4f0fa9bca567676d45d136619c076a95 author Manuel Odendahl <email@example.com> 1653303485 -0400 committer Manuel Odendahl <firstname.lastname@example.org> 1653303485 -0400 Add more # second tree ❯ git cat-file -p 5a0c4a660a13c0ada7611651399abb362756f83e 100644 blob 2227cddb7f6318ea735a1c4adb52f5cd36c5783c foo.txt ❯ git cat-file -p 2227cddb7f6318ea735a1c4adb52f5cd36c5783c hello more
Branches, tags (and branches, tags on remote repositories) are just pointers to commit nodes.
❯ cat .git/refs/heads/main 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
But Manuel, you ask, how does
git diff and
git merge and all that funky stuff work?
When you run
git diff, git actually uses different diff algorithm to compare the state of two trees, every time.
When you do a rebase, git computes the diff for each commit of the branch before rebase, and then applies those diffs to the destination, thus "moving" the branch over to the destination, with fresh tree and commit nodes.
When you do a merge, git first searches for the common parent of both branches to be merged (this can be a bit more involved depending on your graph). It computes the diff of each branch to that original commit, and then merges both diffs in what is called a three-way merge.
The resulting commit has multiple parent fields. The parent fields don't really mean anything except for informational purposes, the tree the merge commit points to is what actually counts. Once a three-way merge has been computed and applied, git doesn't really care how the resulting tree was computed.
This is literally all there is to git, and the mental model that I use every day, even as I'm doing the most advanced git surgery.
So what is a squash merge? A squash merge is the same as a normal merge, except that it doesn't record only parent commit. It basically slices off a whole part of the git graph, which will later be garbage collected if not referenced anymore. You're basically losing information for no reason.
Let's look at this in practice. Let's create a few commits on top of the ones we have, and then do both a squash merge and a non-squash merge, and look at the results.
> git checkout -B work-branch Switched to a new branch 'work-branch' ❯ echo "Add more" >> foo.txt ❯ git add foo.txt && git commit -m "Add more" [main 4b84cfe] Add more 1 file changed, 1 insertion(+) ❯ echo "Add more" >> foo.txt ❯ git add foo.txt && git commit -m "And more" [main 1836f1c] And more 1 file changed, 1 insertion(+) ❯ git checkout -B no-squash-merge main Switched to a new branch 'no-squash-merge' ❯ git merge --no-squash --no-ff work-branch Merge made by the 'ort' strategy. foo.txt | 2 ++ 1 file changed, 2 insertions(+) ❯ git checkout -B squash-merge main Switched to a new branch 'squash-merge' ❯ git merge --squash --ff work-branch Updating 16660f8..1836f1c Fast-forward Squash commit -- not updating HEAD foo.txt | 2 ++ 1 file changed, 2 insertions(+) ❯ git commit [squash-merge 150c57d] Squashed commit of the following: 1 file changed, 2 insertions(+)
Let's look at the resulting graph and commits.
❯ git log --graph --pretty=oneline --abbrev-commit --all * 150c57d (HEAD -> squash-merge) Squashed commit of the following: | * 535b740 (no-squash-merge) Merge branch 'work-branch' into no-squash-merge |/| | * 1836f1c (work-branch) And more | * 4b84cfe Add more |/ * 16660f8 (main) Add more * 02a154b Initial commit ❯ git cat-file -p no-squash-merge tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697 parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c parent 1836f1c53221ae701a038bf5ae380770ea911665 author Manuel Odendahl <email@example.com> 1653304391 -0400 committer Manuel Odendahl <firstname.lastname@example.org> 1653304391 -0400 Merge branch 'work-branch' into no-squash-merge * work-branch: And more Add more squash-merges-considered-harmful on squash-merge on ☁️ ttc (us-east-1) ❯ git cat-file -p squash-merge tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697 parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c author Manuel Odendahl <email@example.com> 1653304543 -0400 committer Manuel Odendahl <firstname.lastname@example.org> 1653304543 -0400 Squashed commit of the following: commit 1836f1c53221ae701a038bf5ae380770ea911665 Author: Manuel Odendahl <email@example.com> Date: Mon May 23 07:11:08 2022 -0400 And more commit 4b84cfe11aa51da994448e602e1bc4cc6083d691 Author: Manuel Odendahl <firstname.lastname@example.org> Date: Mon May 23 07:11:03 2022 -0400 Add more
You can see that save that both
no-squash-merge point to the exact same tree. The only changed thing is the commit message, and the missing parent in the squash merge.
To read more about the underpinnings of git, I can recommend just experimenting with the git command line, and the following resources:
But Manuel, you say, the history is so much cleaner!
To which I counter that it is actually not. If you want to hide the link to the right parent of the non-squash merge (as it is called, the left parent being
main ), all you need to do is to hide it. If you use the command-line or a proper tool, use the option to only show first parents. If you only look at the first parent, and configure your git tool to fill in a full log history of the branch into the merge commit message (I personally use the github CLI
gh or some git-commit hooks to do it), the squash merge commit is identical to the non squash merge commit.
git log command of mine to quickly look at the history of the main branch, and create a changelog:
> git log --pretty=format:'# %ad %H %s' --date=short --first-parent --reverse # 2022-05-23 02a154bc4f0fa9bca567676d45d136619c076a95 Initial commit # 2022-05-23 16660f8b1d1538ed1b55d8533b3ee7feb68e474c Add more # 2022-05-23 535b740f42e331175f3766c1374116e329a78f7e Merge branch 'work-branch' into no-squash-merge
When using github and pull requests, this will show author, branch name (which would contain ticket name and short description in my case) and date on a single line. Here's a slightly more complex real world example (anonymized)
# 2021-12-15 123 Merge pull request #5937 from garbo/TK-234/feature-1 # 2021-12-16 234 Merge pull request #5938 from bongo/TK-235/feature-2 # 2021-12-16 456 Merge pull request #5939 from gingo/TK-236/feature-3
But Manuel, why keep all those commits lying around when we have all we need in the commit message?
One comes down to just preference. I like to see the actual log of what a person did on their branch. Did they do many small commits? On which days (this might make looking up documents or slack conversations related to the work easier)? Did they merge other branches into their work (useful when resolving merge conflicts and other boo boos)?
I have done a lot of git cleanup work, and while they are not supposed to exist, big merges with thousands of lines happen, and having a single monolithic commit that contains 80 different changes is a nightmare.
The other one actually makes the side history extremely useful. When hunting down for a bug, I often use
git bisect. I first use
git bisect --first-parent to jump from main commit to main commit. But once I found which pull request led to the bug, I bisect on the original branch. Instead of having to figure out which line in the pull-request merge might cause the bug, I have a much more granular path. Often, it surfaces a single line commit, and leads to a painless and immediate bugfix.
As you can drive your bisect with your unit tests, you often have no work to do at all, given sufficiently atomic and small commits on side branches. Losing that capability would seriously impact my sanity when I have to fix bugs.
And that is why squashing history is harmful. It's literally just deleting information from the git graph by losing a single
parent entry into the merge commit.