A recurring conversation in developer circles is if you should use git --squash when merging or do explicit merge commits. The short answer: you shouldn't.
People have strong opinions about this. The thing is that my opinion is the correct one. Squashing commits has no purpose other than losing information. It doesn't make for a cleaner history. At most it helps subpar git clients show a cleaner commit graph, and save a bit of space by not storing intermediate file states.
Let me show you why.
Git tracks contents, not diffs
In many ways you can just see git as a filesystem.
– Linus (in 'Re: more git updates..' - MARC)
Git is in many ways a very dumb graph database. When you check in code, it actually stores the content of all the tracked files in your repository.
The content of each file is stored as a "blob" node in the database. The filenames are stored separately in a "tree" node: If you rename a file, no new content node will be created. Only a new tree node will be created.
Commits are store as "commit" nodes. A commit object points to a tree, and adds metadata: author, committer, message and parent commits. A merge commit has multiple parents.
Here is a visualization from Scott Chacon's Git Internals:
Looking at a real git repository
Enough theory, we have work to get done. Let's create a simple git repository:
> mkdir squash-merges-considered-harmful
> cd squash-merges-considered-harmful
> git init
> echo hello > foo.txt
> git add foo.txt
> git commit -m "Initial commit"
[main (root-commit) 02a154b] Initial commit
1 file changed, 1 insertion(+)
create mode 100644 foo.txt
> echo more >> foo.txt
> git add foo.txt
> git commit -m "Add more"
[main 16660f8] Add more
1 file changed, 1 insertion(+)
We can now look at the contents of the objects we created:
# initial commit
❯ git cat-file -p 02a154b
tree f269b7cd59094d5365ef6b5618098cbcbeee0c43
author Manuel Odendahl <wesen@ruinwesen.com> 1653303427 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653303427 -0400
Initial commit
# initial tree
❯ git cat-file -p f269b7cd59094d5365ef6b5618098cbcbeee0c43
100644 blob ce013625030ba8dba906f756967f9e9ca394464a foo.txt
# initial foo.txt
❯ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello
# second commit
❯ git cat-file -p 16660f8
tree 5a0c4a660a13c0ada7611651399abb362756f83e
parent 02a154bc4f0fa9bca567676d45d136619c076a95
author Manuel Odendahl <wesen@ruinwesen.com> 1653303485 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653303485 -0400
Add more
# second tree
❯ git cat-file -p 5a0c4a660a13c0ada7611651399abb362756f83e
100644 blob 2227cddb7f6318ea735a1c4adb52f5cd36c5783c foo.txt
❯ git cat-file -p 2227cddb7f6318ea735a1c4adb52f5cd36c5783c
hello
more
Branches, tags (and branches, tags on remote repositories) are just pointers to commit nodes.
❯ cat .git/refs/heads/main
16660f8b1d1538ed1b55d8533b3ee7feb68e474c
But we still use diffs and merges
But Manuel, you ask, how does git diff and git merge and all that funky stuff work?
When you run git diff, git actually uses different diff algorithm to compare the state of two trees, every time.
When you do a rebase, git computes the diff for each commit of the branch before rebase, and then applies those diffs to the destination, thus "moving" the branch over to the destination, with fresh tree and commit nodes.
When you do a merge, git first searches for the common parent of both branches to be merged (this can be a bit more involved depending on your graph). It computes the diff of each branch to that original commit, and then merges both diffs in what is called a three-way merge.
The resulting commit has multiple parent fields. The parent fields don't really mean anything except for informational purposes, the tree the merge commit points to is what actually counts. Once a three-way merge has been computed and applied, git doesn't really care how the resulting tree was computed.
This is literally all there is to git, and the mental model that I use every day, even as I'm doing the most advanced git surgery.
What is a squash merge?
So what is a squash merge? A squash merge is the same as a normal merge, except that it doesn't record only parent commit. It basically slices off a whole part of the git graph, which will later be garbage collected if not referenced anymore. You're basically losing information for no reason.
Let's look at this in practice. Let's create a few commits on top of the ones we have, and then do both a squash merge and a non-squash merge, and look at the results.
> git checkout -B work-branch
Switched to a new branch 'work-branch'
❯ echo "Add more" >> foo.txt
❯ git add foo.txt && git commit -m "Add more"
[main 4b84cfe] Add more
1 file changed, 1 insertion(+)
❯ echo "Add more" >> foo.txt
❯ git add foo.txt && git commit -m "And more"
[main 1836f1c] And more
1 file changed, 1 insertion(+)
❯ git checkout -B no-squash-merge main
Switched to a new branch 'no-squash-merge'
❯ git merge --no-squash --no-ff work-branch
Merge made by the 'ort' strategy.
foo.txt | 2 ++
1 file changed, 2 insertions(+)
❯ git checkout -B squash-merge main
Switched to a new branch 'squash-merge'
❯ git merge --squash --ff work-branch
Updating 16660f8..1836f1c
Fast-forward
Squash commit -- not updating HEAD
foo.txt | 2 ++
1 file changed, 2 insertions(+)
❯ git commit
[squash-merge 150c57d] Squashed commit of the following:
1 file changed, 2 insertions(+)
Let's look at the resulting graph and commits.
❯ git log --graph --pretty=oneline --abbrev-commit --all
* 150c57d (HEAD -> squash-merge) Squashed commit of the following:
| * 535b740 (no-squash-merge) Merge branch 'work-branch' into no-squash-merge
|/|
| * 1836f1c (work-branch) And more
| * 4b84cfe Add more
|/
* 16660f8 (main) Add more
* 02a154b Initial commit
❯ git cat-file -p no-squash-merge
tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697
parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
parent 1836f1c53221ae701a038bf5ae380770ea911665
author Manuel Odendahl <wesen@ruinwesen.com> 1653304391 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653304391 -0400
Merge branch 'work-branch' into no-squash-merge
* work-branch:
And more
Add more
squash-merges-considered-harmful on squash-merge on ☁️ ttc (us-east-1)
❯ git cat-file -p squash-merge
tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697
parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
author Manuel Odendahl <wesen@ruinwesen.com> 1653304543 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653304543 -0400
Squashed commit of the following:
commit 1836f1c53221ae701a038bf5ae380770ea911665
Author: Manuel Odendahl <wesen@ruinwesen.com>
Date: Mon May 23 07:11:08 2022 -0400
And more
commit 4b84cfe11aa51da994448e602e1bc4cc6083d691
Author: Manuel Odendahl <wesen@ruinwesen.com>
Date: Mon May 23 07:11:03 2022 -0400
Add more
You can see that save that both squash-merge and no-squash-merge point to the exact same tree. The only changed thing is the commit message, and the missing parent in the squash merge.
To read more about the underpinnings of git, I can recommend just experimenting with the git command line, and the following resources:
But the history!
But Manuel, you say, the history is so much cleaner!
To which I counter that it is actually not. If you want to hide the link to the right parent of the non-squash merge (as it is called, the left parent being main ), all you need to do is to hide it. If you use the command-line or a proper tool, use the option to only show first parents. If you only look at the first parent, and configure your git tool to fill in a full log history of the branch into the merge commit message (I personally use the github CLI gh or some git-commit hooks to do it), the squash merge commit is identical to the non squash merge commit.
A favorite git log command of mine to quickly look at the history of the main branch, and create a changelog:
> git log --pretty=format:'# %ad %H %s' --date=short --first-parent --reverse
# 2022-05-23 02a154bc4f0fa9bca567676d45d136619c076a95 Initial commit
# 2022-05-23 16660f8b1d1538ed1b55d8533b3ee7feb68e474c Add more
# 2022-05-23 535b740f42e331175f3766c1374116e329a78f7e Merge branch 'work-branch' into no-squash-merge
When using github and pull requests, this will show author, branch name (which would contain ticket name and short description in my case) and date on a single line. Here's a slightly more complex real world example (anonymized)
2021-12-15 123 Merge pull request #5937 from garbo/TK-234/feature-1
2021-12-16 234 Merge pull request #5938 from bongo/TK-235/feature-2
2021-12-16 456 Merge pull request #5939 from gingo/TK-236/feature-3
But why?
But Manuel, why keep all those commits lying around when we have all we need in the commit message?
One comes down to just preference. I like to see the actual log of what a person did on their branch. Did they do many small commits? On which days (this might make looking up documents or slack conversations related to the work easier)? Did they merge other branches into their work (useful when resolving merge conflicts and other boo boos)?
I have done a lot of git cleanup work, and while they are not supposed to exist, big merges with thousands of lines happen, and having a single monolithic commit that contains 80 different changes is a nightmare.
The other one actually makes the side history extremely useful. When hunting down for a bug, I often use git bisect. I first use git bisect --first-parent to jump from main commit to main commit. But once I found which pull request led to the bug, I bisect on the original branch. Instead of having to figure out which line in the pull-request merge might cause the bug, I have a much more granular path. Often, it surfaces a single line commit, and leads to a painless and immediate bugfix.
As you can drive your bisect with your unit tests, you often have no work to do at all, given sufficiently atomic and small commits on side branches. Losing that capability would seriously impact my sanity when I have to fix bugs.
Conclusion
And that is why squashing history is harmful. It's literally just deleting information from the git graph by losing a single parent entry into the merge commit.

Latest comments (80)
The subtlety that's always missing in these discussions is that this isn't really (or at least doesn't need to be) an across-the-board decision.
In most work settings, there are two scenarios - features and minor changes:
Guy works on a new feature for a week and makes tons of commits, essentially just using git like a "disk drive" - this sounds bad, but if that's what they set out to do for this particular task, squashing before submitting the PR is probably a must.
Guy makes 4-5 changes to
package.json, carefully documenting each change with commit messages explaining why he made that particular change. Squashing in this case would be absolutely terrible.If you squash in case (2) you're making everyone's job harder. If I'm trying to solve a problem, and I place the cursor over a version constraint in
package.json, and I see Guy's most recent commit to this line, I need to be able to see why he made that particular change - if it's been squashed with 4-5 other changes, I can't tell which reasons were the ones pertainining to that particular line change. I can't make sense of the changes, and I can't revert the change.On the other hand, if you didn't squash in case (1) you're just leaving a very long and noisy commit log where every 10th or 50th commit explains anything actually useful.
Both of these situations are bad.
But if you've enabled squash commits as a default, you're going to lose a lot of useful information - that's why this is not a decision you can make across-the-board. It needs to be every contributor's decision if they squash - they should do that locally and not bother reviewers with even seeing that noisy history.
(and if your team's entire log history is useless noise that's never helpfully explains why any change is made, well, then squash away for all I care - you've got a much bigger cultural problem.)
Absolutely true that
mergeis the heart of git and squashing is a kind of perversion. It just so happens that it's exactly the right kind of perverted for some team workflows.In my company we have a pretty strict rule about squashing your commits on your private branch into a neat, minimal history before you merge your PR. This makes sense for keeping shared history manageable but causes many problems.
git rebase -i HEAD~2I don't get the last 2 commits, I get all the commits that were merged in. That's one of the reason our company's policy is torebaseon trunk, notmergeit in. You can see how far we're getting from the promised land of merge-only purity here, and toward the greater pain of fixing merge conflicts with rebase.And there's also a much more important problem. With every PR, the last commit has passed through an extensive CI pipeline of checks. Ensuring every commit in the PR passes all checks infeasible. As a result, all those other commits must be assumed to be broken (especially when they're artificial Frankenstein commits created flyby in interactive rebase). These unsafe commits lie around as hand grenades with loose pins left strewn all around. Release managers must be extremely careful to avoid them when trying to assemble a good release. As a rule of thumb that means ignore everything except merge commits, but what if there's a fast-forward merge? We could get a workaround on that but just why should a large team of variously skilled developers be exposed to a history that is mostly made of unsafe commits?
A solution
Outcome
Drawbacks
git blameis less helpful for getting granular detail. If we were using git the way its free software advocates intended, as a complete and self-contained source of truth for codebase history, then this would be a dealbreaker. But that's not how we're using it. We have a self-hosted Gitlab and the PRs with all their attached comments and CI runs are a far richer source of historical information than the git repo alone. When I need to understand why something was done that's always where I first look. The purism of avoiding vendor lock-in is nice, but I've never seen it amount to something for repo hosting.Conclusion
I agree with these people that the benefit far outweighs the gotchas in practice.
For any developers who actually code professionally in a big team, this is horrible advice, nobody has time to care about your commit history it's just noise. People don't spend time investigating where this original issue happened in which sub-commit of which branch, blah blah, you are busy finding solutions and moving on to the next task.
Pedantic un-pragmatic advice, ignore this article and start coding, you are not an academic or a historian.
“Squashing commits has no purpose other than losing information.”
Cleaning your house has no purpose other than losing dirt. Pretty neat tho…
What no one is mentioning is that the squash feature on GitHub PRs preserves the original commits that were squashed. If you REALLY need to go back and examine the granular history of a PR than you can still do so. On teams I've been on, we strive for PRs that are not over scoped and where that squashed message tells you exactly what feature / fix was added by those line changes. We also often use Conventional Commits, which help in a big way with release note automation. When I look at a
mainbranch history like this:I see VERY clearly what features and fixes have gone in between versions 1.0.0 and 1.1.0. I also have easy links to the PRs that were squashed to produce those commits if I need to drill down any further. If a feature needs to be reverted it's a very easy reversion (no extra parameters).
If you end up with a PR that has a very large scope, there are two things one can/should do:
I'll agree that just having a "linear commit history" shouldn't be the only reason for doing squash commits. But if it simplifies your team's workflow, reduces cognitive load, and makes understanding exactly what is included in a release easier to find then I say it's worth doing.
That said, I feel there is a "best of both worlds" place we could get to if we strived for it.
First, a merge commit is really a squash but with an extra parent link to a branch where the squash originated. I wish this was driven home more however the standard commit title for these PR merge commits is always something like these:
Those titles don't help me. It has the PR number, which I have to individually click on and look up, and a branch name, which could easily be too brief, poorly written or even irrelevant.
Nothing in Git, from what I understand, prevents those titles being similar to the squash titles I shared above. So if GitHub produced them you'd suddenly have the clarity you have with squash merges.
Second, the Git CLI commands and various Git UIs default to showing/working with the full branch out history of everything. You need to know special parameters or set certain settings in order to see and work with a simplified linear view. If these tools defaulted to a linear history and required special parameters in order to drill down into merged branches I feel that would improve the developer experience a bunch. You get an easy to understand summarized linear history and the ability to go deeper when you need to.
Of course, outside of the GUIs maybe, any changes in how people work with Git are very hard pushes from what I understand.
Third, GitHub could allow the PR author to set the merging strategy to be used in advance. Since each developer may have their own style, some with very intentful and effortful PR commits like your own @wesen, others who commit WIP things quickly and often, and some with a mix. This gives the author the ability to decide how they will ultimately formulate their PR. Obviously certain projects can still limit what PR merge strategies are available, and admins could still override the author's preset wishes. But the author at least has a chance to influence how the commits in the PR will be laid into the base branch.
But just to wrap up my argument, I think like any other tool squashing vs merging vs rebasing are options that teams can consider and make a decision on using given whatever their needs and circumstances are. There is no one size fits all approach to it.
Alternatively, if your git history after working on a new feature is just "wip brrr whoops", don't even bother your team with having to see that in your PR: squash them on your own local system before opening the PR. 🙌
It feels so good to read a tech blog not written by a junior enthusiast with a degree trying to skip that 5 years learning period.
Thanks mate.
I despise git for always going against my way, and I am looking at alternatives CVS workflows. The whole fact that there is room for arguing is so toxic.
I really liked the way you explained the way squash is working. But it seems that the main argument against squashing is that it just drops the history?
That's fair, but it doesn't mean squash is bad, it means one just need to know the cost, correct?
Yes. But most people argue that it "cleans up" history, because they are unaware that you can easily hide the right parent when printing out logs, for example. I find losing history a very high cost to avoid using a pretty-print flag.
That's a very good argument. The other perspective to consider is that more and more people don't use git from command line so they see whatever they git tool shows, which may be unable to do pretty-print in a firts place
One reason I'm for the squash merge camp hasn't been mentioned, so I think I should mention it here so @wesen can correct me.
We use squash merge to make it easy to revert a whole PR since it's just a commit.
It can be reverted after some time has passed easily by just reverting that commit.
What do you recommend for this so I can leave the wrong squash merge camp and follow the righteous path oh great @wesen .
you can do exactly the same for a merge commit by using
git revert -m1. The squash merge commit and the merge commit both point to the same tree hash, they only differ wrt the parent commits. With a squash merge, you only have 1, sogit revertknows "ok well you just want to revert to the parent". With the merge commit, you have 2, so you have to tell it "please use the left parent (aka, the parent on the main branch) to revert to". easy peasy!A number of commentors have made the distinction of squashing before the PR is submitted and after it is submitted. I contend it's an irrelevant distinction.
If you are in the squash-before-but-not-after crowd, I counter that once a PR starts being reviewed, and updated and re-reviewed, you're going to get a whole list of commits that you'll events up wanting to squash anyway.
My criteria for squashing is this: for each particular commit, if you cannot roll back that commit and have a working functional system, then there is no point in having that commits in your history; squash it out.
Now, if you think there is value in the various conversations surrounding those commits, then keep them around, off the
mainbranch like this:mainbranch directly...), naming the copy itarchive/branchnamebranchbnamearchive/branchnamein the PR's comments.Great post!