Manuel Odendahl

Posted on May 23, 2022

⛔ Squash commits considered harmful ⛔

#programming #productivity #git #tutorial

A recurring conversation in developer circles is if you should use git --squash when merging or do explicit merge commits. The short answer: you shouldn't.

People have strong opinions about this. The thing is that my opinion is the correct one. Squashing commits has no purpose other than losing information. It doesn't make for a cleaner history. At most it helps subpar git clients show a cleaner commit graph, and save a bit of space by not storing intermediate file states.

Let me show you why.

Git tracks contents, not diffs

In many ways you can just see git as a filesystem.
– Linus (in 'Re: more git updates..' - MARC)

Git is in many ways a very dumb graph database. When you check in code, it actually stores the content of all the tracked files in your repository.

The content of each file is stored as a "blob" node in the database. The filenames are stored separately in a "tree" node: If you rename a file, no new content node will be created. Only a new tree node will be created.

Commits are store as "commit" nodes. A commit object points to a tree, and adds metadata: author, committer, message and parent commits. A merge commit has multiple parents.

Here is a visualization from Scott Chacon's Git Internals:

Looking at a real git repository

Enough theory, we have work to get done. Let's create a simple git repository:



> mkdir squash-merges-considered-harmful
> cd squash-merges-considered-harmful 
> git init
> echo hello > foo.txt
> git add foo.txt
> git commit -m "Initial commit"
[main (root-commit) 02a154b] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 foo.txt
> echo more >> foo.txt
> git add foo.txt
> git commit -m "Add more" 
[main 16660f8] Add more
 1 file changed, 1 insertion(+)

We can now look at the contents of the objects we created:



# initial commit
❯ git cat-file -p 02a154b
tree f269b7cd59094d5365ef6b5618098cbcbeee0c43
author Manuel Odendahl <wesen@ruinwesen.com> 1653303427 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653303427 -0400

Initial commit
# initial tree
❯ git cat-file -p f269b7cd59094d5365ef6b5618098cbcbeee0c43
100644 blob ce013625030ba8dba906f756967f9e9ca394464a    foo.txt
# initial foo.txt
❯ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello

# second commit
❯ git cat-file -p 16660f8
tree 5a0c4a660a13c0ada7611651399abb362756f83e
parent 02a154bc4f0fa9bca567676d45d136619c076a95
author Manuel Odendahl <wesen@ruinwesen.com> 1653303485 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653303485 -0400

Add more
# second tree
❯ git cat-file -p 5a0c4a660a13c0ada7611651399abb362756f83e
100644 blob 2227cddb7f6318ea735a1c4adb52f5cd36c5783c    foo.txt
❯ git cat-file -p 2227cddb7f6318ea735a1c4adb52f5cd36c5783c
hello
more

Branches, tags (and branches, tags on remote repositories) are just pointers to commit nodes.



❯ cat .git/refs/heads/main         
16660f8b1d1538ed1b55d8533b3ee7feb68e474c

But we still use diffs and merges

But Manuel, you ask, how does git diff and git merge and all that funky stuff work?

When you run git diff, git actually uses different diff algorithm to compare the state of two trees, every time.

When you do a rebase, git computes the diff for each commit of the branch before rebase, and then applies those diffs to the destination, thus "moving" the branch over to the destination, with fresh tree and commit nodes.

When you do a merge, git first searches for the common parent of both branches to be merged (this can be a bit more involved depending on your graph). It computes the diff of each branch to that original commit, and then merges both diffs in what is called a three-way merge.

The resulting commit has multiple parent fields. The parent fields don't really mean anything except for informational purposes, the tree the merge commit points to is what actually counts. Once a three-way merge has been computed and applied, git doesn't really care how the resulting tree was computed.

This is literally all there is to git, and the mental model that I use every day, even as I'm doing the most advanced git surgery.

What is a squash merge?

So what is a squash merge? A squash merge is the same as a normal merge, except that it doesn't record only parent commit. It basically slices off a whole part of the git graph, which will later be garbage collected if not referenced anymore. You're basically losing information for no reason.

Let's look at this in practice. Let's create a few commits on top of the ones we have, and then do both a squash merge and a non-squash merge, and look at the results.



> git checkout -B work-branch
Switched to a new branch 'work-branch'
❯ echo "Add more" >> foo.txt
❯ git add foo.txt && git commit -m "Add more"
[main 4b84cfe] Add more
 1 file changed, 1 insertion(+)
❯ echo "Add more" >> foo.txt                 
❯ git add foo.txt && git commit -m "And more"
[main 1836f1c] And more
 1 file changed, 1 insertion(+)
❯ git checkout -B no-squash-merge main
Switched to a new branch 'no-squash-merge'
❯ git merge --no-squash --no-ff work-branch
Merge made by the 'ort' strategy.
 foo.txt | 2 ++
 1 file changed, 2 insertions(+)
❯ git checkout -B squash-merge main
Switched to a new branch 'squash-merge'
❯ git merge --squash --ff work-branch
Updating 16660f8..1836f1c
Fast-forward
Squash commit -- not updating HEAD
 foo.txt | 2 ++
 1 file changed, 2 insertions(+)
❯ git commit
[squash-merge 150c57d] Squashed commit of the following:
 1 file changed, 2 insertions(+)

Let's look at the resulting graph and commits.



❯ git log --graph --pretty=oneline --abbrev-commit --all
* 150c57d (HEAD -> squash-merge) Squashed commit of the following:
| * 535b740 (no-squash-merge) Merge branch 'work-branch' into no-squash-merge
|/| 
| * 1836f1c (work-branch) And more
| * 4b84cfe Add more
|/  
* 16660f8 (main) Add more
* 02a154b Initial commit
❯ git cat-file -p no-squash-merge
tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697
parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
parent 1836f1c53221ae701a038bf5ae380770ea911665
author Manuel Odendahl <wesen@ruinwesen.com> 1653304391 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653304391 -0400

Merge branch 'work-branch' into no-squash-merge

* work-branch:
  And more
  Add more

squash-merges-considered-harmful on  squash-merge on ☁️  ttc (us-east-1) 
❯ git cat-file -p squash-merge   
tree 58c1fb22faa444b264e98a5ae4c4ddb07be09697
parent 16660f8b1d1538ed1b55d8533b3ee7feb68e474c
author Manuel Odendahl <wesen@ruinwesen.com> 1653304543 -0400
committer Manuel Odendahl <wesen@ruinwesen.com> 1653304543 -0400

Squashed commit of the following:

commit 1836f1c53221ae701a038bf5ae380770ea911665
Author: Manuel Odendahl <wesen@ruinwesen.com>
Date:   Mon May 23 07:11:08 2022 -0400

    And more

commit 4b84cfe11aa51da994448e602e1bc4cc6083d691
Author: Manuel Odendahl <wesen@ruinwesen.com>
Date:   Mon May 23 07:11:03 2022 -0400

    Add more

You can see that save that both squash-merge and no-squash-merge point to the exact same tree. The only changed thing is the commit message, and the missing parent in the squash merge.

To read more about the underpinnings of git, I can recommend just experimenting with the git command line, and the following resources:

But the history!

But Manuel, you say, the history is so much cleaner!

To which I counter that it is actually not. If you want to hide the link to the right parent of the non-squash merge (as it is called, the left parent being main ), all you need to do is to hide it. If you use the command-line or a proper tool, use the option to only show first parents. If you only look at the first parent, and configure your git tool to fill in a full log history of the branch into the merge commit message (I personally use the github CLI gh or some git-commit hooks to do it), the squash merge commit is identical to the non squash merge commit.

A favorite git log command of mine to quickly look at the history of the main branch, and create a changelog:



> git log --pretty=format:'# %ad %H %s' --date=short --first-parent --reverse
# 2022-05-23 02a154bc4f0fa9bca567676d45d136619c076a95 Initial commit
# 2022-05-23 16660f8b1d1538ed1b55d8533b3ee7feb68e474c Add more
# 2022-05-23 535b740f42e331175f3766c1374116e329a78f7e Merge branch 'work-branch' into no-squash-merge

When using github and pull requests, this will show author, branch name (which would contain ticket name and short description in my case) and date on a single line. Here's a slightly more complex real world example (anonymized)



  
  
  2021-12-15 123 Merge pull request #5937 from garbo/TK-234/feature-1


  
  
  2021-12-16 234 Merge pull request #5938 from bongo/TK-235/feature-2


  
  
  2021-12-16 456 Merge pull request #5939 from gingo/TK-236/feature-3

But why?

But Manuel, why keep all those commits lying around when we have all we need in the commit message?

One comes down to just preference. I like to see the actual log of what a person did on their branch. Did they do many small commits? On which days (this might make looking up documents or slack conversations related to the work easier)? Did they merge other branches into their work (useful when resolving merge conflicts and other boo boos)?

I have done a lot of git cleanup work, and while they are not supposed to exist, big merges with thousands of lines happen, and having a single monolithic commit that contains 80 different changes is a nightmare.

The other one actually makes the side history extremely useful. When hunting down for a bug, I often use git bisect. I first use git bisect --first-parent to jump from main commit to main commit. But once I found which pull request led to the bug, I bisect on the original branch. Instead of having to figure out which line in the pull-request merge might cause the bug, I have a much more granular path. Often, it surfaces a single line commit, and leads to a painless and immediate bugfix.

As you can drive your bisect with your unit tests, you often have no work to do at all, given sufficiently atomic and small commits on side branches. Losing that capability would seriously impact my sanity when I have to fix bugs.

Conclusion

And that is why squashing history is harmful. It's literally just deleting information from the git graph by losing a single parent entry into the merge commit.

Oldest comments (80)

Manuel Odendahl • May 23 '22 • Edited

If you have a good reason to squash commit, please post it here. But I don't think you have.

ecyrbe • May 23 '22

I have some pretty good reasons to squash my commits when working with a team:

Evoid rebase nightmare where i will need to fix the same conflicts for each commits.
Evoid revert nightmare when reverting a faulty merges
Remove easilly uneeded git History (essentially bug hunting commits, trial and errror commits). And don't tell anyone to not commit unfinished work. Git was created to allow and encourage it.
Your developpement process has nothing (brunch of test red, impl test green, refactor) to do with your product History. What matters is the feature History. And that's what i want to see in the product git History.

Manuel Odendahl • May 23 '22

There's a couple of tricks to have an easier time rebasing:

You can avoid "rebase nightmare" by using git rerere. It records how you resolve conflicts and allows you to replay it.
For "reverting", you just need to checkout the tree to the state that you want to revert to, and make an appropriate commit message.

I always think "tree state" first, more so than caring about individual commits. I can always link up the graph by manually putting in a parent link when merging, if I do want to show what happened in the history.

Realizing that git only cares about file contents, not diffs or commit or patches, really freed up how I can navigate "complicated" issues.

For the last two points you raise, my approach is to use --first-parent or similar flags to just look at the part of the history i care about (usually, one commit per ticket on the main branch) and link it up to product features (the ticket themselves). No need to squash.

Ingo Steinke, web developer • May 23 '22

Tell that to GitHub, they seem to have made squash commits the new default. Not possible to make merge commits in their web UI anymore, and that sucks!

Manuel Odendahl • May 23 '22

I think you can with an option?

Ingo Steinke, web developer • May 23 '22

It used to be possible, but in practice, it is always grayed out, and this does not seem to result of a conscious decision by the project maintainers.

Create a merge commit: Not enabled for this repository
Squash and merge
Rebase and merge: Not enabled for this repository

Manuel Odendahl • May 23 '22

I think it has to be enabled in the repo settings. But now that git bisect and git blame all support --first-parent, I really don't see the point anymore. Maybe save some space because the intermediary blobs are garbage collected, but that kind of only makes sense on big public repositories like linux. And even then, people maintain different repositories that still keep the individual history.

lukens • Jun 10 '22

My understanding is that GitHub also has some kind of hidden tag on the PR branch, so you can still view it on GitHub after it is squashed, so it presumably doesn't save any space for them.

Simon Egersand 🎈 • May 23 '22

I assume you are referring to the "Squash and merge" option on GitHub? If so, yes I 100% agree with you.

On the other hand, if you mean devs should not squash and rebase before pushing a PR, then I disagree.

PS. Some of the formatting in your post is off :) Around the last example with git

Manuel Odendahl • May 23 '22

by squash you mean collapse all commits into a single one? because i think that's wrong :)

Manuel Odendahl • May 23 '22

I do think spending some time in git rebase --interactive (or magit in my case) makes a lot of sense, however.

Simon Egersand 🎈 • May 23 '22

Yeah, totally agree!

Simon Egersand 🎈 • May 23 '22

No, that's not what I mean. I was confused if that was what you meant. I guess we're on same page :D

lukas1 • May 25 '22

It's not as bad. Provided you keep reference to the PR number in the commit message. Luckily Github includes PR number into the commit message when merging automatically and in the github history it will even create a link directly to the PR. That way one does not lose the history of the PR itself, should anyone really need it.

It worked well with one team I was involved in.

Teaching good commit practices and using git to its full potential is doable, when majority of the team is already good with it and only some developers need help, if the whole team has problems with that, it's not so easy, the option to squash merge saves a lot of time.

Also helps to get rid of nasty merge commits of merging main branch into feature branch, if github is setup so that it requires the feature branch to have latest changes from main branch (which should be required). Rebasing would be preferable, but it's not as comfortable, because it will require new approval from your team, if your protected branches rules require an approval before merge (which it should).

Simon Egersand 🎈 • May 23 '22

Yeah, me too. This is one of (if not the most) important practices you should do to make the PR review process quick. A quick PR review process in turn speeds up the shipping of code => business moves faster => greater chances of success for the company.

Always make your commits nice before asking for a PR review!

Manuel Odendahl • May 23 '22

do you look at individual commits when doing a review? because the diff view shown is just the comparison of the trees, the history itself is irrelevant.

Simon Egersand 🎈 • May 23 '22

I do look at individual commits of the PR, yeah. Sometimes it makes sense to split up a task into multiple commits, or include refactor work, and that work should not be combined IMO.

Manuel Odendahl • May 23 '22

I agree. I'm not the greatest at this (often solo dev on things), but it's a good skill to know.

arthurolga • May 23 '22

I don't like Squash if you are working with good commits they should be few per PR, if you have something around 10 commits on a feature, it probably should be separated into smaller tasks.

If the developer wants to aggregate two or more commits, like if they refactored some part of the code, BEFORE MAKING THE PR, then I totally like Squash.

If you have small commits with good names, it probably is better to just Merge than Squash and Merge, e.g.

Commit 1: Add DatePicker component to App
Commit 2: Make API call for Date Service on UserScreen
Commit 3: Make API call for Date Service on PostScreen

If something is breaking, it will probably be easier to see in which commit, also allows for better cherry picking.

Christian Kozalla • May 23 '22

I'm using GitLab at work and we are using the Squash-and-Merge option of GitLab.. I don't know how that compares to the way GitHub is doing it. But I also don't know how Squash-and-Merge compares to manually squashing, pushing and then opening a PR

We, as a team, are using Squash-and-Merge because one single feature will be mapped to one single commit. But I suppose the same is possible with unsquashed merge commits..

Manuel Odendahl • May 23 '22

Only one way to know! look at the graph, and use git cat-file to look into the internals.

Magnus Markling • May 23 '22 • Edited

If squashing means loosing too much information, then your PRs are probably too big to begin with. Imho it's a code (or process) smell that should be brought to attention asap.

As for looking at what the developer did in their branch, I tend to think the PR should speak for itself. How we got there is not important. Unless you're also prepared to spend a lot of time cleaning up your branches before sending PRs. (Time you could possibly spend making many small PRs instead.)

Nice trick for the CLI tools with "first parent"! I was not aware it even existed. Unfortunately it's not available in most graphical tools that I'm aware of, so those users will be stuck with the "ugly" history.

Manuel Odendahl • May 23 '22 • Edited

I had a long conversation about that with other developers, it was very interesting, and I plan to write about "big evil merges" in the future.

Situations where big branch merges might happen (for valid reasons, imo):

merging relatively independent projects (in the context of a monorepo, for example)
wide "rip off the bandaid" refactor (especially type-system / compiler driven refactors)
having to merge shitty code from someone who left / from external contributors over whom you only have so much control
slow PR / merge cycles (can have many reasons: reviewers are scarce, QA is a bottleneck)
overall politics: management thinks PRs are a waste of time, crunch time

In general, I'm not a fan of "in a perfect world you wouldn't need more information" arguments.

In my experience, even small clean PRs can benefit from having a granular history, say when git blaming something 3 years down the road.

As for UI tools, I use magit / sourcetree / intellij's history browser, I'm sorry if other tools don't support it :/

I wish all tools supported --first-parent, because the (valid, because tool friction matters) reason is "my tool doesn't know how to display the information i want, thus i have to lose context for it", arguing that "merges make the history sloppy" is just a cop-out, it's just not true. I think one reason for that is that many developers don't know how git internally works, and thus have a warped understanding of what the history is. Git's CLI tooling really doesn't help here.

Jack • May 24 '22

I used to insist devs squashed/rebased/etc. their commits before opening a PR and then use rebase-merge to merge the PR into main.
Over time I've learned the value of a squash merge. If a PR is too big to be able to describe in one commit message, or too complicated to understand from looking at the diff, then you're doing too much in one go.
Squash merging PRs is absolutely fine if your branch has nothing but work-in-progress commits. If you feel like you're losing something by squashing, then you need to rethink your process...

Manuel Odendahl • May 24 '22

So you are saying to only do pull request that have the size of a single commit?

Jack • May 24 '22

As a (very) general rule, yes. You should be able to understand a change based on a single commit message, yes.
Obviously sometimes you have a big feature that can't be released piece-by-piece. In that case I would have a feature branch, and then individual branches off that. You PR (with squash commits) each smaller piece of work into the feature branch, and then at the end merge (not squash) the feature branch in. You have a history of all the pieces of work done, but not all the useless wip commits that don't actually tell any kind of story...

Manuel Odendahl • May 24 '22

That makes sense. I see a pattern emerging here. I think that usually, when we "argue", we often are actually solving the same problem, often in the same way, but with different words.

I operate under the premise that your branch history is meaningful, and has relevant commits. If you do a ton of WIP commits, I would question why you would do a "WIP" commit in the first place, because squash merge or not, you are robbing yourself of helpful history while developing your feature already. I also heavily use interface staging (staging individual hunks), both for "pseudo review", and to split up my work in proper chunks, with git commit hooks validating at every step of the way that my tests run. If I still manage to make a mess (say, I'm tired, or in a rush, or just frustrated), I will often spend the time to go back with interactive rebase and eliminate the junk either with squash / revert+squash or plain delete, more rarely split up bigger commits into smaller ones.

What you are describe in your workflow above to me is basically what I am achieving by keeping side branch history. I would say, as someone who often had to merge dirty crap branches, I do like to keep the WIP commits anyway, because they give me an insight into what someone was trying to do, what their cognitive style is, what they were struggling with, to be able to assist them better.

But let's let things speak. I recently merged a "big" commit, setting out to build a feature that led me to start introducing typescript annotations. We are fairly fast moving 2 dev team and reasonably trust each other, so the other dev was fine with keeping both the typescript introduction and the actual feature in the same PR. Here's my history in this case:

# 2022-05-17 597679dd482ca990cffb5fe73bbd91108163d4cc :art: Psalm fixes for Sql and OrdersSplits in tadmin
# 2022-05-17 9f67f64c17d0f4fb9e7f9f86d04a1c56770b5ad8 :art: Start adding some API typescript to tadmin
# 2022-05-17 51dedf0360364369bd873d65476185fab8e4e97c :sparkles: :zap: Faster items summary query (still not instant)
# 2022-05-17 a776a961952511ea756a999949d8a5e49fbcc37c :zap: Make it even a bit faster. Computing links is slow.
# 2022-05-17 025ef30f90e429cf4cc3d882c8b56b4dca9cd7f6 :zap: Make productQuantities computation even faster by getting managestock/isVirtual up front
# 2022-05-18 dc4d4b4584f954fbc8c3bb8633afd7c74e45b37c :art: Fix intellij code style at least roughly
# 2022-05-18 c2f5e98983121fcd7c8e42f944f5eba62de2ca69 :art: Use transients for image and permalink in getOrdersSummary
# 2022-05-18 9bba2793b0b8eaa503233948298c8d0f5a4a832b :art: Introduce RowType for Table/useTable, split out into files
# 2022-05-18 96eb7fd43777a2f74d3fe333e31d6fdd4b8254b8 :art: Fighting some odd import weirdnesses, gonna stop tweaking for now
# 2022-05-18 143fde48f4783296123a308578c1ce320ea9c575 :art: Start adding type to ManageOrders useTable
# 2022-05-19 1782ef761a5cc82e884ff7fbec33654f8b19b805 :tractor: Move api types to shared code
# 2022-05-19 dd3bdbadc69c0f5932d24fb09c467415d9b1289e :sparkles: Add more typing to useTableApi
# 2022-05-19 c66c30129283b995420f1f69a4113d0f19a53b4d :art: :tractor: Split out the different types of summary bars
# 2022-05-19 93f3ee27d111ebe6f1a114620e3e722db1f85ecf :ambulance: Fix proper backend query, remove permalink from sideview
# 2022-05-20 37578d82a18da86b42fe2932a72c52dfff3f8604 :art: First attempt at latching on to triggerSearch
# 2022-05-20 d57be0a0e88c8af0a8c9f2a7c2dabfcecfaeaf7a :art: Remove error_log
# 2022-05-20 63a77c1515e8a866dcc3c4d9c1322a2cd3b43035 :art: Remove error_log
# 2022-05-20 5420e6a64499cb812fd44566205cf8abebebfdd9 :sparkles: Proper orders summary debouncing
# 2022-05-20 883ecc39b38ed34ce47ead9036f993a263050bf3 :art: Cleanup dependency handling to avoid expensive backend calls
# 2022-05-20 d24292be1df37cba702c54f982f1630cb2f3a42a :ambulance: Fix useEffect eslint check
# 2022-05-20 cf70860c5f46e2725c0139e33ddcada656ff3112 :art: Fix prettier changes
# 2022-05-20 e067d0ddbf508501970ad0578fdd9e60b37a68e3 :ambulance: Fix type annotations
# 2022-05-20 54c70047ad5e3f6858799c6faaefe027d112b330 :sparkles: Measure and return performance measures
# 2022-05-20 0731046f579e441c5ed96292199e82df0c4d1c1a :poop: Bunch of performance measurement and debug logging
# 2022-05-20 37ae6f0afccf868d829103e45acfee2f069df744 :ambulance: Fix eslint warnings
# 2022-05-20 da06324156213dc6524adc53fc06708e3b1e5073 :ambulance: Fix PHP initialization of start_time
# 2022-05-20 858db0142afc2c21ee90b3a08fb557938877bc33 :art: Fix loading indicator
# 2022-05-23 7731a0a694b617585c51325442048491beea8998 :sparkles: :lipstick: Add checkbox to enable getting all orders summary
# 2022-05-23 2e89e73f96a772de978cd8228b44a34532794904 :art: Undo unnecessary stuff
# 2022-05-23 22c850563b8937951d1b3a46d19d6857222f14be :art: Use a single php-cs-fixer config
# 2022-05-23 8aa2b0ed201a55b196fb169e8be921d30ce9aa0e :zap: :art: Cache thumbnails for 24h
# 2022-05-23 cbd6fce9f072bb091197cc2fcec9063c7207d5db :pencil: Slight whitespace adjusts
# 2022-05-23 e74df0d3a02679f2c98b2f24c0245245e5956be6 :ambulance: Fix php-cs-fixer
# 2022-05-23 bd1e34d466495d86c1f53510be0936df909445f8 :art: Make linkbutton clickable, fix markup
# 2022-05-23 fafaeb6644f61e2b6ed1df475a2b6266e2eaf425 :art: Remove logging entries

Those are all valuable commits to me that I would like to keep for the long run. Maybe I'll figure out that the reason a certain DB query doesn't work anymore is not say, the API change, but actually the "Fix php-cs-fixer" commit. Of course I could make a separate PR for "fix php-cs-fixer", and then again for "Make linkbutton clickable", and then again for "Remove logging entries", but then we end up where we started, except with a lot more PRs and CI runs.

Manuel Odendahl • May 24 '22 • Edited

You'll note I use gitmoji, which I also find very useful, as I can at a glance recognize what the reason is behind commits. To show the graphical view:

Manuel Artero Anguita 🟨 • May 23 '22

Hi Manuel, I'm Manuel.

you got me with this:

The thing is that my opinion is the correct one

Above all considering that my opinion is the correct one. Kidding.

Nice post, I just disagree, squashing is fine. But I can see your points. In the end it's a tool and sometimes will be handy sometimes won't.

Manuel Odendahl • May 23 '22

Do you use squashing because you want to have a "clean" history per default? Or do you have other reasons?

Manuel Artero Anguita 🟨 • May 23 '22

IMO too much information leads to disinformation. Checking the actual "WIP" commits from a feature branch is a "thin grain info" I've never-ever required.

Cleaner history is , yep, the main reason.

But actually I've faced another issue in the past; there was this repo 15+ years old in my company, with hundreds of committers through the ages, and commits in the order of n * 100.000. Dealing with this repo was a challenge actually! too much useless info at .git/ folder. What I'm trying to say is that "thin grain" info do weigh. Of course you need to reach those numbers.

Manuel Odendahl • May 23 '22

A lot of people bring up "WIP" commits. Do you often do WIP commits? I personally rarely do (I do get frustrated and use the 💩 emoji, as I use gitmoji, but still make meaningful commits). But the point of the article was that you can easily hide all that information and focus on what you need.

As for historical gits, I wonder if people here ever did a git "cleanup" where most of the ancient state gets culled, and just the last few years are kept. Cruft does indeed accumulate.

Luke Inglis • May 26 '22

Just adding a little perspective, I don't think I've ever worked on a team with anyone who didn't use WIP commits. I work on a small team and there is a lot of context switching that needs to happen and 'finishing' a commit before switching to something else just isn't an option.

Manuel Odendahl • May 26 '22

Interesting. I have the opposite experience. I use git stash in those cases, or do you git rebase --interactive to clean things up later.

lukens • Jun 10 '22

I don't like the idea of most of the ancient state being culled.

The codebase at my current job has a cutoff from when it was moved to git, and there's even less hope of finding out why something was done for code that predates that than there is for the rest of the codebase.

Maybe I've always worked at the wrong places, but I've never been in a place where I wish there were fewer commits in the history, but a lot of the time I have wished there were more commits (often when trying to review code), so that I had a finer grain insight into why a particular line of code was written, and what else was changed for the same purpose.

I'm 100% with you that losing this information is nothing but a bad thing. I find that even the worst git commits tend to provide the best and most accurate and up-to date documentation of the code; it amazes me that so many people choose to throw that away!

Rasmus Schultz • Feb 17 '25

I don't like the idea of most of the ancient state being culled.

Me either.

No need to clone the entire history if you don't need it though?

git clone --depth 100, should be fine?

That way, you're only cloning what you need - and you're not throwing anything away, so if you do need the full history, you can always fetch that later.

Charles Robertson • May 30 '25

@manuartero Sorry, but I fundamentally disagree with your statement:

too much information leads to disinformation

Not, if you understand how Git works and how to read this information correctly.
Once you delete information, via an Interactive Rebase, you lose it forever, which could be disastrous, when trying to track bugs, later on.

Charles Robertson • May 30 '25

I feel that squashing is for people who want to package a group of commits into one commit so that we have one commit per PR. To me, this is more of a psychological problem. And anyway, there are better tools to manage this flow, like Gerrit.
I really don’t have a problem seeing several commits per PR. In fact, I prefer the fact that a PR is split up into smaller chunks for code review. And I can also feel assured that I still have a true audit history.
This is why I never use interactive rebase.

Mike Martin • May 23 '22

We occasionally need to revert a feature from a release branch if it fails UAT. While this isn't too common, It's much easier to do this if the feature is squashed into a single commit (pre-push). Yet to see much of a downside.

Manuel Odendahl • May 23 '22

You can just use git revert -m1 to revert a merge commit to the first parent (aka, what git reverting the squashed commit would do). -m1 says "revert to the first parent, aka the one that git squash preserves.

git revert really just resets the checked out tree to a specific tree hash, and prepares a pretty commit message. It doesn't really have much to do with the history itself. You could pretty much get the same result by doing (haven't fully tried this out, just a sketch):

git reset --hard ${hashyouwanttoreverto}
git commit -m "Revert XXX"

tzwel • May 23 '22

awesome article, at just the beggining it was obvious you know what you are doing, keep it up

View full discussion (80 comments)