Ben Halpern

Posted on Feb 26, 2018

Git is one of the most brilliant pieces of software ever written...And other opinions on git

#git

A little while ago I started this discussion on the topic of git:

Is git the be all and end all of version control?

Ben Halpern ・ Oct 9 '17

#discuss #git

I was curious about whether git truly was the winner across all use cases. My thought was that some features of git act as unnecessary complexity for different cases. I wasn't overly opinionated, but I thought it would be a useful conversation to have.

Some of the responses to the thread addressed the question head on, but many of them were simply an opportunity to express the history and brilliance of git. It was a great read and I want to share some of it.

leob • Oct 10 '17

Git is, in my opinion, one of the most brilliant pieces of software ever written (I'd go as far as saying it's Linus' biggest achievement, rather than Linux). It's an example of software that's so well designed that it keeps surprising you (well me, at least) by its versatility and performance.

By now there aren't any serious competitors left and maybe that's what's triggering you to ask "aren't there alternatives". In other areas (backend frameworks, frontend frameworks, operating systems, etc etc etc) there are dozens of alternatives with none of them clearly superior, but Git is indeed dominant, but I'm saying deservedly so.

For me it's a breath of fresh air to have at least one area where I'm not bogged down by too many choices with no clear added value.

Add to that the huge installed based (just Github alone but also in enterprises) and the huge boost which Git (and Github) are for OSS and "social coding" (sharing, contributing) - then the conclusion for me is that any alternative has to be clearly superior, not like 50% or 100% "better" but an order of magnitude better, otherwise it won't stand a chance.

Joshua Ballanco • Oct 9 '17 • Edited

Git is the classic example of something that was invented to solve one problem, but ended up solving a different, more important problem along the way. For that reason, I do think that Git could (nay, will) eventually be supplanted by something better, something that directly addresses the more important problem that Git solved by accident.

To be more specific: when Linus created Git, his goal was to create a version control system that would be fully distributed. You have to understand, he was targeting the Linux kernel, a project which is extremely widely distributed and for which there are many "sources of truth". In other words, the goal was that the Linux Foundation could host a Git server with history, branches, and tags for the "official" Linux kernel, but RedHat or Canonical or Debian might also host Git servers with their own branches, history, and tags for kernels that are just as complete as the "official" version.

Most teams, however, don't have more than one "source of truth". Every time you see people complain that "GitHub is down", what they're really saying is "even though Git is fully distributed and capable of supporting multiple 'authoritative' nodes, we've agreed on using GitHub as the de facto central repository we all depend on".

Why is it, then, that so many people use a version control system when they're not interested in its raison d'être? Because along the way Linus solved a smaller, but much more valuable, issue: local versioning.

Young coders may not remember, but back in the bad ol' days of Subversion, if you wanted to save something you were working on so that you could move on to something else, your choices were: a.) save the file locally, then try and walk back the changes when it came time to svn commit, or b.) make sure you are connected to the Subversion server, svn commit, and pray there are no conflicts with anything anyone else was working on.

Actually, there was a 3rd option known as "SVK". On the surface, svk worked much like git does today: you would keep a copy of the Subversion repository locally, to which you could commit as needed, and periodically you would svk sync with the main Subversion repository.

So, I honestly believe that Git "won" not because it is completetly distributed, but merely because it allows developers to work locally at will, and synchronize centrally only as needed. Really, this is a natural progression from the earliest version control systems where you would have to actually lock a file on the central server before you could edit it, to Subversion which did away with locks but still required access to the central server in order to version changes. Some new system that focused more on the local developer experience, and simplified the "central repository" situation (since the vast majority of teams will only ever have one central repository), could easily eat Git's lunch.

And in follow-up to Joshua's comment:

Evan Wilde • Oct 10 '17

This is a good response, though I think it is missing one fairly important point in the why git was created portion. Bitkeeper, the original VCS for linux, originally had extremely restrictive licensing, but was free (money) for open source projects. Then in 2005, changed the licensing to start charging open source projects too. Most developers in the project would be able to purchase the software, but due to a separate project being worked on by the OSDL (Linux foundation now) that infringed on the restrictive licensing made it so that anyone working for the OSDL (Linus Torvalds) couldn't use the software.

Linus, in his typical Linus-y way, flipped Bitmover a bird and built his own, git, "the stupid content tracker". Bitkeeper was distributed and had most of the features necessary, but was too restrictive.

Jilles van Gurp • Oct 10 '17

The distinction between a different branches and repositories is what makes git so genius. There is none. Just because you call your local branch master and I call my local branch master doesn't mean we are both on the same branch. Rebases and merges work the same when performed against local branches and remote branches. The use of a cryptographic content addressed object store is what makes this work. Syncing is as simple as exchanging read only objects.

So, even though teams don't necessarily use many remotes, they do use loads of branches, which is why they need Git. There are at least n+1 branches where n is the number of people plus whatever feature, production, developmnent, etc. branches people create. On my team that means well over a thousand in the past six months. Some branches are long lived, some are short lived.

Github is just a convenient way to synchronize change. Most teams use it incorrectly by allowing multiple individuals to write to the same repository instead of cloning and creating pull requests.

There is still plenty of room for improvement. Git is not very usable for non experts. A lot of the complexity has to do with dealing with the many ways merges and rebases can fail to work as expected. One limitation here is that a merge commit typically has only two ancestors. Git actually allows for more than two but the porcelain just does not use this. There have been some alternatives where this is used to track changes from multiple branches.

Another area where Git is struggling is larger code bases. The linux kernel is not small but there are examples of big companies where git just isn't good enough for tracking everything they have in one repository. E.g. Google does this (not using git for that). So scale is an issue. Then even though git is distributed, it is not sharded. This imposes upper limits on what can be in a single git repository. The model has always been that you need the entire repository locally to be able to work with it. Wouldn't it be nice if you'd only need a subset of that to be able to work.IPFS copies some of the design for git and stores content addressed objects in a distributed FS. Wouldn't it be nice if somebody rebuilt git on top of that?

So, there's plenty of room for improvement in the Git world.

This comment thread addressed the question more directly:

Sergio Daniel Xalambrí • Oct 9 '17 • Edited

I think Git is simple enough to create tools over it (like Github) in order to extend it.

Example: Pull Requests aren't a feature of Git, but nobody would use a version control platform (Github, Gitlab, etc.) without that feature which was built upon Git branches.

So instead of doing Git again from the ground I think is better to just improve tools built upon Git, maybe simplify the flow to avoid messing it up.

Ben Halpern • Oct 9 '17

That's reasonable. Git could be a bit like the assembly language for version control. Regardless of abstractions you build, it compiles to git for compatibility and leverage the immense work put into it.

Comment Not Found

Zack Philipps • Oct 10 '17

Git Flow, anyone?

It's worth questioning whether we can improve on any bit of software, but it's also exciting to marvel in the power and stability of some of the tools we work with. This discussion left me with a higher appreciation of git and I hope you enjoyed reading about it.

Oldest comments (10)

Ghost • Feb 26 '18

Once I was introduced to git I used it for everything in the way I use Emacs for everything (though I have yet to embrace the Magit porcelain). While I don't know if the benefit would outweigh the effort, I wouldn't mind seeing a decentralized online solution for git like Mastodon. I don't mind Github but much prefer Bitbucket to back up, organize and otherwise manage my own repositories. A federated network would be ideal to share and connect with projects while using one's chosen flavour.

Mike Lloyd • Mar 12 '18

I would agree, a federated network would be nice. That is a nightmare I'm not sure many people would be willing to take. Could be interesting to see, though. :)

Personally, I run a private Bitbucket server as I prefer the interface and features (re: cheaper), but I will keep Github up to date as Github is the Facebook of Developers.

leob • Feb 27 '18

I think that what makes Git so great is its flexibility, but that this flexibility is based on a very small number of well-conceived concepts. All of its capabilities, simple and not so simple, are built on top of that small set of concepts - no matter how many commands there are its basis (core) remains simple.

And have you ever thought about how extremely reliable Git is? It's possible to "lose" a change by making a blatant mistake but I've never encountered a situation where something was lost or went wrong because of the software itself. It's a piece of software that works, and works extremely well.

When there's criticism of Git it's most often targeted at ease of use, the commands (naming etc) not being intuitive, etc. However the number of commands that you'll actually use day to day is small and it doesn't take a lot of time to get familiar with them.

Ever noticed how the git command line even guides you with helpful messages regarding what "git reset HEAD" and "git checkout -- " do?

Personally I use bash aliases a lot so I just type 2 or 3 letters to issue the most used commands.

One thing I'll grant is that some concepts are quite confusing, after using Git for years I'm still getting confused about the difference between "merge" and "rebase". On the other hand in 99% of the (basic) use cases you can get by with basic commands, I'm not even using most of the 'power tools' in Git (advice for novices: just stick with 'merge' to stay on the safe side).

Maybe they should add a "--novice" or "--expert" switch to the Git command line?

Mike Lloyd • Feb 27 '18

When I see git, I see several challenges.

Usability. I've used git for years with libgit2, git2go, various Python libraries, etc. Git is very complicated and there is no easy way to use it programmatically, nor is is the CLI always straight forward. git pull upstream master doesn't have the same syntax as git checkout upstream/master. There are the standard commands and the porcelain commands, when to use which and which solves the proper problem? These things may be evident to me, but to younger devs? It's black magic with a side of voodoo. I actually got my current job helping someone get unstuck from a git problem (no joke, ask me about it sometime)...
No centrality. Git succeeds over Subversion because of it's lack of centrality, but while we've all agreed upon Github as the OSS standard, what happens when Github is unavailable? Granted, this problem exists for Subversion, but I know I'd like to see a multi-master, peered server with consistency to help solve this problem. Granted, this solves one problem and potentially creates others, but availability is important if you communally centralise. In my mind, Git's lack of central control is both it's greatest and most frustrating feature.
Workflows/Branching Strategy. This is by far the most opinionated aspect of Git. Everyone has what they believe is the best workflow, and Git really makes zero effort in being opinionated. I've used Git Flow, Rebasing, Release Merges, and others I'm not entirely sure I understand the name of. I understand they all have their place, but for the love of all that is holy, STANDARDISE as a community and tool ecosystem. Yes, I can get someone out of 3-way conflicting merge, but that doesn't mean I enjoy it. I feel a lot of the Git tooling lends itself to more confusing workflows and bad situations due to lack of opinion on how things should be done.
Checkouts. From what Microsoft says (I believe them, for what it's worth), the Windows codebase is the largest Git repository in the world. Checkouts take HOURS, hurting both developer and build infrastructure. Microsoft had to write a tool called git-vfs, which leverages a virtual filesystem representative of the remote repository, so when a developer checks something out, they are only checking out the files they are working on, not the entire tree. The checkout process shouldn't need to resolve the entire tree, and n-10 should almost be the default to prevent long wait times.

I think Git is a good tool; I don't see it as the end-all, and I definitely think it has a long way to go before it could ever be considered the end-all tool.

Pedro Rodrigues • Feb 28 '18 • Edited

Ok, had to step in.

git pull upstream master > means pull from upstream whatever my master branch is tracking.

git checkout upstream/master > means checkout the master branch on upstream.

They do different things against different branches, master and upstream/master are two completly different refs. If you can't see this you haven't grasped git yet.

The second command leaves you in a detached head state btw.

mbtts • Mar 5 '18

Disappointing response to a very thoughtful, well written and accurate critique. Mike Lloyd (original poster) does understand the difference (read it carefully). The point was that the design present an unnecessary and high barrier to entry.

Git is very clever, but there is almost no abstraction between the underlying implementation and the user interface (command line). As another example some commands and combined into aliases (pull is fetch and merge), some commands are combined using a flag (checkout -b is branch and checkout) and changing the case of a flag often changes the behaviour in subtle ways.

Pedro Rodrigues • Mar 5 '18 • Edited

I can safely say your first line is copy pasted.

I stopping the discussing right here. Git is not a finished product, never indented to have a proper user interface; git is complex until you grasp it, that why you're complaining about git.

I absolutely love git, I my response to your a 'flag is just the same as running 2 commands' and how that is, I don't know, is that confusing? is that the issue? Thats syntactic sugar, makes you type less and do more (usually added to things people use a lot); you can always ignore it.

mbtts • Mar 6 '18

A yardstick/measurement for good software I have found helpful to apply is:

a. Does it make simple things easy?
b. Does it make complex things possible?

I would not score git highly on the first of these two criteria.

To address these two points specifically, I don't feel they stand up to much critical analysis:

"git is complex until you grasp it

Lots of other software is also complex (in some cases even more complex that git). The issue is not with the complexity the issue is with the level of abstraction (or lack thereof) from the implementation.

flag is just the same as running 2 commands' and how that is

Lots of languages/tools use syntactic sugar (shortcuts/aliases) and are far more consistent in their approach. There is no problem with adding a flag in order to combine two commands if this approach is applied consistently (and with git it is not as per examples cited).

Being passionate about technology is great - but in my experience the best engineers and developers I have worked with also have:

i. The ability to evaluate tools and technology (pros and cons) rationally with as little personal bias as possible.
ii. Empathy, understanding and the ability to listen for others.

Please note I am not picking on you personally or trying to patronise - it is not simple and I am still working on these skill as well.

I do feel this thread would progress much better if you didn't make baseless accusations (examples below):

"I can safely say your first line is copy pasted."
"you haven't grasped git yet."

Mike Lloyd • Mar 12 '18

Thanks for your comment. The point I made was exactly what you explained, they perform two different actions with different results, but the ambiguity of the commands doesn't help new users nor is it explained well.

If you can't see this you haven't grasped git yet.

I don't think there's a need to be rude. If you can't see this then you haven't grasped it yet. ;)

Yawar Amin • Jul 6 '19

Re: (2), this seems to be a different problem than what git is trying to solve. What you're describing sounds like a distributed, peer-to-peer git protocol server. That is a problem vastly different in size and scope than a 'dumb content tracker' that git set out to be. It's a problem that really the GitHubs and the GitLabs of the world should be solving, imho.

Re: (3), people can't even standardize on tabs or spaces in this industry, I think you're asking for too much here 😉

Re: (4), Windows codebase checkouts are one thing, but in most real-world projects, checkouts probably are not going to involve traversing gigabytes of files. Even leaving aside that subsequent checkouts will be faster because git just needs to diff files in place, even the initial checkout should not be a big deal for most projects. Unless you work on Windows, of course.

Re: being considered the end-all tool, you're absolutely right about that, git should not be considered the end-all tool, because it never set out to be that.