Discussion on: Is git the be all and end all of version control?

View post

Git is the classic example of something that was invented to solve one problem, but ended up solving a different, more important problem along the way. For that reason, I do think that Git could (nay, will) eventually be supplanted by something better, something that directly addresses the more important problem that Git solved by accident.

To be more specific: when Linus created Git, his goal was to create a version control system that would be fully distributed. You have to understand, he was targeting the Linux kernel, a project which is extremely widely distributed and for which there are many "sources of truth". In other words, the goal was that the Linux Foundation could host a Git server with history, branches, and tags for the "official" Linux kernel, but RedHat or Canonical or Debian might also host Git servers with their own branches, history, and tags for kernels that are just as complete as the "official" version.

Most teams, however, don't have more than one "source of truth". Every time you see people complain that "GitHub is down", what they're really saying is "even though Git is fully distributed and capable of supporting multiple 'authoritative' nodes, we've agreed on using GitHub as the de facto central repository we all depend on".

Why is it, then, that so many people use a version control system when they're not interested in its raison d'être? Because along the way Linus solved a smaller, but much more valuable, issue: local versioning.

Young coders may not remember, but back in the bad ol' days of Subversion, if you wanted to save something you were working on so that you could move on to something else, your choices were: a.) save the file locally, then try and walk back the changes when it came time to svn commit, or b.) make sure you are connected to the Subversion server, svn commit, and pray there are no conflicts with anything anyone else was working on.

Actually, there was a 3rd option known as "SVK". On the surface, svk worked much like git does today: you would keep a copy of the Subversion repository locally, to which you could commit as needed, and periodically you would svk sync with the main Subversion repository.

So, I honestly believe that Git "won" not because it is completetly distributed, but merely because it allows developers to work locally at will, and synchronize centrally only as needed. Really, this is a natural progression from the earliest version control systems where you would have to actually lock a file on the central server before you could edit it, to Subversion which did away with locks but still required access to the central server in order to version changes. Some new system that focused more on the local developer experience, and simplified the "central repository" situation (since the vast majority of teams will only ever have one central repository), could easily eat Git's lunch.

Ben Halpern • Oct 9 '17

Great reply

Dave Cridland • Oct 11 '17

Subversion... I used to dream of Subversion. I had to use CVS.

But anyway - Subversion did something truly amazing. People had used CVS - which is itself based around RCS - for so long that it'd somehow become a fact of life that version control systems were too difficult to write.

Even BitKeeper was itself based on RCS. And of course BitMover forbid anyone using BitKeeper - paid or not - if they worked on any other version control system.

Once Subversion proved that version control systems were, in fact, things which ordinary mortals could write, then suddenly everyone was writing them.

Evan Wilde • Oct 10 '17

This is a good response, though I think it is missing one fairly important point in the why git was created portion. Bitkeeper, the original VCS for linux, originally had extremely restrictive licensing, but was free (money) for open source projects. Then in 2005, changed the licensing to start charging open source projects too. Most developers in the project would be able to purchase the software, but due to a separate project being worked on by the OSDL (Linux foundation now) that infringed on the restrictive licensing made it so that anyone working for the OSDL (Linus Torvalds) couldn't use the software.

Linus, in his typical Linus-y way, flipped Bitmover a bird and built his own, git, "the stupid content tracker". Bitkeeper was distributed and had most of the features necessary, but was too restrictive.

Ben Sinclair • Mar 25 '18

I used to just keep a copy of the entire Subversion checkout in another directory, then diff -qr them and rsync them as needed if I wanted to work on a local branch. I'm pretty sure that was common practice and it's basically just a manual way of doing what git does.

Git made it easier by reducing the friction and fragility of local history management, and Github made it "win" because the interface was so much prettier than WebSvn and because it was entirely hosted and mostly gratis.

I don't know anyone working in a company who has a second "source of truth" beyond GitHub, Bitbucket, or their local machine these days.

Ho Anh • Jan 25 '20

Sorry sir, what do 'authoritative' nodes mean?

Jilles van Gurp • Oct 10 '17

The distinction between a different branches and repositories is what makes git so genius. There is none. Just because you call your local branch master and I call my local branch master doesn't mean we are both on the same branch. Rebases and merges work the same when performed against local branches and remote branches. The use of a cryptographic content addressed object store is what makes this work. Syncing is as simple as exchanging read only objects.

So, even though teams don't necessarily use many remotes, they do use loads of branches, which is why they need Git. There are at least n+1 branches where n is the number of people plus whatever feature, production, developmnent, etc. branches people create. On my team that means well over a thousand in the past six months. Some branches are long lived, some are short lived.

Github is just a convenient way to synchronize change. Most teams use it incorrectly by allowing multiple individuals to write to the same repository instead of cloning and creating pull requests.

There is still plenty of room for improvement. Git is not very usable for non experts. A lot of the complexity has to do with dealing with the many ways merges and rebases can fail to work as expected. One limitation here is that a merge commit typically has only two ancestors. Git actually allows for more than two but the porcelain just does not use this. There have been some alternatives where this is used to track changes from multiple branches.

Another area where Git is struggling is larger code bases. The linux kernel is not small but there are examples of big companies where git just isn't good enough for tracking everything they have in one repository. E.g. Google does this (not using git for that). So scale is an issue. Then even though git is distributed, it is not sharded. This imposes upper limits on what can be in a single git repository. The model has always been that you need the entire repository locally to be able to work with it. Wouldn't it be nice if you'd only need a subset of that to be able to work.IPFS copies some of the design for git and stores content addressed objects in a distributed FS. Wouldn't it be nice if somebody rebuilt git on top of that?

So, there's plenty of room for improvement in the Git world.

Pavlo Shchelokovskyy • Oct 15 '17

Here is an article about changes MS had to do to Git, interesting read

arstechnica.com/information-techno...

Mike Gasparelli • Oct 15 '17

Unless I'm misunderstanding, Microsoft actually built an implementation to allow working with a subset of files in a repo. They called it Git Virtual File System in case you want to check it out!

Osaetin Daniel • Oct 15 '17

Well i know it's not Sharding. but if you need a Subset of a git repo you can do a shallow clone.

I find that useful for Repo's that are very large.