I've seen a couple articles to try and explain git as a system of snapshots rather than changes. I'm really not sure how this explanation is supposed to help when working with git. Especially since git doesn't always save entire blobs, using an assortment of partial file diffs to save space.
Git is also really good at moving changes not just blobs from one place to another.
So maybe someone can explain why I would want to use snapshots to explain git.
Top comments (5)
You say that "git doesn't always save entire blobs, using an assortment of partial file diffs to save space." Can you provide an example?
If you do "git diff HEAD..HEAD" you will get a diff between the two commits but this does not mean it used a diff to store the files in HEAD.
I will oversimplify: when you do a git commit, Git takes the "current state" of all your files and stores them. If a file in the state to be recorded is the same as in a previously recorded "state", Git will just point to the already stored file, so it will not take space. It may look like git records only changes, but what it actually recorded is "commit 012abc has file2, which contents can be taken from file1 at commit 123456".
The real details are not exactly like this. Files are not named "fileX at commitY". Instead, are named using their SHA1 signature and a Git commit says "this commit has file1 which is object af52bc63d617f58[...] and fileN which is object 87f32ab88c[...]". You can see how if a file has the same content in two commits, both will point to the same file because both have the same SHA1. Linus calls this "content addressable".
If you do a git diff commit1..commit2, if the file in the two commits point to the same SHA1 you can assume it is the same content, skipping the content comparison for that file. This makes git really fast.
I thought the docs would be more straightforward but eventually you get to the answer: "When Git packs objects, it looks for files that are named and sized similarly, and stores just the deltas from one version of the file to the next."
(edit) the point was, I don't understand how it helps at a conceptually level even if git did follow through 100%
git-scm.com/book/en/v2/Git-Interna...
Yes, but that's only for "packed" files only. Packed files are used as a storage optimization only for convenient situations (like old commits, see note [1]). If Git doesn't pack (which better for speed) then it follows the native model: taking "states" or "snapshots" of your content and storing the objects without packing. The native model is still compressed, but it stores the whole object.
So: the model for content history is snapshots. The storage content is a objects (natively) with deltas (only when space-optimizing objects).
The page you are reading is specifically for packfiles. That's why it doesn't say anything, but see note [1].
The page you are looking for is this one:
You may want to read this question from Stack Overflow:
Note [1]: From the page you linked: "Git does this if you have too many loose objects around, if you run the git gc command manually, or if you push to a remote server."
Hope I helped.
Pack files are still part of git. I do understand how git stores files, which is why I mentioned pack files.
None of this helps me to understand why emphasizing git as a snapshot storage is helpful to explaining git.
(edit) I could basically say the same about svn. The content model is snapshots, it's storage is diffs and uses them to rebuild the content. But that is not helpful to using or understanding svn.
OK so I do see where this mental model provides an understanding. If you make multiple commits to the same file, if you then cherry pick the latest commit the changes from the other commits also come, however it seems cit will mark the extra changes as a conflict.