I'm posting here a series of introductory articles I wrote on another platform, hoping you will enjoy them
In my previous article, I wrote about the differences between GIT and other versioning systems, some basic terminology, states and working areas, what is the heart of a GIT system and the object it stores.
Starting from the last point, I want to recall what are the objects that GIT stores: Blobs, Trees, Commits and Tags.
But where and how are these objects stored?
At the core of GIT there is a simple key-value data store, that gives you the capability of storing whatever kind of content you want. The system will give you back a key that can be used to retrieve the data at any time.
When you initialise a new repository, GIT creates a .git folder, with some others sub-folders: objects, objects/info, objects/pack as you can see from the following example:
git init plain_git Initialized empty Git repository in /tmp/plain_git/.git/ cd plain_git find .git/objects .git/objects .git/objects/pack .git/objects/info
Obviously, at this point there aren't any object in the repository and you need to add one:
touch status.txt # Now edit and save the file # check the status of the repository git status On branch master No commits yet Untracked files: (use "git add <file>..." to include in what will be committed) status.txt nothing added to commit but untracked files present (use "git add" to track) # add the file to the staging area git add . # then save it to your repository git commit -m "first commit" [master (root-commit) 51dcfd9] first commit 1 file changed, 1 insertion(+) create mode 100644 status.txt # Now check the objects created find .git/objects .git/objects .git/objects/51 .git/objects/51/e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f .git/objects/51/dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2 .git/objects/pack .git/objects/info .git/objects/53 .git/objects/53/8d4c75373bb8ebb9af381c4e8287b6f0819533
What happens when you create a file? When you run the git status command, GIT checks if the file was in the previous snapshot of your repository and, if not, marks it as untracked.
To start tracking it, you need to add it to the staging area using
git add <file-name>
At this point the file is staged and ready to be committed using
git commit -m "first commit"
After committing, using the following command
find .git/objects -type f
we can take a look at what kind of files GIT creates:
find .git/objects -type f .git/objects/51/e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f .git/objects/51/dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2 .git/objects/53/8d4c75373bb8ebb9af381c4e8287b6f0819533
As you can see, after a commit GIT creates three files under the '51' and '53' sub-folders, but where these names come from?
When you commit a file, GIT calculates a 40-characters checksum hash for your file content plus a header in the following way
First it calculates the header
<type-of-content> + <space> + <content size> +<\0> (null char) eg: blob 7\000
It then concatenates the header to the content
<header> + <content>
and calculates the SHA-1 checksum of this new content.
The '51' and '53' sub-folder names are directly coming from the first two digits of the checksum, while the filenames are the remaining 38 characters.
Now, let's take a look at the files in detail using the following code:
# Tree object git cat-file -p 51e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f 100644 blob 538d4c75373bb8ebb9af381c4e8287b6f0819533 status.txt # Commit object: Author info, Committer info and commit message git cat-file -p 51dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2 tree 51e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f author wildeng <firstname.lastname@example.org> 1563483367 +0100 committer wildeng <email@example.com> 1563483367 +0100 first commit # Content of the file git cat-file -p 538d4c75373bb8ebb9af381c4e8287b6f0819533 This is my first commit
As you could see, I used the command
git cat-file -p <SHA-1>
which is a sort of Swiss army knife for GIT objects. It helps you inspecting any kind of GIT object and the -p option prints in a nice form its content.
If, for example, you apply it to a commit it will first uncompress the file and then it will nicely display it on STDOUT.
As a reference, here is the output of the -h option ( help )
git cat-file -h usage: git cat-file (-t [ - allow-unknown-type] | -s [ - allow-unknown-type] | -e | -p | <type> | - textconv | - filters) [ - path=<path>] <object> or: git cat-file ( - batch | - batch-check) [ - follow-symlinks] [ - textconv | - filters] <type> can be one of: blob, tree, commit, tag -t show object type -s show object size -e exit with zero when there's no error -p pretty-print object's content - textconv for blob objects, run textconv on object's content - filters for blob objects, run filters on object's content - path <blob> use a specific path for - textconv/ - filters - allow-unknown-type allow -s and -t to work with broken/corrupt objects - buffer buffer - batch output - batch[=<format>] show info and content of objects fed from the standard input - batch-check[=<format>] show info about objects fed from the standard input - follow-symlinks follow in-tree symlinks (used with - batch or - batch-check) - batch-all-objects show all objects with - batch or - batch-check
As a final step let's add a tag to our repository and see what happens:
# Adding a tag to our commit git tag -a v0.1 -m "a nice commit" # looking for the created files we see that one more file # has been created find .git/objects -type f .git/objects/51/e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f .git/objects/51/dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2 .git/objects/53/8d4c75373bb8ebb9af381c4e8287b6f0819533 # new folder and new file created .git/objects/11/85637c960727f803e5229393839629479c7b8d # content of the TAG object git cat-file -p 1185637c960727f803e5229393839629479c7b8d object 51dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2 type commit tag v0.1 tagger wildeng <firstname.lastname@example.org> 1563610234 +0100 a nice commit a nice tag
When you add a tag, git creates a new object with its own SHA-1 checksum and creates a sub-folder to store it.
In this object, GIT saves the information about the object type that has been tagged - a commit - , the annotation used - v0.1 - , the author of the tag and the human readable message added to the tag.
Now you should have an understanding of what GIT is doing when you commit a file, what are the objects involved and how GIT stores them.
In the next episode I will talk about what happens in a basic GIT workflow, how you can take a look at the history of the repository and how you can change it.