loading...

Another GIT Introduction part 2

wildeng profile image Alain Mauri ・5 min read

I'm posting here a series of introductory articles I wrote on another platform, hoping you will enjoy them

In my previous article, I wrote about the differences between GIT and other versioning systems, some basic terminology, states and working areas, what is the heart of a GIT system and the object it stores.

Starting from the last point, I want to recall what are the objects that GIT stores: Blobs, Trees, Commits and Tags.

But where and how are these objects stored?

At the core of GIT there is a simple key-value data store, that gives you the capability of storing whatever kind of content you want. The system will give you back a key that can be used to retrieve the data at any time.

When you initialise a new repository, GIT creates a .git folder, with some others sub-folders: objects, objects/info, objects/pack as you can see from the following example:

git init plain_git
Initialized empty Git repository in /tmp/plain_git/.git/
cd plain_git
find .git/objects
.git/objects
.git/objects/pack
.git/objects/info

Obviously, at this point there aren't any object in the repository and you need to add one:

touch status.txt
# Now edit and save the file
# check the status of the repository

git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    status.txt

nothing added to commit but untracked files present (use "git add" to track)

# add the file to the staging area 
git add .

# then save it to your repository
git commit -m "first commit"
[master (root-commit) 51dcfd9] first commit
 1 file changed, 1 insertion(+)
 create mode 100644 status.txt

# Now check the objects created

find .git/objects
.git/objects
.git/objects/51
.git/objects/51/e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f
.git/objects/51/dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2
.git/objects/pack
.git/objects/info
.git/objects/53
.git/objects/53/8d4c75373bb8ebb9af381c4e8287b6f0819533

What happens when you create a file? When you run the git status command, GIT checks if the file was in the previous snapshot of your repository and, if not, marks it as untracked.

To start tracking it, you need to add it to the staging area using

git add <file-name>

At this point the file is staged and ready to be committed using

git commit -m "first commit"

After committing, using the following command

find .git/objects -type f

we can take a look at what kind of files GIT creates:

find .git/objects -type f
.git/objects/51/e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f
.git/objects/51/dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2
.git/objects/53/8d4c75373bb8ebb9af381c4e8287b6f0819533

As you can see, after a commit GIT creates three files under the '51' and '53' sub-folders, but where these names come from?
When you commit a file, GIT calculates a 40-characters checksum hash for your file content plus a header in the following way
First it calculates the header

<type-of-content> + <space> + <content size> +<\0> (null char)
eg: blob 7\000

It then concatenates the header to the content

<header> + <content>

and calculates the SHA-1 checksum of this new content.

The '51' and '53' sub-folder names are directly coming from the first two digits of the checksum, while the filenames are the remaining 38 characters.
Now, let's take a look at the files in detail using the following code:

# Tree object
git cat-file -p 51e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f
100644 blob 538d4c75373bb8ebb9af381c4e8287b6f0819533    status.txt

# Commit object: Author info, Committer info and commit message
git cat-file -p 51dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2
tree 51e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f
author wildeng <wildeng@myemail.com> 1563483367 +0100
committer wildeng <wildeng@myemail.com> 1563483367 +0100

first commit

# Content of the file
git cat-file -p 538d4c75373bb8ebb9af381c4e8287b6f0819533
This is my first commit

As you could see, I used the command

git cat-file -p <SHA-1>

which is a sort of Swiss army knife for GIT objects. It helps you inspecting any kind of GIT object and the -p option prints in a nice form its content.

If, for example, you apply it to a commit it will first uncompress the file and then it will nicely display it on STDOUT.

As a reference, here is the output of the -h option ( help )

git cat-file -h
usage: git cat-file (-t [ - allow-unknown-type] | -s [ - allow-unknown-type] | -e | -p | <type> | - textconv | - filters) [ - path=<path>] <object>
 or: git cat-file ( - batch | - batch-check) [ - follow-symlinks] [ - textconv | - filters]
<type> can be one of: blob, tree, commit, tag
 -t show object type
 -s show object size
 -e exit with zero when there's no error
 -p pretty-print object's content
 - textconv for blob objects, run textconv on object's content
 - filters for blob objects, run filters on object's content
 - path <blob> use a specific path for - textconv/ - filters
 - allow-unknown-type allow -s and -t to work with broken/corrupt objects
 - buffer buffer - batch output
 - batch[=<format>] show info and content of objects fed from the standard input
 - batch-check[=<format>]
 show info about objects fed from the standard input
 - follow-symlinks follow in-tree symlinks (used with - batch or - batch-check)
 - batch-all-objects show all objects with - batch or - batch-check

As a final step let's add a tag to our repository and see what happens:

# Adding a tag to our commit
git tag -a v0.1 -m "a nice commit"

# looking for the created files we see that one more file
# has been created
find .git/objects -type f
.git/objects/51/e1acfa6ecdb46d6c9d4ad13e82b5cab90d5f3f
.git/objects/51/dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2
.git/objects/53/8d4c75373bb8ebb9af381c4e8287b6f0819533

# new folder and new file created
.git/objects/11/85637c960727f803e5229393839629479c7b8d

# content of the TAG object
git cat-file -p 1185637c960727f803e5229393839629479c7b8d
object 51dcfd9bba09650f50ccad8dde3ad5fe8ba68ad2
type commit
tag v0.1
tagger wildeng <wildeng@myemail.com> 1563610234 +0100

a nice commit

a nice tag

When you add a tag, git creates a new object with its own SHA-1 checksum and creates a sub-folder to store it.
In this object, GIT saves the information about the object type that has been tagged - a commit - , the annotation used - v0.1 - , the author of the tag and the human readable message added to the tag.
Now you should have an understanding of what GIT is doing when you commit a file, what are the objects involved and how GIT stores them.
In the next episode I will talk about what happens in a basic GIT workflow, how you can take a look at the history of the repository and how you can change it.
Enjoy!

References:

GIT - GIT Objects

GIT Pocket Guide

Posted on by:

wildeng profile

Alain Mauri

@wildeng

Passionate dad and guitar player. Dev by mistake and Ruby lover

Discussion

markdown guide