We developers use git all the time. Git internals might feel like magic, but what git actually does is really simple. Let us peek under the hood to see how git works.
Lets create a empty folder and then run initialize the git repo by running
$ git init
This creates .git folder inside your empty folder. The structure of this .git folder is as follows
.git/
├── branches
├── config
├── description
├── HEAD
├── hooks
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags
If you open up the HEAD file in your text editor you will see
the following text in it
ref: refs/heads/master
which means that your current branch is master.
Now add a file and make a first commit by doing
$ echo "Hello" >> README.md
$ git add .
$ git commit -m "Initial commit"
now run $ git log
and you will get
commit acde617e8ab39bb157821d3bf84d04e157bff52c (HEAD -> master)
Author: username <test@email.com>
Date: Wed Aug 05 18:43:48 2020 +0330
Initial commit
Note:
- The exact commit hash that you get will differ from what you see here depending on your username, email and time that you make the commit
And if you open up refs/head/master it will have text acde617e8ab39bb157821d3bf84d04e157bff52c
inside it.
In git each commit is associated with a hash. The content of the file refs/head/master means master is pointing to the commit
acde617e8ab39bb157821d3bf84d04e157bff52c
// TODO:
1. Now make a second commit
2. check `$ git log` and content of **refs/head/master**
After you have made your first commit if you inspect the contents of the .git folder again you will see something new
.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
├── index
├── info
│ └── exclude
├── logs
│ ├── HEAD
│ └── refs
│ └── heads
│ └── master
├── objects
│ ├── ac
│ │ └── de617e8ab39bb157821d3bf84d04e157bff52c
│ ├── dc
│ │ └── 0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
│ ├── e9
│ │ └── 65047ad7c57865823c7d992b1d046ea66edf78
│ ├── info
│ └── pack
└── refs
├── heads
│ └── master
└── tags
There is a new file called index and some weird things inside object folders (we will ignore all other new things for now).
When you run $ git add .
git takes the changes that you have made and creates objects for it. The names of the objects are determined by running your file content into SHA1 algorithm. SHA1 algorithm basically takes some input and outputs 40 character string.
Let's try to generate SHA1 of the file README.md. You can do that by running
$ git hash-object README.md
which will give you output
e965047ad7c57865823c7d992b1d046ea66edf78
So that is where the content of the file README.md is stored (inside objects folder). The first two characters of the hash are used for folder name. The file 65047ad7c57865823c7d992b1d046ea66edf78 is binary file to see its
content we can run
$ git cat-file -p e965047ad7c57865823c7d992b1d046ea66edf78
which outputs
Hello
Which is the content of your README.md !!!
But what are other two objects?
There two other objects that are present in the objects directory. What are those?
Git has four types of objects blob, tree, commit and tag. Blob is used to store the content of the file the one we just saw is a blob.
You can see the type of the object by running
$ git cat-file -t e965047ad7c57865823c7d992b1d046ea66edf78
which will print
blob
When you run
$ git cat-file -p acde617e8ab39bb157821d3bf84d04e157bff52c
and you will get
tree dc0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
author username <test@email.com> 1597845162 +0530
committer username <test@email.com> 1597845162 +0530
Initial commit
That is our actual commit and it has author, committer and something called tree which is another git object
Lets see what that tree object has by running
$ git cat-file -p dc0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
It outputs
100644 blob e965047ad7c57865823c7d992b1d046ea66edf78 README.md
It has the name of the file and name of the blob that has the file content. It is essentially how your working directory looked like at that commit.
Let's sum up our understanding till now. When you make a commit in git. The content of the file are passed through an SHA1 hash to get a 40 character length string, file is created with that name storing the contents. Then it creates a tree object which is essentially how your working directory looked at that point in time. The tree says which blobs are associated with which file names. Then there is a commit object which points to this tree object and also has the commit message, author, committer, and email.
// TODO
1. Now make another commit
2. inspect the contents of the .git folder
3. See what are the objects that are there in .git folder
4. Look at content of objects
Branching
Now lets create a branch by running
$ git branch feature-1
Now lets take a look at content of .git folder
.git/
├── branches
├── COMMIT_EDITMSG
├── config
├── description
├── HEAD
├── hooks
├── index
├── info
│ └── exclude
├── logs
│ ├── HEAD
│ └── refs
│ └── heads
│ ├── feature-1
│ └── master
├── objects
│ ├── ac
│ │ └── de617e8ab39bb157821d3bf84d04e157bff52c
│ ├── dc
│ │ └── 0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
│ ├── e9
│ │ └── 65047ad7c57865823c7d992b1d046ea66edf78
│ ├── info
│ └── pack
└── refs
├── heads
│ ├── feature-1
│ └── master
└── tags
Now there is a new file called refs/heads/feature-1 and if we take a peak at its content it will be the commit hash from which you created the branch.
Now if we checkout feature-1 branch by running
$ git checkout feature-1
The content of our HEAD file changes to
ref: refs/heads/feature-1
// TODO
1. Try creating a file refs/heads/feature-2
2. Run git log
3. Put the hash inside that file
4. Try running git branch
Staging area
When you create a file you are creating the file in your local file system and after you are done you add the file to git by running $ git add .
this adds the file to the staging area. Then when you make commit the files the commit object is created for files in the staging area.
So the question now is where is this staging area? The answer is it is in the index file
We can see the contents of the staging area by running
$ git ls-files --stage
which gives us
100644 e965047ad7c57865823c7d992b1d046ea66edf78 0 README.md
// TODO
1. Create a README2.md file
2. Run git ls-files --stage and look at its content
3. Run git add .
4. Now run git ls-files --stage again
Note:
- We have skipped some details like tags, packing...
- The number 100644 is essentially permissions of the file
Thanks to Yancy Min for sharing their work on Unsplash
Top comments (2)
Very insightful content!
So basically all that we edit and write in our files are stored inside a mere 40 char string encoded with SHA1? Or are there any limitations to this?
Thank you!
Thank you @heytulsiprasad there are some limitations of SHA1. Git is working on migrating to SHA256 for better hash security lwn.net/Articles/811068/ but it will take some time.