Kenichiro Nakamura

Posted on May 7, 2020

git deep dive part 3: HEAD, index and working tree

#git

In the previous article, I added branches and commits. Though I have good idea what's happening behind the scene, it's important to understand the concept of "tree" and how reset and checkout works.

Three trees

HEAD, index and working tree are the three trees in git. The official document has very clear explanation of three trees concept as part of Reset Demystified article.

HEAD

As we already examined, I only need a commit to get entire snapshot. HEAD points to the current commit, which usually is the latest commit of the current branch, so we can consider HEAD as last committed snapshot.

gitDeepDive> git ls-tree -r HEAD
100644 blob 44f41854d770f2a38d368936b14975d280cbd950    docs/article.txt
100644 blob 2b366bf2f2784dbf26fcd56e1cedb3afc1345753    docs/news.txt
100644 blob 20fe8be9820a49252e2a4dd37a60e678cd5cda14    hello.txt

index

The index contains "staging items". It maintain the reference of the latest blob. Initially, it has same ids as last commit. If I modify any file and run git add, then index hold the latest id which is different from last commit.

gitDeepDive> git ls-files --stage
100644 44f41854d770f2a38d368936b14975d280cbd950 0       docs/article.txt
100644 2b366bf2f2784dbf26fcd56e1cedb3afc1345753 0       docs/news.txt
100644 20fe8be9820a49252e2a4dd37a60e678cd5cda14 0       hello.txt

Working Tree (a.k.a Working Directory)

Working tree is the root folder (gitDeepDive folder in this case). I usually modify files in this directory and git takes care of the rest.

What does staging file mean?

When I git add, I stage files. git creates blob and ids behind the scene. Let's see the behavior again.

Modify files and stage them

Run the following command to create new file and stage it.

git checkout dev
echo "cool blog" > blog.txt
echo "The third line" >> hello.txt
git add .\blog.txt .\hello.txt

Then, check the index. blob.txt is added, and id of hello.txt is changed as I modified it.

gitDeepDive> git ls-files --stage
100644 a3b6a8ee47c62758ed838e056da40f4c83fdc55a 0       blog.txt
100644 44f41854d770f2a38d368936b14975d280cbd950 0       docs/article.txt
100644 2b366bf2f2784dbf26fcd56e1cedb3afc1345753 0       docs/news.txt
100644 e4838ed8db44f8614513a8c1417e408b9d1b367c 0       hello.txt

Following objects are created or modified in .git directory.

index file was updated
a3 and e4 folder is added in objects
b6a8ee47c62758ed838e056da40f4c83fdc55a was added in a3 folder
838ed8db44f8614513a8c1417e408b9d1b367c was added in e4 folder

Now blob.txt and updated hello.txt exists in both .git directory and working tree.

Un-stage files

When I want to "unstage" the file, which means I want to remove the change from index/staging but keep working on the file in working tree, I can do git restore.

git restore --staged hello.txt

Then index restores the original id of hello.txt

gitDeepDive> git ls-files --stage
100644 a3b6a8ee47c62758ed838e056da40f4c83fdc55a 0       blog.txt
100644 44f41854d770f2a38d368936b14975d280cbd950 0       docs/article.txt
100644 2b366bf2f2784dbf26fcd56e1cedb3afc1345753 0       docs/news.txt
100644 20fe8be9820a49252e2a4dd37a60e678cd5cda14 0       hello.txt

Even though it restores the index file, the blob object still exists.

gitDeepDive> git cat-file blob e4838ed8db44f8614513a8c1417e408b9d1b367c
hello git
The second line
The third line

If I modify the hello.txt and stage it, new object is created as the content is different. git uses SHA-1 to hash the content to create id, so even 1 bit change will generate new hash and new object.

echo "The fourth line" >> hello.txt
git add .\hello.txt

See the index to confirm that hello.txt has different id.

gitDeepDive> git ls-files --stage
100644 a3b6a8ee47c62758ed838e056da40f4c83fdc55a 0       blog.txt
100644 44f41854d770f2a38d368936b14975d280cbd950 0       docs/article.txt
100644 2b366bf2f2784dbf26fcd56e1cedb3afc1345753 0       docs/news.txt
100644 7a218e826670e77d05c1c244b514a7f449056752 0       hello.txt

Remove files from working tree

To remove files from working tree, I can use git rm command. If the file is staged (or cached in another terminology), I can select to delete it from both index and working tree or just from working tree. I can also use git clean to delete files from working tree. Or, I can simply delete from directory without using git.

git rm -f blog.txt

As I used -f parameter, it removes from both working tree and index.

gitDeepDive> git ls-files --stage
100644 44f41854d770f2a38d368936b14975d280cbd950 0       docs/article.txt
100644 2b366bf2f2784dbf26fcd56e1cedb3afc1345753 0       docs/news.txt
100644 7a218e826670e77d05c1c244b514a7f449056752 0       hello.txt
gitDeepDive> ls -Name
docs
hello.txt

However, the created blob remains as expected.

gitDeepDive> git cat-file blob a3b6a8ee47c62758ed838e056da40f4c83fdc55a
cool blog

Optimize space

I have so many files in objects folder. Some of them are orphaned that no commit reference them.

git takes care of these files automatically, but you can clean up them manually with git gc and prunecommand.

git gc
git prune

As a result, it removed all unrooted files and pack other files. The .git folder looks like below.

.git
│  COMMIT_EDITMSG
│  config
│  description
│  HEAD
│  index
│  ORIG_HEAD
│  packed-refs
├─info
│      exclude
│      refs
│
├─logs
│  │  HEAD
│  │
│  └─refs
│      └─heads
│              dev
│              master
│              test
├─objects
│  ├─info
│  │      commit-graph
│  │      packs
│  │
│  └─pack
│          pack-32b528fb3da0ed8dd7c96bf4608b5874805561e1.idx
│          pack-32b528fb3da0ed8dd7c96bf4608b5874805561e1.pack
└─refs
    ├─heads
    └─tags

I can see the packed file with git verify-pack.

gitDeepDive> git verify-pack -v .\.git\objects\pack\pack-32b528fb3da0ed8dd7c96bf4608b5874805561e1.idx
367c2d000be0ffbb640252384c820ce472fe32a4 commit 246 161 12
2adbcacc0047a991956dedb4b16691ba244674b3 commit 259 171 173
16f1fa822d53d12329e9a68c7463c5697bddc7d1 commit 203 132 344
44f41854d770f2a38d368936b14975d280cbd950 blob   14 23 476
2b366bf2f2784dbf26fcd56e1cedb3afc1345753 blob   29 33 499
7a218e826670e77d05c1c244b514a7f449056752 blob   57 50 532
2baf027b74c551817c2a5ef6a3472ccc8e99738c tree   75 80 582
30962e4266975d43d1698bec735caa2e17ba3223 tree   68 78 662
20fe8be9820a49252e2a4dd37a60e678cd5cda14 blob   26 36 740
129b57b6945a4e9e56abaf5b229701565e2c6cdd tree   68 78 776
79a776223b60cb98e81a58d0ec92f00242ca7dcb tree   75 79 854
2b54426c8ded2b5334352e13b3ae62231ab67fee blob   11 20 933
a2cf761ea993127a4aae5762806441cc18d730f5 tree   37 47 953

The branch information is packed into packed-refs file.

Summary

I explain the relationships between HEAD, index and working tree and I hope this demystify some of the behavior. I explain reset in the next article to see how I can play with these three trees more.

Go to next article