loading...

Inside the git folder

jasmin profile image Jasmin Virdi Updated on ・5 min read

Git is a Distributed Version Control System which is used widely in Software Development. It helps in keeping track of every modification to the code. There are certain features provided by gits like rollback or revert and compare earlier versions of code to spot the differences and changes while minimizing disruptions.

Let's dig deeper to understand what happens in the background when we initiate the git repository and perform any other actions on git.

git init

This is the first command that is used by us. The .git folder is created inside our project and some files are added to it. These files contain information about git configuration and history. Go to your terminal and type

cd .git

Alt Text

git add

When the user adds a new file it has two effects on the git folder.

  • It creates a new blob file in .git/objects directory. The blob file contains the compressed content of the added file. We can check the hash of the currently added file by using the command.

git ls-tree “current_branch_name”

Alt Text

  • The number 100644 denotes the permission which means that it is the
    owner group and has read or write permission. The next part is the
    hash of the file.

  • In the hash the first two characters are the name of the folder and the rest is the name of the blob file that holds the content. To read the content of the file we will use the following command.

git cat-file -p c6bcc7fc25d25b6e8fbef976b8622cd2079fcc39

In the above example, c6 is the directory inside the .git/objects
folder and rest is the hash inside it.

  • git also adds the entry to the index file. This index file keeps track of each file mapped to its hash. The files are listed in the index, not the directory. When the content of the file is changed a new blob file is created with new content and the entry of the index file is updated.

Make Commit

Whenever a new commit is made the following steps are executed:

  • A tree graph is created from the index file to represent the current version of the project being committed. It records the content and location of every file in the project. Commit object is created which points the commit branch to the new commit object.
    Alt Text

  • The tree graph is composed of blobs and trees. The tree is stored when a commit is made. A tree represents a directory in a working copy. A tree is a folder that has commits in it.

  • In the terminal type the following command.

    cat FETCH_HEAD

Alt Text

This will give you the list of all the latest commits. The hash represents the root of the working copy. Pick any of the above hash and run the following command to see the full commit.

Alt Text

  • Head and feature/files_module are both refs. A ref is a label used by git to identify specific commit. The index entry always points to the new commit. When the user makes consecutive commits the commit object points at the new root tree object. The second line points at the parent of the commit. To find parent commit git goes to HEAD followed it to recent branch and finds the commit hash of the previous commit.

  • Commits make use of the blob objects created before when the new commit is an ancestor of the previous commit. Git uses symbolic refs like HEAD, FETCH_HEAD, and MERGE_HEAD to support commands that manipulate merge history. We can use the git log to find the logs of the commit.

Git Checkout

Git checks the latest commit and gets the tree graph at that moment. It writes entries in the tree graph to working copy and index file. The content is set to the hash of the last commit.

Check out a branch that is incompatible with the working copy

When the user makes a change in a file and checks out to another branch, for example, there are 2 branches b1 and b2 so in b1 there is a file that reads 1 and in b2 the same file reads 2. In this case, git aborts the change instead of replacing or making any change in the working copy.

Merge an ancestor

Merging two branches means merging two commits. When the current branch is a descendant of a master branch than the merge can be auto fixed.

Merge two commits

  • Git writes hash of current branch commits in the file named MERGE_HEAD.
  • It finds the most recent ancestors that the c1c5 and the c1c6 branches commit have in common.
  • Then git generates indexes for the base which is the recent comment parent node from tree graphs. It generates a diff that combines the changes made to c1c5 and c1c6 commits from tree graphs.
  • Git generates a diff that combines the changes made to the base by the c1c5 and the c1c6 commit.
  • The changes indicated by the entries in the diff are applied to the working copy.
  • The changes indicated by the entries in the diff are applied to the index and the updated index is committed. Alt Text

Here c1c5 and c1c6 are ancestor commits and c1c4 is the base commit.

Merge two commits that modify the same file

  • Git writes the hash of the current branch commits to a file at .git/MERGE_HEAD.
  • Git finds the base commit (c1c7) and generates indices for the base, master and current branch commits.
  • It also generates a diff that combines the changes made to the base by the c1c8 and the c1c9 commit. This diff is a list of file paths that point to a change: add, remove, modify or conflict.
  • The changes indicated by the entries in the diff are applied to the working copy.
  • The changes indicated by the entries in the diff are applied to the index. Entries in the index are uniquely identified by a combination of their file path and stage. The entry for an unconflicted file has a stage of 0. Before this merge, the index looked like this, where the 0s are stage values:
  • The file which has been changed will have 3 entries in the index file.
    One file will have 3 entries with the hash of base, master and current branch in the index file with stages 1, 2 and 3 respectively. The presence of these three entries tells Git that the file has conflicts.

  • Git sees .git/MERGE_HEAD in the repository, which tells it that a merge is in progress. It checks the index and finds there are no conflicts.

  • Git points the current branch, master, at the new commit.

Alt Text

Here c1c8 and c1c9 are the current and master branch commits.

Reference

Git from inside out

Posted on by:

Discussion

markdown guide
 

Git is so fascinating, thanks for sharing a peek into the insides of it. I have avoided the .git directory like the plaugue.

 

This has helped me a lot in getting a better understanding of git.