Abhay Singh Kathayat

Posted on Jul 1

Git from the Inside Out: A Deep Dive into How Git Really Works

#github #git #versioncontrol #vcs

Most developers use Git every day—committing, branching, merging—but few truly understand what happens under the hood. Git’s design is elegant yet often misunderstood. This article will peel back the layers, explaining Git’s core mechanisms, storage model, and how everyday commands work at a fundamental level.

By the end, you'll understand:

How Git stores data (blobs, trees, commits, and tags)
The role of the index (staging area) and HEAD
How branches and tags really work (they’re just pointers!)
What happens when you run git commit, git merge, or git rebase
How Git efficiently stores history (packfiles and delta compression)
How to recover lost commits using low-level Git commands

1. Git’s Data Model: Blobs, Trees, Commits, and Tags

Git is fundamentally a content-addressable filesystem—meaning every piece of data is stored as an object, referenced by a unique hash (SHA-1 or SHA-256). There are four core object types:

1.1 Blobs (Binary Large Objects)

Store raw file contents (text, binary, etc.)
Key property: A blob doesn’t store the filename—just the content.
Example: A file hello.txt with text "Hello, Git!" is stored as a blob.

1.2 Trees (Directory Snapshots)

Represent directories, mapping filenames to blobs or other trees.
Example:

  tree 123abc
  ├── 100644 blob abc123 hello.txt
  └── 040000 tree def456 subdir

1.3 Commits (Project Snapshots)

Point to a tree (the state of the repo at that commit).
Contain metadata: author, timestamp, commit message, and parent commit(s).
Example:

  commit 789xyz
  tree abc123
  parent 456def
  author Alice <alice@example.com>
  committer Bob <bob@example.com>
  message "Add hello.txt"

1.4 Tags (Named References to Commits)

A permanent label for a specific commit (e.g., v1.0.0).
Unlike branches, tags don’t move.

2. How Git Stores Objects: The `.git` Directory

Git’s entire database lives in .git/:

.git/
├── objects/       # Stores all objects (blobs, trees, commits)
│   ├── 12/345abc  # Blob
│   ├── ab/cdef12  # Tree
│   └── 34/567def  # Commit
├── refs/          # References (branches, tags)
│   ├── heads/     # Branches (e.g., main, feature)
│   └── tags/      # Tags (e.g., v1.0.0)
├── HEAD           # Points to current branch/commit
└── index          # Staging area (next commit)

Key Insight: Everything is Content-Addressed

Each object is named by its SHA-1 hash (e.g., abc123...).
The first two characters are the directory (ab/), the rest is the filename (c123...).

Example:

# Find the hash of a file
$ git hash-object hello.txt
ab1234...

# View the object’s content
$ git cat-file -p ab1234
Hello, Git!

3. Branches and HEAD: Just Pointers!

A branch (e.g., main) is just a reference to a commit.
HEAD points to the current branch (or commit, in "detached HEAD" mode).

         main
           ↓
A ← B ← C ← D

If you commit on main, Git:
1. Creates a new commit (E) pointing to D.
2. Moves main forward to E.

         main
           ↓
A ← B ← C ← D ← E

Detached HEAD Mode

If you checkout a commit directly (git checkout abc123), HEAD points to the commit, not a branch.
New commits are orphaned unless you create a branch.

4. The Index (Staging Area): Where Commits Are Prepared

The index (.git/index) is a binary file tracking what goes into the next commit.
When you git add, Git:
1. Creates a blob of the file.
2. Updates the index to point to it.

Key Insight:

git commit takes the index and turns it into a tree, then creates a commit pointing to that tree.

5. How Common Commands Work Internally

5.1 `git commit`

Takes the index and creates a tree.
Creates a commit with:
- The tree (current state).
- Parent commit (current HEAD).
- Author/message metadata.
Updates the branch reference (e.g., main) to point to the new commit.

5.2 `git merge`

Finds the common ancestor (merge base).
Performs a three-way merge:
- Base vs. HEAD (your changes).
- Base vs. other-branch (their changes).
Creates a merge commit (unless it’s a fast-forward).

5.3 `git rebase`

Finds commits not in the target branch.
Replays them one by one on top of the target.
Moves the branch pointer forward.

6. Efficient Storage: Packfiles and Delta Compression

Initially, Git stores objects loosely (one file per object).
Over time, Git compresses them into packfiles (.git/objects/pack/).
Uses delta compression (storing only changes between similar objects).

Example:

If two versions of a file differ slightly, Git stores only the difference, not two full copies.

7. Recovering Lost Commits

Since Git never truly deletes objects (just removes references), you can recover "lost" commits using:

# Find dangling commits
$ git fsck --lost-found

# View a commit
$ git show abc123

# Recover it by creating a branch
$ git branch recovered abc123

Conclusion: Why This Matters

Understanding Git’s internals helps you:

Debug issues (merge conflicts, detached HEAD, lost commits).
Optimize workflows (e.g., when to rebase vs. merge).
Write custom Git tools (e.g., scripting with git cat-file).
Appreciate Git’s design (simple but powerful).

DEV Community

Git from the Inside Out: A Deep Dive into How Git Really Works

1. Git’s Data Model: Blobs, Trees, Commits, and Tags

1.1 Blobs (Binary Large Objects)

1.2 Trees (Directory Snapshots)

1.3 Commits (Project Snapshots)

1.4 Tags (Named References to Commits)

2. How Git Stores Objects: The `.git` Directory

Key Insight: Everything is Content-Addressed

3. Branches and HEAD: Just Pointers!

Detached HEAD Mode

4. The Index (Staging Area): Where Commits Are Prepared

5. How Common Commands Work Internally

5.1 `git commit`

5.2 `git merge`

5.3 `git rebase`

6. Efficient Storage: Packfiles and Delta Compression

7. Recovering Lost Commits

Conclusion: Why This Matters

Top comments (0)

1. Git’s Data Model: Blobs, Trees, Commits, and Tags

1.1 Blobs (Binary Large Objects)

1.2 Trees (Directory Snapshots)

1.3 Commits (Project Snapshots)

1.4 Tags (Named References to Commits)

2. How Git Stores Objects: The .git Directory

Key Insight: Everything is Content-Addressed

3. Branches and HEAD: Just Pointers!

Detached HEAD Mode

4. The Index (Staging Area): Where Commits Are Prepared

5. How Common Commands Work Internally

5.1 git commit

5.2 git merge

5.3 git rebase

6. Efficient Storage: Packfiles and Delta Compression

7. Recovering Lost Commits

Conclusion: Why This Matters

2. How Git Stores Objects: The `.git` Directory

5.1 `git commit`

5.2 `git merge`

5.3 `git rebase`