DEV Community

Cover image for Git from the Inside Out: A Deep Dive into How Git Really Works
Abhay Singh Kathayat
Abhay Singh Kathayat

Posted on

Git from the Inside Out: A Deep Dive into How Git Really Works

Most developers use Git every day—committing, branching, merging—but few truly understand what happens under the hood. Git’s design is elegant yet often misunderstood. This article will peel back the layers, explaining Git’s core mechanisms, storage model, and how everyday commands work at a fundamental level.

By the end, you'll understand:

  • How Git stores data (blobs, trees, commits, and tags)
  • The role of the index (staging area) and HEAD
  • How branches and tags really work (they’re just pointers!)
  • What happens when you run git commit, git merge, or git rebase
  • How Git efficiently stores history (packfiles and delta compression)
  • How to recover lost commits using low-level Git commands

1. Git’s Data Model: Blobs, Trees, Commits, and Tags

Git is fundamentally a content-addressable filesystem—meaning every piece of data is stored as an object, referenced by a unique hash (SHA-1 or SHA-256). There are four core object types:

1.1 Blobs (Binary Large Objects)

  • Store raw file contents (text, binary, etc.)
  • Key property: A blob doesn’t store the filename—just the content.
  • Example: A file hello.txt with text "Hello, Git!" is stored as a blob.

1.2 Trees (Directory Snapshots)

  • Represent directories, mapping filenames to blobs or other trees.
  • Example:
  tree 123abc
  ├── 100644 blob abc123 hello.txt
  └── 040000 tree def456 subdir
Enter fullscreen mode Exit fullscreen mode

1.3 Commits (Project Snapshots)

  • Point to a tree (the state of the repo at that commit).
  • Contain metadata: author, timestamp, commit message, and parent commit(s).
  • Example:
  commit 789xyz
  tree abc123
  parent 456def
  author Alice <alice@example.com>
  committer Bob <bob@example.com>
  message "Add hello.txt"
Enter fullscreen mode Exit fullscreen mode

1.4 Tags (Named References to Commits)

  • A permanent label for a specific commit (e.g., v1.0.0).
  • Unlike branches, tags don’t move.

2. How Git Stores Objects: The .git Directory

Git’s entire database lives in .git/:

.git/
├── objects/       # Stores all objects (blobs, trees, commits)
│   ├── 12/345abc  # Blob
│   ├── ab/cdef12  # Tree
│   └── 34/567def  # Commit
├── refs/          # References (branches, tags)
│   ├── heads/     # Branches (e.g., main, feature)
│   └── tags/      # Tags (e.g., v1.0.0)
├── HEAD           # Points to current branch/commit
└── index          # Staging area (next commit)
Enter fullscreen mode Exit fullscreen mode

Key Insight: Everything is Content-Addressed

  • Each object is named by its SHA-1 hash (e.g., abc123...).
  • The first two characters are the directory (ab/), the rest is the filename (c123...).

Example:

# Find the hash of a file
$ git hash-object hello.txt
ab1234...

# View the object’s content
$ git cat-file -p ab1234
Hello, Git!
Enter fullscreen mode Exit fullscreen mode

3. Branches and HEAD: Just Pointers!

  • A branch (e.g., main) is just a reference to a commit.
  • HEAD points to the current branch (or commit, in "detached HEAD" mode).
         main
           ↓
A ← B ← C ← D
Enter fullscreen mode Exit fullscreen mode
  • If you commit on main, Git:
    1. Creates a new commit (E) pointing to D.
    2. Moves main forward to E.
         main
           ↓
A ← B ← C ← D ← E
Enter fullscreen mode Exit fullscreen mode

Detached HEAD Mode

  • If you checkout a commit directly (git checkout abc123), HEAD points to the commit, not a branch.
  • New commits are orphaned unless you create a branch.

4. The Index (Staging Area): Where Commits Are Prepared

  • The index (.git/index) is a binary file tracking what goes into the next commit.
  • When you git add, Git:
    1. Creates a blob of the file.
    2. Updates the index to point to it.

Key Insight:

git commit takes the index and turns it into a tree, then creates a commit pointing to that tree.


5. How Common Commands Work Internally

5.1 git commit

  1. Takes the index and creates a tree.
  2. Creates a commit with:
    • The tree (current state).
    • Parent commit (current HEAD).
    • Author/message metadata.
  3. Updates the branch reference (e.g., main) to point to the new commit.

5.2 git merge

  1. Finds the common ancestor (merge base).
  2. Performs a three-way merge:
    • Base vs. HEAD (your changes).
    • Base vs. other-branch (their changes).
  3. Creates a merge commit (unless it’s a fast-forward).

5.3 git rebase

  1. Finds commits not in the target branch.
  2. Replays them one by one on top of the target.
  3. Moves the branch pointer forward.

6. Efficient Storage: Packfiles and Delta Compression

  • Initially, Git stores objects loosely (one file per object).
  • Over time, Git compresses them into packfiles (.git/objects/pack/).
  • Uses delta compression (storing only changes between similar objects).

Example:

If two versions of a file differ slightly, Git stores only the difference, not two full copies.


7. Recovering Lost Commits

Since Git never truly deletes objects (just removes references), you can recover "lost" commits using:

# Find dangling commits
$ git fsck --lost-found

# View a commit
$ git show abc123

# Recover it by creating a branch
$ git branch recovered abc123
Enter fullscreen mode Exit fullscreen mode

Conclusion: Why This Matters

Understanding Git’s internals helps you:

  • Debug issues (merge conflicts, detached HEAD, lost commits).
  • Optimize workflows (e.g., when to rebase vs. merge).
  • Write custom Git tools (e.g., scripting with git cat-file).
  • Appreciate Git’s design (simple but powerful).

Top comments (0)