Most developers use Git every day—committing, branching, merging—but few truly understand what happens under the hood. Git’s design is elegant yet often misunderstood. This article will peel back the layers, explaining Git’s core mechanisms, storage model, and how everyday commands work at a fundamental level.
By the end, you'll understand:
- How Git stores data (blobs, trees, commits, and tags)
- The role of the index (staging area) and HEAD
- How branches and tags really work (they’re just pointers!)
- What happens when you run
git commit
,git merge
, orgit rebase
- How Git efficiently stores history (packfiles and delta compression)
- How to recover lost commits using low-level Git commands
1. Git’s Data Model: Blobs, Trees, Commits, and Tags
Git is fundamentally a content-addressable filesystem—meaning every piece of data is stored as an object, referenced by a unique hash (SHA-1 or SHA-256). There are four core object types:
1.1 Blobs (Binary Large Objects)
- Store raw file contents (text, binary, etc.)
- Key property: A blob doesn’t store the filename—just the content.
- Example: A file
hello.txt
with text"Hello, Git!"
is stored as a blob.
1.2 Trees (Directory Snapshots)
- Represent directories, mapping filenames to blobs or other trees.
- Example:
tree 123abc
├── 100644 blob abc123 hello.txt
└── 040000 tree def456 subdir
1.3 Commits (Project Snapshots)
- Point to a tree (the state of the repo at that commit).
- Contain metadata: author, timestamp, commit message, and parent commit(s).
- Example:
commit 789xyz
tree abc123
parent 456def
author Alice <alice@example.com>
committer Bob <bob@example.com>
message "Add hello.txt"
1.4 Tags (Named References to Commits)
- A permanent label for a specific commit (e.g.,
v1.0.0
). - Unlike branches, tags don’t move.
2. How Git Stores Objects: The .git
Directory
Git’s entire database lives in .git/
:
.git/
├── objects/ # Stores all objects (blobs, trees, commits)
│ ├── 12/345abc # Blob
│ ├── ab/cdef12 # Tree
│ └── 34/567def # Commit
├── refs/ # References (branches, tags)
│ ├── heads/ # Branches (e.g., main, feature)
│ └── tags/ # Tags (e.g., v1.0.0)
├── HEAD # Points to current branch/commit
└── index # Staging area (next commit)
Key Insight: Everything is Content-Addressed
- Each object is named by its SHA-1 hash (e.g.,
abc123...
). - The first two characters are the directory (
ab/
), the rest is the filename (c123...
).
Example:
# Find the hash of a file
$ git hash-object hello.txt
ab1234...
# View the object’s content
$ git cat-file -p ab1234
Hello, Git!
3. Branches and HEAD: Just Pointers!
- A branch (e.g.,
main
) is just a reference to a commit. -
HEAD
points to the current branch (or commit, in "detached HEAD" mode).
main
↓
A ← B ← C ← D
- If you commit on
main
, Git:- Creates a new commit (
E
) pointing toD
. - Moves
main
forward toE
.
- Creates a new commit (
main
↓
A ← B ← C ← D ← E
Detached HEAD Mode
- If you checkout a commit directly (
git checkout abc123
),HEAD
points to the commit, not a branch. - New commits are orphaned unless you create a branch.
4. The Index (Staging Area): Where Commits Are Prepared
- The index (
.git/index
) is a binary file tracking what goes into the next commit. - When you
git add
, Git:- Creates a blob of the file.
- Updates the index to point to it.
Key Insight:
git commit
takes the index and turns it into a tree, then creates a commit pointing to that tree.
5. How Common Commands Work Internally
5.1 git commit
- Takes the index and creates a tree.
- Creates a commit with:
- The tree (current state).
- Parent commit (current
HEAD
). - Author/message metadata.
- Updates the branch reference (e.g.,
main
) to point to the new commit.
5.2 git merge
- Finds the common ancestor (merge base).
- Performs a three-way merge:
- Base vs.
HEAD
(your changes). - Base vs.
other-branch
(their changes).
- Base vs.
- Creates a merge commit (unless it’s a fast-forward).
5.3 git rebase
- Finds commits not in the target branch.
- Replays them one by one on top of the target.
- Moves the branch pointer forward.
6. Efficient Storage: Packfiles and Delta Compression
- Initially, Git stores objects loosely (one file per object).
- Over time, Git compresses them into packfiles (
.git/objects/pack/
). - Uses delta compression (storing only changes between similar objects).
Example:
If two versions of a file differ slightly, Git stores only the difference, not two full copies.
7. Recovering Lost Commits
Since Git never truly deletes objects (just removes references), you can recover "lost" commits using:
# Find dangling commits
$ git fsck --lost-found
# View a commit
$ git show abc123
# Recover it by creating a branch
$ git branch recovered abc123
Conclusion: Why This Matters
Understanding Git’s internals helps you:
- Debug issues (merge conflicts, detached HEAD, lost commits).
- Optimize workflows (e.g., when to rebase vs. merge).
- Write custom Git tools (e.g., scripting with
git cat-file
). - Appreciate Git’s design (simple but powerful).
Top comments (0)