Introduction
Git, a powerful distributed version control system, manages project history through snapshots called commits. Operations like merging and rebasing allow developers to combine or reorganize these snapshots to suit collaboration and maintenance needs. This article explores the internal mechanics of Git's commits, merges, and rebases, diving into the object model (commits, trees, and blobs), the three-way merge algorithm, and how rebasing rewrites history. By understanding these concepts, developers can master Git workflows, resolve conflicts, and maintain clean project histories.
What is a Commit in Git?
A commit in Git represents a snapshot of your project's state at a specific point in time. Stored as an immutable commit object, it’s identified by a unique SHA-1 hash (or SHA-256 in newer configurations). A commit object contains:
- A pointer to a tree object, capturing the directory structure and file contents.
- Pointers to one or more parent commits, forming a directed acyclic graph (DAG).
- Metadata: author name, email, timestamp, and committer information.
- A commit message describing the changes.
For example:
commit 3f1d2ab273...
tree ab2e3r1c0b...
parent 7ba3f1c0b2...
author Jane Doe <jane@example.com> 1624392390 +0100
committer Jane Doe <jane@example.com> 1624392390 +0100
Add feature X
This immutability ensures the integrity of your project’s history, making Git reliable for collaboration and auditing.
Understanding Git’s Object Model
Git’s efficiency stems from its object model, consisting of three primary types: commits, trees, and blobs.
Commit Object
The commit object ties together the project’s state and history. It references a tree object, parent commits, and metadata, stored in the .git/objects
directory, identified by its SHA-1 hash.
Tree Object
A tree object represents a directory snapshot, containing:
- References to blob objects (files).
- References to other tree objects (subdirectories).
- Metadata, such as filenames and permissions (e.g.,
100644
for regular files,100755
for executables).
For example:
100644 blob a789c3d... README.md
100644 blob b4e42f1... index.js
040000 tree b3db2a6... src
Trees allow Git to reconstruct the project’s file structure at any commit.
Blob Object
A blob (binary large object) stores a file’s raw content, excluding metadata like filenames. Identical file contents share the same blob, enabling deduplication. Blobs are stored in .git/objects
, identified by their SHA-1 hash.
This object model ensures efficient storage, deduplication, and fast retrieval, making Git scalable for large projects.
Merging: Combining Branches
Merging combines changes from two branches, preserving their history. Git typically uses the three-way merge algorithm.
Three-Way Merge
When merging a feature
branch into main
, Git:
- Identifies the merge base (common ancestor commit).
- Computes differences from the merge base to the heads of both branches.
- Combines these differences into a new snapshot.
- Creates a merge commit with two parents, referencing both branch heads.
For example, consider this DAG:
A --- B --- C (main)
\
D --- E (feature)
Running:
git checkout main
git merge feature
Results in:
A --- B --- C --- M (main)
\ /
D --- E (feature)
Here, M
is the merge commit, with parents C
and E
, and B
is the merge base. If conflicts arise, Git pauses, marks conflicting files, and requires manual resolution before committing.
Fast-Forward Merge
If main
has no unique commits since the merge base, Git performs a fast-forward merge, moving the main
pointer to the feature
head without a merge commit:
A --- B --- D --- E (main, feature)
Rebasing: Rewriting History
Rebasing rewrites history by replaying commits from one branch onto another, creating a linear history without merge commits. For example:
git checkout feature
git rebase main
Given:
A --- B --- C (main)
\
D --- E (feature)
Git:
- Identifies the common ancestor (
B
). - Creates patches for each commit in
feature
(D
andE
). - Moves the
feature
branch to the tip ofmain
(C
). - Applies the patches, creating new commits (
D'
andE'
) with new hashes. - Updates
feature
to point toE'
:
A --- B --- C --- D' --- E' (feature)
The original D
and E
commits become unreachable unless referenced elsewhere.
Visualizing Rebase
Before rebase:
A --- B --- C (main)
\
D --- E (feature)
After rebase:
A --- B --- C --- D' --- E' (feature)
This linear history is cleaner but rewrites commit hashes, which can complicate collaboration if the branch is already shared.
Three-Way Merge in Detail
The three-way merge algorithm involves:
-
Identify the Merge Base: Find the common ancestor (e.g.,
B
). -
Compute Differences:
-
diff(B → C)
formain
. -
diff(B → E)
forfeature
.
-
- Combine Changes: Merge differences into a new snapshot, resolving conflicts if needed.
-
Create Merge Commit: Generate a new commit (
M
) with two parents (C
andE
) and a tree reflecting the combined state.
Practical Commands for Exploration
To dive into Git’s internals, try these commands:
- View a commit object:
git cat-file -p HEAD
- View a tree object:
git cat-file -p <tree-hash>
- Visualize the commit graph:
git log --oneline --graph --all
- List all objects:
git rev-list --objects --all
Key Takeaways
- Commits are immutable snapshots with metadata and parent pointers, stored as commit objects.
- Git’s object model (commit, tree, blob) enables efficient storage and deduplication.
- Merge combines branches, preserving history with merge commits or fast-forwarding.
- Rebase rewrites history for a linear look, creating new commits with new hashes.
- The three-way merge algorithm relies on the common ancestor to combine changes.
- Understanding these mechanics improves control over Git workflows, conflict resolution, and history management.
Final Thoughts
Git’s merge and rebase operations, built on its robust object model, provide powerful tools for shaping project history. By mastering commits, merges, and rebases, developers can navigate complex workflows, resolve conflicts efficiently, and maintain clean histories. Commands like git log --graph
and git cat-file
reveal Git’s elegant design, empowering you to leverage its full potential for collaboration and project management.
Additional Resources
- Official Git Documentation
- Pro Git Book by Scott Chacon and Ben Straub
- Git Internals Documentation
Top comments (0)