DEV Community

Cover image for A Deep Dive into Git: Understanding Commits, Merges, and Rebases
Vahid Ghadiri
Vahid Ghadiri

Posted on

A Deep Dive into Git: Understanding Commits, Merges, and Rebases

Introduction

Git, a powerful distributed version control system, manages project history through snapshots called commits. Operations like merging and rebasing allow developers to combine or reorganize these snapshots to suit collaboration and maintenance needs. This article explores the internal mechanics of Git's commits, merges, and rebases, diving into the object model (commits, trees, and blobs), the three-way merge algorithm, and how rebasing rewrites history. By understanding these concepts, developers can master Git workflows, resolve conflicts, and maintain clean project histories.

What is a Commit in Git?

A commit in Git represents a snapshot of your project's state at a specific point in time. Stored as an immutable commit object, it’s identified by a unique SHA-1 hash (or SHA-256 in newer configurations). A commit object contains:

  • A pointer to a tree object, capturing the directory structure and file contents.
  • Pointers to one or more parent commits, forming a directed acyclic graph (DAG).
  • Metadata: author name, email, timestamp, and committer information.
  • A commit message describing the changes.

For example:

commit 3f1d2ab273...
tree ab2e3r1c0b...
parent 7ba3f1c0b2...
author Jane Doe <jane@example.com> 1624392390 +0100
committer Jane Doe <jane@example.com> 1624392390 +0100
Add feature X
Enter fullscreen mode Exit fullscreen mode

This immutability ensures the integrity of your project’s history, making Git reliable for collaboration and auditing.

Understanding Git’s Object Model

Git’s efficiency stems from its object model, consisting of three primary types: commits, trees, and blobs.

Commit Object

The commit object ties together the project’s state and history. It references a tree object, parent commits, and metadata, stored in the .git/objects directory, identified by its SHA-1 hash.

Tree Object

A tree object represents a directory snapshot, containing:

  • References to blob objects (files).
  • References to other tree objects (subdirectories).
  • Metadata, such as filenames and permissions (e.g., 100644 for regular files, 100755 for executables).

For example:

100644 blob a789c3d... README.md
100644 blob b4e42f1... index.js
040000 tree b3db2a6... src
Enter fullscreen mode Exit fullscreen mode

Trees allow Git to reconstruct the project’s file structure at any commit.

Blob Object

A blob (binary large object) stores a file’s raw content, excluding metadata like filenames. Identical file contents share the same blob, enabling deduplication. Blobs are stored in .git/objects, identified by their SHA-1 hash.

This object model ensures efficient storage, deduplication, and fast retrieval, making Git scalable for large projects.

Merging: Combining Branches

Merging combines changes from two branches, preserving their history. Git typically uses the three-way merge algorithm.

Three-Way Merge

When merging a feature branch into main, Git:

  1. Identifies the merge base (common ancestor commit).
  2. Computes differences from the merge base to the heads of both branches.
  3. Combines these differences into a new snapshot.
  4. Creates a merge commit with two parents, referencing both branch heads.

For example, consider this DAG:

A --- B --- C (main)
       \
        D --- E (feature)
Enter fullscreen mode Exit fullscreen mode

Running:

git checkout main
git merge feature
Enter fullscreen mode Exit fullscreen mode

Results in:

A --- B --- C --- M (main)
       \         /
        D --- E (feature)
Enter fullscreen mode Exit fullscreen mode

Here, M is the merge commit, with parents C and E, and B is the merge base. If conflicts arise, Git pauses, marks conflicting files, and requires manual resolution before committing.

Fast-Forward Merge

If main has no unique commits since the merge base, Git performs a fast-forward merge, moving the main pointer to the feature head without a merge commit:

A --- B --- D --- E (main, feature)
Enter fullscreen mode Exit fullscreen mode

Rebasing: Rewriting History

Rebasing rewrites history by replaying commits from one branch onto another, creating a linear history without merge commits. For example:

git checkout feature
git rebase main
Enter fullscreen mode Exit fullscreen mode

Given:

A --- B --- C (main)
       \
        D --- E (feature)
Enter fullscreen mode Exit fullscreen mode

Git:

  1. Identifies the common ancestor (B).
  2. Creates patches for each commit in feature (D and E).
  3. Moves the feature branch to the tip of main (C).
  4. Applies the patches, creating new commits (D' and E') with new hashes.
  5. Updates feature to point to E':
A --- B --- C --- D' --- E' (feature)
Enter fullscreen mode Exit fullscreen mode

The original D and E commits become unreachable unless referenced elsewhere.

Visualizing Rebase

Before rebase:

A --- B --- C (main)
       \
        D --- E (feature)
Enter fullscreen mode Exit fullscreen mode

After rebase:

A --- B --- C --- D' --- E' (feature)
Enter fullscreen mode Exit fullscreen mode

This linear history is cleaner but rewrites commit hashes, which can complicate collaboration if the branch is already shared.

Three-Way Merge in Detail

The three-way merge algorithm involves:

  1. Identify the Merge Base: Find the common ancestor (e.g., B).
  2. Compute Differences:
    • diff(B → C) for main.
    • diff(B → E) for feature.
  3. Combine Changes: Merge differences into a new snapshot, resolving conflicts if needed.
  4. Create Merge Commit: Generate a new commit (M) with two parents (C and E) and a tree reflecting the combined state.

Practical Commands for Exploration

To dive into Git’s internals, try these commands:

  • View a commit object:
  git cat-file -p HEAD
Enter fullscreen mode Exit fullscreen mode
  • View a tree object:
  git cat-file -p <tree-hash>
Enter fullscreen mode Exit fullscreen mode
  • Visualize the commit graph:
  git log --oneline --graph --all
Enter fullscreen mode Exit fullscreen mode
  • List all objects:
  git rev-list --objects --all
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  • Commits are immutable snapshots with metadata and parent pointers, stored as commit objects.
  • Git’s object model (commit, tree, blob) enables efficient storage and deduplication.
  • Merge combines branches, preserving history with merge commits or fast-forwarding.
  • Rebase rewrites history for a linear look, creating new commits with new hashes.
  • The three-way merge algorithm relies on the common ancestor to combine changes.
  • Understanding these mechanics improves control over Git workflows, conflict resolution, and history management.

Final Thoughts

Git’s merge and rebase operations, built on its robust object model, provide powerful tools for shaping project history. By mastering commits, merges, and rebases, developers can navigate complex workflows, resolve conflicts efficiently, and maintain clean histories. Commands like git log --graph and git cat-file reveal Git’s elegant design, empowering you to leverage its full potential for collaboration and project management.

Additional Resources

Top comments (0)