Rahul Yavvari

Posted on Dec 14

Git: A Guide to Mastering Version Control

#git #webdev #programming

Git is a distributed version control system. In just 5 days, Linus Torvalds, the creator of Linux wrote his own VCS, Git. Over the years, it spread like wildfire, becoming the go-to tool for version control across developers worldwide.

Before Git, Linus and many other open-source developers used BitKeeper, a proprietary version control system. However, when the relationship between BitKeeper and the open-source community soured, Linus decided to create his own solution. Thus, Git was born, designed to be fast, reliable, and open-source.

Git allows you to search, manipulate, and revert history with ease using commands.

Git’s efficiency and flexibility revolutionized how developers collaborate, manage code, and maintain history in software projects.

Porcelain and Plumbing Commands

Git operates using two types of commands: porcelain and plumbing.

Porcelain commands are high-level, user-friendly commands that are designed to be intuitive for everyday tasks like committing changes, viewing logs, and managing branches. (You'll use these 999% of the time)
Plumbing commands are low-level commands, typically used by scripts and other tools, offering more granular control over Git's internals.

Examples of Porcelain Commands:

git clone – Create a copy of a repository.
git commit – Record changes to the repository.
git push – Upload local changes to a remote repository.
git status – Check the status of files in the working directory.

Examples of Plumbing Commands:

git cat-file – Show object content in Git’s internal database.
git ls-tree – List objects in a tree (typically used to inspect the repository structure).
git update-index – Update the index (staging area), allowing for low-level changes.

While porcelain commands are used in everyday Git interactions (they are 99% of what you are going to use), plumbing commands give developers and advanced users more control over the system's operations. We'll talk about important ones in this post.

To install Git, visit Git --distributed-even-if-your-workflow-isnt.

Once installed, you can check the version of Git with:

git --version

RTFM, or "Read The F*****g/Friendly Manual," is a common phrase in the developer community encouraging you to consult the documentation for detailed information. In Git, you can do this by running the following command to access its manual:

man git

The manual provides comprehensive information about Git commands, options, and usage to help you better understand how to use Git effectively.

Linus created Git because the license agreement with BitKeeper prohibited reverse engineering of the software, as it was proprietary. However, someone at the Linux organization violated this agreement, causing BitKeeper to revoke the license. In response, Linus created Git as an alternative.

Configuring Git

Git configuration can be done at various levels: global (for all repositories) or local (specific to a repository). The configuration file can either be located globally in the user's home directory (~/.gitconfig) or within the .git folder of a specific repository.

To set up and configure your Git repository, follow these steps:

Check Current Configuration:

Check the currently configured username:
```
 git config --get user.name
```
Check the currently configured email:
```
 git config --get user.email
```

Add Global Configuration (Set username and email):

Set your username globally (for all repositories):

 git config --add --global user.name "github_username_here"

Set your email globally (for all repositories):

 git config --add --global user.email "email@example.com"

Set Default Branch Name:

Configure Git to use master as the default branch name when initializing new repositories:
```
 git config --global init.defaultBranch master
```

Check Global Configuration File:

To view your global Git configuration, you can open the configuration file:
```
 cat ~/.gitconfig  # Location of the global config file on Linux
```

These commands set up your Git configuration to ensure proper tracking of commits, a consistent user identity across all repositories, and the setup of default behavior for new repositories.

Git Repository

A Git repository is where all your project’s files and version history are stored. The .git directory inside your project folder contains all the internal tracking and version information, like commits, branches, and configuration.

git init

To initialize a new Git repository, use:

git init

This command creates the .git directory, turning your project folder into a Git repository, and allows you to start tracking changes.

Git Status

In Git, a file can be in several stages during its lifecycle:

Untracked: Git is not aware of the file yet.
Staged: The file is ready to be committed (added to the next snapshot).
Committed: The file's changes are saved in Git's history.

Working Directory
+---------------------+
|                     |
|   Untracked Files   |  <-- Files not added to Git
|                     |
|   Staged Files      |  <-- Files added with `git add`
|                     |
+---------------------+
         |
         v
+---------------------+
|                     |
|   Committed Files   |  <-- Files saved in Git history
|                     |
+---------------------+

To see the current state of your repository, including the status of all files, you can use:

git status

This command shows you which files are untracked, staged, or have changes that need to be committed. It's a useful way to track the progress of your work.

Staging

In Git, staging refers to the process of adding files to the staging area before committing them. The staging area is like a preview of what will be included in the next commit.

To stage a specific file, you can use:

git add i-use-arch.btw

Or, if you want to stage all modified or new files, you can use:

git add .

This prepares the changes for the next commit, allowing you to control which changes are included.

Committing

In Git, committing is the process of saving a snapshot of the repository at a specific point in time. Each commit includes a commit message that describes the changes made.

To create a commit, use:

git commit -m "your message here"

This stores the current state of the staging area as a commit with the provided message.

If you want to change the message of the last commit, you can use:

git commit --amend -m "new message"

This allows you to update the last commit message without creating a new commit.

You've Learned Half of Git (Kind of)

So far, you've learned the core commands for managing your local repository. These are the essential tools for solo development:

git status – Check the current state of your repository.
git add – Stage changes to be committed.
git commit – Save a snapshot of your work with a message.

These commands are enough for managing a project on your own. However, there’s more to Git that enhances collaboration and version control.

40% of Git focuses on working with others, including handling remotes and pushing or pulling code to/from repositories hosted on platforms like GitHub or GitLab.
The last 10% involves handling mistakes, rolling back changes, and advanced topics like branching, merging, and rebasing. These help you work with more complex scenarios and refine your workflow.

But with what you've learned so far, you’re already well on your way to being proficient with Git for solo development!

Git Log

The git log command is essential for viewing the history of commits in your repository. It allows you to see who made each commit, when it was made, and what changes were introduced. Here’s an overview of the common flags used with git log:

git log

This command shows the full commit history. By default, it uses a pager to display the logs, so you can scroll through the history one screen at a time.
git --no-pager log -n 10

Shows the last 10 commits without using a pager, so the output is directly shown in your terminal without pausing. This is useful when you want a quick look at the most recent commits.
git --no-pager log -n 10 --oneline --parents --graph
- -n 10: Limits the output to the last 10 commits.
- --oneline: Condenses the output to one line per commit, showing only the commit hash and the commit message.
- --parents: Displays the parent commits, showing the relationship between merges.
- --graph: Draws a graph of the branch structure to show how commits are related, visually representing branching and merging.
git log --decorate=full/short/no
- --decorate=full: Shows references (like branch names or tags) in full detail next to commits.
- --decorate=short: Displays references in a shortened format.
- --decorate=no: Hides reference names altogether, only showing commit details.
git log --oneline -p
- --oneline: Displays each commit on a single line.
- -p: Shows the changes introduced in each commit, making it easy to see exactly what was modified in the files for each commit.
git log --oneline --graph --all
- --oneline: Condenses each commit to a single line.
- --graph: Visualizes the branch structure as a graph.
- --all: Includes all branches (not just the current one) in the log, so you can see the full history across the entire repository.
git log --oneline --graph --decorate --parents

This combines several options:
- --oneline: One-line summary for each commit.
- --graph: Graphical representation of the commit history.
- --decorate: Shows references (like branch names or tags).
- --parents: Displays the parent commits, especially useful for understanding merges.

Each of these flags allows you to customize the output of git log to suit different needs, making it easier to view commit history and track changes in your project.

Commit Hash (SHA-1)

A commit hash is a unique identifier generated for each commit in Git using the SHA-1 hashing algorithm. It is used to track commits and their changes within the repository.

For example: 5ba78624h4i5hslv831c01e71444b9baa2228a4f

In practice, only the first 7 characters of the commit hash are typically required to identify it.

The commit hash is a function of:

The commit message: The text describing the changes made in the commit.
The author's name and email: The person who made the commit.
The date and time: When the commit was made.
Parent (previous) commit hashes: The commit(s) that preceded the current commit, linking the history together.

Due to these factors, the probability of hash collisions (two different commits having the same hash) is extremely low, ensuring each commit can be uniquely identified.

Let's Peek into Plumbing

In Git, plumbing refers to the internal mechanisms that handle the storage and organization of the repository’s data. All Git data is stored in the .git directory, which is hidden within the project folder.

.git/objects: This is where Git stores its data objects, including commits, trees, and blobs. A commit is actually a type of object, and all the versioning information is stored as objects here.

Git's approach to file storage helps ensure efficient access and retrieval while preventing system limitations, such as those found in traditional file systems.

Inodes in Filesystems

Inodes are data structures used by file systems (like in Linux) to manage file metadata (e.g., permissions, timestamps, etc.). When there are many files in the same directory, it can lead to inode busting, where the file system struggles to manage so many files in one place, leading to performance issues.

To avoid inode busting, Git employs a clever method:

Git organizes objects by the first two characters of their commit hash, creating directories that contain files named by the remaining characters of the hash. This reduces the number of files per directory, improving performance.

Example:

If you look into a specific object, for example:

cat .git/objects/78/asfadfefj8e0r48...

It will print a bunch of compressed, raw byte data. This is the content of the object, which Git stores in a compressed form to save space and make the .git directory smaller.

Git uses this object storage system to efficiently manage and track changes while maintaining a highly optimized file structure.

Built-in Plumbing Command: `git cat-file`

Git provides built-in plumbing commands for accessing the raw internal data of your repository. One such command is git cat-file, which allows you to interact with Git objects by their hash values.

`git cat-file -p <hash>`

This command is used to pretty-print the content of a Git object given its hash.
You can use the first 4 characters of the hash instead of the full hash for quicker access.
Example:

  git cat-file -p <hash>

This will display the content of the object (commit, tree, or blob) in a readable format.

Hex Dump of a File

To see the raw binary content of a file object in Git, you can use xxd to convert it into a hex format:

xxd path/to/file > /tmp/commit_object_hex.txt

This will generate a hex dump of the specified file and save it to the /tmp/commit_object_hex.txt file.

Types of Git Objects

Commit
- A commit object represents a snapshot of your repository at a specific point in time.
- A commit contains a tree object (which represents a directory) that points to blob objects (which represent files).
- Example: A commit might look like:
  - Commit → Tree → Blob → Contents (i.e., a file’s content)
Tree
- A tree object represents a directory in Git.
- It contains references to blob objects (files) or other trees (subdirectories).
- A tree is similar to the structure of directories in the file system, but represented as a Git object.
Blob
- A blob object stores the contents of a file.
- It contains the actual data of the file, without any directory structure.

The Relationship Between Commit, Tree, Blob, and Contents

A commit contains:

A reference to a tree object, which is a representation of the directory structure at the time of the commit.
The tree object points to blob objects, which represent the files in the commit.
The blob objects contain the actual file content.

Example structure:

Commit → Tree → Blob → Contents
(A) → (tree) → (blob) → contents.md

Parent Hash in Git Log

The parent commit in the Git log is the hash of the previous commit.
It’s important for Git to keep track of parent commits so it can maintain the history and structure of the repository over time.

Plumbing vs. Porcelain Commands

git log is a Porcelain command
git cat-file is a Plumbing command

By understanding how trees and blobs work, you can get a deeper insight into how Git organizes and stores data. This internal structure helps Git manage changes efficiently while keeping the repository lightweight.

Storing Data in Git

Git doesn’t just store changes; it stores entire snapshots of all files in each commit. Each commit captures the full state of the project at that point in time, not just the differences.

Efficiency Through Compression

Git uses compression to minimize the size of the .git directory, which helps reduce storage requirements. Even large files are stored efficiently, thanks to Git’s internal mechanisms.

Tree Objects and Blobs

Each commit stores a unique tree object, which is a snapshot of the directory. While a new tree is created per commit, Git doesn’t store unchanged files again. Instead, it points to existing blobs (files) from previous commits, reducing duplication and making the process efficient.

Deleting Files in Later Commits

If files are deleted in later commits, the hash for those files will point to null. This creates broken links in the history. To clean up these references, you can prune the repository to remove unused objects and reduce size.

Pruning and Cleaning

You can prune broken links and optimize the repository using the following command:

git gc --prune=now

This helps remove unreferenced objects and keep the repository smaller.

Configuring Git(Again)

Git provides multiple levels of configuration, allowing you to set configuration options globally, locally, or for specific worktrees. Here’s an overview of how you can manage and interact with Git configuration.

Git Config Commands

View Local Configuration To view the local configuration for a specific repository:

   git config --list --local
   cat .git/config

Get a Specific Configuration Value To retrieve a specific configuration value:

   git config --get <key>

Unset a Configuration Key To remove a specific key from the configuration:

   git config --unset <key>

Unset All Occurrences of a Key To remove all instances of a configuration key:

   git config --unset-all example.key

(Note: Git will only apply the last occurrence of a key in the configuration file, so if you have duplicates, the last one will take precedence.)

Remove a Section from Configuration To remove an entire section from the configuration:

   git config --remove-section section

Manual Editing You can also directly edit the .git/config file (or other configuration files depending on the scope) for an easier approach.

Configuration File Locations

There are several places where Git configuration files can exist, each with different scopes:

System: /etc/gitconfig This file configures Git for all users on the system.
Global: ~/.gitconfig This file configures Git for all repositories of a user.
Local: .git/config This file configures Git for a specific repository.
Worktree: .git/config.worktree This file configures Git for a specific part of a project (worktree).

Git Branches

A branch in Git is essentially a named pointer to a specific commit. When you create a branch, you're essentially creating a new pointer that tracks a particular commit. The commit that the branch points to is called the tip of the branch. Branches are lightweight, as they are just pointers and don’t require duplicating the entire project, making them a cheap resource-wise creation.

Common Commands

Rename a Branch You can rename a branch using the following command:

   git branch -m oldname newname

Create a New Branch (without switching) To create a new branch but not switch to it, use:

   git branch my_new_feature

Create and Switch to a New Branch To create and switch to a new branch in one step, use:

   git switch -c my_new_feature

Switch to an Existing Branch To simply switch to an existing branch, use:

   git switch my_existing_feature

Or, the old way:

   git checkout my_existing_feature

Branch Information Storage

Git stores all information about branches in files within the .git subdirectory at the root of your project. The "heads" (or "tips") of branches are specifically stored in the .git/refs/heads/ directory.

Merging Branches

To find the best common ancestor commit (merge base) between two branches, Git will use the merge base to identify the common commit point for merging. For example:

git merge my_feature_branch

Amending Commit Messages

If you need to change the message of the last commit, you can use the --amend flag:

git commit --amend -m "Updated commit message"

Fast-Forward Merge

In a fast-forward merge, if the feature branch has all the commits that the base branch has, Git will simply move the pointer of the base branch to the tip of the feature branch. For example:

git merge my_feature_branch

Deleting a Branch

Once you're done with a branch, you can delete it using:

git branch -d my_feature_branch

This will delete the branch locally if it has been merged. If the branch is not merged, use -D to force the deletion.

Common Git Workflow for Team Development

Create a Branch Start by creating a new branch for the change.
Make the Change Work on the change, then commit it once it’s ready.
Merge the Branch into Main Once the change is complete, merge the branch back into the main branch.
Remove the Branch After merging, delete the branch to keep the repository clean.
Repeat Repeat this process for each new change or feature.

Rebase in Git

Rebasing is a way to move or combine a sequence of commits to a new base commit. It helps maintain a cleaner, more linear history compared to merging.

Consider this scenario:

A - B - C    main
   \
    D - E    feature_branch

You're working on feature_branch, and you want to bring in the latest changes from main to avoid working with stale code. You could merge main into feature_branch, but that would introduce a merge commit, which could clutter the history. Instead, rebase re-applies the commits from feature_branch on top of main, creating a linear history.

After running the following rebase command:

git rebase main

The history will look like this:

A - B - C         main
         \
          D' - E' feature_branch

Notice that the commits D and E have been rewritten as D' and E', because the history of feature_branch is now based on commit C from main. This is why the commit hashes change after a rebase—the base for these commits has shifted.

Why Use Rebase?

Cleaner History: Rebase creates a linear history by avoiding unnecessary merge commits. It makes the project history easier to read and understand, especially when reviewing changes.
Avoid Merge Commits: Unlike merge, which can result in many merge commits, rebase keeps the history cleaner and linear.

However, rebasing rewrites commit history, so it should be used with caution.

Important Notes:

Rebase on Feature Branches: Rebasing is safe on your own feature branches. You can rebase as often as needed to keep your branch up to date with the latest changes from the main branch.
Never Rebase Public Branches: Refrain from rebasing shared branches (like main) that other developers are working on. Changing the commit history of a public branch can cause conflicts and confusion for others who have already based their work on that history.
Preserving History with Merge: While rebase results in a cleaner history, merging preserves the true history of the project, showing exactly when and where branches were merged. If preserving the full history is important, use merge. Otherwise, rebase for a more straightforward commit history.

Example of Switching Branches:

If you need to branch off from a specific commit in your history, you can use the following:

git switch -c new_feature <COMMITHASH>

This creates a new branch called new_feature from the commit identified by <COMMITHASH>.

Just be cautious when using rebase on shared branches, as it alters commit history.

Some terms you might want to know

Index: The staging area where changes are added before committing. Files in this area are ready to be committed but are not yet part of the commit history.
Worktree: The working directory of your project, where all files exist. This includes both staged and unstaged changes, which may or may not be added to the commit history.

Git Reset: Undoing Changes

The git reset command is used to undo changes in your repository. It moves the current branch pointer to a different commit, and can affect the staging area and working directory.

Types of Reset

git reset --soft <COMMITHASH>
- Resets the branch to the specified commit.
- Keeps your changes staged for a new commit.
git reset --hard <COMMITHASH>
- Resets the branch and discards all changes (both staged and unstaged).
- Caution: Irreversible, all uncommitted changes are lost.

Example:

Soft Reset (keep changes staged):

  git reset --soft HEAD~2

Hard Reset (discard changes):

  git reset --hard abc1234

Warning:

Hard reset is irreversible. It will permanently discard any uncommitted changes, so use it with caution. Always make sure you've saved or committed your important work before performing a --hard reset.

.gitignore

The .gitignore file is a hidden file (it starts with a dot) that tells Git which files or directories to ignore in a project. It's used to prevent unnecessary or sensitive files (such as build artifacts, logs, or configuration files) from being tracked in version control.

How It Works:

Global .gitignore: This is typically located in the root directory of your project and applies to the entire repository.
Local .gitignore: You can also have .gitignore files in subdirectories. These will apply only to files within that specific subdirectory, allowing for more granular control.

Example:

Root .gitignore (ignores *.log files everywhere):

  *.log

Subdirectory .gitignore (in a logs/ directory, ignores *.tmp files there):

  logs/*.tmp

In this case, the *.log files will be ignored everywhere in the repository, but only the *.tmp files within the logs/ directory will be ignored.

Key Points:

You can have .gitignore files in multiple directories, and each one applies to files within that directory and its subdirectories.
.gitignore works based on relative paths. You can specify files or folders to ignore, or use wildcards like *.log or **/temp/.

Example Structure:

/project-root
  .gitignore       # ignores *.log globally
  /logs
    .gitignore     # ignores *.tmp in the logs directory
    error.log      # will be ignored because of the global .gitignore
    temp.tmp       # will be ignored because of the subdirectory .gitignore

This flexibility helps manage which files should be tracked and which should remain local to the development environment.

Always RTFM (Read The Freaking Manual) before asking people with experience. Understanding the fundamentals and referring to the official documentation will not only save you time but also help you grasp concepts more thoroughly.

Revise Git on a weekly basis to keep your skills sharp, and the best way to do this is by integrating Git into the projects you're actively working on. By using it regularly, you'll become more comfortable with advanced features and workflows, which will make version control feel like second nature.

This is part 1 of the series, and part 2 is coming soon! Stay tuned for more insights on mastering Git.

Top comments (2)

Power_Coder • Dec 14

Bright and thorough.

CB • Dec 14

Very insightful! Thank you

Porcelain and Plumbing Commands

Examples of Porcelain Commands:

Examples of Plumbing Commands:

Configuring Git

Git Repository

git init

Git Status

Staging

Committing

You've Learned Half of Git (Kind of)

Git Log

Commit Hash (SHA-1)

Let's Peek into Plumbing

Inodes in Filesystems

Example:

Built-in Plumbing Command: git cat-file

git cat-file -p <hash>

Hex Dump of a File

Types of Git Objects

The Relationship Between Commit, Tree, Blob, and Contents

Parent Hash in Git Log

Plumbing vs. Porcelain Commands

Storing Data in Git

Efficiency Through Compression

Tree Objects and Blobs

Deleting Files in Later Commits

Pruning and Cleaning

Configuring Git(Again)

Git Config Commands

Configuration File Locations

Git Branches

Common Commands

Branch Information Storage

Merging Branches

Amending Commit Messages

Fast-Forward Merge

Deleting a Branch

Common Git Workflow for Team Development

Rebase in Git

Why Use Rebase?

Important Notes:

Example of Switching Branches:

Some terms you might want to know

Git Reset: Undoing Changes

Types of Reset

Example:

Warning:

.gitignore

How It Works:

Example:

Key Points:

Example Structure:

Read next

Best Practices for Writing Clean Code in Multiple Languages

Bolt.new with any LLM, you need to use it

pyya - The way to manage YAML config in your Python project

# How to write good commit messages

Built-in Plumbing Command: `git cat-file`

`git cat-file -p <hash>`