Git is a distributed version control system. In just 5 days, Linus Torvalds, the creator of Linux wrote his own VCS, Git. Over the years, it spread like wildfire, becoming the go-to tool for version control across developers worldwide.
Before Git, Linus and many other open-source developers used BitKeeper, a proprietary version control system. However, when the relationship between BitKeeper and the open-source community soured, Linus decided to create his own solution. Thus, Git was born, designed to be fast, reliable, and open-source.
Git allows you to search, manipulate, and revert history with ease using commands.
Git’s efficiency and flexibility revolutionized how developers collaborate, manage code, and maintain history in software projects.
Porcelain and Plumbing Commands
Git operates using two types of commands: porcelain and plumbing.
- Porcelain commands are high-level, user-friendly commands that are designed to be intuitive for everyday tasks like committing changes, viewing logs, and managing branches. (You'll use these 999% of the time)
- Plumbing commands are low-level commands, typically used by scripts and other tools, offering more granular control over Git's internals.
Examples of Porcelain Commands:
-
git clone
– Create a copy of a repository. -
git commit
– Record changes to the repository. -
git push
– Upload local changes to a remote repository. -
git status
– Check the status of files in the working directory.
Examples of Plumbing Commands:
-
git cat-file
– Show object content in Git’s internal database. -
git ls-tree
– List objects in a tree (typically used to inspect the repository structure). -
git update-index
– Update the index (staging area), allowing for low-level changes.
While porcelain commands are used in everyday Git interactions (they are 99% of what you are going to use), plumbing commands give developers and advanced users more control over the system's operations. We'll talk about important ones in this post.
To install Git, visit Git --distributed-even-if-your-workflow-isnt.
Once installed, you can check the version of Git with:
git --version
RTFM, or "Read The F*****g/Friendly Manual," is a common phrase in the developer community encouraging you to consult the documentation for detailed information. In Git, you can do this by running the following command to access its manual:
man git
The manual provides comprehensive information about Git commands, options, and usage to help you better understand how to use Git effectively.
Linus created Git because the license agreement with BitKeeper prohibited reverse engineering of the software, as it was proprietary. However, someone at the Linux organization violated this agreement, causing BitKeeper to revoke the license. In response, Linus created Git as an alternative.
Configuring Git
Git configuration can be done at various levels: global (for all repositories) or local (specific to a repository). The configuration file can either be located globally in the user's home directory (~/.gitconfig
) or within the .git
folder of a specific repository.
To set up and configure your Git repository, follow these steps:
Check Current Configuration:
-
Check the currently configured username:
git config --get user.name
-
Check the currently configured email:
git config --get user.email
Add Global Configuration (Set username and email):
-
Set your username globally (for all repositories):
git config --add --global user.name "github_username_here"
-
Set your email globally (for all repositories):
git config --add --global user.email "email@example.com"
Set Default Branch Name:
-
Configure Git to use
master
as the default branch name when initializing new repositories:
git config --global init.defaultBranch master
Check Global Configuration File:
-
To view your global Git configuration, you can open the configuration file:
cat ~/.gitconfig # Location of the global config file on Linux
These commands set up your Git configuration to ensure proper tracking of commits, a consistent user identity across all repositories, and the setup of default behavior for new repositories.
Git Repository
A Git repository is where all your project’s files and version history are stored. The .git
directory inside your project folder contains all the internal tracking and version information, like commits, branches, and configuration.
git init
To initialize a new Git repository, use:
git init
This command creates the .git
directory, turning your project folder into a Git repository, and allows you to start tracking changes.
Git Status
In Git, a file can be in several stages during its lifecycle:
- Untracked: Git is not aware of the file yet.
- Staged: The file is ready to be committed (added to the next snapshot).
- Committed: The file's changes are saved in Git's history.
Working Directory
+---------------------+
| |
| Untracked Files | <-- Files not added to Git
| |
| Staged Files | <-- Files added with `git add`
| |
+---------------------+
|
v
+---------------------+
| |
| Committed Files | <-- Files saved in Git history
| |
+---------------------+
To see the current state of your repository, including the status of all files, you can use:
git status
This command shows you which files are untracked, staged, or have changes that need to be committed. It's a useful way to track the progress of your work.
Staging
In Git, staging refers to the process of adding files to the staging area before committing them. The staging area is like a preview of what will be included in the next commit.
To stage a specific file, you can use:
git add i-use-arch.btw
Or, if you want to stage all modified or new files, you can use:
git add .
This prepares the changes for the next commit, allowing you to control which changes are included.
Committing
In Git, committing is the process of saving a snapshot of the repository at a specific point in time. Each commit includes a commit message that describes the changes made.
To create a commit, use:
git commit -m "your message here"
This stores the current state of the staging area as a commit with the provided message.
If you want to change the message of the last commit, you can use:
git commit --amend -m "new message"
This allows you to update the last commit message without creating a new commit.
You've Learned Half of Git (Kind of)
So far, you've learned the core commands for managing your local repository. These are the essential tools for solo development:
-
git status
– Check the current state of your repository. -
git add
– Stage changes to be committed. -
git commit
– Save a snapshot of your work with a message.
These commands are enough for managing a project on your own. However, there’s more to Git that enhances collaboration and version control.
- 40% of Git focuses on working with others, including handling remotes and pushing or pulling code to/from repositories hosted on platforms like GitHub or GitLab.
- The last 10% involves handling mistakes, rolling back changes, and advanced topics like branching, merging, and rebasing. These help you work with more complex scenarios and refine your workflow.
But with what you've learned so far, you’re already well on your way to being proficient with Git for solo development!
Git Log
The git log
command is essential for viewing the history of commits in your repository. It allows you to see who made each commit, when it was made, and what changes were introduced. Here’s an overview of the common flags used with git log
:
git log
This command shows the full commit history. By default, it uses a pager to display the logs, so you can scroll through the history one screen at a time.git --no-pager log -n 10
Shows the last 10 commits without using a pager, so the output is directly shown in your terminal without pausing. This is useful when you want a quick look at the most recent commits.-
git --no-pager log -n 10 --oneline --parents --graph
-
-n 10
: Limits the output to the last 10 commits. -
--oneline
: Condenses the output to one line per commit, showing only the commit hash and the commit message. -
--parents
: Displays the parent commits, showing the relationship between merges. -
--graph
: Draws a graph of the branch structure to show how commits are related, visually representing branching and merging.
-
-
git log --decorate=full/short/no
-
--decorate=full
: Shows references (like branch names or tags) in full detail next to commits. -
--decorate=short
: Displays references in a shortened format. -
--decorate=no
: Hides reference names altogether, only showing commit details.
-
-
git log --oneline -p
-
--oneline
: Displays each commit on a single line. -
-p
: Shows the changes introduced in each commit, making it easy to see exactly what was modified in the files for each commit.
-
-
git log --oneline --graph --all
-
--oneline
: Condenses each commit to a single line. -
--graph
: Visualizes the branch structure as a graph. -
--all
: Includes all branches (not just the current one) in the log, so you can see the full history across the entire repository.
-
-
git log --oneline --graph --decorate --parents
This combines several options:-
--oneline
: One-line summary for each commit. -
--graph
: Graphical representation of the commit history. -
--decorate
: Shows references (like branch names or tags). -
--parents
: Displays the parent commits, especially useful for understanding merges.
-
Each of these flags allows you to customize the output of git log
to suit different needs, making it easier to view commit history and track changes in your project.
Commit Hash (SHA-1)
A commit hash is a unique identifier generated for each commit in Git using the SHA-1 hashing algorithm. It is used to track commits and their changes within the repository.
For example: 5ba78624h4i5hslv831c01e71444b9baa2228a4f
In practice, only the first 7 characters of the commit hash are typically required to identify it.
The commit hash is a function of:
- The commit message: The text describing the changes made in the commit.
- The author's name and email: The person who made the commit.
- The date and time: When the commit was made.
- Parent (previous) commit hashes: The commit(s) that preceded the current commit, linking the history together.
Due to these factors, the probability of hash collisions (two different commits having the same hash) is extremely low, ensuring each commit can be uniquely identified.
Let's Peek into Plumbing
In Git, plumbing refers to the internal mechanisms that handle the storage and organization of the repository’s data. All Git data is stored in the .git
directory, which is hidden within the project folder.
-
.git/objects
: This is where Git stores its data objects, including commits, trees, and blobs. A commit is actually a type of object, and all the versioning information is stored as objects here.
Git's approach to file storage helps ensure efficient access and retrieval while preventing system limitations, such as those found in traditional file systems.
Inodes in Filesystems
- Inodes are data structures used by file systems (like in Linux) to manage file metadata (e.g., permissions, timestamps, etc.). When there are many files in the same directory, it can lead to inode busting, where the file system struggles to manage so many files in one place, leading to performance issues.
To avoid inode busting, Git employs a clever method:
- Git organizes objects by the first two characters of their commit hash, creating directories that contain files named by the remaining characters of the hash. This reduces the number of files per directory, improving performance.
Example:
If you look into a specific object, for example:
cat .git/objects/78/asfadfefj8e0r48...
It will print a bunch of compressed, raw byte data. This is the content of the object, which Git stores in a compressed form to save space and make the .git
directory smaller.
Git uses this object storage system to efficiently manage and track changes while maintaining a highly optimized file structure.
Built-in Plumbing Command: git cat-file
Git provides built-in plumbing commands for accessing the raw internal data of your repository. One such command is git cat-file
, which allows you to interact with Git objects by their hash values.
git cat-file -p <hash>
- This command is used to pretty-print the content of a Git object given its hash.
- You can use the first 4 characters of the hash instead of the full hash for quicker access.
- Example:
git cat-file -p <hash>
This will display the content of the object (commit, tree, or blob) in a readable format.
Hex Dump of a File
To see the raw binary content of a file object in Git, you can use xxd
to convert it into a hex format:
xxd path/to/file > /tmp/commit_object_hex.txt
This will generate a hex dump of the specified file and save it to the /tmp/commit_object_hex.txt
file.
Types of Git Objects
-
Commit
- A commit object represents a snapshot of your repository at a specific point in time.
- A commit contains a tree object (which represents a directory) that points to blob objects (which represent files).
- Example: A commit might look like:
- Commit → Tree → Blob → Contents (i.e., a file’s content)
-
Tree
- A tree object represents a directory in Git.
- It contains references to blob objects (files) or other trees (subdirectories).
- A tree is similar to the structure of directories in the file system, but represented as a Git object.
-
Blob
- A blob object stores the contents of a file.
- It contains the actual data of the file, without any directory structure.
The Relationship Between Commit, Tree, Blob, and Contents
A commit contains:
- A reference to a tree object, which is a representation of the directory structure at the time of the commit.
- The tree object points to blob objects, which represent the files in the commit.
- The blob objects contain the actual file content.
Example structure:
Commit → Tree → Blob → Contents
(A) → (tree) → (blob) → contents.md
Parent Hash in Git Log
- The parent commit in the Git log is the hash of the previous commit.
- It’s important for Git to keep track of parent commits so it can maintain the history and structure of the repository over time.
Plumbing vs. Porcelain Commands
-
git log
is a Porcelain command -
git cat-file
is a Plumbing command
By understanding how trees and blobs work, you can get a deeper insight into how Git organizes and stores data. This internal structure helps Git manage changes efficiently while keeping the repository lightweight.
Storing Data in Git
Git doesn’t just store changes; it stores entire snapshots of all files in each commit. Each commit captures the full state of the project at that point in time, not just the differences.
Efficiency Through Compression
Git uses compression to minimize the size of the .git
directory, which helps reduce storage requirements. Even large files are stored efficiently, thanks to Git’s internal mechanisms.
Tree Objects and Blobs
Each commit stores a unique tree object, which is a snapshot of the directory. While a new tree is created per commit, Git doesn’t store unchanged files again. Instead, it points to existing blobs (files) from previous commits, reducing duplication and making the process efficient.
Deleting Files in Later Commits
If files are deleted in later commits, the hash for those files will point to null. This creates broken links in the history. To clean up these references, you can prune the repository to remove unused objects and reduce size.
Pruning and Cleaning
You can prune broken links and optimize the repository using the following command:
git gc --prune=now
This helps remove unreferenced objects and keep the repository smaller.
Configuring Git(Again)
Git provides multiple levels of configuration, allowing you to set configuration options globally, locally, or for specific worktrees. Here’s an overview of how you can manage and interact with Git configuration.
Git Config Commands
- View Local Configuration To view the local configuration for a specific repository:
git config --list --local
cat .git/config
- Get a Specific Configuration Value To retrieve a specific configuration value:
git config --get <key>
- Unset a Configuration Key To remove a specific key from the configuration:
git config --unset <key>
- Unset All Occurrences of a Key To remove all instances of a configuration key:
git config --unset-all example.key
(Note: Git will only apply the last occurrence of a key in the configuration file, so if you have duplicates, the last one will take precedence.)
- Remove a Section from Configuration To remove an entire section from the configuration:
git config --remove-section section
-
Manual Editing
You can also directly edit the
.git/config
file (or other configuration files depending on the scope) for an easier approach.
Configuration File Locations
There are several places where Git configuration files can exist, each with different scopes:
-
System:
/etc/gitconfig
This file configures Git for all users on the system. -
Global:
~/.gitconfig
This file configures Git for all repositories of a user. -
Local:
.git/config
This file configures Git for a specific repository. -
Worktree:
.git/config.worktree
This file configures Git for a specific part of a project (worktree).
Git Branches
A branch in Git is essentially a named pointer to a specific commit. When you create a branch, you're essentially creating a new pointer that tracks a particular commit. The commit that the branch points to is called the tip of the branch. Branches are lightweight, as they are just pointers and don’t require duplicating the entire project, making them a cheap resource-wise creation.
Common Commands
- Rename a Branch You can rename a branch using the following command:
git branch -m oldname newname
- Create a New Branch (without switching) To create a new branch but not switch to it, use:
git branch my_new_feature
- Create and Switch to a New Branch To create and switch to a new branch in one step, use:
git switch -c my_new_feature
- Switch to an Existing Branch To simply switch to an existing branch, use:
git switch my_existing_feature
Or, the old way:
git checkout my_existing_feature
Branch Information Storage
Git stores all information about branches in files within the .git
subdirectory at the root of your project. The "heads" (or "tips") of branches are specifically stored in the .git/refs/heads/
directory.
Merging Branches
To find the best common ancestor commit (merge base) between two branches, Git will use the merge base to identify the common commit point for merging. For example:
git merge my_feature_branch
Amending Commit Messages
If you need to change the message of the last commit, you can use the --amend
flag:
git commit --amend -m "Updated commit message"
Fast-Forward Merge
In a fast-forward merge, if the feature branch has all the commits that the base branch has, Git will simply move the pointer of the base branch to the tip of the feature branch. For example:
git merge my_feature_branch
Deleting a Branch
Once you're done with a branch, you can delete it using:
git branch -d my_feature_branch
This will delete the branch locally if it has been merged. If the branch is not merged, use -D
to force the deletion.
Common Git Workflow for Team Development
- Create a Branch Start by creating a new branch for the change.
- Make the Change Work on the change, then commit it once it’s ready.
- Merge the Branch into Main Once the change is complete, merge the branch back into the main branch.
- Remove the Branch After merging, delete the branch to keep the repository clean.
- Repeat Repeat this process for each new change or feature.
Rebase in Git
Rebasing is a way to move or combine a sequence of commits to a new base commit. It helps maintain a cleaner, more linear history compared to merging.
Consider this scenario:
A - B - C main
\
D - E feature_branch
You're working on feature_branch
, and you want to bring in the latest changes from main
to avoid working with stale code. You could merge main
into feature_branch
, but that would introduce a merge commit, which could clutter the history. Instead, rebase re-applies the commits from feature_branch
on top of main
, creating a linear history.
After running the following rebase command:
git rebase main
The history will look like this:
A - B - C main
\
D' - E' feature_branch
Notice that the commits D
and E
have been rewritten as D'
and E'
, because the history of feature_branch
is now based on commit C
from main
. This is why the commit hashes change after a rebase—the base for these commits has shifted.
Why Use Rebase?
Cleaner History: Rebase creates a linear history by avoiding unnecessary merge commits. It makes the project history easier to read and understand, especially when reviewing changes.
Avoid Merge Commits: Unlike merge, which can result in many merge commits, rebase keeps the history cleaner and linear.
However, rebasing rewrites commit history, so it should be used with caution.
Important Notes:
Rebase on Feature Branches: Rebasing is safe on your own feature branches. You can rebase as often as needed to keep your branch up to date with the latest changes from the
main
branch.Never Rebase Public Branches: Refrain from rebasing shared branches (like
main
) that other developers are working on. Changing the commit history of a public branch can cause conflicts and confusion for others who have already based their work on that history.Preserving History with Merge: While rebase results in a cleaner history, merging preserves the true history of the project, showing exactly when and where branches were merged. If preserving the full history is important, use merge. Otherwise, rebase for a more straightforward commit history.
Example of Switching Branches:
If you need to branch off from a specific commit in your history, you can use the following:
git switch -c new_feature <COMMITHASH>
This creates a new branch called new_feature
from the commit identified by <COMMITHASH>
.
Just be cautious when using rebase on shared branches, as it alters commit history.
Some terms you might want to know
- Index: The staging area where changes are added before committing. Files in this area are ready to be committed but are not yet part of the commit history.
- Worktree: The working directory of your project, where all files exist. This includes both staged and unstaged changes, which may or may not be added to the commit history.
Git Reset: Undoing Changes
The git reset
command is used to undo changes in your repository. It moves the current branch pointer to a different commit, and can affect the staging area and working directory.
Types of Reset
-
git reset --soft <COMMITHASH>
- Resets the branch to the specified commit.
- Keeps your changes staged for a new commit.
-
git reset --hard <COMMITHASH>
- Resets the branch and discards all changes (both staged and unstaged).
- Caution: Irreversible, all uncommitted changes are lost.
Example:
- Soft Reset (keep changes staged):
git reset --soft HEAD~2
- Hard Reset (discard changes):
git reset --hard abc1234
Warning:
-
Hard reset is irreversible. It will permanently discard any uncommitted changes, so use it with caution. Always make sure you've saved or committed your important work before performing a
--hard
reset.
.gitignore
The .gitignore
file is a hidden file (it starts with a dot) that tells Git which files or directories to ignore in a project. It's used to prevent unnecessary or sensitive files (such as build artifacts, logs, or configuration files) from being tracked in version control.
How It Works:
-
Global
.gitignore
: This is typically located in the root directory of your project and applies to the entire repository. -
Local
.gitignore
: You can also have.gitignore
files in subdirectories. These will apply only to files within that specific subdirectory, allowing for more granular control.
Example:
-
Root
.gitignore
(ignores*.log
files everywhere):
*.log
-
Subdirectory
.gitignore
(in alogs/
directory, ignores*.tmp
files there):
logs/*.tmp
In this case, the *.log
files will be ignored everywhere in the repository, but only the *.tmp
files within the logs/
directory will be ignored.
Key Points:
- You can have
.gitignore
files in multiple directories, and each one applies to files within that directory and its subdirectories. -
.gitignore
works based on relative paths. You can specify files or folders to ignore, or use wildcards like*.log
or**/temp/
.
Example Structure:
/project-root
.gitignore # ignores *.log globally
/logs
.gitignore # ignores *.tmp in the logs directory
error.log # will be ignored because of the global .gitignore
temp.tmp # will be ignored because of the subdirectory .gitignore
This flexibility helps manage which files should be tracked and which should remain local to the development environment.
Always RTFM (Read The Freaking Manual) before asking people with experience. Understanding the fundamentals and referring to the official documentation will not only save you time but also help you grasp concepts more thoroughly.
Revise Git on a weekly basis to keep your skills sharp, and the best way to do this is by integrating Git into the projects you're actively working on. By using it regularly, you'll become more comfortable with advanced features and workflows, which will make version control feel like second nature.
This is part 1 of the series, and part 2 is coming soon! Stay tuned for more insights on mastering Git.
Top comments (2)
Bright and thorough.
Very insightful! Thank you