Smitter

Posted on Sep 9, 2022

Understand Git And Ease The Rest Of Your Learning

#beginners #git #webdev #github

Learn Git

This was the best advice I ever got. But to use something effectively, you need to understand it.

That's why in this article, I explain Git to you that you may understand it -- Don't just memorize commands. We explore how Git differs by examining the key components of its architecture and some important concepts.

Understand Git, so you may have better Days

Git has a reputation of being uncanny to new comers. If you are new to Git, you may just copy and paste commands. Git is magical! And because of it's magical voodoo... tadaa!

Rest of your team is now glowering at you! Probably, you have blown the fuse.

Conceptualizing the fundamentals of how Git works will help you learn Git with ease, pick up Git commands quickly and use Git effectively.

Git works differently from other Version Control Systems(VCS). Therefore when trying to understand Git, you should clear your head of what you know about other Version Control Systems. Otherwise, you will have a limited understanding of Git's capabilities.

What is Git?

OK,Hmm...!

Firstly, Git is not Github. Just like Java is NOT Javascript.

Github is a website to host your Git projects.

Git is a Version Control System(software tool that help software teams manage changes to source code over time).

Ideally, when you edit a file, Git can help you determine exactly what changed, who changed it, and why. You can also go back to the initial state of your file(version), before you edited it.

That now sounds like controlling versions of your file, right?

And remember that's why it is a Version Control System(VCS).

Git is not the only Version Control System(VCS). Other VCS include: Apache Subversion(SVN), Concurrent Version Systems(CVS), Perforce and ClearCase.
Git is the most commonly used VCS. Today Git is the de facto standard tool for Version Control, and understanding it can give a major boost to your resume.

Git stands tall from other VCS with features like: cheap local branching, convenient staging areas, and multiple workflows.
Additionally, When you understand Git, it is astonishingly simple to learn. Git is also fast in performance.

In a nutshell, Git tracks the changes you make to files in your project, so you have a record of what has been done(history of your project), and you can revert(go back) to specific versions should you ever need to. Git also makes collaboration easier, allowing changes by other people to all be combined back into one source.

How Git works

It is a hidden advantage if you have not worked with other VCS.

If you have worked with other VCS, Git's User Interface is fairly similar to other VCS. But Git stores and thinks about information in a very different way.

Understanding how Git works in being a Version Control System, will make it obvious to you the magic posed by Git commands and Git as a whole.

Let's go over how Git works, and how it is distinct:

Git uses a Three-tree architecture

Pay attention!

What we'll discuss here is a key concept to understand in Git; that will unlock a smooth learning for you.

Firstly, I will start by taking a look at the Two-tree architecture.

We call them trees because they represent a file/folder structure, that is, the main project directory is at the top, and below it might be other files and folders.

File tree

Two-tree architecture

This is what a lot of other VCS use such as Subversion. They have a working tree and repository.

Working Tree or Directory and repository

When we want to populate our working tree with files we have been working on from our repository, we checkout, that's the term we use.

Therefore, we checkout a version from our repository, to populate our working tree with files we are working on, so that we can view or make changes to them. And when we finish making our changes, we commit those changes back to the repository.

Secondly and finally, I look at the Three-tree architecture.

Three-tree architecture

Git uses a three-tree architecture.

Git has the working tree and the repository, and additionally in between is another tree which is the staging area.

Working directory, staging area, and Git directory

Also Note that, files in Git have three main states they can reside in:

Modified/Untracked
Staged
Committed

Git views modified and untracked files similarly.

Modified means that you have changed the file but have not committed it to your database yet.

Untracked is any files in your working directory that were not in your last snapshot(commit) and are not in your staging area.

Staged means that you have marked a modified file in its current version to go into your next commit snapshot.

Committed means that the data is safely stored in your local database.

These Git files live in one of three trees of a Git project: the working directory, staging area and the git directory(aka your local repository).

The working directory/tree will consist of files that you are currently working on, obtained by
doing a single checkout of one version of your project from the repository(Git directory). These files are pulled out of the compressed database in the Git directory and placed on disk for you to use, view or modify.

The staging area is a file, generally contained in your Git directory, that stores information about what will go into your next commit.
When you add a file to the staging area (git add file), Git stores the actual contents of the file as an object in the standard place in the repository and updates the index file at .git/index, to contain a reference to it.
Technical name for staging area in Git parlance is the index, but the phrase staging area works just as well.

This design makes committing(git commit -m "...") the state of the index very fast because all the blob objects are already present. Git need only create the tree and commit objects required for the commit.

The repository(Git directory) is where Git stores the metadata and object database for your project. Directory/folder with files and subdirectories are handled as the content (Git doesn't actually handle directories, just files at a path location). The repository holds all versions of the content. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.

The basic Git workflow goes something like this:

You modify files in your working tree.
You selectively stage just those changes you want to be part of your next commit, which adds only those changes
to the staging area.
You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

If a particular version of a file from the Git directory was changed since it was checked out but has not been staged, it is modified.

If it has been modified and was added to the staging area, it is staged.

If it is in the Git directory, it's considered committed.

Git stores information as a series of snapshots.

This is a clear cutting difference between Git and other VCS.
Other VCS, such as Perforce, Concurrent Version Systems(CVS), Subversion(SVN), will store commits(versions) of your project as a list of file-based changes. That is, in each commit, they store a set of files and the diffs of each file(differences in previous version of the file and current version of the file, compared line by line) over time.
This type of versioning is commonly reffered to as delta-based version control.

Storing data as changes to a base version of each file

Things work differently with Git.
Everytime you commit, Git takes a snapshot of what your Working Directory looks like and stores a reference to that snapshot in the Git history.

Note that the term commit, is used both as verb for creating a snapshot and as name for the resulting snapshot.

Storing snapshot of your Working Directory as new commit

To be efficient, Git does not store again files that have not changed when you commit. It just stores a link to the previous identical file it has already stored. With subsequent commits over time, we get an overview of Git's data being stored as a stream of snapshots.

Storing version as snapshot of your project over time

This is a major difference between how Git is designed to handle committing to repository compared to nearly all others VCS.

In each commit, other VCS will store a set of files and their diffs, while git will store a snapshot of the working tree.

This makes Git more like a mini filesystem.

Storing data this way(as a series of snapshots) makes Git have a smooth branching model. Infact, some people refer to Git’s branching model as its "killer feature", and it certainly sets Git apart in the VCS community.

Most Git operations work offline

With Git, most operations can be done locally without the need to access another computer over the network.
If you are working on a project from a central server for the first time, you may need to Git clone to obtain the history of the project from the remote to your local machine.

Once that is done, it means you have the entire history of the project right on your local disk. You can browse your project history locally without any network latency since Git does not need to go out to the server to get history and display it for you. This makes most Git operations instantaneous.
This means you can also make changes and commit to your local copy. And only upload when you get to a network connection.

This Git design counters the shortcomings that most other VCS have. For instance in Perforce, you cannot do much when you are not connected to the server; in Subversion and CVS, you can edit files but cannot commit changes since your database is offline.

Git uses checksums as identifiers of the data it stores

Everything in Git is checksummed before it is stored and is then referenced by that checksum. Basically, Every file and commit is checksummed and retrieved by its checksum when checked back out.
This means it's impossible to get anything out of Git other than the exact bits you put in(It is not tampered with).
It is also impossible to change any file, date, commit message, or any other data in a Git repository, as trying to make any change, would change the ID used to identify it. It would become a different commit.
This means that if you have a commit ID, you can be assured that your project is exactly the same as when it was committed. Hence Git ensures Integrity.

Git uses SHA-1 hash function to name content. For example, files, directories, and revisions(commits) are referred to by hash values unlike in other VCS where files or versions are referred to via sequential numbers.
The hash values is a 40-character string composed of hexadecimal characters(0-9 and a-f). The hash value is calculated according to the contents of a file or directory structure. Calculating hash value this way, means that if the file or directory has changes you will get at different checksum.
A SHA-1 hash looks something like: 7b437930d993368a25d0e852d3474d88508af554
Run git log in your terminal opened at your git project and see a SHA-1 hash used as commit ID.

If you have liked the knowledge am sharing with you or you would like us to connect, you can follow me on twitter

DEV Community

Understand Git And Ease The Rest Of Your Learning

What is Git?

How Git works

Git uses a Three-tree architecture

Git stores information as a series of snapshots.

Most Git operations work offline

Git uses checksums as identifiers of the data it stores

Top comments (0)

Read next

TDoC 2024 - Day 3: Introduction to Machine Learning

Week 2 done 🚀

Copier vs Cookiecutter

How to Write Manual Test Cases: A Step-by-Step Guide for Manual Testing