William Otieno

Posted on Oct 17, 2020

Beginner's guide to Git and Open-Source Contribution

#github #git #versioncontrol #opensource

Introduction

There are a lot of skilled developers from around the world; ranging from web developers, android engineers, DevOps engineers, sys admins, IoT devs to simple script kiddies...but there is always one thing that ties them down; getting 3rd-party assistance. That's where open-source contribution comes in.
Well, there are a number of definitions regarding what open source is, Wikipedia defining it as "denoting software for which the original source code is made freely available and may be redistributed and modified" but I see it more as software of the people, for the people and by the people.

In order for software developers and engineers to contribute to particular projects without the need to be physically present, a version control system (VCS) was developed. Well, it's not only about remote contribution; just as the name suggests, version control also enables one to revert back to a particular moment in time just in case he/she made a mistake developing software. Over the years there have been a number of VCS that were implemented but Git became the most popular.

This article will focus on the absolute beginners, we will go through the fundamentals only as Git is very wide. But not to worry, you will have mastered essentials in just a couple of minutes.

What is Version Control?

When you are working on a simple project, such as a single page html, it is fairly easy to remember the last thing you changed and where the development is headed. But tracking revisions over time, also referred to as version control, quickly becomes more complex when you are working on a large project with multiple files and multiple developers.

You not only want to record changes, but also who made the changes, and when. Managing revisions at this level requires a version control system.

Version Control Systems (VCS) help a software team manage changes to source code over time.

VCS software includes tools for saving the state of a project, viewing the history of changes, and reverting changes.

Developing software without using version control is risky, similar to not having backups.

VCS can also enhance and speed up development. Depending on the version control software used, many developers can work on the same code at the same time.
For example, one developer on the team may be working on a new feature while another developer fixes an unrelated bug, each developer making their changes in several parts of the code base.

VCS even have tools to prevent conflicts when one developer's changes are incompatible with changes made at the same time by another developer.

Git vs Github

Git became the most popular VCS system since its onset in 2005. It was made mainly as a result of necessity since a lot of developers needed to contribute to the Linux Kernel. It's main advantage was that it was distributed - rather than having only one single place for the full version history of a project, every developer's working copy of the code is also a repository that can contain the full history of all changes.

You should however note that Git is somewhat like an engine and Github is a platform that enables users to exploit git.

Hands-on Lab

Getting Git

For *nix systems(Mac/Linux), git should be installed in your system by default. If not, there is a quick installation guide for all platforms here.

Initializing a Git repository

I'll try to explain how this works in the simplest form. This is how everything is... You have a local repository (storage) and a remote repository. The local repo is essentially in your computer and the remote repo is online in a particular platform (maybe Github or Gitlab). So what happens is that you will mostly write code on your local machine then push those changes to the remote repo where other members can see what you did and in essence pull those changes from the remote repo to there local repos. It's just like uploading and downloading in a way.

Let's just jump right in for this to make sense.
We will first create a directory(folder) then make it a local git repository so that git can track every change happening to it.

Launch your command-line interface (terminal for *nix users and the git command-line for Windows users)
To create folder, it's just as simple as running mkdir foldername
Let's name it devstuff
As of now, it is just a folder named so. We then have to further make it a git repository (like a git storage). We do this by navigating into the folder using the cd command and executing git init (initialize git repo)
So in my terminal...

As you can see, we got into the directory and it was at first just a directory. After initializing git, it becomes a git repo. What it does is that it creates a git folder of sorts but it is hidden by default hence named .git as seen by the generated path above.
Well, we now have our local git repo. Let's make a remote repo on Github then connect it to our local repo.
You should first create an account on Github and sign in.
Navigate to the top right corner and click on the + button and select New Repository

For ease, we shall give it the same name as our local repo devstuff and put in some little description and create the repo.

You should encounter a new page with this...

Let's break down what we can see then...

Well, we already did most of the essential stuff on our own so the only command we should run is git remote add origin https://github.com/WilliamOtieno/devstuff.git
That will simply link our local repo to our remote one. In short, we are adding the remote repo from origin to our local repo. So that information will be contained in the .git hidden directory.
In our terminal...

Let's add a file so that we can track the changes. You can use your file manager to do so or in the terminal execute touch dev.py
It will be some simple Python code with a simple print statement.

Running a simple listing ls command shows that our file is present.
We can take it a notch higher by running git status to see...well, the command is self-explanatory

Git Tracking

When working with files, git by itself has a particular workflow to get things done. There are 3 states in which a file exists:-

Untracked
Staged
Committed

Finding the actual status of files opts us to implement the git status command as seen earlier.

Untracked means that git is not currently tracking any changes on the file, staged means that git is tracking changes on the file wheareas committed means that a snapshot of the current state of the file has been taken with a simple message (commit message) of what happened in particular.
Let's add 2 other files so that we can really understand what's going on. For that, use your file manager or simply use the touch [filename] command. I'll add a README.md file and maybe a requirements.txt file.
Now, to essentially move a file from the unstaged/untracked area to the staging area, we should use the git add [filename] command. In your CLI...

As we can see, we staged our python file but the other ones remained in the unstaged area. So after staging, we should commit our change. We do this by running git commit -m ['message']. In our case...

You should however note that commits are only applied to files on the staging area. The remaining ones remained untracked.
Most times one will be working with dozens of files so to add all untracked files in the local git repo to the staging area, we implement git add .
Take note of the position of the dot.
This will add everything in the current working directory so it's like adding all the contents of the working directory. Another command would be git add * which in essence means adding everything. Let's try it in our command-line and check the status...

Let's try writing something to our README.md file then check the status. You can use your favorite text editor or do this one-line command in bash...
echo "### I love Git." >> README.md
That will simply print out the text but redirect the output to the README.md file thence writing to it.

As you can see, the file was being tracked. Git has automatically detected that it was modified so we have to commit our change using the git commit command.
Consider a situation where you had staged all files in a repo and you simply want to unstage it so that git doesn't track any changes. For that scenario, one should exploit the git rm --cached [file] command. In our case, let's unstage the requirements.txt file and check the status.

Now our requirements.txt file is untracked.
But constantly having to see that portion telling us that a file is untracked when we want it that way is rather annoying so let's consider a scenario where we do not want git to track some files. For that we use a .gitignore file. It is normally hidden and doesn't have a file extension. So let's create and use it. In your CLI, just use the touch .gitignore command. Or in your file manager, create a file and name it .gitignore but don't append any file extension to it plus don't forget the preceding dot.
After that, open the file and just write the name of the file(s) you want git to ignore. For multiple files, each file name should be in its own line.
Since .gitignore is a new file, stage it and commit the changes and then run the status check again.

Pushing to the remote repo

At this point, we are ready to push our changes to our remote repo in Github. But before that, I'm sure you have questions about the output from our last command "On branch master..."
Well, that will be on the next article but basically, git organizes repositories in branches such that one can avoid messing up the production or main branch. By default, the main branch is normally called master and that's why it's there. Not to worry, it's nothing complicated.
So let's push everything we've done so far to our remote repo. But at first, we shall verify that we indeed have nothing there. So go to Github and search for the repo (we named it devstuff).

As you can see, there is no code but some guide to get started. But before that, we should set up our credentials in our git command-line before pushing. You should use the same name and email address as the ones in your Github account.
Do so by running...
git config --global user.name "YourName"
git config --global user.email"youremail@yourdomain.com"
Now let's push the code. It will be a simple one-line command...

git push -u origin master

You'll be prompted for your username and password. The password field is normally blank when typing for security reasons so don't be alarmed.
The -u flag is used to tell git that we are pushing the repo upstream and the next time we push without the flag, it should remember that we want to push upstream. Also, origin is the default name of our remote repo so in essence we are pushing upstream to our remote repo called origin and into the branch called master.

Refresh your browser tab and voila! You'll see that the code is now present. The remote repo will contain the commit messages and commit times as well.

You could also click on the individual files so as to view the contents.

Conclusion

Git is not as complicated as people perceive it to be. This is only the 1st part of a 2-series article but the remaining chunk is not as lengthy. Actually, this is enough to get one started to open-source contribution. For any queries, just follow and send a message to my Github here.
Thank you for your undivided time and attention.

Top comments (9)

Akin C. • Aug 14 '21 • Edited

Hello William Otieno,

thanks for your article.
It's written in great detail. I especially liked the topics "Introduction", "What is Version Control?" And "Git vs Github".
Also, I believe your article will make Git easier for beginners :).