TLDR;
This post provides an in-depth guide into version control in software development and collaborative project.
You can click here to install and configure a version control system (VCS) or here to connect with an online VCS repository service.
Introduction
In a previous post, we were able to successfully set up a Windows virtual machine using VirtualBox. This left us ready to begin working on data science projects.
But, data science, like most tech fields, boils down to code and files. And thus, we must consider a number of questions around these, including:
- How do we store them?
- How do we track changes made?
- How can we access them across different machines?
- How can we share them with collaborators?
This is where version control comes to save the day.
Version Control / Source Control
Version Control is defined as "the practice of tracking and managing changes to software code". 1 Essentially, it is a mechanism of keeping track of changes made to computer files, particularly source code files, with the ability to view old versions and optionally revert a file to a preferred previous version.
It has been around in some form for as along as humans have worked with computers, with early attempts having collaborators managing sharing of computer files and projects, through means such as pendrives, emails and shared folders. Most approaches were clunky, with no built-in way of detecting who made what changes to what files and folders when. This resulted in teams coming up with strange conventions e.g. folders being named project, project1, project-latest etc. Also, errors such as accidental overwrites were quite common with users commonly having their work lost when others working on the same files and folders posted their work to the common storage area after them.
Ultimately, all these problems led to the rise of Version Control Systems (VCS). These are software tools used mostly in, but not limited to, software development and collaborative projects to automatically track and manage changes to source code files.
Benefits of such systems include:
- Automated functionality to record and track every update to code base.
- Enhanced collaboration without fear of accidental permanent overwrites.
- Easy reversion to previous versions of specific files and folders (or the project as a whole).
- Ingrained, structured history which simplifies auditing of the project's progression.
There exist a number of popular VCS tools used by individuals, teams and enterprises globally, including Mercurial, Subversion, Fossil and CVS. The most popular one, however, and also the focus of this article and VCS in use at LuxDevHQ, is Git.
Git
Git is a lightning-fast, free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. 2
It was originally developed by Linus Torvalds - the creator of Linux - as version control for the development of the Linux Kernel. It has since grown to become used by over 93% of developers worldwide. It is compatible with and installable on most conventional operating systems as well as on technologies such as Docker.
Installation
To install on one's machine, head on over to the Install for guidance on the specific OS.
For Windows Users
Visit here and download the latest version of Git for Windows which includes a command line interface (CLI) within which one can run Git commands (known as Git Bash).
Upon completion of the download, double click on the file to trigger installation. Click Yes when prompted for User Account Control (UAC) permission. This will open the Git Setup Wizard.
Most of the default configurations should suffice so you can consistently click "Next" throughout the process. (Should you encounter a configuration that you wish to change, kindly feel free to change it.) Once done, the installation will run for a few seconds. Upon completion, check the Launch Git Bash option, uncheck View Release Notes then click Finish. It should open a window similar to the one below.
For macOS Users
The easiest way to install Git on macOS is using Homebrew, a tool that simplifies installation of software tools on macOS and Linux. If you don't yet have Homebrew on your computer, download and install it first.
Once Homebrew is ready, run brew install git in Terminal (macOS' CLI). This will install Git on macOS and have it ready for use.
For Linux Users
A number of Linux distributions have Git bundled and installed within the OS. However, if your OS doesn't have it (run git --version in the Terminal to confirm), then visit here and follow the instructions for the particular distribution.
Setup
Before continuing, confirm that Git is properly installed and functional by running
git --version
If output is similar to the following, then you are begin setting up Git.
First off, there needs to be a global user configured to use Git (this can be overridden within individual repositories). To do this, run the following commands:
-
User's name
git config --global user.name "<Your Name>" -
User's email address
git config --global user.email "<your.email@emaildomain>"
NB: Replace <Your Name> and <your.email@emaildomain> with actual name and email address, respectively.
Once done, run git config --list to confirm the configurations have applied correctly.
Repository Setup
Now, let's cross over to creating and configuring a Git repository. A repository is a central location in which data is stored and managed. Within the context of a software project, it can be thought of as the root folder containing the source file and code.
First, we shall a create our project folder
-
Create a folder within the CLI
mkdir -p test-projectNB:
-pinstructs the CLI to create any intermediate parent folder that may be missing. Navigate into the new folder
cd test-projectCreate a sample README
touch README.md-
Populate the README with some informative message
echo "This is a test project showing how to set up a Git repository" >> README.mdNB: You can confirm that the content has been written into the README file using the following command
cat README.md -
Being a data science project, let's create a test a sample Python that simply prints greetings to the CLI
touch test.py && echo "print('Hello, World')" >> test.pyNB: You can combine multiple commands using the
&&operator. -
Let's now list the content of the current folder (i.e test-project) to confirm that all our files are in place
ls .If you see both of our newly created files (i.e README.md and test.py), then you are good to continue.
NB:
.in the commandls .represents our current working folder.
NB: You can find a cheat sheet of common Linux commands that can run in the CLI here.
Now, let's configure our folder into a git repository
-
Initialize the repository
git init -
Stage the created files in preparation for committing
git add . -
Commit the new files with an optional message
git commit -m "Initial Commit"
The repository is now ready to be connected to an online Git repository service.
GitHub
Git, while being a distributed VCS, is largely a tool running on user's local machine. As such, particularly in collaborative projects, there needs to be a mechanism in which a Git repository can be accessed and utilized by different team members remotely.
Fortunately, there exist a number of services that provide such functionality. They are typically web-based platforms that host Git repositories, providing a centralized location for storing code, tracking changes, and enabling collaboration among developers. Furthermore, these platforms offer additional functionality including:
- Continuous Integration and Continuous Deployment (CI/CD) automation
- Access Control and Security on repositories
- Collaboration Tools such as issue tracking, pull requests etc.
- Integrations with third-party apps and services among others.
A number of them are free to create an account in and use, although there typically exists incurred charges for specified features. Examples include GitHub, GitLab and Bitbucket.
Of these services, the most widely used is GitHub. It was released in February 2008 and acquired by Microsoft in October 2018. 3 It has over a billion published repositories and is what will be used for both this article and the entirety of the course.
Account Creation
To get started with GitHub, one needs to have a registered account, which can be achieved by following the steps below:
- Visit GitHub's website and clicking the Sign up button in the top left corner. This will redirect to Account Creation page.
Fill in the form with your details to proceed. Choose a distinctive user name that you can remember. Alternatively, you can create an account using Google or Apple for simplified but secure signing up.
-
Atlassian. (2026, Jan 16). What is version control: Atlassian Git Tutorial https://www.atlassian.com/git/tutorials/what-is-version-control ↩
-
Git. (2026, Jan 18). Git https://git-scm.com ↩
-
Git and GitHub Use, Collaboration, and Workflow. (2026, Jan 18). History of GitHub https://pslmodels.github.io/Git-Tutorial/content/background/GitHubHistory.html ↩





Top comments (0)