DEV Community

ondieki frank
ondieki frank

Posted on

Git & GitHub: A Beginner’s Guide to Version Control for Data Professionals

Whether you're a data engineer building pipelines or a data scientist working on models, keeping track of your code changes is crucial. That's where Git and GitHub come in. In this guide, I'll walk you through setting up Git and mastering the basics of version control.

1. Installing Git Bash

Windows Users
1.Visit

2.Download the Windows installer

3.Run the installer with these recommended settings:

  • Select "Use Git from Git Bash only"
  • Choose "Checkout Windows-style, commit Unix-style line endings"
  • Use MinTTY as the terminal emulator
  • Enable file system caching

Mac Users:

brew install git
Enter fullscreen mode Exit fullscreen mode

Linux Users:

sudo apt-get install git  # Debian/Ubuntu
sudo yum install git      # CentOS/Fedora
Enter fullscreen mode Exit fullscreen mode

Verify installation by opening Git Bash/Terminal and typing:

git --version
Enter fullscreen mode Exit fullscreen mode

2. Connecting Git to Your GitHub Account

Step 1: Configure Your Identity

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
Enter fullscreen mode Exit fullscreen mode

Step 2: Generate SSH Key (Secure Connection)

ssh-keygen -t rsa -b 4096 -C "your.email@example.com"
Enter fullscreen mode Exit fullscreen mode

Press Enter to accept default file location, then create a passphrase.

Step 3: Add SSH Key to GitHub

  1. View your public key:
cat ~/.ssh/id_rsa.pub
Enter fullscreen mode Exit fullscreen mode
  1. Copy the entire output

  2. Go to GitHub → Settings → SSH and GPG keys → New SSH key

  3. Paste your key and save

Step 4: Test Connection

ssh -T git@github.com
Enter fullscreen mode Exit fullscreen mode

You should see: "Hi username! You've successfully authenticated...

## 5. Understanding Version Control: The What & Why
What is Version Control?

  • Think of it as a time machine for your code. Every change is saved, so you can:

  • Track who made what changes

  • Revert to previous versions

  • Work on features without breaking your main code

  • Collaborate without overwriting others' work

6. Why Data Professionals Need Git:

  • Reproducibility: Track exactly which version of code produced which results

  • Collaboration: Multiple team members can work on same project

  • Experimentation: Try new approaches without fear of breaking working code

  • Documentation: Commit messages explain why changes were made

Top comments (0)