You spent three hours on a Python script.
It worked. Everything was perfect.
Then you tried to add one feature. Something broke. You tried to undo it. You could not remember exactly what you changed. You deleted too much. Now nothing works and the working version is gone.
Every programmer has lived this exact moment. Usually more than once.
Git exists so this never happens again.
What Git Actually Is
Git is a version control system. Every time you tell Git to save your work, it takes a snapshot of your entire project at that moment. You can go back to any snapshot at any time.
Not just one undo. Infinite undos, all the way back to the beginning of the project.
But Git is more than undo. It lets multiple people work on the same project without overwriting each other's work. It lets you experiment in a separate branch without touching the main code. It shows you exactly what changed between any two points in time.
Every professional developer uses Git. Every AI project on GitHub uses Git. Every job that involves code expects you to know Git. This is not optional.
Installing Git
Check if it is already installed:
git --version
If not:
# Windows: download from git-scm.com
# Mac:
brew install git
# Ubuntu/Debian:
sudo apt install git
Configure your identity. Git tags every commit with your name and email:
git config --global user.name "Your Name"
git config --global user.email "you@email.com"
git config --global core.editor "code --wait" # use VS Code as editor
Run these once. They apply to every Git repository on your machine.
The Mental Model: Three Areas
Before commands make sense, understand where your files live in Git's world.
Working Directory: your actual files. Where you write and edit code. Git knows about changes here but has not saved them yet.
Staging Area: a holding area. You choose which changes to include in the next save. Think of it as packing a box before sealing it.
Repository: the saved history. Every commit lives here permanently. This is what Git actually tracks.
The flow is always: edit files → stage the changes you want → commit to save them permanently.
Working Directory → git add → Staging Area → git commit → Repository
Starting a Project
mkdir ai_project
cd ai_project
git init
Output:
Initialized empty Git repository in /home/user/ai_project/.git/
git init creates a hidden .git folder inside your project. That folder is the entire repository. Git history, configuration, everything. Never delete or manually edit it.
Check the current state:
git status
Output:
On branch main
No commits yet
nothing to commit (create/copy files and start working)
Empty project. Nothing staged. Nothing committed. Now create a file.
echo "# My AI Project" > README.md
echo "print('Hello, AI')" > main.py
git status
Output:
On branch main
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
README.md
main.py
nothing added to commit but untracked files present
Git sees both files but they are untracked. It will not save them until you tell it to.
Staging and Committing
Add specific files to staging:
git add README.md
git add main.py
Or add everything at once:
git add .
Check what is staged:
git status
Output:
On branch main
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: README.md
new file: main.py
Both files are staged. Now commit them:
git commit -m "Initial commit: add README and main script"
Output:
[main (root-commit) a3f9d21] Initial commit: add README and main script
2 files changed, 2 insertions(+)
create mode 100644 README.md
create mode 100644 main.py
The commit is saved. a3f9d21 is the commit hash, a unique identifier for this exact snapshot.
The commit message matters. Write what changed and why, not just what file you touched. "fix bug" is useless. "fix KeyError when loading CSV with missing headers" is useful.
Viewing History
git log
Output:
commit a3f9d21c4e8b1f2a3d5e6f7890ab12cd34ef5678
Author: Your Name <you@email.com>
Date: Mon Mar 04 14:23:11 2024
Initial commit: add README and main script
Compact version:
git log --oneline
Output:
a3f9d21 Initial commit: add README and main script
See what changed in a commit:
git show a3f9d21
See differences between current state and last commit:
git diff
Making More Commits
Add some code:
# data_loader.py
import pandas as pd
def load_data(filepath):
df = pd.read_csv(filepath)
print(f"Loaded {len(df)} rows")
return df
git add data_loader.py
git commit -m "Add data loader function with row count logging"
Change something in main.py:
git add main.py
git commit -m "Import and use data loader in main script"
View the full history:
git log --oneline
Output:
c7e91b3 Import and use data loader in main script
d4f82a1 Add data loader function with row count logging
a3f9d21 Initial commit: add README and main script
Three commits. Three save points. You can go back to any of them.
Going Back in Time
See what the project looked like at any commit:
git checkout a3f9d21
Your files change to that earlier state. To come back to the current version:
git checkout main
Undo changes to a file before staging:
git restore main.py
This throws away all unsaved changes to main.py and restores it to the last committed version. Cannot be undone. Use carefully.
Unstage a file you accidentally staged:
git restore --staged main.py
.gitignore: Files Git Should Never Track
Some files should never go into Git. Large data files. Environment variables. API keys. Generated outputs. Python cache files.
Create a .gitignore file in your project root:
# Python
__pycache__/
*.pyc
*.pyo
.env
venv/
.venv/
# Data files
*.csv
*.xlsx
data/raw/
*.db
# Jupyter
.ipynb_checkpoints/
# OS
.DS_Store
Thumbs.db
# IDE
.vscode/
.idea/
# Model files (often large)
*.pkl
*.h5
models/
Once this file exists, Git ignores everything matching these patterns. They will never show up in git status and will never be accidentally committed.
Add and commit the .gitignore file itself:
git add .gitignore
git commit -m "Add gitignore for Python, data files, and editor artifacts"
Branches: Experiment Without Breaking Things
A branch is a parallel version of your project. You create one, make changes, and those changes only exist in that branch until you merge them back.
Create and switch to a new branch:
git branch feature-preprocessing
git checkout feature-preprocessing
Shortcut that does both:
git checkout -b feature-preprocessing
Now you are on the feature branch. Any commits you make here do not affect the main branch. Build your feature, break things, experiment freely.
git add preprocessing.py
git commit -m "Add data preprocessing pipeline with outlier removal"
Switch back to main and check that your new file is not there:
git checkout main
ls
The preprocessing.py file is gone from view. It exists on the feature branch. Main is untouched.
Merge the feature branch into main when you are happy with it:
git merge feature-preprocessing
Output:
Updating c7e91b3..f3a84e2
Fast-forward
preprocessing.py | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
The feature is now on main. The feature branch can be deleted:
git branch -d feature-preprocessing
The Commands You Use Every Day
git status # what is changed
git add . # stage everything
git add filename.py # stage one file
git commit -m "clear message" # save staged changes
git log --oneline # view history
git diff # see unstaged changes
git diff --staged # see staged changes
git checkout -b branch-name # create and switch to branch
git checkout main # switch to main
git merge branch-name # merge branch into current
git restore filename.py # discard unstaged changes
git restore --staged filename.py # unstage a file
git stash # temporarily save uncommitted work
git stash pop # restore stashed work
git stash is underrated. When you are in the middle of something and need to switch branches quickly, stash saves your work in progress without committing it. git stash pop brings it back.
A Common Workflow
This is what a typical working day looks like with Git.
git status # see what changed since yesterday
git add analysis.py data_loader.py # stage the files you worked on
git commit -m "Add feature scaling and handle NaN in loader"
git checkout -b experiment-new-model # try something risky in isolation
# ... work for a few hours ...
git add model.py
git commit -m "WIP: trying LGBM instead of XGBoost"
# something breaks, want to go back to clean state
git checkout main # jump back to stable code
git stash drop # or delete the experiment branch
git branch -D experiment-new-model
Commit often. Small commits are easier to understand and easier to undo. A commit with twenty file changes and the message "stuff" is useless. Ten commits with clear messages is professional.
A Resource Worth Reading
Scott Chacon and Ben Straub wrote Pro Git, a complete book on Git available free at git-scm.com/book. Chapters 1 through 3 cover everything in this post plus branching and merging in depth. Chapter 7 covers the advanced tools most professionals do not know. The book is the canonical Git reference and it is completely free. Worth bookmarking and reading chapter by chapter over a few weeks.
Atlassian has a tutorial series called "Learn Git with Bitbucket Cloud" at atlassian.com/git/tutorials that is more visual and beginner-friendly than the official docs. Widely recommended as a starting point. Search "Atlassian Git tutorials."
Try This
Create a new folder called ml_practice. Initialize a Git repository inside it.
Create these files: README.md with a project description, data_loader.py with a function that reads a CSV, utils.py with a helper function, .gitignore that ignores .csv files, __pycache__, and .env.
Make three separate commits with clear messages. Each commit should add or change something meaningful, not just touch a file.
Create a branch called feature-eda. On that branch, add an eda.py file with at least three Pandas operations. Commit it with a clear message.
Switch back to main. Verify eda.py is not there. Merge the feature branch into main. Verify it is there now.
Run git log --oneline and show four commits minimum. The history should tell the story of what you built.
What's Next
Your code is version controlled. Next is GitHub, where that version history becomes visible to the world, shareable with collaborators, and the foundation of your public portfolio. Every project you build from this point forward goes on GitHub.
Top comments (0)