Alan West

Posted on Apr 9

How to Stop Feeling Lost in Unfamiliar Codebases Using Git

#git #productivity #programming #beginners

You just cloned a repo. Maybe you joined a new team, maybe you're reviewing a PR from an open-source contributor, or maybe you're debugging something in a service you haven't touched in six months. The instinct is to open the project in your editor and start reading files.

Don't do that yet.

I used to dive straight into src/ and try to build a mental map by reading code top-down. It's slow, it's overwhelming, and you miss the story of how the code got to its current state. These days, I run a handful of git commands first, and it saves me a ridiculous amount of time.

The Problem: Code Without Context Is Just Text

Reading code without understanding its history is like walking into a movie halfway through. You can see what's on screen, but you don't know why anyone is doing what they're doing. Why is there a weird adapter pattern in the database layer? Why are there three different HTTP clients? Why does this function have seventeen parameters?

The answers are almost always in the git history. The code is just the latest frame — git gives you the whole film.

Step 1: See Who Actually Works Here

# Show the most active contributors, sorted by commit count
git shortlog -sn --no-merges

This tells you who the major contributors are. If one person has 80% of the commits, that's your go-to person for questions. If commits are spread evenly across twenty people, you're dealing with a different kind of project — probably more process, more conventions, more docs.

I also like to scope this to recent history:

# Who's been active in the last 6 months?
git shortlog -sn --no-merges --since="6 months ago"

This is more useful than the all-time leaderboard. The person who wrote 60% of the code three years ago might have left the company. You want to know who's actively maintaining things now.

Step 2: Understand What's Changing (and What's Stable)

# Show the last 20 commits, one line each, with dates
git log --oneline --date=short --format="%h %ad %s" -20

This gives you the recent narrative. You'll quickly see patterns: are they shipping features? Fixing bugs? Refactoring? If the last fifteen commits are all bug fixes in the payments module, you know where the pain is.

But here's the command I reach for most often:

# Which files have changed the most in the last 3 months?
git log --since="3 months ago" --name-only --pretty=format: | sort | uniq -c | sort -rn | head -20

This is gold. The files that change the most frequently are either:

The core of the application (important to understand first)
Poorly designed code that keeps needing fixes (important to understand for different reasons)
Configuration or generated files (safe to ignore for now)

Either way, you now know where to focus your reading.

Step 3: Find the Architecture in the Commit History

# Look for big structural changes — commits that touched many files
git log --oneline --shortstat | head -60

When you see a commit that changed 47 files with 3,000 insertions, that's usually a major refactor, a migration, or a new feature being added. Read that commit message carefully. These big-bang commits often explain architectural decisions better than any documentation.

You can dig into a specific one:

# See exactly what a specific commit changed
git show <commit-hash> --stat

Step 4: Understand a Specific File's Story

Once you've identified the important files from Step 2, pick one and read its history:

# Full history of a single file, with diffs
git log -p --follow -- path/to/important/file.ts

The --follow flag is crucial — it tracks the file even if it was renamed. Without it, you'll think the file was created six weeks ago when it was actually just moved from somewhere else.

For a quicker overview without the full diffs:

# Just the commit messages for a specific file
git log --oneline --follow -- path/to/important/file.ts

This is how I figure out why code looks the way it does. "Oh, this weird null check was added in a hotfix for issue #437." Suddenly the code makes sense.

Step 5: Find the Experts for Each Area

# Who has touched this file the most?
git log --format="%an" -- path/to/file.ts | sort | uniq -c | sort -rn

This is basically a per-file version of Step 1. When you inevitably have questions about a specific module, this tells you exactly who to ask. It's way more useful than guessing based on team structure or org charts.

Step 6: Check for Recent Pain Points

# Find commits that mention "fix", "bug", or "revert"
git log --oneline --all --grep="fix" --since="2 months ago"

This surfaces the parts of the codebase that have been causing trouble. If you're joining a team and want to make a good first impression, understanding where the bugs cluster is genuinely valuable context. You'll ask better questions in your first code review.

You can also look for reverts specifically:

# Reverted commits often tell a cautionary tale
git log --oneline --all --grep="revert" -i

Every revert has a story. Usually it's "we shipped this, it broke production, we rolled it back." Those stories teach you where the landmines are.

Putting It All Together

My full workflow when I land in a new codebase takes about five minutes:

git shortlog -sn --no-merges --since="6 months ago" — who's active
git log --oneline -20 — what's the recent narrative
The frequency command from Step 2 — where's the action
git log --oneline --shortstat | head -40 — find the big structural commits
Pick the 2-3 most-changed files and read their git log --oneline --follow

After those five minutes, I have a mental map that would have taken me an hour of code reading to build. I know who works on what, what's changing, what's stable, and where the problems are.

Why This Actually Matters

Here's the thing — reading code is a skill, but reading code efficiently is a different skill. The developers I've worked with who ramp up fastest on new codebases aren't necessarily the ones who read code the fastest. They're the ones who know which code to read first.

Git history is the cheat code for that. It turns a flat directory of files into a narrative with characters, plot points, and drama. And honestly, some of the best drama I've seen has been in commit messages.

Next time you clone a repo, resist the urge to immediately open your editor. Spend five minutes in the terminal first. Your future self will thank you.

Quick Reference

Active contributors: git shortlog -sn --no-merges
Recent activity: git log --oneline -20
Hot files: git log --name-only --pretty=format: | sort | uniq -c | sort -rn | head -20
File history: git log --oneline --follow -- path/to/file
File experts: git log --format="%an" -- path/to/file | sort | uniq -c | sort -rn
Bug clusters: git log --oneline --grep="fix" --since="2 months ago"

DEV Community