Most developers use Git daily without knowing what it actually is under the hood. They memorize commands — commit, push, merge, rebase — and pray nothing breaks.
Here's the thing: Git is not a version control system. Not really. It's a directed acyclic graph (DAG) with a version-control interface bolted on top. Once you see the graph, every confusing Git behavior suddenly makes sense.
Commits Are Snapshots, Not Diffs
This trips up almost everyone. When you run git commit, Git doesn't save "what changed." It saves the entire state of your tracked files as a snapshot. Every single commit is a complete picture of your project at that moment.
"But that would waste tons of space!" — not really. Git uses content-addressable storage with SHA-1 hashes. If a file didn't change between commits, Git just points to the same blob. No duplication. The object store is essentially a hash map where identical content always maps to the same key.
SVN and older systems store diffs. Git stores snapshots. This one difference explains why branching in Git is nearly instant (just create a new pointer) while branching in SVN was a full directory copy.
The DAG: Four Object Types
Git's internals come down to four objects:
- Blobs — raw file contents, no filename, no metadata, just bytes
- Trees — directory listings that map filenames to blobs (or other trees)
- Commits — a pointer to a tree, plus metadata (author, message, timestamp) and one or more parent commit references
- Tags — a named pointer to a specific commit
That's it. The entire commit history forms a DAG because each commit points backward to its parent(s), and cycles are structurally impossible — you'd need a commit to reference a future commit's hash, which can't exist yet.
Branches Are Just Sticky Notes
A branch in Git is a 41-byte file containing a commit hash. That's literally all it is. When you create a new branch, Git writes a 41-byte file. When you switch branches, Git updates HEAD to point at the new branch file.
This is why git branch feature takes microseconds. There's no copying, no cloning, no heavy operation. You're creating a Post-it note that says "I'm currently looking at commit abc123."
Merge? You're connecting two nodes in the graph — the merge commit has two parent pointers instead of one. Rebase? You're replaying commits to create new nodes with different parents, then moving your sticky note to the new chain.
Why This Mental Model Matters
Once you think in graphs instead of "versions," hard Git problems become visual:
Detached HEAD — your HEAD pointer isn't attached to any branch sticky note. You're looking directly at a commit node. Any new commits you make will be orphaned when you move away.
Merge conflicts — two branches modified the same blob differently. Git needs you to decide which content the merged tree should contain.
Rebase vs merge — merge preserves the graph structure (two parents converge). Rebase rewrites it into a straight line (new commits, different parent pointers, same diffs).
Cherry-pick — copy a commit's diff and apply it as a new node on your current branch. Different hash, same changes.
git reflog — a log of everywhere HEAD has pointed. Even "deleted" commits still exist in the object store for ~30 days. Nothing in Git is truly gone until garbage collection runs.
The Practical Payoff
I used to memorize Git commands like magic spells. Now I draw the graph in my head before running anything:
- Where is HEAD?
- Where do I want it?
- What nodes need to exist to get there?
When something goes wrong, git reflog shows me the graph's recent history, and I just move pointers around. Ninety percent of "I broke Git" situations are solved by understanding that you didn't break anything — you just moved a pointer somewhere unexpected.
If you want to see your repo's DAG yourself, try git log --oneline --graph --all. That tree-like output IS the data structure. Everything else in Git is just manipulating it.
Stop memorizing commands. Start seeing the graph.
Top comments (0)