mitali

Posted on Oct 12 • Edited on Oct 17

I Built My Own Git in Rust to Understand Version Control

#git #rust #tutorial #beginners

I've been using Git for years. Committing, pushing, pulling, occasionally panicking when things break. But if you'd asked me what actually happens when I run git commit, I'd have given you some vague answer about "saving changes" and hoped you wouldn't ask more questions.

That bothered me. So I built Veridian, my own version control system in Rust. Not because the world needs another Git, but because I needed to understand the one we already have.

Turns out, Git is way simpler than I thought.

Why Everyone Finds Git Confusing

We learn Git backwards. We memorize commands without understanding what they do. git add stages files. Okay, but what does staging actually mean? git commit saves your work. Cool, but saves it where and how?

I spent years just following commands. I could use Git, but I couldn't understand it. Building Veridian changed that.

Here's What Git Actually Is

Strip away all the commands and features, and Git is just a content-addressable storage system. Sounds complicated, but it's not.

You have a storage system where things are saved by their content, not their name. Put in the same content twice? Same storage location. Change one character? Different location.

That's Git. The "location" is a SHA-1 hash. The "storage" is the .git/objects folder.

Three Types of Objects

Blobs are file contents. Take your file, add a header like blob <size>\0, hash it, compress it, store it. Done. Git doesn't care about filenames here. Only content.

Trees are directory listings. They say "here's what this folder looked like" by listing files and folders with their hashes. Trees point to blobs and other trees.

Commits are snapshots with context. Each commit points to a tree (what your project looked like), points to parent commits (what came before), and has metadata like author, time, and message.

Three object types. That's the whole system.

Why the Hash System is Smart

Same file in multiple commits? Stored once. Changed one line in a big file? Only the new version gets stored. Want to check if two files are identical? Compare hashes, instant answer.

Git isn't making copies of your project over and over. It's storing unique pieces and building snapshots from them. That's why repos with hundreds of commits aren't huge.

Branches Are Just Files

A branch is literally a file with a commit hash in it.

The file .git/refs/heads/main has 40 characters in it, the hash of your latest commit. When you make a branch, Git writes a new file with the current commit hash. When you commit, Git updates the file with the new hash.

No copying. Just updating a small text file. That's why branches are "lightweight."

The Compression Part is Cool

Git uses zlib to compress everything before storing it. So your object files aren't just raw content, they're compressed. When I was building Veridian, I had to handle this compression and decompression for every read and write.

Here's what happens: Git takes your blob (with header), compresses it, then stores it in .git/objects/ab/cdef123... where ab is the first two characters of the hash and cdef123... is the rest. The two-character split is just to avoid having thousands of files in one directory, which would slow down file systems.

Reading it back means finding the file, decompressing with zlib, parsing the header to check object type and size, then giving you the content. Rust's standard library doesn't have zlib built in, so I used the flate2 crate for this. Took like 5 lines of code.

What Building Veridian Taught Me

I thought building a version control system would be hard. It wasn't.

Building in Rust was interesting because Rust makes you think about memory and ownership. When you're hashing files and building trees, you need to handle errors properly (what if the file doesn't exist?) and manage buffers carefully (you can't just load a 5GB file into memory).

But honestly, the version control logic itself is simple. Most of my code is just reading files, computing SHA-1 hashes, and writing compressed data. The hard part wasn't the algorithm, it was understanding what Git was actually doing.

Init Command

Make a .veridian folder. Add subfolders for objects and refs. Create a HEAD file. Done, you have a repo.

Hash-Object Command

Read file, add header, hash it with SHA-1, compress with zlib, write to .veridian/objects/. Return the hash. That's how files enter the system.

Write-Tree Command

Go through a directory. Hash every file (making blobs). Put all the names and hashes into a tree object. Hash that tree. Now you have a snapshot of your directory.

One thing I learned: tree entries need to be sorted by filename. If you don't sort them, the same directory structure produces different hashes depending on the order you process files. Git sorts them to keep hashes consistent. Small detail, but it matters.

Commit-Tree Command

Take a tree hash. Add parent commit hash if there is one. Add author info and time. Add message. Hash it all. Write to objects. Update branch pointer. Update HEAD. That's a commit.

The implementation is surprisingly small.It works like Git because Git is actually this simple.

Fun fact: Git stores timestamps as Unix timestamps (seconds since 1970) with timezone info. So a commit object has something like 1760211794 +0530 which is the timestamp and timezone offset. When you git commit, it grabs your system time and timezone. I used Rust's chrono crate for this, but you could do it with any language.

Things That Finally Clicked

Why Git is fast: It compares hashes, not file contents. 40 character strings. Super quick.

Why detached HEAD happens: HEAD normally points to a branch file, which points to a commit. Check out a commit directly? HEAD points at the commit, skipping the branch. You're detached because you're not on a branch, you're on a specific commit.

Why you can recover deleted commits: They're still in .git/objects. Just unreferenced. Use git reflog, find the hash, get it back. Only garbage collection deletes them for real.

Why merge conflicts exist: Two commits have the same parent but different changes to the same file. Git can't pick which one wins. It needs you to decide.

What I Learned

Git isn't hard because it's complicated. It's hard because we learn it wrong.

Once you get that Git is a key-value store where the key is a content hash, and there are three types of values (blob, tree, commit), everything makes sense. Branches are pointers. Merging combines trees. Rebasing replays commits on a different parent.

I used Git for years without getting it. Then I built Veridian in a week and suddenly Git made sense. Not because building is magic, but because it forces you to understand what's happening.

Why You Should Try This

You don't need to build something perfect. Just start.

Make a repo. Store a file as a blob. Build a tree. Make a commit. Do those four things and you'll understand Git better than most developers.

Building Veridian took maybe a week. Now when I use Git, I actually know what's happening. It's just data structures and file operations. Nothing complicated.

Veridian isn't perfect. It's missing features. Probably has bugs. But it taught me how Git works, and that was the point.

If you want to actually learn Git, not just use it, build something. Even if it's small. Even if it breaks. You'll learn more building for a week than reading docs for months.

Check out Veridian on GitHub. Break it, fix it, learn from it. That's how this works.

Top comments (10)

Pavel Gurkov • Oct 15

Two commits have the same parent but different changes to the same file.

More like, two parents have different snapshots of the same file. You won't have any merge conflict if two commits have the same parent - that's essentially branching; and it's incorrect to talk about diffs as, as you know, git does not store diffs.

Kate • Oct 22 • Edited

I love this, did you really done this in one week? Impressive!!
BTW, as offtopic - can you tell me how (what program) did you use to create this beautiful diagram of Veridian Architecture?

mitali • Oct 22

i used excalidraw and mermaid for the architecture, and honestly, it wasn’t a long project , something that’d take me more than a week. I mostly went through the docs a lot and spent most of my week on this project.