<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Muhammad Abdullah Khaver</title>
    <description>The latest articles on DEV Community by Muhammad Abdullah Khaver (@abdullahkhaver).</description>
    <link>https://dev.to/abdullahkhaver</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3999453%2Fa0c46f1b-b1ea-4c14-ab3f-ad8f883d7546.png</url>
      <title>DEV Community: Muhammad Abdullah Khaver</title>
      <link>https://dev.to/abdullahkhaver</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/abdullahkhaver"/>
    <language>en</language>
    <item>
      <title>I Built PyVCS to Understand How Git Works Internally</title>
      <dc:creator>Muhammad Abdullah Khaver</dc:creator>
      <pubDate>Tue, 23 Jun 2026 21:40:28 +0000</pubDate>
      <link>https://dev.to/abdullahkhaver/i-built-pyvcs-to-understand-how-git-works-internally-5cm</link>
      <guid>https://dev.to/abdullahkhaver/i-built-pyvcs-to-understand-how-git-works-internally-5cm</guid>
      <description>&lt;h2&gt;
  
  
  Why I Did It
&lt;/h2&gt;

&lt;p&gt;Like many developers, I use Git every day. I know the commands by heart – &lt;code&gt;add&lt;/code&gt;, &lt;code&gt;commit&lt;/code&gt;, &lt;code&gt;push&lt;/code&gt;, &lt;code&gt;log&lt;/code&gt; – but I realised I had only a vague idea of what actually happens under the hood. What’s a blob? How does a tree differ from a commit? How does the index (staging area) work? And most mysteriously, how does &lt;code&gt;git push&lt;/code&gt; actually send data to a remote server?&lt;/p&gt;

&lt;p&gt;I wanted to demystify this. Instead of reading dry documentation, I decided to build my own simplified Git clone. Not to replace Git, but to understand it.&lt;/p&gt;

&lt;p&gt;So I wrote &lt;strong&gt;PyVCS&lt;/strong&gt; – a pure-Python version control system that implements a subset of Git’s features. The entire codebase is around 500 lines, and it taught me more than any tutorial could.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Set Out to Build
&lt;/h2&gt;

&lt;p&gt;My goals were clear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Object storage&lt;/strong&gt; – store blobs, trees, and commits with SHA‑1 hashing and zlib compression.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A staging area (index)&lt;/strong&gt; – the classic &lt;code&gt;add&lt;/code&gt; and &lt;code&gt;commit&lt;/code&gt; flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Commit history&lt;/strong&gt; – parent pointers, log, and the ability to walk the chain.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Working tree operations&lt;/strong&gt; – &lt;code&gt;status&lt;/code&gt; and &lt;code&gt;diff&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Remote push&lt;/strong&gt; – speak the Git pack protocol over HTTP.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I also wanted it to be &lt;strong&gt;installable&lt;/strong&gt; – so that after I was done, I could actually use it as a command-line tool, even if it’s limited.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1: Understanding the Object Model
&lt;/h2&gt;

&lt;p&gt;Git’s object model is surprisingly elegant. There are four types (I implemented three): blobs (file contents), trees (directory listings), and commits (snapshots with metadata). Each object is stored as a header (&lt;code&gt;type size\0&lt;/code&gt;) followed by the data, compressed with zlib, and named by its SHA‑1 hash.&lt;/p&gt;

&lt;p&gt;The first thing I wrote was &lt;code&gt;hash_object()&lt;/code&gt; – it computes the hash, writes the compressed data to &lt;code&gt;.vcs/objects/ab/cdef...&lt;/code&gt; (two‑level directory structure), and returns the hash. I also wrote &lt;code&gt;read_object()&lt;/code&gt; to decompress and parse the object back.&lt;/p&gt;

&lt;p&gt;It was thrilling to see that my handcrafted objects had the exact same format as Git’s – I could even use &lt;code&gt;git cat-file&lt;/code&gt; on them if I renamed &lt;code&gt;.vcs&lt;/code&gt; to &lt;code&gt;.git&lt;/code&gt; (though I didn’t).&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 2: The Index – A Sneaky Data Structure
&lt;/h2&gt;

&lt;p&gt;The index (&lt;code&gt;.vcs/index&lt;/code&gt;) is a binary file that tracks which files are staged and their metadata. I reverse‑engineered the Git index format (version 2) – it’s a header, a series of fixed‑length entries with variable‑length path names, and a trailing SHA‑1 checksum.&lt;/p&gt;

&lt;p&gt;Parsing it was a bit tedious with &lt;code&gt;struct.unpack&lt;/code&gt;, but once I got it right, I could list staged files with &lt;code&gt;ls-files&lt;/code&gt; and compare them to the working tree for &lt;code&gt;status&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The tricky part was building a tree from the index entries. Git requires trees to be sorted and to handle nested directories. I wrote a recursive &lt;code&gt;build_tree_from_entries()&lt;/code&gt; that groups entries by directory, builds subtrees, and creates the final tree object. That was a moment of clarity – I finally understood how Git represents directories.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 3: Commits and the Master Branch
&lt;/h2&gt;

&lt;p&gt;A commit is just a text object with a tree hash, parent hash (if any), author/committer info, timestamp, and a message. I wrote &lt;code&gt;commit()&lt;/code&gt; to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write the tree from the current index.&lt;/li&gt;
&lt;li&gt;Read the current &lt;code&gt;master&lt;/code&gt; pointer (a reference stored in &lt;code&gt;.vcs/refs/heads/master&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;Create the commit object.&lt;/li&gt;
&lt;li&gt;Update the master ref to the new commit hash.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;log&lt;/code&gt; command then simply walks the parent chain, printing each commit’s details. It was like watching a timeline come alive.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 4: The Big Challenge – Push
&lt;/h2&gt;

&lt;p&gt;This was the most difficult part. I wanted to push my local commits to a remote Git repository (like GitHub) over HTTP. Git uses a smart protocol with two phases:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Discovery&lt;/strong&gt; – &lt;code&gt;GET /info/refs?service=git-receive-pack&lt;/code&gt; returns the remote’s current refs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Upload&lt;/strong&gt; – &lt;code&gt;POST /git-receive-pack&lt;/code&gt; sends a packfile containing the objects that the remote doesn’t have, plus a reference update command.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I had to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Parse the pkt‑line format (each line prefixed with a 4‑byte hex length).&lt;/li&gt;
&lt;li&gt;Calculate the set of missing objects between local and remote commits (recursive diff of object graphs).&lt;/li&gt;
&lt;li&gt;Build a packfile – a custom binary format with a header, compressed object data for each object, and a trailing SHA‑1.&lt;/li&gt;
&lt;li&gt;Encode objects with a variable‑length header (type and size) and compressed data.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The packfile logic was the most intricate – I used the official Git documentation and a lot of trial‑and‑error with &lt;code&gt;xxd&lt;/code&gt; and &lt;code&gt;hexdump&lt;/code&gt;. When I finally saw &lt;code&gt;unpack ok&lt;/code&gt; from the server, I literally cheered.&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 5: Making It Installable
&lt;/h2&gt;

&lt;p&gt;Once the core was working, I wanted to share it. I structured it as a Python package with a &lt;code&gt;pyproject.toml&lt;/code&gt; and a console script entry point. Now anyone can install it with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;pyvcs
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and run &lt;code&gt;pyvcs init&lt;/code&gt;, &lt;code&gt;pyvcs add&lt;/code&gt;, etc. from anywhere.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Git is not magic&lt;/strong&gt; – it’s a set of simple, well‑designed data structures and protocols.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The index is a cache&lt;/strong&gt; – it stores file metadata and hashes to speed up commits and diffs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trees are recursive&lt;/strong&gt; – they’re the key to Git’s fast directory comparisons.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Packfiles are clever&lt;/strong&gt; – they compress and deduplicate objects, but the format is surprisingly straightforward.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;HTTP is just bytes&lt;/strong&gt; – the Git protocol is just a stream of lines and binary chunks; once you understand the framing, it’s not scary.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More than that, I gained a deep appreciation for the design decisions that make Git fast and reliable. Linus Torvalds and the Git community built something truly remarkable.&lt;/p&gt;




&lt;h2&gt;
  
  
  What PyVCS Can (and Can’t) Do
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Can:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;init&lt;/code&gt;, &lt;code&gt;add&lt;/code&gt;, &lt;code&gt;commit&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;diff&lt;/code&gt;, &lt;code&gt;log&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ls-files&lt;/code&gt; and &lt;code&gt;cat-file&lt;/code&gt; for inspection&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;push&lt;/code&gt; to a remote HTTP repository (with basic auth)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Can’t:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Branches or tags (only &lt;code&gt;master&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;pull&lt;/code&gt; or &lt;code&gt;clone&lt;/code&gt; (though push works, so half the story is there)&lt;/li&gt;
&lt;li&gt;Merging or conflict resolution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s not meant to be a production tool – it’s an educational experiment. But it’s functional, and it helped me understand Git from the inside out.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;I’ve open‑sourced PyVCS on &lt;a href="https://github.com/abdullahkhaver/pyvcs" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; under the MIT license. If you’ve ever wondered how version control works, I encourage you to clone it, read the code, and maybe even extend it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/abdullahkhaver/pyvcs.git
&lt;span class="nb"&gt;cd &lt;/span&gt;pyvcs
pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt;
pyvcs init mytest
&lt;span class="nb"&gt;cd &lt;/span&gt;mytest
&lt;span class="c"&gt;# ... play with it&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The joy of building something that works, and the insight gained, makes this project one of the most rewarding I’ve ever done.&lt;/p&gt;

&lt;p&gt;If you decide to write your own VCS, I’d love to hear about it. Happy hacking!&lt;/p&gt;

</description>
      <category>python</category>
      <category>git</category>
      <category>programming</category>
      <category>ai</category>
    </item>
  </channel>
</rss>
