DEV Community

Cover image for Mini-git, Understanding How Files Are Stored in Git Objects
Keerthi Vardhan
Keerthi Vardhan

Posted on

Mini-git, Understanding How Files Are Stored in Git Objects

Yesterday, I set out to implement one of Git's core functionalities on my own—specifically, how files are stored, what Git objects are, and the processes of hashing and compressing. It took me 4 hours to develop, and in this article, I'll walk you through my thought process and approach.

What Happens When You Commit a File?

When you commit a file in Git, several important steps occur under the hood:

File Compression:

The content of the file is compressed using a zlib algorithm to reduce its size. This compressed content is what gets stored in the Git object database.

Hash Calculation:

A unique SHA-1 hash is generated from the compressed file content. This hash serves as the identifier for the file in the Git object database.

Storing the Object:

The object file is stored in the .mygit/objects directory, organized by the first two characters of the hash. This structure makes it easier to manage and retrieve objects efficiently.
Updating Commit Information:

To demonstrate how files are stored in git.
I have implemented commit functionality, taking one file in to consideration

  1. For every file, I have calculated hash
  2. Inside objects folder, new folder is created with name equal to first two characters of hash.
  3. And a file is created inside that folder with remaining hash as name.(this file stores the compressed format of committed file)
  4. Detected changes by comparing newly calculated hash and last calculated hash of the file

Detecting Changes

I implemented this algorithm based on my own approach, but Git uses more efficient algorithms for these operations.

  1. Extracted array of lines from oldContent and newContent
  2. Created a Map to store line as key and index as value
  3. Created two new arrays to store indexes of common lines in oldContent and newContent 4.eg: OldCommonarray = [0 , 3] then deleted lines will be [1,2]

GitHub Repo
Linkedin

Thanks a lot for you time.

Top comments (0)