Yesterday, I set out to implement one of Git's core functionalities on my own—specifically, how files are stored, what Git objects are, and the processes of hashing and compressing. It took me 4 hours to develop, and in this article, I'll walk you through my thought process and approach.
What Happens When You Commit a File?
When you commit a file in Git, several important steps occur under the hood:
File Compression:
The content of the file is compressed using a zlib algorithm to reduce its size. This compressed content is what gets stored in the Git object database.
Hash Calculation:
A unique SHA-1 hash is generated from the compressed file content. This hash serves as the identifier for the file in the Git object database.
Storing the Object:
The object file is stored in the .mygit/objects directory, organized by the first two characters of the hash. This structure makes it easier to manage and retrieve objects efficiently.
Updating Commit Information:
To demonstrate how files are stored in git.
I have implemented commit functionality, taking one file in to consideration
- For every file, I have calculated hash
- Inside objects folder, new folder is created with name equal to first two characters of hash.
- And a file is created inside that folder with remaining hash as name.(this file stores the compressed format of committed file)
- Detected changes by comparing newly calculated hash and last calculated hash of the file
Detecting Changes
I implemented this algorithm based on my own approach, but Git uses more efficient algorithms for these operations.
- Extracted array of lines from oldContent and newContent
- Created a Map to store line as key and index as value
- Created two new arrays to store indexes of common lines in oldContent and newContent 4.eg: OldCommonarray = [0 , 3] then deleted lines will be [1,2]
Thanks a lot for you time.
Top comments (0)