lotanna obianefo

Posted on Jun 22

Inside .git, Objects & Hashing

#github #git #ai #devops

Understanding what happens inside the hidden .git directory is one of the most important steps toward mastering Git. It helps explain why Git is fast, reliable, and capable of tracking millions of changes across large software projects.

When you initialize a repository with git init Git creates a hidden directory called .git/. It contains everything Git needs to track changes, manage versions, store commit history, and maintain the overall state of your project.

Git Objects is the building blocks that make up your repository. Together, they allow Git to efficiently store and track project history. Object is the basic unit of storage used internally by Git. Every piece of information in a repository is stored as one of these four object types;

Blob for Storing file contents
Tree for Storing directory and file structure
Commit for Storing a snapshot of the repository and metadata
Tag for Storing references to specific releases or versions

Hashing is the process of converting data into a unique, fixed-length identifier called a hash. Git uses the SHA-1 hashing algorithm to generate these identifiers.
This hash acts like a unique fingerprint for the content. If the content changes, even by a single character, Git generates a completely different hash.

Now let's perform a mini project to help understand these...

Initialise & Explore the .git Directory

Initialize a new Git repository and examine the hidden .git directory to understand how Git stores and manages commits, branches, configuration files, and other version control data behind the scenes.

1. Initialise a New Repository

This step creates a new working directory, initializes a Git repository using git init, and verifies the repository's state with git status. During initialization, Git generates a hidden .git directory that contains the internal database and configuration files required for version control.

The git init command is the starting point of every Git workflow. It creates the foundational repository structure that Git uses to store objects, track changes, manage branches, and maintain commit history. Without a .git directory, Git cannot record versions, store metadata, or perform any version control operations.

Understanding what git init does behind the scenes helps build a solid foundation for learning Git, as every commit, branch, tag, and repository operation depends on the database structure it creates.

Every software project should begin with a properly initialized repository. Establishing version control from the start promotes consistency, traceability, and effective collaboration throughout the software development lifecycle.

      # Creates a new directory (folder).
       mkdir git-internals-lab

      # Changes the current directory.
       cd git-internals-lab

      # Initializes a new Git repository in the current directory.
       git init

      # Shows which files are staged, modified, or untracked your current working state.
       git status

2. Tour the .git Directory

This step examines the contents of the hidden .git directory, including key components such as HEAD, config, objects, refs, hooks, and info. It also inspects files like HEAD, which identifies the currently checked-out branch, and config, which stores repository-specific settings.

The .git directory is the core of every Git repository. It contains all the information Git uses to manage version control, including commit history, branches, tags, configuration settings, and object storage. Files such as HEAD act as pointers to the current branch, while the objects directory stores Git's internal database and refs maintains references to branches and tags.

Understanding the structure of the .git directory provides valuable insight into how Git operates behind the scenes. This knowledge helps demystify advanced concepts such as branching, rebasing, reflog inspection, repository recovery, and troubleshooting version control issues.

      # Lists files and directories. -l shows details (permissions, size, date).
       ls .git

      # Displays the contents of Git's HEAD file
       cat .git/HEAD

      # Displays the repository's local configuration file
       cat .git/config

      # lists the contents of the Git refs directory
       ls .git/refs

Hashing & Blob Objects

Generate your first SHA hash and observe how Git converts a file into a content-addressed blob object stored in the object database. The SHA hash acts as a unique identifier based on the file's contents, allowing Git to track and retrieve the file efficiently without relying on its filename or location.

1. Create a File and Hash It

Writes "Hello Git" to hello.txt and instructs Git to calculate the file's SHA-1 hash without storing the file as an object. The resulting hash (for example, 9f4d96d5b00d98959ea9960f069585ce42b1349a) serves as a unique content fingerprint that identifies the file based solely on its contents.

Git tracks data by its hash rather than its filename. Identical content always produces the same hash, while even a one-character modification generates an entirely different hash. This content-addressable design makes Git history tamper-evident, as altering a file changes its hash and invalidates every reference that depends on it.

Cryptographic hashing helps ensure the integrity and authenticity of repository data.

       # Prints text to the terminal — also used to write to files with > or >>.
        echo 'Hello Git' > hello.txt

       # Runs a Git version control command.
        git hash-object hello.txt

2. Write the Blob to the Object Store

The -w flag tells Git to store the file as a blob object in the .git/objects directory. Using find reveals that Git stores the object at .git/objects/first-2-chars/remaining-38-chars, splitting the hash into a directory and filename to prevent a single directory from containing too many files.

An object's hash determines its storage location. Identical files produce the same hash and are stored only once, reducing duplication while allowing Git to efficiently manage millions of objects.

Hash-based directory sharding enables fast object lookups and maintains filesystem performance as repositories grow.

        # Runs a Git version control command.
        git hash-object -w hello.txt

        # Searches for files and directories matching criteria (name, type, size, date, etc.).
        find .git/objects -type f

        # Lists files and directories. -l shows details (permissions, size, date).
         ls .git/objects

3. Inspect the Blob with cat-file

git cat-file -t displays the object's type (such as blob), while git cat-file -p prints the object's contents in a human-readable format. Replace the example hash with the hash generated in your environment, the output will be identical if the file contents are the same.

A blob object stores only the file's contents. it does not contain the filename, directory path, or commit history. This is why multiple files with identical content can reference the same blob object. File names and directory structures are stored separately in tree objects.

Low-level Git commands such as cat-file provide valuable insight into Git's internal data model and can assist with troubleshooting, data recovery, and advanced repository management.

             # Runs a Git version control command.
              git cat-file -t 9f4d96d5b00d98959ea9960f069585ce42b1349a

             # Runs a Git version control command.
              git cat-file -p 9f4d96d5b00d98959ea9960f069585ce42b1349a

Commits, Trees & Snapshots

Stage and commit your file, then observe how Git represents a project snapshot using three core object types: blob, tree, and commit.

In this structure, a blob stores the file contents, a tree represents the directory hierarchy, and a commit ties everything together with metadata such as the author, timestamp, and a reference to the root tree object. This object model is what enables Git’s efficient version tracking and history reconstruction.

1. Stage and Create the First Commit

Configures your Git identity for commit attribution, stages hello.txt, creates the initial commit, and displays the commit history using the log.

During this process, Git builds its internal object graph. A blob stores the file content, a tree represents the directory structure, and a commit object references the tree while also capturing metadata such as author, timestamp, and message.

Specifically, git add writes the file content as a blob and records it in the staging index, while git commit converts the staged index into a tree snapshot and then creates a commit object pointing to that tree. Together, these three objects form a single atomic snapshot of the repository state, which becomes the fundamental unit of Git history.

This supports Operational Excellence by enabling atomic, immutable snapshots that allow safe rollbacks, efficient debugging, and reliable history traversal like bisecting changes.

      # Sets Git configuration email.
      git config --global user.email 'you@example.com'

      # Sets Git configuration name.
      git config --global user.name 'Your Name'

      # Stages the specified file(s) for the next commit.
      git add hello.txt

      # Creates a new commit with all staged changes and the message after -m.
      git commit -m 'Initial commit'

      # Shows the commit history. --oneline condenses each commit to one line, --graph shows branches visually.
      git log --oneline

2. Inspect the Commit Object

git rev-parse outputs the SHA-1 (or SHA-256) identifier of the current commit. git cat-file -t verifies the object type as a commit, while git cat-file -p displays its full internal structure, including the referenced tree SHA, author and committer metadata, timestamp, parent commit reference(s), and commit message.

A commit object itself does not store file contents. Instead, it contains metadata (author, committer, timestamps, parent commit SHA, and message) and a single reference to a tree object, which represents the project’s directory snapshot. That tree, in turn, references blob objects that store the actual file contents.

This supports Security by providing cryptographic identifiers and immutable history, enabling accurate traceability, forensic auditing, and verification of who made each change and when.

      # Runs a Git version control command.
      git rev-parse HEAD

      # Runs a Git version control command.
      git cat-file -t HEAD

      # Runs a Git version control command.
      git cat-file -p HEAD

3. Inspect the Tree Object

git ls-tree displays the contents of a tree object, which represents a snapshot of a directory in Git. The TREE variable typically resolves to the tree SHA referenced by HEAD, and git cat-file -p reveals its entries in the format. File mode (e.g., 100644), object type (blob or tree), object SHA, and filename (e.g., hello.txt).

A tree object functions as a mapping between filenames and their corresponding object hashes. It can point either to blob objects (files) or to other tree objects (subdirectories), thereby reconstructing the project’s directory hierarchy. This is the mechanism through which Git restores filenames and directory structure on top of its content-addressable storage. The file mode (such as 100644) represents Unix-style permission metadata.

This supports Operation by enabling deep inspection of repository structure, which is critical for debugging diffs, analyzing history, and performing precise, targeted history modifications.

      # Runs a Git version control command.
      git ls-tree HEAD

      # Sets the shell variable TREE so later commands can reference it with $TREE.
      TREE=$(git rev-parse HEAD^{tree})

      # Runs a Git version control command.
      git cat-file -p $TREE

4. Observe Snapshots, Not Deltas

Compares the number of Git objects before and after making a small change to a file. After the second commit, Git typically creates three new objects: a blob containing the updated file content, a tree representing the updated directory structure, and a commit object representing the new repository snapshot. As a result, the total object count increases by three.

Git's architecture is based on immutable snapshots rather than modifying existing objects. Each commit references a complete tree that represents the state of the repository at a specific point in time. When changes are made, Git creates new objects while preserving all previous ones unchanged. This design allows you to instantly check out historical commits, reliably reconstruct repository states, and maintain a highly resilient version history.

This supports Reliability by ensuring that every commit is an immutable, independently recoverable snapshot, providing robust protection against data loss and enabling safe rollback to any previous state.

      # Searches for files and directories matching criteria (name, type, size, date, etc.).
       find .git/objects -type f | wc -l

      # Prints text to the terminal & also used to write to files with > or >>.
       echo 'Another line' >> hello.txt

      # Stages the specified file(s) for the next commit.
       git add hello.txt

      # Creates a new commit with all staged changes and the message after -m.
       git commit -m 'Updated file'

      # Searches for files and directories matching criteria (name, type, size, date, etc.).
       find .git/objects -type f | wc -l

Refs, HEAD & Publishing to GitHub

Trace the HEAD → Branch → Commit → Tree → Blob reference chain to understand how Git locates and reconstructs a project snapshot. Then push the repository to GitHub and observe how Git transfers all required objects commits, trees, and blobs to the remote repository, preserving the complete history and structure of your project.

1. Trace HEAD and Branch Refs

Reads the HEAD file, which typically contains a reference such as ref. refs/heads/main, and then reads the corresponding branch reference file, which stores the SHA of the latest commit. Running git rev-parse HEAD and git rev-parse main returns the same commit SHA, demonstrating how Git resolves references through a chain of indirection.

HEAD is a symbolic reference that points to a branch rather than directly to a commit. The branch itself is a lightweight reference that stores the SHA of the latest commit. When a new commit is created, Git generates a new commit object and updates the branch reference to point to it, while HEAD continues to reference the branch. This design makes branches inexpensive to create and update, and also explains detached HEAD state, where HEAD points directly to a commit instead of a branch.

Understanding how HEAD and branch references work helps prevent common Git issues such as detached HEAD states, lost commits, and accidental branch overwrites.

      # Runs a Git version control command.
       cat .git/HEAD

       cat .git/refs/heads/main

       git rev-parse HEAD


      # Runs a Git version control command.
       git rev-parse main

2. Set Up SSH Auth and Add a Remote

Generates an Ed25519 SSH key pair and adds the public key to GitHub (Settings → SSH and GPG keys) to enable secure authentication.
Next, registers a GitHub repository as the origin remote. Before doing so, create an empty repository on GitHub and avoid initializing it with a README, .gitignore, or license file to prevent conflicts with your local repository.

A remote is a named reference to another Git repository, allowing Git to exchange commits and objects between locations. Ed25519 SSH keys are the modern standard for Git authentication, providing strong cryptographic security, faster key operations, and passwordless access. Once the origin remote is configured, Git knows where to send (push) and retrieve (pull) repository data.

SSH key-based authentication replaces password-based workflows with cryptographic identity verification, improving both security and automation compatibility.

      # Stages the specified file(s) for the next commit.
       ssh-keygen -t ed25519 -C 'you@example.com'

       cat ~/.ssh/id_ed25519.pub

       git remote add origin git@github.com:<YOUR_USERNAME>/git-internals-lab.git

      # Manages remote repository connections (origin, upstream, etc.).
      git remote -v

3. Push Every Object to GitHub

Uploads all reachable Git objects blobs, trees, and commits to the remote repository and updates the remote branch reference (e.g., main) to point to your latest commit. The -u flag sets origin/main as the upstream tracking branch, enabling simplified future push and pull operations. git ls-remote can be used to inspect and verify the current references stored on the remote.

A git push operation is fundamentally object replication combined with a reference update. Git ensures that the remote repository stores the exact same objects as your local repository, identified by identical SHA-1 (or SHA-256) hashes. As a result, the commit history, file contents, and directory structure remain fully consistent across environments, meaning GitHub reconstructs the same tree and blob relationships you have locally.

This supports Reliability by leveraging Git’s distributed architecture, where every clone contains a complete, self-contained history, enabling redundancy, integrity, and full recoverability of the repository state.

      # Uploads your local commits to the remote repository.
      git push -u origin main

      # Manages remote repository connections (origin, upstream, etc.).
      git ls-remote origin

Although Git appears simple on the surface, it is powered by a sophisticated object database and hashing system. Every file, directory, and commit is stored as an object identified by a unique hash, allowing Git to efficiently track changes and maintain repository integrity.

DEV Community

Inside .git, Objects & Hashing

Top comments (0)