Git Plumbing in Practice: How CI, Review Tools, and AI Agents Build on Git's Primitives

#webdev #tutorial

Run git log and you're using porcelain — the human-facing layer Git's own manual labels exactly that. Run git cat-file -p HEAD and you've dropped into plumbing: the low-level toolkit porcelain itself is built from. The split is not trivia. It's the reason a whole generation of developer tools — CI runners, stacked-diff CLIs, code review systems, and now AI coding agents — builds on Git rather than reinventing version control. They all program against the same small set of primitives, and once you can read those primitives, most "magic" tooling behavior becomes legible.

The Object Model Is Smaller Than You Think

Git's entire data model is four object types living in .git/objects:

Blob — file contents and nothing else. No filename, no permissions.
Tree — one directory listing: mode, type, object ID, and name per entry.
Commit — a pointer to exactly one tree, zero or more parent commits, an author, a committer, and a message.
Annotated tag — a named (optionally signed) pointer to another object.

Every object is content-addressed: its ID is the SHA-1 hash of its type, size, and bytes (SHA-256 has been an opt-in repository format since Git 2.29). Identical content always hashes to the same ID, which is why a file left unchanged across 500 commits is stored as one blob referenced 500 times, not 500 copies.

On top of objects sit refs, and a ref is almost embarrassingly simple: cat .git/refs/heads/main prints 40 hex characters. A branch is a text file containing a commit ID. HEAD is typically a one-line file pointing at one of those refs.

You can verify all of this in under a minute:

git cat-file -p HEAD          # the raw commit object
git cat-file -p HEAD^{tree}   # the tree it points to
git ls-tree HEAD src/         # one directory's entries
git rev-parse HEAD            # resolve any ref to an ID

These are plumbing commands, and they come with a contract porcelain doesn't offer: their output formats stay stable and script-friendly across Git versions, while git log's formatting is allowed to drift. That stability guarantee is what third-party tools build against.

Reading .git is safe; writing into it by hand is not. Refs get compacted into .git/packed-refs, and repositories on Git 2.45+ can use the reftable backend instead of loose files — so a hash you echo into refs/heads/ may be shadowed, skip locking, and leave no reflog entry. Go through git update-ref, git symbolic-ref, and git hash-object -w; they handle packing, locks, and reflogs for you.

What Real Tools Do With These Primitives

Once you hold the object model in your head, existing tools stop looking like magic and start looking like four primitives composed differently.

CI systems tune object transfer, not checkouts. GitHub Actions' checkout action defaults to fetch-depth: 1 — a shallow clone that fetches only the objects reachable from a single commit instead of full history. On a long-lived repository that's the difference between transferring one tree's worth of blobs and every blob ever written. Partial clone (--filter=blob:none) goes further, deferring blob downloads until checkout actually needs them.

Stacked-diff tools are ref editors. Graphite, ghstack, and git-branchless implement "restack" by writing new commit objects — same trees, new parents — and pointing branch refs at them. There's no second storage engine for your code; the stack is a set of refs plus a dependency order the tool tracks.

Code review systems use Git as their database. Gerrit stores every patchset under a refs/changes/... namespace, and since its NoteDb migration it keeps review comments and votes inside the repository itself using git notes — a mechanism that attaches metadata to a commit without changing the commit's hash. Replicating the review database is a git fetch.

Alternative frontends keep the Git backend. Jujutsu (jj) replaces the index and branching UX entirely but reads and writes standard Git object storage, so you can run jj locally while collaborators see an ordinary Git repo. Libraries like libgit2 (C), gitoxide (Rust), go-git, and isomorphic-git (JavaScript, runs in the browser) reimplement object and ref access without shelling out to a git binary — which is how browser-based editors clone and diff without a server-side checkout.

AI Coding Agents Treat Git as a Snapshot Engine

The newest tenants on the plumbing are coding agents, and they lean on two properties: cheap isolation and cheap snapshots.

git worktree add gives one repository multiple working directories that share a single object database, each checked out to its own branch. That's the standard isolation move for agents — Claude Code, for one, can run subagents in disposable worktrees so parallel edits can't clobber each other, then remove a worktree that produced no changes. Spinning one up costs a directory and some ref bookkeeping, not a second clone.

Snapshots fall out of content addressing. git hash-object -w writes any file into the object store; git write-tree, pointed at an alternate index via the GIT_INDEX_FILE environment variable, captures an entire working state as a tree ID — without touching your branches, your index, or your history. An agent that wants a checkpoint between every edit doesn't need to invent a journaling format. The repository already is one: append-only, deduplicated, addressable by hash.

The practical payoff when you evaluate agentic tools: ask how they isolate work and how they snapshot it. A tool that answers "worktrees and trees" inherits Git's guarantees — git diff works, git fsck works, your existing recovery muscle memory works. A tool that answers with a proprietary sidecar format makes you learn a second recovery model for the day something goes wrong.

Where to Start Building

You don't need libgit2 bindings to get value from the plumbing. Three small projects, in ascending order of effort:

A repo inspector. Pipe git for-each-ref and git cat-file --batch into a script that answers a question your team actually has — say, which branches still contain a leaked config blob. cat-file --batch is built for this: object IDs in on stdin, parsed objects out on stdout, one process for thousands of lookups.
A deploy hook. A bare repository plus GIT_WORK_TREE=/srv/app git checkout -f inside a post-receive hook is a complete push-to-deploy pipeline in roughly five lines of shell. Heroku-style deploys worked this way, and it still holds up for a single server.
A snapshot tool. Combine an alternate GIT_INDEX_FILE, git add -A, and git write-tree to checkpoint a directory on a timer into refs under refs/snapshots/. You get deduplicated, diffable backups with zero new dependencies.

For the canonical deep dive, Chapter 10 of Pro Git ("Git Internals") is free online and walks the same ground with full examples; the plumbing section of man git lists every low-level command with its stability contract.

Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.