Chandan

Posted on Apr 25 • Edited on Apr 26

What if your authoritative DNS server was a Git repository?

#go #dns #distributedsystems

DNS has no git. I built one.

This is a working prototype, not a finished product. The point of the post is to show that a long-standing gap in how DNS handles its own state is real, and that closing it is more tractable than it looks.

The 2 a.m. question

Pick any on-call channel. Wait long enough. Eventually someone types:

"What did api.checkout.acme.com point to ten minutes ago?"

There is no good answer. You grep five log files. You SSH into the secondary. You hope someone's terminal scrollback is still alive. You make a guess, ship the rollback, and write a postmortem with the words "lessons learned" in it for the fourth time this quarter.

We have had a working model for this since 2005. Linus called it git. Twenty-one years later the most critical name-resolution layer on the internet still treats its state as a file you overwrite. No history. No diff. No blame. No rollback. No preview.

That gap is what this project is about.

A 60-second glossary (skip if you know both worlds)

The rest of the post assumes you know either DNS or Git well, but probably not both. Two short columns to level the field.

DNS terms (for the Git-native reader)

Zone — the unit of authority for a domain (foo.com.). A zone owns a set of records.
RRset — every record at a single (name, type) coordinate, sharing one TTL. api.foo.com. A is one RRset; it can hold multiple A records.
Authoritative server — the source of truth for a zone (BIND, Knot, NSD, PowerDNS, Route 53).
AXFR / IXFR — the standard wire protocols a primary uses to push a full zone (AXFR) or a delta (IXFR) to secondaries.
SOA serial — a monotonically increasing integer in the zone’s SOA record. Secondaries poll it to decide whether to pull a new transfer.
TTL — how long a resolver may cache an answer. The reason “instant DNS change” is a half-truth.

Git terms (for the DNS-native reader)

Blob — content-addressed bytes. Hash the bytes, the hash is the name.
Tree — a directory: a sorted list of (name, hash) entries pointing at blobs or other trees.
Commit — a tuple of (tree, parent commit, author, time, message), also content-addressed.
Branch — a mutable named pointer to a commit. HEAD is a pointer to a branch.
Content addressing — every object is named by the hash of its contents. Equal contents ⇒ equal names. Different contents ⇒ different names. Tampering is detectable for free.
Structural sharing — when only one leaf changes, the new tree shares every unchanged subtree with the old tree by reference (same hash). A 1 M-record zone with one A-record edit writes a few dozen new objects, not a million.

With that in hand, the rest of the post is just “apply these primitives to the data model on the left.”

What I built

The project is two binaries written in Go, race-detector clean, and small enough to read end-to-end in an afternoon.

zonegit is a CLI that looks and feels like git, but the verbs operate on DNS records:

zonegit init foo.com.
zonegit import zone.txt -m "initial import"
zonegit set api.foo.com. A 60 9.9.9.9 -m "scale to new endpoint"
zonegit log
zonegit diff HEAD~1 HEAD
zonegit blame api.foo.com. A
zonegit show api.foo.com. A HEAD~3

That last command, "what did this name resolve to three commits ago?", looks like this in practice:

$ zonegit show api.foo.com. A HEAD~3
api.foo.com.    300 IN  A   1.2.3.4
# author: alice <alice@acme.com>
# commit: 4f1c9b2  scale to two-region rollout (2026-04-19 11:02 UTC)

That is not something you can ask any DNS tool shipping today, whether that is BIND, Knot, PowerDNS, or Route 53. Some of them keep a change log in one shape or another. None of them keep the queryable historical state.

zonegitd is a reference authoritative DNS server. It speaks UDP and TCP, sets the AA bit, distinguishes NXDOMAIN from NODATA correctly, returns SOA in the authority section, and answers dig in well under a millisecond by walking a Merkle tree at HEAD.

zonegitd is not trying to replace BIND or Knot or NSD. Those are decades-hardened pieces of infrastructure and ripping them out is nobody's day-one move. The intended deployment shape is zonegit as the control plane, your existing authoritative server as a downstream secondary over standard AXFR/IXFR:

  operator             control plane                edge
  ────────             ─────────────                ────
   zonegit set ...      ┌───────────────────────┐   AXFR/IXFR    ┌───────┐
   zonegit commit  ─▶  │  zonegit (primary) │ ────────────▶ │ BIND  │ ─▶ dig
                       │  versioned, branched│   each commit  │ / Knot │
                       │  audited, revertible│   = serial bump│ / NSD  │
                       └──────────────────────┘                └───────┘

The reference server exists so the demo is honest end-to-end and so greenfield deployments have a path. It is not the adoption vector.

Together they form a versioned, content-addressed, branch-aware control plane for authoritative DNS, running on my laptop today.

The shape of the idea

A DNS zone is a tree of names. An RRset is a leaf. The leaf’s contents are the records.

                       commit C1
                           │  (tree, parent, author, msg, time)
                           ▼
                       tree ("foo.com.")
                       │
          ┌────────┬────────┐
          ▼          ▼          ▼
        "@"         "api"       "www"          ← child names
        │           │           │
     ┌───┼───┐      ┌─┴─┐         │
     ▼   ▼   ▼      ▼   ▼         ▼
   [SOA][NS][MX]   [A][AAAA]    [A]            ← RRset blobs

In other words, it is a filesystem with somewhat unusual path separators (. instead of /, read right-to-left). The apex (foo.com. itself) lives under a sentinel child named @ so traversal stays uniform.

The recipe is the same one Git uses, on the napkin:

Hash the bytes of every leaf. That is a blob.
Hash a sorted list of (name, child-hash) pairs. That is a tree.
Hash a (tree, parent, author, message, time) tuple. That is a commit.
Branches are mutable named pointers to commits. HEAD is a pointer to a branch.

The payoff is structural sharing. Change one A record on api.foo.com., and:

   commit C0  (parent of C1)         commit C1  (one record changed)
        │                                  │
        ▼                                  ▼
      tree T0                            tree T1
     /   |   \                          /   |   \
    @   api   www                      @  api'  www
    │    │    │                        │    │    │
  [b1] [b2]  [b3]                     [b1] [b4]  [b3]
        ▲                                  ▲
        └── only api’s leaf changed ───────┘
           @ and www subtrees are reused
           by reference (same hashes)

A billion-record zone with one A-record change writes a few dozen new objects, not a billion. Diffing two commits is O(changed-subtrees), not O(zone-size). Time travel is free; every commit is already a complete, immutable snapshot.

The interesting question was not whether this works in theory. It was whether the model survives contact with DNS-specific quirks (canonical encoding, the apex, NXDOMAIN vs NODATA, CNAME chasing). It does. The surprises were on the upside.

The moment it stopped being theory

Two terminals, side by side. Try it yourself.

Left:

while true; do
  printf "%s -> " "$(date +%T)"
  dig +short @127.0.0.1 -p 15353 api.foo.com. A
  sleep 1
done

14:23:01 -> 1.2.3.4
14:23:02 -> 1.2.3.4
14:23:03 -> 1.2.3.4

Right:

$ zonegit set api.foo.com. A 60 9.9.9.9 -m "fail over to new region"
[main 99f98c225a8a] fail over to new region

Left, on the very next tick:

14:23:04 -> 9.9.9.9
14:23:05 -> 9.9.9.9

No SIGHUP, no zone reload, no SOA-serial bump, no 30-second wait for a secondary to catch up. A commit lands and the server sees the new HEAD on the next packet it answers.

Then:

$ zonegit blame api.foo.com. A
99f98c225a8a    ckumar3 <ckumar3@host>    fail over to new region

$ zonegit diff HEAD~1 HEAD
~ api A

Who changed it, what changed, and when, each available as a single command.

One disclaimer, because experienced DNS operators will rightly flinch at "instant": the point of this demo is not speed. TTLs and downstream caches still exist for excellent reasons and zonegit does not pretend otherwise. The point is that propagation becomes an explicit, auditable, revertible decision instead of an emergent property of reload scripts and SOA-serial bumps. You stage a change on a branch, diff it against main, cut over with a commit, and revert with a reset if it goes wrong. Speed is a side effect; control is the feature.

Want to see this yourself in two minutes?
git clone https://github.com/ckumar392/zonegit && cd zonegit && ./scripts/demo.sh
Two terminals, one dig loop, one zonegit set. The answer changes mid-loop. The rest of the post is what is happening underneath.

The part that surprised me

I expected the DNS server to be the hard bit. It wasn't. The whole serving path (UDP, TCP, NXDOMAIN/NODATA, SOA-in-authority, CNAME chasing) came out small enough that it stopped feeling like the project. The interesting code lives where Git's interesting code lives: canonical encoding, structural-sharing tree updates, and the lockstep diff with the "subtree hashes match, skip the whole subtree" optimization that makes Git diff a 100,000-file repository in 40 ms.

Three lessons from the build.

Canonical encoding is the entire game. For "equal RRsets produce equal hashes" to hold, you have to be ruthless. Owner names lowercased. Records inside an RRset sorted by their wire-format rdata (the canonical form DNSSEC standardized in RFC 4034). Class normalized. TTL folded into the content hash, which is a deliberate design choice rather than an RFC requirement: changing a TTL is an audit-worthy event that deserves its own commit. Get any of this wrong and dedup does not dedup, structural sharing does not share, and the whole illusion collapses.

The apex breaks the filesystem analogy. A zone like foo.com. itself has records (SOA, NS, MX), so the "root directory" is also leaf-bearing. I tried the obvious thing first: a parallel apex blob hanging off the tree object, separate from the children. It worked, but every traversal then had two cases ("check apex blob, then walk children") and every diff had to special-case the apex on both sides. The fix was a literal @ sentinel as a child name, the same trick zonefile syntax has used for forty years. The apex becomes just another leaf, traversal stays uniform, diff stays uniform, and the special case disappears from the code entirely.

Read-mostly DNS plus content-addressed storage equals lockless serving. The server opens the database read-only, walks an immutable tree at an immutable hash, returns bytes. The writer can be in the middle of a five-second commit and it cannot affect the read path. Half-written zones do not exist as commits. It is the same property that lets you git log while you git commit. The DNS read path inherits it for free.

What this unlocks

A versioned, content-addressed, branch-aware DNS state store does not just close the audit-log gap. It makes a class of things trivial that are currently impossible.

Safe change preview. Make the change on a branch. Run synthetic queries against that branch. Diff against main. Merge if green. The model is so familiar to anyone who has shipped code that it requires zero training.

Canary DNS. Send 5% of queries to a branch, the rest to main. Watch the dashboards. Promote or roll back at one-commit granularity. Today this requires standing up a parallel resolver fleet. With a versioned core, it is a routing decision:

zonegit branch create canary
zonegit checkout canary
zonegit set api.foo.com. A 60 9.9.9.9 -m "new region candidate"
zonegitd --route 'main=95%,canary=5%'      # daemon-side weighting
# ... watch dashboards for an hour ...
zonegit checkout main && zonegit merge canary   # promote
# or
zonegit branch delete canary                    # roll back, zero impact

Forensic-grade audit. "Show me the exact state of this zone at 14:23:04 UTC last Tuesday" is one command, answered from immutable history rather than reconstructed from log scraping.

True GitOps for DNS. Every modern infra team is moving to declarative state in Git. DNS has been the awkward stepchild because the runtime knows nothing about Git. This makes the runtime itself a Git repo. The pipeline does not push files; it pushes commits.

Replication with cryptographic integrity. Content-addressed objects are tamper-evident by construction. Two replicas agree on a commit hash, they have byte-for-byte identical state. No "hopefully the AXFR completed" prayers.

I did not build all of this. I built the foundation that reduces each of them to a well-scoped engineering task instead of a research problem.

"But isn't this already solved?"

Fair pushback, worth answering directly. Partial answers exist, and several of them are genuinely useful in production today. The honest comparison is not "this is better than all of them." It is "these solve adjacent problems; the runtime-as-repo problem is not one of them."

System	What it gives you	What it still cannot do
Route 53 + CloudTrail	Audit log of API calls that mutated DNS state	Query the zone’s state at a past timestamp; diff two zone states; branch a zone and serve from the branch
PowerDNS + SQL backend	Transactional row-level writes; SELECT against current state	No commit graph, no branches, no content-addressed snapshots; yesterday’s row values are gone unless you hand-wrote an audit table
BIND + zonefiles in Git + an scp/rsync pipeline	Real Git history for the source files	The runtime BIND has no idea about the Git repo. It cannot tell you what it actually served at 14:23 last Tuesday — only what the repo claims should have been served, if no one drifted it manually
Vendor-UI change history (e.g. NS1, DNSimple)	Per-record history visible in a console	Single-vendor, no diff/merge primitives, no API-grade time-travel resolution, no branch-based canary
OctoDNS / DNSControl	Declarative, multi-provider, code-reviewed change pipeline	Same fundamental shape as the BIND-in-Git case: the runtime itself (whatever provider you target) is still a stateless engine that doesn’t know about the repo

None of these are wrong. Each one closes some part of the gap. The thing none of them attempt is making the runtime itself the version-controlled artifact, instead of bolting a workflow around a stateless engine.

That distinction is what makes branch-aware serving, byte-for-byte replica equivalence, and forensic time-travel queries fall out of the model rather than be chased after the fact.

The honest part

This is not production. What works today is a single zone on a single host, with branches in the CLI and a daemon that serves only main. The interesting work that is genuinely still ahead, in roughly the order it should happen:

Branch-aware serving and canary cutover. Branches exist in the storage; the daemon needs to learn how to route a fraction of queries to one branch and the rest to another. Small, well-scoped, and the next thing on the bench.
Speak the standard wire protocol outward. AXFR and IXFR out, so zonegitd can be a primary that feeds existing BIND/Knot/NSD secondaries. Each commit becomes an SOA serial bump; IXFR is the diff between two commits, which is exactly what the storage already computes.
Shadow mode and migration, together. Run as an AXFR secondary of your existing authoritative server, commit every transferred state (you instantly have a git log for a system you already trust), and serve a sampled fraction of real query traffic in parallel. A continuous-diff process compares answers byte for byte. When the diff has been zero across enough zones for enough days, the cutover stops being a leap. This is also the migration story: no flag-day, no rip-and-replace, just a quietly accumulating commit log next to production until it has earned a turn at the front.
Three-way merge with DNS-aware semantics. Git diffs lines; DNS records are not lines. An RRset is a set of records sharing a single TTL and class. "Both branches added an A record to api.foo.com.": does that union, conflict, prefer the later commit? Each RR type plausibly wants different semantics. This is a design problem, not a coding problem.
Distributed consistency across writers and regions. Single-writer CP via Raft is well-trodden ground and the obvious first stop. Multi-writer with conflict-free convergence is genuinely harder and at least partly research territory for this domain. The honest plan is to start with single-writer plus read replicas and earn the right to anything more ambitious.
DNSSEC stays downstream. The store holds unsigned authoritative state; your existing signer signs on the way out. Online signing as a commit hook is interesting but is a separate concept and should be treated as one.

What exists today is a complete, end-to-end, working v0 with a clear runway. Storage is pluggable (BadgerDB today, Postgres or S3 next). Serving is pluggable. Nothing in the design painted itself into a corner.

What exists today works end-to-end and is clean under the race detector. It is also not yet a thing you would put in front of acme.com. on a Tuesday, and the post would be dishonest if it pretended otherwise.

The question that decides everything

A reviewer I respect put the only question that matters into one sentence:

"Why would a production team trust this with their authoritative DNS?"

The honest answer today is: they would not, and they should not. Race-detector-clean correctness and an elegant model are not the same thing as an operational track record. Pretending otherwise would be the worst possible move.

The interesting part of the question is whether there is a path to trust that does not require a leap of faith. The answer is shadow mode, described in the roadmap above, which earns trust the only way infrastructure ever earns trust: by agreeing with what is already trusted, in front of real traffic, for long enough that disagreement would have surfaced.

What is proven so far is "this should exist." What is left to prove is "this is safer than what you are already running." I know which one is harder.

Try it

git clone https://github.com/ckumar392/zonegit
cd zonegit
./scripts/demo.sh

Two terminals. One running while true; do dig …; done. One running zonegit set. Watch the answer change mid-loop.

If the small voice in the back of your head says "wait, that should have been a thing already": yes. That is the project.

If you build infrastructure for a living, fork it, break it, tell me what is wrong. If you operate infrastructure for a living, run the demo and ask yourself which other system in your stack (firewall rules, RBAC, route policies, feature flags) has been quietly missing a git log this whole time.

Code: github.com/ckumar392/zonegit · Apache 2.0.

Top comments (1)

Chandan • Apr 28

The GitHub repo is updated with v3 implementations now, checkout the demo.