Why We Split meta/root and coordinator in NoKV

Guocheng Song — Wed, 22 Apr 2026 13:47:30 +0000

Links

Repo: https://github.com/feichai0017/NoKV
Interactive demo: https://demo.eric-sgc.cafe/

When people first look at a distributed KV system, one of the most natural assumptions is:

“There should be one control-plane service that owns the cluster metadata.”

That intuition is understandable. If you’ve looked at systems like TiKV, the first mental model you often get is something like a PD-style component: routing, timestamps, heartbeats, scheduling decisions, cluster topology, all gathered around one control-plane authority.

We started with a similar intuition.

But as NoKV grew, that model started to feel too coarse.

The problem was not whether we wanted a control plane. We absolutely did. The problem was whether the durable truth of the distributed system should live inside the same process that answers requests, serves views, and reacts to runtime events.

Our answer became: no.

That is why NoKV ended up with a deliberate split between:

meta/root: the rooted truth kernel
coordinator: the control-plane service and rebuildable runtime view

Once that split became explicit, a lot of other design choices started to become cleaner.

The core idea

In NoKV, the “brain” of the distributed system is not the coordinator.

The durable metadata truth lives in meta/root, which is implemented as a typed, append-only committed log plus compact applied state. Coordinator lease changes, allocator fences, region lifecycle, pending peer/range changes: these are not “just some in-memory fields inside the control plane”. They are rooted, replicated, and auditable metadata truth.

The coordinator sits above that truth.

It is a service + view, not the ultimate owner of metadata persistence.

That distinction matters a lot.

Because the moment you let the control plane also be the sole durable metadata owner, you start coupling together several concerns that actually have very different failure and evolution properties:

serving RPCs
maintaining routing views
lease competition
allocator windows
scheduling logic
metadata durability
metadata replication

We wanted those boundaries to be explicit instead of implicit.

From a PD-like intuition to a rooted-truth design

A useful way to explain the evolution is this:

the initial intuition was closer to a TiKV / PD-style control-plane concentration
the final direction is closer to a FoundationDB-style role separation, combined with a Delos-like rooted-truth design

Not in the sense of copying another system’s exact implementation, but in the sense of adopting a cleaner architectural boundary:

the log is the truth
services above it are consumers, views, and operators
restart should rebuild from truth, not recover from hidden local authority

That is the key shift.

In other words, we did not want coordinator to become a giant “metadata brain process” that owns everything and then needs more and more local state to stay alive. We wanted it to become something horizontally deployable and operationally replaceable.

So in NoKV:

meta/root owns durable rooted truth
coordinator consumes rooted truth and builds a runtime cluster view
raftstore executes data-plane work and region-level replication

This is also why the repository documentation describes the system as having three planes:

truth plane
control plane
execution plane

Why this split is useful in practice

This is not just a conceptual refinement. It has concrete engineering payoffs.

1. Coordinator becomes much lighter on restart

A coordinator restart is no longer “recover local metadata authority”.

It becomes:

reconnect to rooted truth
rebuild the in-memory view
resume lease competition if appropriate
continue serving

That makes the coordinator much easier to reason about operationally. The only thing that differentiates active and standby coordinators is not some private local metadata store, but the rooted lease state.

2. Durable truth stops being mixed with runtime convenience

Routing caches, heartbeat-derived state, scheduling hints, and local runtime maps are useful, but they are not the same thing as authoritative metadata truth.

The split forces us to say that explicitly.

That reduces a whole category of ambiguity around:

which state is “just a view”
which state must survive as the source of truth

3. Control-plane horizontal scaling becomes more realistic

If the coordinator is “everything”, then horizontal scaling is awkward, because every extra coordinator replica either:

becomes a passive hot standby with hidden state coupling, or
requires reimplementing distributed truth inside the coordinator layer itself

But if the durable metadata truth already lives below, then multiple coordinator processes become much simpler:

all consume the same rooted truth
all rebuild the same kind of view
lease determines who is currently active for singleton duties
standby instances are not fake; they are real, warm consumers of the same truth

That is a much cleaner story for scaling and failover.

4. Authority handoff becomes auditable

Lease grant, seal, closure, handoff: these become committed rooted events rather than side effects lost inside a single service process.

That matters both for correctness and for understanding the system later.

Why we built a dashboard for this

Once you split the system this way, a static architecture diagram is no longer enough.

Because the interesting part is not just “there are three kinds of nodes”.

The interesting part is:

who is currently the meta-root raft leader
which coordinator currently holds the lease
how region leaders are distributed across stores
how failover changes the live control path
what stays durable truth, and what is only a rebuildable view

That is why we built a live dashboard.

The dashboard is not only there to make the demo prettier. It is there because this architecture is much easier to understand when you can observe it from several angles at once:

truth plane: rooted truth ownership and replication
control plane: lease holder, routing view, coordinator role
execution plane: per-region leadership and store-level state

It turns the system from “a diagram in a README” into something you can actually inspect while it is running.

That is especially useful for a project like NoKV, because one of our goals is not just to build a storage system, but to build a maintainable and extensible distributed storage research platform.

If the architecture cannot be made visible, it is much harder to evolve it rigorously.

If you want to read the code, start here

If you want to understand this split from the source code instead of only from this post, these are the best entry points:

`meta/root/`

The rooted truth kernel:

typed events
compact state
storage backend
remote service/client

`coordinator/`

The control-plane service:

routing
heartbeats
lease handling
allocator serving
rebuildable cluster view

`raftstore/`

The execution plane:

multi-Raft region lifecycle
replicated command execution

The shortest doc path is:

README.md
docs/architecture.md
docs/rooted_truth.md
docs/coordinator.md

Those four together give the cleanest route from:

“What is this repo?”

to:

“Why are these package boundaries the way they are?”

Closing thought

A lot of distributed systems talk about separation of concerns, but in practice still let the control plane quietly accumulate too much hidden authority.

What we wanted in NoKV was a cleaner line:

the durable metadata truth should live in its own rooted substrate
the coordinator should be a service layer on top of that truth
the execution plane should stay separate from both

That separation made the architecture easier to explain, easier to visualize, and, I think, easier to extend.

And that is exactly why the dashboard exists: not as decoration, but as a way to make those boundaries visible.

DEV Community: Guocheng Song