Links
- Repo: https://github.com/feichai0017/NoKV
- Interactive demo: https://demo.eric-sgc.cafe/
When people first look at a distributed KV system, one of the most natural assumptions is:
“There should be one control-plane service that owns the cluster metadata.”
That intuition is understandable. If you’ve looked at systems like TiKV, the first mental model you often get is something like a PD-style component: routing, timestamps, heartbeats, scheduling decisions, cluster topology, all gathered around one control-plane authority.
We started with a similar intuition.
But as NoKV grew, that model started to feel too coarse.
The problem was not whether we wanted a control plane. We absolutely did. The problem was whether the durable truth of the distributed system should live inside the same process that answers requests, serves views, and reacts to runtime events.
Our answer became: no.
That is why NoKV ended up with a deliberate split between:
-
meta/root: the rooted truth kernel -
coordinator: the control-plane service and rebuildable runtime view
Once that split became explicit, a lot of other design choices started to become cleaner.
The core idea
In NoKV, the “brain” of the distributed system is not the coordinator.
The durable metadata truth lives in meta/root, which is implemented as a typed, append-only committed log plus compact applied state. Coordinator lease changes, allocator fences, region lifecycle, pending peer/range changes: these are not “just some in-memory fields inside the control plane”. They are rooted, replicated, and auditable metadata truth.
The coordinator sits above that truth.
It is a service + view, not the ultimate owner of metadata persistence.
That distinction matters a lot.
Because the moment you let the control plane also be the sole durable metadata owner, you start coupling together several concerns that actually have very different failure and evolution properties:
- serving RPCs
- maintaining routing views
- lease competition
- allocator windows
- scheduling logic
- metadata durability
- metadata replication
We wanted those boundaries to be explicit instead of implicit.
From a PD-like intuition to a rooted-truth design
A useful way to explain the evolution is this:
- the initial intuition was closer to a TiKV / PD-style control-plane concentration
- the final direction is closer to a FoundationDB-style role separation, combined with a Delos-like rooted-truth design
Not in the sense of copying another system’s exact implementation, but in the sense of adopting a cleaner architectural boundary:
- the log is the truth
- services above it are consumers, views, and operators
- restart should rebuild from truth, not recover from hidden local authority
That is the key shift.
In other words, we did not want coordinator to become a giant “metadata brain process” that owns everything and then needs more and more local state to stay alive. We wanted it to become something horizontally deployable and operationally replaceable.
So in NoKV:
-
meta/rootowns durable rooted truth -
coordinatorconsumes rooted truth and builds a runtime cluster view -
raftstoreexecutes data-plane work and region-level replication
This is also why the repository documentation describes the system as having three planes:
- truth plane
- control plane
- execution plane
Why this split is useful in practice
This is not just a conceptual refinement. It has concrete engineering payoffs.
1. Coordinator becomes much lighter on restart
A coordinator restart is no longer “recover local metadata authority”.
It becomes:
- reconnect to rooted truth
- rebuild the in-memory view
- resume lease competition if appropriate
- continue serving
That makes the coordinator much easier to reason about operationally. The only thing that differentiates active and standby coordinators is not some private local metadata store, but the rooted lease state.
2. Durable truth stops being mixed with runtime convenience
Routing caches, heartbeat-derived state, scheduling hints, and local runtime maps are useful, but they are not the same thing as authoritative metadata truth.
The split forces us to say that explicitly.
That reduces a whole category of ambiguity around:
- which state is “just a view”
- which state must survive as the source of truth
3. Control-plane horizontal scaling becomes more realistic
If the coordinator is “everything”, then horizontal scaling is awkward, because every extra coordinator replica either:
- becomes a passive hot standby with hidden state coupling, or
- requires reimplementing distributed truth inside the coordinator layer itself
But if the durable metadata truth already lives below, then multiple coordinator processes become much simpler:
- all consume the same rooted truth
- all rebuild the same kind of view
- lease determines who is currently active for singleton duties
- standby instances are not fake; they are real, warm consumers of the same truth
That is a much cleaner story for scaling and failover.
4. Authority handoff becomes auditable
Lease grant, seal, closure, handoff: these become committed rooted events rather than side effects lost inside a single service process.
That matters both for correctness and for understanding the system later.
Why we built a dashboard for this
Once you split the system this way, a static architecture diagram is no longer enough.
Because the interesting part is not just “there are three kinds of nodes”.
The interesting part is:
- who is currently the
meta-rootraft leader - which coordinator currently holds the lease
- how region leaders are distributed across stores
- how failover changes the live control path
- what stays durable truth, and what is only a rebuildable view
That is why we built a live dashboard.
The dashboard is not only there to make the demo prettier. It is there because this architecture is much easier to understand when you can observe it from several angles at once:
- truth plane: rooted truth ownership and replication
- control plane: lease holder, routing view, coordinator role
- execution plane: per-region leadership and store-level state
It turns the system from “a diagram in a README” into something you can actually inspect while it is running.
That is especially useful for a project like NoKV, because one of our goals is not just to build a storage system, but to build a maintainable and extensible distributed storage research platform.
If the architecture cannot be made visible, it is much harder to evolve it rigorously.
If you want to read the code, start here
If you want to understand this split from the source code instead of only from this post, these are the best entry points:
meta/root/
The rooted truth kernel:
- typed events
- compact state
- storage backend
- remote service/client
coordinator/
The control-plane service:
- routing
- heartbeats
- lease handling
- allocator serving
- rebuildable cluster view
raftstore/
The execution plane:
- multi-Raft region lifecycle
- replicated command execution
The shortest doc path is:
README.mddocs/architecture.mddocs/rooted_truth.mddocs/coordinator.md
Those four together give the cleanest route from:
“What is this repo?”
to:
“Why are these package boundaries the way they are?”
Closing thought
A lot of distributed systems talk about separation of concerns, but in practice still let the control plane quietly accumulate too much hidden authority.
What we wanted in NoKV was a cleaner line:
- the durable metadata truth should live in its own rooted substrate
- the coordinator should be a service layer on top of that truth
- the execution plane should stay separate from both
That separation made the architecture easier to explain, easier to visualize, and, I think, easier to extend.
And that is exactly why the dashboard exists: not as decoration, but as a way to make those boundaries visible.
Top comments (0)