Farhan Syah for NodeDB

Posted on Apr 5

How NodeDB Handles Multi-Model Differently

#nodedb #database #designsystem #datastructures

In the previous article, I explained where NodeDB stands in the current multi-model landscape.

This post is about the next question: how does it actually work?

The short answer is:

one engine cannot do everything

NodeDB splits the problem into core engine families, then builds specialized models on top of the right base, while keeping everything inside one database boundary.

That is the core idea:

Some models get their own real engine
Some models grow from another strong native engine
The planner knows the difference
The user does not have to split the stack into more databases just because the workload widened

Start from the core, not from the label

When people say "multi-model database", the phrase usually hides one of several shortcuts.

It can means:

One strong database with extensions.
One document system with a broader marketing surface.
Several services under one product name.
One generalized storage shape being stretched across too many models.

That is not the route I wanted.

The starting question for NodeDB was simpler and harder at the same time:

What are the real engine families needed if this database is supposed to handle broad workloads without turning every new requirement into more stitching?

The current answer is:

Document
Strict
Graph
Vector
Columnar
Key-Value
Full-Text Search

Then there are models that should be native, but do not need a fully separate engine family if they already have the right base:

Time-Series On Columnar
Spatial On Columnar

That split matters because it avoids two bad extremes:

Pretending every model is the same under the hood
Creating a separate mini-database for every capability

The core engine families

The concrete part is what these engines actually are.

Document is the flexible record engine. In NodeDB that means schemaless records, MessagePack storage, and CRDT-oriented sync paths for local-first use cases.
Strict is the structured record engine. It is closer to row-oriented access, uses a different storage shape, and is meant for workloads that care about predictable fields and faster direct access, not just schema validation on top of documents.
Graph is not document plus links. It has its own adjacency structures, traversal path, and graph-native operations.
Vector is not "just add embeddings somewhere." It has its own ANN path, quantization path, and distance behavior.
Columnar is the analytical base: compression, scan-heavy reads, predicate pushdown, and the kind of layout you want when the workload is closer to analytics than OLTP.
Key-Value exists for direct lookup workloads that should be simple and cheap instead of routed through a heavier model.
Full-Text Search has its own ranking and tokenization behavior. BM25, stemming, stop-word handling, fuzzy matching, and hybrid retrieval should not be faked with normal filtering.

This is the part many multi-model products blur. The names are easy. The backing engine choices are the hard part.

Why strict matters so much

Strict is especially important here, because this is where a lot of multi-model systems get vague.

From the outside, Strict can look like document plus schema rules. That is not enough.

In NodeDB, Strict exists because structured workloads need a different path:

Fixed field expectations
Different storage behavior
Different access patterns
Different planner assumptions
Different performance goals

In the repo direction today, strict is treated as a row-like storage mode with direct field access, not just a document collection that rejects invalid writes.

That distinction matters. If a multi-model database cannot give structured data a serious path of its own, then the "multi-model" story is already weak for a large class of real applications.

Specialized models should grow from the right foundation

This is where Time-Series and Spatial fit.

I do not think every model deserves its own isolated engine family just for the sake of appearances. Sometimes the right design is to start from a strong native base and specialize from there.

In NodeDB, that base is Columnar.

That is why time-series sits there:

Columnar layout fits scan-heavy reads
Compression matters
Aggregation matters
Retention and rollups matter more than point-update behavior

The current repo direction already reflects this. Time-series is described as a columnar profile with ILP ingest, continuous aggregation, PromQL support, and time-oriented SQL functions.

Spatial follows the same idea. It belongs near columnar because analytical and geospatial workloads often overlap, but it still has its own native behavior: spatial indexes, geohash/H3-style locality tools, and geometry predicates.

So the pattern is:

Separate engine families where the workload is genuinely different
Specialized native profiles where another engine already gives the right foundation

How NodeDB keeps coherence

This is the part that decides whether the whole design works or falls apart.

If you only collect engines under one brand, you still have not solved multi-model. You have just moved the boxes closer together.

The harder problem is coherence:

How does a mixed workload stay in one system?
How does the planner choose the right path for each model?
How do you avoid weakening every model just to make the interface look uniform?

For NodeDB, the answer is not one generic query path for everything. That would flatten the models.

The answer is dedicated query and planner treatment where model semantics really differ, while still keeping them in one database boundary.

That is especially important for Strict, because strict should not be planned like schemaless document. It should give the planner stronger assumptions.

It is also important for Vector, Graph, and Full-Text Search, because each of them has search and ranking behavior that should not be reduced to ordinary filtering.

So coherence here means:

One database boundary
Shared system-level coordination
Model-specific execution paths where needed
Fewer cross-system hops when workloads mix

That is a more useful definition than simply saying "one query language" or "one product."

Why this is different from pseudo multi-model

This is the distinction I care about most.

Pseudo multi-model usually fails in one of two ways:

Everything is pushed through one generic core, so the models exist but feel weak
Every serious model becomes another extension, service, or external database, so the models stay stronger but the user absorbs the integration cost

NodeDB is trying to stay between those two failures.

That is why the design is split the way it is:

Real engine families where the workload needs real specialization
Native profiles where a strong base already exists
Planner-level respect for the differences
One database boundary for mixed workloads

What this means in practice

If NodeDB works the way I want it to work, the practical result should be different in a few concrete ways.

Graph queries should follow graph-native structures and algorithms instead of pretending edges are just another document field.
Vector search should use a real ANN path with its own indexing and quantization choices, not a thin side feature.
Strict collections should behave like a serious structured model with stronger planner assumptions and faster direct field access.
Time-series should inherit the strengths of columnar execution instead of being simulated on top of an unsuitable engine.
Spatial should get native spatial behavior without forcing the user into another database.
A workload that mixes records, vectors, search, and graph should stay inside one system instead of turning into a chain of remote calls.

Those tests are better than asking whether the product can list many model names.

The risk

This approach is harder for obvious reasons.

It is easier to start with fewer engines.
It is easier to flatten more behavior into one generic layer.
It is easier to delay the planner work.
It is easier to turn more models into later add-ons.

The risk is whether NodeDB can keep this depth and coherence once real workloads start pushing every corner of the design.

Why I still prefer this route

Even with that risk, I still think this is the more honest way to build a true multi-model database.

The point is not only to support many models.

The point is to support them with the right engine shape, the right planner behavior, and the right database boundary.

If that holds up, then NodeDB becomes interesting for a real reason, not just because it can claim a long feature list.

If you want to follow how NodeDB works at the engine level, where this design holds up, and where it still needs to prove itself, follow NodeDB. I will keep sharing the architecture, tradeoffs, and deeper implementation decisions as the database evolves.

DEV Community