Understanding the core mechanics of ThingsDB

#architecture #database #systemdesign

In my previous post, I promised to dive into the client side. However, to truly appreciate how the client interacts with the system, we first need to pull back the curtain on what happens "under the hood" when you execute a query or run a procedure in ThingsDB.

Step 1: The Gateway and "AWAY" Mode

When you send a query to a ThingsDB node, the first thing it does is check its own status. If the node is currently in "AWAY" mode (a specialized state we’ll explore in a moment), it won't process your request locally. Instead, it acts as a traffic controller, forwarding the query to another active node in the cluster. This ensures that the client never experiences a "blocked" connection.

Step 2: To Compile or Not to Compile?

Once a node accepts the query, it checks its internal cache. If ThingsDB recognizes the exact same query from earlier, it can skip the compilation phase and use a pre-compiled version from the cache.

If the query is new, ThingsDB begins the compilation process. During this phase, the engine determines a critical factor: Does this query require a "change"?

Read-only queries: If no data changes are required, the query executes immediately. No other nodes are involved, making read operations incredibly fast.
Procedures: This is where procedures shine. Unlike raw queries, procedures are always pre-compiled. ThingsDB knows instantly whether they require a change or not, shaving off valuable microseconds.

Step 3: The Race for Consensus (The Change ID)

If a query needs to modify data, ThingsDB initiates a "battle" for the next Change ID. It looks at the current global Change ID and proposes the next increment (Current + 1).

Through a quorum-based consensus, if the majority of nodes agree on this ID, the change is processed. All modifications to the collection are tied to this specific ID and synchronized across the cluster. This mechanism guarantees that every node handles changes in the exact same order, maintaining perfect data consistency.

Step 4: Futures and Non-Blocking Changes

ThingsDB handles complex logic using Futures. If a query contains one or multiple futures that require changes, each future can receive its own Change ID. This is a powerful optimization! It prevents the system from locking up during external module calls and allows you to isolate heavy "change" logic from the rest of your procedure or query.

Strategy Tip: If you have a procedure that is mostly read-only but occasionally needs to write data, wrap the "write" portion in a future. This prevents the entire procedure from generating a global change every time it's called (see this example from the ThingsDB book).

Step 5: Storage and The "Boot" Sequence

ThingsDB uses a dual-mechanism for data persistence:

Full State Dumps: A snapshot of the entire data state at a specific point in time.
Archive Files: A continuous log of every individual change.

When a node boots up, it follows a strict recovery path:

It loads the last Full State Dump.
It identifies the last Change ID from that dump and then replays all subsequent changes from the Archive Files.
It then waits for a peer node to enter Away Mode. The booted node reports its last processed Change ID, and the "Away" node syncs the missing delta.

If a node is too far out of sync (or is a brand-new node being added to the cluster), a Full Synchronization occurs, where the entire state dump is transferred from scratch.

Step 6: The Magic of "AWAY" Mode

"Away" mode is ThingsDB’s secret weapon for maintenance without downtime. While a node is "Away," it performs heavy lifting such as:

Full State Dumps (Snapshots).
Garbage Collection (Memory management).
Scheduled Backups.
Synchronizing peer nodes.

During this time, the node still talks to clients, but it forwards their queries to keep the system responsive. Changes are collected in the background and applied just before the node leaves Away mode and rejoins the cluster. ThingsDB ensures that only one node at a time is in this mode, keeping the rest of the cluster available for primary tasks.

Conclusion

By decoupling blocking maintenance tasks from client requests and using a sophisticated consensus for changes, ThingsDB provides a database experience that is both high-performance and incredibly resilient.

Now that you understand the "brain" of ThingsDB, we are finally ready to look at the Client Side in my next post!