Intro to CockroachDB (No exterminator needed!)

Introduction

As many of the more traditional ways of interaction and other branches of life begin to phase out and transition to include digital integration, our data must be stored securely and remain consistent. Data integrity and accessibility are some of the most important requirements for an application to build and maintain trust with its targeted audience. This is because developers and users alike want to know that their information is available whenever they need it and expect it to be served promptly. This is where the data-replicating, "invincible" database known as CockroachDB truly makes sense of its name.

What is CockroachDB?

CockroachDB is an open-source SQL database that was designed with "high availability, effortless scale, and control over data placement" as the main goals it was tasked to achieve. In short, it was created to be an extremely reliable database that is as adaptable and resilient as the infamous cockroach with the addition of being a lot easier to manage.

Features

There are many capabilities that CockroachDB provides that allow for it to be such a dependable resource that can be tied to the four main features that set it apart from other SQL databases: Resilience, Scalability, Consistency, and Geo-Partitioning.

Resiliency

CockroachDB automatically replicates data amongst multiple nodes to ensure continuous accessibility and operational integrity at all times. This built-in replication creates at minimum 3 copies of a data entry by default that are saved at different locations, or in this instance, nodes.

This results in protections against potential system failures or downtimes. If one or two nodes were to experience issues that result in data access being blocked or an unforeseen error occurs, CockroachDB's fault tolerance system has allowed for data to remain accessible from a different node. With multiple sources of truth existing in various locations, data loss protection is increased significantly for potential non-ideal circumstances.

// how to start a node
cockroach start \
--insecure \
--store=fault-node1 \
--listen-addr=localhost:26257 \
--http-addr=localhost:8080 \
--join=localhost:26257,localhost:26258,localhost:26259

Multiple nodes must be created with the "join" corresponding to the other nodes in the cluster to ensure load balancing can happen

Scalability

Horizontal scaling, also known as scaling out, is a technique that CockroachDB uses to properly handle the growing database and any additional resources that may be needed to accommodate the increasing amounts of data (nodes in this instance). Horizontal scaling differs from the more common scaling technique, vertical scaling (scaling up) because horizontal scaling revolves around using more machines as opposed to vertical scaling involving using a larger or more capable machine.

In the context of CockroachDB, more nodes can be added as needed to divide the workload between other nodes. Associated nodes, called clusters, automatically rebalance data as new additions and their respective additions are made. The split workload between the nodes ultimately allows for a heavier workload to be tasked to the node cluster.

Consistency

Upon investigating the official site for CockroachLabs, it becomes apparent that they pride themselves in their consistency. Stating that CockroachDB is "100% ACID" compliant, its creators have incorporated a way for data and the replications that came from it to remain in sync with each other through the use of transactions. By not immediately sending requests and instead waiting for the source of origin to succeed before continuing to do the same for the other nodes, many issues involving discrepancies between node data can be avoided completely.

Geo-partitioning

Geo-partitioning is another beneficial feature that CockroachDB has to offer. With this feature, data can have a "region" association to be established that is a determinant of the user's location.

CREATE TABLE houses (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id),
    purchase_date TIMESTAMPTZ DEFAULT now(),
    region STRING NOT NULL, // region column
    total_amount DECIMAL
) PARTITION BY LIST (region);

Regions are stored within their own columns inside of the data tables

This region info is then used to make decisions on which server or node to receive the data from. Ideally, the server that is closest to the data's established region is the one that is chosen to reduce latency as data travels over a network.

Conclusion

CockroachDB is an SQL database that is full of functionality that has been designed to cater to those who need a database that's reliable, consistent, and adaptable. By using horizontal scaling to manage workload delegation, this database is capable of performing more demanding requests at higher frequencies by increasing the number of nodes the cluster is using. This along with the multiple sources of data that originated from the same entry and the use of transactions to make sure only successful requests will make it to other nodes, CockroachDB has proved that the power built within it sets it apart from the rest.