Distributed Key-Value Store in C/C++, from Scratch!

#octograd2020 #devgrad2020 #distributedsystems #database

Background

In the Spring of 2020, I co-developed a distributed key-value store in C as part of Northeastern University's Software Development course. The course's capstone project - dubbed eau2 - consisted of creating a series of applications that all used an underlying distributed data storage solution built by us, the students.

The idea of the project was to expose students to designing, building, testing, and iterating on a large codebase. The result was an interesting and complex distributed app that I think did a good job of exposing students to what it's like to maintain real codebases as software engineers.

If you're a student (or even a professional developer), I'd highly recommend trying to build a large program like this layer-by-layer or at least enrolling in a course that offers something similar. The end result is not only impressive but also highly educational! 😃

Design

Language Choice

Eau2 is built using a restricted version of C++ that we called "C with classes". Essentially the code looks like C but uses C++ classes to allow for better code organization and object-oriented design. Most other features and libraries of C++ were forbidden.

Beware, this is not an app designed for production use and so these choices were not made with performance in mind. Rather, the idea was to force students to write their own utility classes (i.e. Object, String, Array) and therefore truly understand every part of their code. Staying away from most of C++ also conveniently avoided any of its inevitable rabbit holes...

Architecture

At a high level, the eau2 system aims to provide key-value storage over a distributed set of nodes in order to support applications which process large amounts of data. Given a cluster of compute nodes, users can run the eau2 system to store data across the cluster but interact with it as if it were local. Note however that the system does not have any of the consistency or availability guarantees normally featured in distributed systems.

In order to achieve this abstraction, the system is broken up into three layers:

At the bottom is the key-value storage system. Since ultimately the data is stored across multiple nodes, this layer features a network protocol which allows the key-value stores on each node to interact with each other. Communication between the stores allows the user to view the key-value storage as a single system holding all of the data, without any knowledge of the underlying distribution.
The layer above provides abstractions like DistributedColumns and DistributedDataframes. These utility classes expose an API that allows for easy data aggregation and manipulation across the network.
At the top sits the Application layer. This layer provides all the functionality necessary to run the eau2 system and interacts with the lower levels. The user builds in this layer to leverage the underlying key-value store as the backbone of their program.

Implementation

The bottom layer is implemented using one main class, the Store. This class has two main components, a Map and a Network.

Map is the normal key-value storage structure, mapping Keys to Strings. Keys represent a tuple of a String, the key name, and an Integer, the ID of the node the Key lives on. Values in the Map are stored as Strings of serialized blobs of data.
Network on the other hand contains all the necessary logic for registering and communicating with the other nodes in the cluster. The nodes pass Messages between each other to both PUT and GET data. Overall, the Store layer has a simple public API supporting methods such as get, getAndWait, and put.

In the middle layer we have the useful abstraction classes. Specifically, the DistributedDataFrame class and the DistributedColumn class.

DistributedDataFrame is a queryable table of data which supports aggregate operations such as map and filter. All data stored in a DistributedDataFrame is stored in columnar format with all data in each column being of a single type. The types supported are INT, BOOL, FLOAT, and STRING. Unlike normal DataFrames, DistributedDataFrames access their data as needed from other nodes across the network under-the-hood.
DistributedColumn has the API of a normal column in a Dataframe but is implemented such that the values it stores are distributed in chunks across the cluster of nodes. This abstraction allows this class to represent much more data than the average column stored in local memory.

For the top layer, we provide a single Application class. This class has the Store as a class field, providing the user who extends the Application class with access to all data stored on the system. Additionally, this class contains useful utility methods, such as this_node() which returns the index of the current node.

Use Case

Here's an example program that runs on 2 nodes. Node 0 saves some data to the store. Node 1 waits until the data appears in the store, gets it, calculates the max, and stores the max back in the Store.

class Demo : public Application {
Public:
    Key data_key(“data”, 0);
    Key max_key(“max”, 1);

    Demo(size_t idx): Application(idx) {}

    void run_() { 
        switch(this_node()) {
            case 0: producer(); break;
            case 1: maxer();
        }
    }

    void producer() { 
        float* vals = new float[3];
        vals[0] = 1;
        vals[1] = 10;
        vals[2] = 100;

        // Stores 'vals' as an Array in 'store' under the key 'data_key'
        DataFrame::fromArray(&data_key, store, 3, vals); 
    }

    void maxer() {
        // Gets the key 'data_key' from the store, waiting until it exists
        // if it's not there
        DataFrame* frame = store->getAndWait(&data_key);

        float max = frame->get_float(0, 0);
        for (size_t i = 1; i < frame->nrows(); i++) {
            float f = frame->get_float(0, i);
            if (f > max) max = f;
        }

        // Stores 'max' as a Scalar in 'store' under the key 'max_key'
        DataFrame::fromScalar(&max_key, store, max);
    }
};

Results

By the end of the project, we were able to read in and query datafiles up to a few gigabytes in size at reasonable speeds. For reference, it could take up to 5 minutes to read a 1 GB file into the system, depending on the hardware of the machine the node reading in the data was running on. Once loaded into the cluster, querying the data took about as long as a heavy HTTP request, usually < 500ms.

These speeds are at first not too impressive, but because we rolled our own data adapters, variable wrappers, and network protocol (i.e. no HTTP!), there is an inherent overhead to eau2 that would not be present under proper C++ usage. We therefore are proud of the system's capabilities, even if they're not production-grade.

You can check out the source for our final demo, "Degrees of Linus", here. The demo calculates the number of collaborators that worked on Github projects a certain number of degrees away from Linus Torvalds.

Closing Thoughts

Co-developing eau2 was a great learning experience that exposed me to everything from manual memory management and network socket operations to data serialization and API design. Like I mentioned earlier, I hope this project inspires you to try to build something similar; the idea isn't to build something perfect (ours certainly isn't) but rather to push yourself to design and iterate on a codebase that is larger and more complex than anything you've ever worked on for classes or hobby projects.

Good luck!