CRDTs and Distributed Consistency - Part 1: Building a distributed counter

#crdts #distributed #consistency

Originally published in Streaver's blog.

You are probably reading this because you either, are really curious and asked yourself the question "How does Google Docs work?" or maybe a client you work for had a similar requirement and basically forced you to ask the question 😅. The latter is what happened to me two years ago, and this is how I met CRDTs and how Streaver has been involved with this kind of problems since 2020.

How does Google Docs (or similar apps) work?

I started by asking the question to Google directly, thinking that someone probably already asked and answered the same question but of course I only got answers from the final user's perspective, this was not enough, I wanted technical details, I wanted the internal engineering details.

I had to re-think the question, and make it more general. So I changed it for "How is collaborative document editing implemented?" This yielded better results, but I still had to do a lot of reading, the answer was not easy.

One of the results pointed me to a solution called Operational Transformation (OT) which after a lot of reading and looking at some of the outgoing requests from the Google Docs app it self I confirmed that it was indeed how it was implemented. But I also saw some mentions to the CRDT acronym, which stands for Conflict-free replicated data type, that caught my eye, because it seemed like a newer idea and with some interesting properties that OT had not, for example not needing a central server.

So, the answer is, Google Docs works by making use the Operational Transformation algorithm, which in many ways is similar to Git, but it automatically solves conflicts between editing agents.

Writing your own collaborative editor is a huge task, we have done it, but we will not do this here. I will leave this video from @martinkl which goes through some of the challenges of building this kind of apps.

From this point onwards we will explore the ideas behind CRDTs that will eventually allow you to write your own collaborative apps including rich-text editors.

What are CRDTs?

In distributed computing, a conflict-free replicated data type (CRDT) is a data structure that is replicated across multiple computers in a network, with the following features:

No Coordination: The application can update any replica independently, concurrently and without coordinating with other replicas.

Self-healing: An algorithm (itself part of the data type) automatically resolves any inconsistencies that might occur.

Convergence: Although replicas may have different state at any particular point in time, they are guaranteed to eventually converge.

An example will make it easier to understand and in this post we will do just that, build our own CRDT.

Imagine you have N instances of an app that need to count the amount of times a user clicks on a button, we don't need to store the information, just count it and eventually synchronize the information so everyone is aware of the total amount. There are many ways to solve that:

One way that comes to my mind is to have a server that holds the count and implements the transactions and APIs to ensure that the counter is updating correctly and that the clients can fetch the latest information. Just describing this was hard, it involves a DB, servers, APIs, etc.
Another way involves using a CRDT that implements a distributed counter over a P2P protocol. This type of counter is called G-Counter (Grow-only Counter) and the P2P stuff is just sugar on top, CRDTs don't care about the communication layer at all!

Building a G-Counter

Let's build this G-Counter together. A counter is just a variable that allows us to store a number. First we want to understand what a counter should do for a single user:

let counter = 0; // Creating the counter and setting initial value

counter += 1; // Increase the counter by n=1 (n could be anything);
counter += 3; // Increase the counter by n=3;

console.log(counter); // We need to query the value somehow;

Now that we know the APIs we need let's build a counter but wrapping it's functionality in a class (I am going to use TypeScript), this will prepare us for the G-Counter:

export default class GCounter {
  private count: number;

  constructor(initialValue = 0) {
    this.count = initialValue;
  }

  public query(): number {
    return this.count;
  }

  public increment(num: number): void {
    if (num < 0) {
      throw new Error('Only positive values');
    }

    this.count = this.count + num;
  }
}

The next step is to introduce a way of storing and synchronizing the counter value from other agents. We will need an internal state that allows us to compute the actual value of the counter based on what we know of the outside world. Moreover, in order to synchronize the state between agents we need a rule to decide who is right, basically we need to answer the question: Is my information of the world the latest or is the information I got from someone else newer?. This is easy to answer if we assume that our counter can only be incremented, we just have to use Math.max function and we always keep the largest count as the most recent information. Let's write the code for this:

export default class GCounter {
  private id: number; // The ID of this client
  public state: number[]; // The last known count value for each client

  constructor(id: number, size: number, initialValue = 0) {
    this.id = id;

    // Initialize an array with zeros for everyone except myself
    this.state = Array.from({ length: size }, (_, index) =>
      index === id ? initialValue : 0,
    );
  }

  public query(): number {
    // Return the sum of all counters (last known value)
    return this.state.reduce((acc, value) => acc + value, 0);
  }

  public increment(num: number): void {
    if (num < 0) {
      throw new Error('Only positive values');
    }

    // Increment my local counter by num
    this.state[this.id] += num;
  }

  public merge(counter: GCounter) {
    // Create tuples with local count and received count
    // zipped = [[local1, remote1], [local2, remote2], ...]
    const zipped = this.state.map((count, index) => [
      count,
      counter.state[index],
    ]);

    // Update the state with the max known counter for each client
    this.state = zipped.map((counts) => Math.max(counts[0], counts[1]));
  }
}

We now have a G-Counter, let's see it in action with 2 clients:

import GCounter from './GCounter';

const NUMBER_OF_CLIENTS = 2;

// All clients start
const client1 = new GCounter(0, NUMBER_OF_CLIENTS);
const client2 = new GCounter(1, NUMBER_OF_CLIENTS);

const clients = [client1, client2];

// Each client independently increases their counter
client1.increment(1);
client2.increment(3);
client2.increment(3);
client1.increment(1);
client1.increment(1);

// Before sharing any knowledge they just know about their own count
console.log({
  client1: client1.query(),
  client2: client2.query(),
});
// { client1: 3, client2: 6 }

// At some point in time they synchronize and share their state
clients.forEach((client) => {
  client1.merge(client);
  client2.merge(client);
});

// After sharing the knowledge they agree on the count
console.log({
  client1: client1.query(),
  client2: client2.query(),
});
// { client1: 9, client2: 9 }

Congratulations we have built a distributed conflict-free and replicated counter! No need to do transactions or coordinations.

In this particular case we are not actually sharing the state through the network, just on memory. But you can share the state in whatever way you prefer, HTTP, WebSockets, etc.

Conclusions

As you saw, a CRDTs can be very simple, a simple counter if written properly can be a considered a CRDT. Despite that, these data types are extremely powerful because they ensure eventual consistency without coordination between agents.

In upcoming parts of this series of posts I will build different structures and I will share how to propagate state between agents via peer-to-peer.

You can find the second part here.