Ravern Koh

Posted on Jul 26, 2018 • Updated on Aug 5, 2018

Collaborative text editing with Logoot

#crdt #logoot #collaborativeediting

Recently, I became intrigued with how collaborative text editing works. I was working on a school assignment in a group, and we were using Google Docs to work. It almost seems like magic, how one person can type something, and a few seconds later have it appear on others’ screens without any conflicts. I also recently discovered that the Keynote and Notes apps have collaborative features built in. So I decided to take a deep dive into collaborative text editing.

The problem with collaboration

To edit a purely text-based document, you actually only do 2 basic operations: insert and delete. Any other operations are built on these 2 operations. A single user editing a document might look something like this.

To collaborate in a document, the naive approach would be to send over each operation a user performs to all other users editing the document. This works when the connection is fast, perhaps when the users are connected in a LAN.

However, when network latency becomes significant, it is incredibly easy to achieve race conditions like this one.

When these race conditions occur, the documents on both clients become out of sync, and because of the nature of these operations, it is rare that they will become in sync again. Due to this, we need a better way of structuring these operations, or even structuring the document itself, so that reliable and consistent collaborative editing becomes a reality.

Operational Transformation

Enter Operational Transformation, or OT for short. OT is an algorithm that will keep documents consistent across all clients, no matter the order of the operations that get sent from other clients. Wave (the collaborative engine behind Google Docs), uses OT. I didn’t dive into OT too deeply, as I found other, simpler, and more intuitive solutions that I will explain later on. But for now, here is a quick rundown on how basic OT works.

Quick rundown

Let’s begin by understanding the very basics of OT. As mentioned earlier, a document is simply some content, which can be operated on by either inserting or deleting. The way OT solves the issue of concurrent editing is by transforming each operation using the previous operation, before it is applied to the document. Using the earlier example, after client A performs the insert on their document, client B’s delete operation, which is the next in line to be performed, gets transformed.

After the transformation, we can see that the operation now deletes the correct character from the document, not the wrong one in the naive example. Also, over on client B’s document, client A’s insert operation gets transformed after client B’s delete operation is performed, and the resulting document is the same on both ends.

Downsides to OT

Unfortunately, most of the popular OT algorithms are not always commutative. This means that when applying OT to two separate documents, it is not always guaranteed that they will end up being the same. Although this is rare, it can happen, and this will result in measures put into place to ensure consistency every once in a while. Thus, a different way of doing collaborative editing has surfaced in the recent years, known as CRDTs.

CRDTs

There are actually 2 flavours of CRDTs. They are (bear with me) Commutative Replicated Data Type and Convergent Replicated Data Type. The following will be about the former. CRDTs take a different approach from OT, in that they rely on structuring the document in a certain way, and not really on structuring the operations. This means that most of the time, using the CRDT approach, the document will be stored in a different format then just a simple string. This includes using things like binary trees and special position identifiers.

Downsides to CRDTs

When using CRDTs instead of OT, we are trading time complexity for space complexity. This means that while OT will take more time to process than CRDTs, CRDTs will usually end up using significantly more memory than OT.

Logoot

Logoot is a great (and simple) example of a CRDT. It is one of the easier CRDTs to understand, and extensions/variations have also been built upon it to enable extra functionality, like group undo/redo. It is the main CRDT that I will be focusing on (duh, look at the title), and also implementing it, which I will talk about in the next story.

Basic principles

Going back to the original example with the race condition, you can see that the document got out of sync because the operations were using a position within the string. So when an insertion happens, everything after the inserted character is shifted to the right, which means their position is incremented.

When a second operation is applied onto both content, they are inserted at the same position, as specified in the operation. However, they occur at different characters as a result of the first operation. For example, in the next image, the original intention is to insert a ‘y’ after the ‘c’. However, due to the operation inserting in a fixed position, the ‘y’ is inserted after the ‘x’ instead.

Logoot tries to solve this problem by making the positions move with the text instead. That is, it assigns unique identifiers to each character/group of characters in the document, and performs the insertions and deletions based on the position of these identifiers.

In Logoot, the document is split up into groups of text. A group of text is called an atom. Each atom is given a unique identifier, or a uid, which has to be generated in a special way that will be covered later. Each uid is comparable to other uids. This means that you can do things like ask if uid1 < uid2 and vice versa. You can also check if they are equal. The document is then stored as an array of (uid, atom) pairs, sorted by the uids within the pairs in ascending
order. There will also be two special, blank lines that exist in every document. One at the very start which will have the MIN_uid, and at the very bottom, which will have the MAX_uid.

Phew, that was a lot to take in. Here comes the most important part though. For any two different uids, you should always be able to find a uid in between them. This is the property that makes Logoot work. So, the uids in Logoot need to be represented in a way that adheres to the property.

Representing uids

Let’s start simple. This whole time, we represented the identifiers of each character with its position in the initial string — an integer. However, using integers as uids does not adhere to the property we mentioned. Let’s say we have the uids 4 and 5. It is impossible to find a integer in between 4 and 5, and thus integers cannot be used as uids.

At this point, you might be thinking, what about floats? Now, theoretically, using real numbers as uids would work, as there are an infinite number of real numbers between any two different real numbers. However, in the context of a computer, this isn’t possible. Floating point numbers are simply approximations of real numbers, which means that it’s not always possible to find a floating point number between any two floating point numbers. Simply put, you cannot represent an infinite number of floating point numbers in just 32 or 64-bits.

So how are these uids represented? The answer is lists. Using lists, we can easily adhere to the property. A uid is represented by a list of integers. Let’s say we have the uids [4] and [5] (Take note that these are lists of a single integer element, and not just the integers). In order to find a uid in between [4] and [5], since there is no integer in between [4] and [5], we can simply make a longer list, that starts with a 4 (e.g. [4, 8]). Now, we have the uids [4], [4, 8] and [5]. And this can repeat forever, since lists are infinitely extensible (Of course, the maximum size of a list is limited by the available RAM, but this is big enough that we don’t have to worry.

Therefore, in Logoot, uids are represented by lists of integers. Take note that there are some additional components that make up a uid, _but for the sake of
understanding Logoot they aren’t necessary.

Now that we have covered uids in Logoot, we can move on the constructing a simple Logoot document.

A sample document

In Logoot, a document is made up of an array of pairs. Each pair containing a uid, and an atom. An atom is simply a string, containing some of the content in the document. The pairs in the document are always sorted by their uids. To get the content of the entire document, simply concatenate all the atoms in order. As with all data structures (remember Logoot is a CRDT), some operations can be performed on it.

An insert operation can be performed on a Logoot document. To perform an insert, you simply need to have the pair to insert. To generate the uid for the pair, you simply have to take two existing uids and generate a random uid in between them.

A delete operation can also be performed. To perform a deletion, you only need to know the uid to delete.

Start and end lines

In the insert operation, you can see that to generate a new uid, you need 2 existing uids. However, in a blank document, how would the two existing uids be found? This is where the start and end lines come into play. The start and end lines are two lines that always exist in a Logoot document. They contain an empty atom and their uids are 0 and MAX respectively, where MAX is the maximum size of an integer.

Conclusion

Hopefully this article has given you a good understanding of Logoot, and even just collaborative editing in general. I implemented Logoot in the Go programming language. Check it out here. Be sure to check out my other stories and thanks for reading!

Top comments (3)

Piotrek Koszuliński • Jul 27 '18

Hi! CKEditor 5's developer here.

Very interesting article – I think it's a good introduction to real-time collaboration algorithms. However, I'd like to clarify a couple of things about OT.

You wrote that:

This means that when applying OT to two separate documents, it is not always guaranteed that they will end up being the same.

This isn't true. A correct OT implementation guarantees that both clients have the same content. Without eventual consistency, there's no collaboration. Besides, a CRDT implementation can be buggy as well, so it's not a downside of OT – that's a downside of using a broken and unproven software :)

I agree, though, that OT is hard. We're four years into the project now. It's there, it works, but there's a lot we can still improve inside. But what we learnt during the course of action is that real-time collaboration is about much more than resolving a couple of conflicts. It's the whole system that needs to be coherent and for some reasons, the only proven rich-text editors which support real-time collaboration that I know use OT. Perhaps it's easier in case of simpler text editors (with a flat model; whereas advanced rich-text editors require a tree structure), but I wouldn't cross out OT just like that :).

Ravern Koh • Jul 27 '18

Woops! Guess I got influenced by the articles that I used for research 😅.

After reading up more on OT, I am inclined to agree with you. I think that OT and CRDTs are both pretty cool solutions to realtime collaboration. Perhaps my next deep dive will be into OT 😆.

Pierre Hedkvist • Feb 9 '22

Thanks for the article. Easy to understand how Loogot works from a higher level. This will help me digg into the research papers.

DEV Community