How Real-Time Editing Works: Understanding Event Ordering in Distributed Systems

#distributedsystems #softwareengineering #systemdesign

A single-threaded application runs on a single machine and a single thread, so handling event timing is straightforward. For example, if you use Notepad or Vim on your local computer to edit a document, you can easily make changes. The backend handles the edits in a single-threaded manner, storing the versions according to the local time. Imagine that opening such a file starts a process in the OS, and as soon as the process starts, every event's timestamp is sequentially handled in a linear fashion. When you open the document, the event is recorded at timestamp 0. Then, if you add a header, it is recorded at time 1. Adding more text records the event at time 2. Here, we don’t have to worry much about timing conflicts. This concept is essentially a "logical clock."

Lamport Clock

As seen in the above example, a logical clock is a measure of time passing in terms of logical operations. The simplest logical clock is a counter incremented before an operation executes, giving each operation or event a unique time. This is Lamport clock.

Now imagine editing an online document being accessed by 3 different users. Here, the timestamps of operations across three different processes must be tracked. This way, the document can maintain versions, and the backend can merge edits based on their timestamps (similar to how Google Docs operates). If two edits happen at different times, the later one is considered newer. If two edits occur concurrently, both contents must be retained, requiring an algorithm to resolve conflicts.

Let's try solving this using logical timestamps. Here are the rules we follow while implementing a lamport clock across different processes:

The counter initializes at 0 for each process when it starts editing the document.
Each process increments its counter by 1 before executing an edit.
When a process sends a message (saves an edit), it increments its counter by 1 and sends a copy of it with the message.
When a process receives a message (an edit from another collaborator), it merges the received counter with its local counter by taking the maximum of the two. Finally, it increments the counter by 1.

Try adding timestamps to the above diagram using the mentioned algorithm, and you will arrive at the same results using the logical timestamp approach. The backend will gather all events/operations from all processes and order them by timestamp to create a versioned document. However, in the case of C and E, even if E occurred after C, E might receive a timestamp lower than C. To handle such issues, we use vector clocks.

Vector Clock

The following approach applies to vector clocks:

A process updates its local vector clock based on these rules:

Initially, the counters in the array are set to 0 when a user comes online.
When an operation occurs, the process increments its counter in the array by 1.
When a process sends a message (user saves changes), it increments its counter in the array by 1 and sends a copy of the array with the message.
When a process receives a message (another user's change), it merges the received array with the local one by taking the maximum of each element. Finally, it increments its counter in the array by 1.

With vector clocks, we can compare two timestamps to determine if they occurred concurrently or if one occurred before the other. If all elements in the array for event X are less than or equal to those in event Y, we can say that X occurred before Y (e.g., B and D in the figure). If some elements are greater and others are smaller, the events are considered concurrent (e.g., A and E in the figure). This case represents a conflict when merging changes, and Google Docs might retain both collaborators' changes to handle concurrent edits. Using this approach, the backend of a collaborative notebook can manage versions and logs of all operations performed by all processes with accurate timestamps.

Conclusion

We use logical clocks, like Lamport and vector clocks, to handle event ordering in distributed systems. Unlike physical clocks (e.g., quartz clocks), logical clocks help track the order of events more reliably in such contexts as physical clock in different systems can have errors and the time recorded could not be used to order the events.

In the next blog, I’ll dive into another topic related to distributed systems. Stay tuned!

Here are links to my previous posts on distributed systems:

Feel free to check them out and share your thoughts!

Quadratic AI – The Spreadsheet with AI, Code, and Connections

AI-Powered Insights: Ask questions in plain English and get instant visualizations
Multi-Language Support: Seamlessly switch between Python, SQL, and JavaScript in one workspace
Zero Setup Required: Connect to databases or drag-and-drop files straight from your browser
Live Collaboration: Work together in real-time, no matter where your team is located
Beyond Formulas: Tackle complex analysis that traditional spreadsheets can't handle

Get started for free.

Watch The Demo 📊✨