As a best-selling author, I invite you to explore my books on Amazon. Don't forget to follow me on Medium and show your support. Thank you! Your support means the world!
I remember the first time I tried to build a collaborative editor. I thought it would be simple—just send keystrokes over a WebSocket and merge them on the server. I was wrong. Within minutes, two users typing the same word turned my document into a jumbled mess of missing letters and duplicate characters. That day I learned that real-time collaboration is one of the hardest problems in distributed systems.
Let me show you what I discovered, step by step, so you can understand it too—even if you’ve never written a line of code before.
When you edit a document alone, there is only one sequence of changes. But when two people edit the same document at the same time, two sequences happen in parallel. They conflict. If I delete a word at the same moment you insert a letter inside that word, who wins? The answer isn’t magic—it’s mathematics.
There are two main ways to solve this. The first is called Operational Transformation, or OT. The second is called Conflict-free Replicated Data Types, or CRDTs. Both achieve the same goal: everyone sees the same final document, even though edits happened in different orders on different machines.
Let’s start with OT because it’s older and used in Google Docs. Imagine you and I are typing in the same paragraph. I type the letter “A” at position 5. You type the letter “B” at position 3. If I apply my change first, then yours, the result is different than if yours comes first. OT fixes this by transforming each operation so that both orders produce the same result.
Here’s a tiny example. I insert “X” at position 2. You insert “Y” at position 4. If we both apply our own changes without thinking, then your “Y” is at position 4 in my document, but in your document, after my “X” is applied, “Y” should actually be at position 5. The transformer shifts “Y” by one. That shift is the core of OT.
I wrote a simple OT transformer for a class project. It wasn’t pretty, but it worked. The key function took two operations and returned two transformed operations that could be applied in any order. For inserts, I compared positions. If my insert came before yours, your insert had to move right by one. If they were at the same spot, we used a tiebreaker—like whose site ID was alphabetically smaller. This ensured that even if we both typed at the same position, the characters would end up in a deterministic order.
The tricky part was deletion. If I delete a character at position 3 and you delete a character at position 5, after my deletion your delete should target position 4. But if we both delete the same character, only one deletion should survive. The transformer had to produce a null operation for the second deletion.
I spent a whole weekend debugging a case where three concurrent operations tangled. My transformer looped over pending operations, transforming each incoming remote operation against every pending local operation. Then I applied the transformed remote operation to the local document. The pending operations were also transformed so they still applied correctly later.
Here’s what that looked like in JavaScript—notice how I use simple arrays and for loops:
class OTClient {
constructor(docId) {
this.doc = '';
this.pending = [];
this.serverVersion = 0;
this.clientId = Math.random().toString(36).slice(2);
}
applyOperation(op) {
if (op.type === 'insert') {
this.doc = this.doc.slice(0, op.pos) + op.char + this.doc.slice(op.pos);
} else if (op.type === 'delete') {
this.doc = this.doc.slice(0, op.pos) + this.doc.slice(op.pos + 1);
}
this.render();
}
localInsert(pos, char) {
const op = { type: 'insert', pos, char, clientId: this.clientId };
this.applyOperation(op);
this.pending.push({ op, version: this.serverVersion });
this.sendToServer(op);
}
receiveRemote(remoteOp, remoteVersion) {
let transformedOp = remoteOp;
const newPending = [];
for (const pending of this.pending) {
const [newLocal, newRemote] = this.transform(pending.op, transformedOp);
if (newLocal) newPending.push({ op: newLocal, version: pending.version });
transformedOp = newRemote;
}
this.pending = newPending;
if (transformedOp) this.applyOperation(transformedOp);
this.serverVersion = remoteVersion;
}
transform(a, b) {
if (a.type === 'insert' && b.type === 'insert') {
if (a.pos < b.pos) return [a, { ...b, pos: b.pos + 1 }];
if (a.pos > b.pos) return [{ ...a, pos: a.pos + 1 }, b];
if (a.clientId < b.clientId) return [a, { ...b, pos: b.pos + 1 }];
return [{ ...a, pos: a.pos + 1 }, b];
}
// ... similar cases for delete-delete, insert-delete, delete-insert
return [a, b]; // fallback
}
}
This code is simplified but captures the essence. I kept it short on purpose—you can see how each insert or delete adjusts the position of the other. The server broadcasts operations in the order it receives them, and each client does this transformation to stay consistent.
But OT has a weakness: it needs a central server to order operations. If you go offline, you cannot produce valid operations because you don’t know the current state of the server. That’s where CRDTs come in.
CRDTs let every client work independently. You can be offline for hours, make hundreds of changes, and when you reconnect, everything merges automatically. The secret is that each character (or operation) gets a unique identifier that never changes. When two clients merge, they just insert the new characters into a sorted list based on that identifier. No conflict resolution needed.
I built a simple CRDT editor once. Instead of storing a string, I stored an array of items. Each item had an ID (a combination of a site ID and a local counter), a character, and a deleted flag. The order of the array was determined by sorting IDs using a total order—first by counter, then by site ID.
Here’s the core merge function:
class CRDT {
constructor(siteId) {
this.items = [];
this.siteId = siteId;
this.counter = 0;
}
insert(pos, char) {
const id = { site: this.siteId, seq: ++this.counter };
const entry = { id, char, deleted: false };
// Find the visible index
let visible = 0;
for (let i = 0; i < this.items.length; i++) {
if (!this.items[i].deleted) {
if (visible === pos) {
this.items.splice(i, 0, entry);
return entry;
}
visible++;
}
}
this.items.push(entry);
return entry;
}
delete(pos) {
let visible = 0;
for (let i = 0; i < this.items.length; i++) {
if (!this.items[i].deleted) {
if (visible === pos) {
this.items[i].deleted = true;
return;
}
visible++;
}
}
}
merge(remoteItems) {
for (const remote of remoteItems) {
const index = this.items.findIndex(
e => e.id.site === remote.id.site && e.id.seq === remote.id.seq
);
if (index === -1) {
// Insert in sorted order
const sortedIndex = this.sortedInsertIndex(remote.id);
this.items.splice(sortedIndex, 0, remote);
} else {
// Merge deleted flag: if either says deleted, it stays deleted
this.items[index].deleted = this.items[index].deleted || remote.deleted;
}
}
}
sortedInsertIndex(id) {
for (let i = 0; i < this.items.length; i++) {
const current = this.items[i].id;
if (current.seq > id.seq || (current.seq === id.seq && current.site > id.site)) {
return i;
}
}
return this.items.length;
}
toString() {
return this.items.filter(e => !e.deleted).map(e => e.char).join('');
}
}
This approach is incredibly robust. You never have to “transform” operations—you just insert new items in the right place. The only rule is that each item’s identifier must be globally unique and follow a total order. This is why CRDTs are used in peer-to-peer editors like Yjs and Automerge.
But CRDTs come with their own challenges. The list of items grows indefinitely, even when characters are deleted. You end up with tombstones. After a few thousand edits, the array can become huge. Engineers use techniques like interleaving (combining many insertions into one entry) and garbage collection (removing tombstones when everyone agrees they are deleted). Still, for most applications, CRDTs are simpler to implement correctly than OT.
Now, cursors. When you see another person’s cursor moving in real time, that’s a separate problem. Each cursor position needs to be transformed alongside the document changes. If I insert a character while your cursor is at position 5, your cursor should move to position 6. If I delete a character before your cursor, your cursor should move left.
I wrote a cursor manager that stored remote cursor positions and adjusted them whenever a local or remote operation was applied. The transformation was the same as for text operations: I used the same transformer functions to shift cursor positions.
class CursorTracker {
constructor(clientId) {
this.remoteCursors = new Map(); // siteId -> { pos, color, name }
this.localPos = null;
this.clientId = clientId;
}
updateLocal(pos) {
this.localPos = pos;
this.broadcast({ type: 'cursor', pos, clientId: this.clientId });
}
handleRemote(msg) {
this.remoteCursors.set(msg.clientId, { pos: msg.pos, color: msg.color, name: msg.name });
this.renderRemoteCursors();
}
transformCursor(remoteClientId, op) {
const cursor = this.remoteCursors.get(remoteClientId);
if (!cursor) return;
// Apply the same transformation as text operations
if (op.type === 'insert' && op.pos <= cursor.pos) {
cursor.pos += 1;
} else if (op.type === 'delete' && op.pos < cursor.pos) {
cursor.pos -= 1;
}
this.renderRemoteCursors();
}
renderRemoteCursors() {
// Remove old cursor elements and insert new ones at correct positions
// This is a browser DOM manipulation example
document.querySelectorAll('.remote-cursor').forEach(el => el.remove());
for (const [id, cursor] of this.remoteCursors) {
const span = document.createElement('span');
span.className = 'remote-cursor';
span.style.backgroundColor = cursor.color;
span.textContent = '|';
span.title = cursor.name;
const textNode = document.getElementById('editor').firstChild;
if (textNode && cursor.pos <= textNode.textContent.length) {
const range = document.createRange();
range.setStart(textNode, cursor.pos);
range.collapse(true);
range.insertNode(span);
}
}
}
}
This code runs every time a cursor update arrives or a local change is applied. I had to be careful not to insert cursor spans while the user was typing, because that would move the selection. I used a simple debounce—send cursor positions at most every 100 milliseconds.
Now let’s talk about the network. Real-time editors use WebSockets for low latency. The client sends operations as JSON messages. The server may broadcast them to all other clients, optionally after validating them. For CRDTs, the server is just a relay—it doesn’t need to understand the content. For OT, the server is the authority that assigns versions.
I built a small server with Node.js and the ws library. The server kept a copy of the document and a list of pending operations from each client (for OT). When a new operation arrived, the server transformed it against the previous pending operations and applied it to its own document, then broadcast the result.
One lesson I learned the hard way: always include a sequence number or version in every message. Without that, you cannot detect lost messages. If a client misses an operation, the next one will break the document. I added a monotonically increasing version number on the server and made clients resend missed messages if they detected a gap.
Conflict resolution doesn’t end at text. Modern editors handle rich content: bold, italics, images, tables. Each of these requires its own CRDT or OT rules. For bold formatting, you might treat it as a separate layer—a set of intervals that toggle the property. CRDT sets work well because they are idempotent: adding the same interval twice does nothing.
Real production systems use block-based editors like ProseMirror or Slate, where the document is a tree of nodes. Collaboration then becomes merging operations on a tree structure. Tree CRDTs are more complex—they need to handle move operations, reordering, and hierarchical constraints. Research in this area is still active.
The user experience matters just as much as the algorithms. When a remote user highlights text, you want to see a colored highlight. When they scroll, you may want to scroll alongside them. A “follow mode” is common in code editors like Visual Studio Code’s Live Share. Implementing scroll sync means tracking scroll positions and adjusting them when another user inserts or deletes content above the viewport.
I added a simple scroll sync by broadcasting scroll offsets relative to the document height. When I received a remote scroll, I animated the scroll smoothly using requestAnimationFrame. To prevent infinite loops, I only broadcast when the scroll position changed by more than 10 pixels.
Performance is another challenge. A single keystroke can trigger dozens of operations (especially with undo). If you broadcast every character as a separate operation, you’ll flood the network. Most editors batch characters typed in quick succession into a single operation. For example, typing “hello” quickly becomes one insert operation with the string “hello” and a start position.
Undo in collaborative systems is surprisingly hard. A simple per-client undo stack is easy: if you undo your last action, you send an inverse operation. But what if you undo after someone else has edited? The undo operation must be transformed against subsequent remote operations. CRDTs handle undo by adding a “counter-operation” that cancels the original. But complex undo histories can lead to unexpected results.
I solved undo by allowing each client to undo only its own last operation, and only if no one else has edited the same region since. This is a compromise, but for many small teams it works.
Let me talk about offline editing. CRDTs shine here because you can keep a local copy of the document, apply changes, and store the operations in a journal. When you reconnect, you send the journal to the server (or to peers), and they merge. No conflict resolution needed because CRDTs guarantee convergence.
I tested offline editing by disconnecting my laptop’s Wi-Fi, making edits, then reconnecting. The merge happened instantly. The document looked correct. But I noticed one thing: the cursors of other users were gone until the next update. I fixed that by having clients re-broadcast their cursor positions on reconnect.
Now, a personal story. I was building a collaborative whiteboard for a hackathon. We used a CRDT for the shapes (rectangle, circle, text). Each shape had a unique ID. When someone moved a shape, we sent a new position. The CRDT ensured that if two people moved the same shape simultaneously, the final position was deterministic based on the IDs. It felt like magic when we tested it with five people drawing the same line—no lock, no conflicts.
But we hit a bug: when a user deleted a shape and another user added a shape with the same ID (because of a bug in ID generation), the shapes got confused. We fixed it by using UUIDs instead of counters. UUIDs are practically unique, so collisions almost never happen.
Lessons learned:
- Always use globally unique IDs for everything.
- Test with more than two concurrent users.
- Simulate network latency and packet loss.
- Log every operation for debugging.
- Keep the user interface responsive—show remote editing with a small delay rather than freezing.
Today, you don’t have to build all this from scratch. Libraries like Yjs, Automerge, and ShareDB handle the hard parts for you. Yjs uses a CRDT that supports rich text and even images. Automerge is built on a conflict-free data type called a “list of operations” and works great offline. ShareDB uses OT and integrates well with React.
If you want to add collaboration to your app, I recommend starting with one of these libraries. They are battle-tested, handle edge cases you never thought of, and come with cursor sync built in. Spend your time on the user interface, not on reinventing the transformation algorithm.
But understanding how they work underneath makes you a better developer. When something breaks—and it will—you’ll know where to look. You’ll know why a delete operation caused a jump, or why a cursor appeared at the wrong position.
The engineering behind real-time collaborative editing is a beautiful blend of distributed systems, data structures, and human interface design. It turns a solitary act—writing—into a shared experience. The next time you watch someone edit a document you’re also editing, remember the math and code that silently keeps everything in sync.
📘 Checkout my latest ebook for free on my channel!
Be sure to like, share, comment, and subscribe to the channel!
101 Books
101 Books is an AI-driven publishing company co-founded by author Aarav Joshi. By leveraging advanced AI technology, we keep our publishing costs incredibly low—some books are priced as low as $4—making quality knowledge accessible to everyone.
Check out our book Golang Clean Code available on Amazon.
Stay tuned for updates and exciting news. When shopping for books, search for Aarav Joshi to find more of our titles. Use the provided link to enjoy special discounts!
Our Creations
Be sure to check out our creations:
Investor Central | Investor Central Spanish | Investor Central German | Smart Living | Epochs & Echoes | Puzzling Mysteries | Hindutva | Elite Dev | Java Elite Dev | Golang Elite Dev | Python Elite Dev | JS Elite Dev | JS Schools
We are on Medium
Tech Koala Insights | Epochs & Echoes World | Investor Central Medium | Puzzling Mysteries Medium | Science & Epochs Medium | Modern Hindutva
Top comments (0)