Gabriel Anhaia

Posted on May 24

Design a Real-Time Collaboration Backend (OT vs CRDT, Step by Step)

#systemdesign #interview #realtime #distributedsystems

Book: System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

"Design Google Docs" is the prompt that exposes whether a candidate has actually thought about real-time collaboration, or whether they've memorized a list of buzzwords. The trap is that OT and CRDT are not interchangeable. Pick the wrong one and your sync engine fights itself for the rest of its life.

Most candidates pick CRDT because it sounds fancier. Most production systems that ship still use OT. Both are right answers depending on the question, and the interviewer is watching to see if you know which question is being asked.

The interview-grade definition of "real-time collaboration"

When the interviewer says "real-time collaboration," they mean four properties at once:

Two or more clients edit the same document. Concurrent edits don't lose each other.
Edits propagate in under ~100ms perceived latency.
Every client converges to the same final state, eventually.
The system survives a client going offline mid-edit and reconnecting later.

That's it. Notice what's missing: nothing about locking, nothing about "the server decides who wins." A collaboration system that locks rows is a CRUD app with extra steps. The hard part is that two users typed at the same position at the same time and both edits have to land.

The two families of algorithms that solve this are Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDT). They solve the same problem with opposite philosophies.

OT: the Google Docs lineage

Operational Transformation came out of academic CSCW research in the late 1980s. Jupiter and the GROVE editor refined it through the 1990s. Google Docs picked it up around 2010 and made it the household name.

The core idea: every edit is an op (operation) with a position. When two ops happen concurrently, a central server transforms one against the other so they apply in a consistent order on every client.

A minimal OT op looks like:

type Op =
  | { type: "insert"; pos: number; char: string }
  | { type: "delete"; pos: number; len: number };

// Client A sees: "hello"
// A inserts "X" at pos 2 → "heXllo"
// Concurrently, B deletes 1 char at pos 0 → "ello"
// Server receives both. It transforms A's op against B's:
//   A.pos -= B.len  →  insert "X" at pos 1
// Final state on every client: "eXllo"

The transform function is the heart of the algorithm. For a real editor you have transforms for every pair: insert × insert, insert × delete, delete × delete, plus formatting ops, attribute changes, and embedded objects. Get one transform wrong and two users with the same op history end up with different text. Debugging that bug in production is the kind of thing that turns engineers into woodworkers.

What OT requires: a central, authoritative server. Every client sends ops to the server, the server orders them, transforms them, and broadcasts the transformed versions back. You can't run OT peer-to-peer without electing one peer as the server, which defeats the point.

What you get in return: small ops on the wire (a single insert is a handful of bytes), simple client-side state (just the current document plus a small pending-op queue), and a decades-deep body of academic work that explains every edge case.

OT in the wild: Google Docs, Google Slides, ShareDB, Apache Wave (RIP), CodeMirror's @codemirror/collab package.

CRDT: the Figma / Linear / Notion direction

A CRDT replaces "transform ops on a central server" with "ops carry enough metadata to be commutative and idempotent." If every replica applies every op exactly once, in any order, the replicas converge. No transform needed. No central server needed.

For text, the most common CRDT family is RGA-style (Replicated Growable Array) or YATA, used by Yjs. Every character gets a unique ID (usually (clientID, clock)) and a parent pointer to the character it was inserted after. Insertion order is reconstructed from the IDs, not from positions.

// Conceptual shape of a Yjs-style insert.
// Every character carries its own identity.
type YItem = {
  id: { client: number; clock: number };  // unique per character
  origin: ItemID | null;                   // the char it was inserted after
  rightOrigin: ItemID | null;              // tie-breaker for concurrent inserts
  content: string;
  deleted: boolean;                        // tombstone, never removed eagerly
};

The wire format and tombstones are bigger than OT ops. But the math works out: any two replicas that have seen the same set of ops are byte-identical. No transform table. No central order.

Yjs is the de-facto library for the JavaScript world. Automerge is the Rust/JS sibling with a JSON-document-shaped API. Both are production-grade.

A side-by-side of the same edit pattern looks like this:

// Yjs: insert "X" at position 2 in a shared text doc.
import * as Y from "yjs";

const doc = new Y.Doc();
const ytext = doc.getText("body");
ytext.insert(0, "hello");
ytext.insert(2, "X");                     // "heXllo"

// The encoded update is what you send on the wire.
// Doesn't matter if 7 peers receive these in different orders.
const update = Y.encodeStateAsUpdate(doc);

// Automerge: same insert, JSON-document API.
import * as A from "@automerge/automerge";

let doc = A.from({ body: "hello" });
doc = A.change(doc, (d) => {
  d.body = d.body.slice(0, 2) + "X" + d.body.slice(2);   // "heXllo"
});

const changes = A.getChanges(A.init(), doc);
// Ship `changes` to any peer; merge order doesn't matter.

The trade: bigger payloads, heavier client memory, but you can ship a peer-to-peer editor over WebRTC, support offline-first natively, and never write a transform function in your life.

CRDT in the wild: Figma (custom CRDT, not Yjs), Linear (custom, ops + snapshots), Notion (CRDT-ish for blocks), Apple Notes sync, Roam Research clones, AppFlowy, the Y-collab ecosystem (Tldraw, Hocuspocus, Liveblocks, ProseMirror with y-prosemirror).

Choosing: three questions

Forget "which is better." The choice falls out of three questions:

1. Server-authoritative or peer-to-peer? If the server has to enforce permissions, redact content, or be the legal source of truth (medical records, financial documents, regulated content), OT fits the mental model. The server already orders everything. With CRDT you can still enforce permissions, but the server is one peer among many and the model leaks.

2. Offline-first? If users edit on a plane for six hours and expect their work to merge back in cleanly, CRDT wins. OT's offline story exists, but the transform queues get hairy after a long disconnect.

3. What's your operational complexity budget? OT means you own the transform function. Every new op type is a quadratic explosion of transform pairs you have to test. CRDT means you adopt Yjs or Automerge and accept their wire format, their memory shape, their idiosyncrasies. Neither is free.

A rough heuristic: if the answer to "where does the document live?" is "in our database, and the client is just a view," OT. If the answer is "on every device, and the server is just one replica," CRDT.

Server architecture: what you actually deploy

Either algorithm, the deployment shape is similar. Three components.

                ┌───────────────────────┐
   clients ──── │  WebSocket gateway    │  (sticky to doc shard)
                └──────────┬────────────┘
                           │
                ┌──────────▼────────────┐
                │  Doc shard (per doc)  │  ── in-memory authoritative state
                │  - applies ops        │     - OT: transform & broadcast
                │  - keeps clients map  │     - CRDT: merge & broadcast
                └──────────┬────────────┘
                           │
              ┌────────────┼────────────┐
              ▼            ▼            ▼
         ops log    snapshots     presence service
         (append)  (periodic)     (Redis pub/sub)

WebSocket gateway. Terminates client connections. Routes by doc_id to the right shard. Sticky session: once a client lands on a shard, all its messages go there. Use consistent hashing on doc_id so adding shards doesn't reshuffle everything.

Doc shard. One process (or actor, or goroutine, or Erlang process) per active document. Holds the current state in memory, applies incoming ops, broadcasts to every connected client for that doc. Dies when nobody's connected and the shard manager evicts it. Cold-start the next time someone opens the doc by replaying the ops log over the last snapshot.

Presence service. Who's online, where their cursor is, what color they get. This is ephemeral and high-frequency (cursor moves at 60Hz). Don't put it through the same path as document ops, or it'll drown the ops log. Redis pub/sub or a separate WebSocket channel.

For OT, the doc shard also holds the transform queue: ops the server has accepted but not yet acknowledged to the originating client. For CRDT, the shard is mostly a relay. It merges incoming ops into its replica and broadcasts to other peers, and the merge math handles the rest.

Persistence: ops log + snapshots

This is the part that's the same for both families, and the part that interviewers almost always want you to draw on the whiteboard.

You never want to write the full document to disk on every keystroke. Even a moderate-size doc is 50KB and you'd be doing 50KB writes 30 times a second per active editor. So:

Append-only ops log. Every accepted op gets written to a log keyed by (doc_id, op_seq). Postgres, DynamoDB, Cassandra, Kafka: anything that does cheap appends and ordered scans by doc. The log is the source of truth. If everything else burns down, you can rebuild any document from its log.

CREATE TABLE doc_ops (
  doc_id      uuid    NOT NULL,
  op_seq      bigint  NOT NULL,
  client_id   text    NOT NULL,
  op          jsonb   NOT NULL,
  created_at  timestamptz NOT NULL DEFAULT now(),
  PRIMARY KEY (doc_id, op_seq)
);

-- Snapshots: a materialized state at a known op_seq.
CREATE TABLE doc_snapshots (
  doc_id      uuid    NOT NULL,
  op_seq      bigint  NOT NULL,
  state       bytea   NOT NULL,           -- serialized doc (OT) or Y.encodeStateAsUpdate
  created_at  timestamptz NOT NULL DEFAULT now(),
  PRIMARY KEY (doc_id, op_seq)
);

Periodic snapshots. Every N ops (N = 1000 is a fine starting point), serialize the current document state and write it as a snapshot. Cold-start = load the latest snapshot + replay the ops since then. Without snapshots, a hot doc with a year of history takes minutes to open.

For Yjs specifically, snapshots are just Y.encodeStateAsUpdate(doc), the encoded merge of every update so far. Restore with Y.applyUpdate(newDoc, snapshot). Automerge has A.save(doc) / A.load(bytes) for the same purpose.

A practical pattern: ship snapshots out-of-band on a background job, not in the hot path. The doc shard accepts an op, writes it to the log, broadcasts to clients, and returns. A separate process tails the log and rolls up snapshots every N ops or every M minutes.

Undo/redo

Both OT and CRDT handle undo, but the semantics differ.

In OT, undo is generating an inverse op (delete the inserted character, re-insert the deleted run) and applying it as a fresh op. The transform machinery does the rest. If someone else edited around your op in the meantime, the inverse gets transformed and lands cleanly.

In CRDT (Yjs in particular), undo is done by an UndoManager that tracks which ops belong to the local user and inverts them on demand. The trick is that the inverse has to respect concurrent edits. If someone replied to your sentence, undoing your sentence doesn't take their reply with it.

Both libraries ship reasonable defaults. The interview answer: "Per-user undo stacks. The inverse is generated against the current state, not the original state, so concurrent edits survive."

Conflict UI: what the user actually sees

True conflicts in text are rarer than you'd think. Two people typing at adjacent positions isn't a conflict, since both insertions land and the order is determined by the algorithm. A real conflict in a rich-text doc shows up at the semantic layer:

Two users edit the same cell of a table.
One user deletes a paragraph another user is editing.
Concurrent format changes (one user bolds a run, another italicizes the overlapping run).

The honest UI answer is: most of the time, you don't show anything. The doc converges, the user reads the result. For the cases that matter, like a deleted block someone was typing into, surface a non-blocking notice ("Someone deleted this section while you were editing. Restore?") and offer one-click revert. Don't put a modal in the user's face.

Figma's approach is the gold standard: their conflict-on-property-change leaves both edits applied in op order, and you can scrub the version history to see what happened.

The 90-second answer

When the interviewer asks "design Google Docs," this is the shape:

"I'd start with the algorithm. OT if the server is authoritative: Google Docs, anything with strict permissions or compliance. CRDT if I need offline-first or peer-to-peer: Figma, Linear, anything where the client is a real replica.

Either way the architecture is the same: WebSocket gateway, sticky-routed by doc ID to a per-document shard process that holds in-memory state. The shard accepts ops, applies them (transforms for OT, merges for CRDT), broadcasts to other connected clients, and appends to a persistent ops log. Periodic snapshots roll the log up so cold-start is fast.

Presence (cursors, selections, who's online) goes through a separate channel. Redis pub/sub. Don't pollute the ops log with cursor moves.

Undo is per-user inverse ops generated against the current state. Conflict UI is mostly invisible, since the algorithm converges. For semantic conflicts (deleted-while-editing), surface a non-blocking notice with a restore option.

The gotcha I'd call out: CRDT state grows. Every tombstone is forever unless you have a garbage-collection story. Yjs and Automerge both ship one. I'd budget for it from day one."

That's 90 seconds. It hits algorithm choice, server shape, persistence, presence, undo, conflict UI, and one operational risk. Interviewer follow-ups will dig into whichever of those you said with the least confidence.

The gotcha: CRDT garbage collection

Here's the bug that hits CRDT systems in production around month 18: the document size on disk keeps growing even when the user perception is that the doc shrank.

CRDTs work by keeping tombstones. When you delete a character, the character object stays in the structure with deleted: true. The tombstone has to exist so that concurrent edits referencing that character (an insert "after this character") have something to anchor to. If you eagerly remove the character, a delayed peer with an old reference can't merge.

So your "empty" doc with one paragraph of text is actually carrying every character that's ever been typed, deleted, and re-typed. A doc that's been edited heavily for a year is comically larger than the visible text.

Yjs ships a garbage collector that runs when you call Y.encodeStateAsUpdate(doc): it collapses tombstones whose causal context is no longer reachable. Automerge has A.save which produces a compacted form. Both work, both have caveats:

GC requires that all peers are caught up past the GC point. A peer that's been offline for two weeks and tries to merge against a GC'd state will fail or produce a corrupted merge. The library raises this, but you have to handle it, usually by forcing the lagging peer to re-fetch a fresh snapshot instead of merging old updates.
GC is expensive on big docs. Don't run it in the request path. Run it in a background job, ideally during low-traffic windows.
The compacted form is not byte-identical across runs of the GC. Don't use document bytes as a cache key after compaction.

In Yjs the practical setup looks like this:

import * as Y from "yjs";

// On the server, every N ops or every M minutes:
async function compactDoc(docId: string) {
  const doc = await loadDoc(docId);

  // Encode + decode with GC enabled. This is the compaction step.
  const compacted = Y.encodeStateAsUpdate(doc);

  // Write the compacted bytes as the new snapshot.
  await saveSnapshot(docId, compacted);

  // Truncate the ops log up to the snapshot point.
  // Any peer ahead of this point must be told to refetch.
  await truncateOpsLog(docId, doc.store.clientStates);
}

The interview-flavored version of this gotcha: "CRDT state grows because tombstones are forever. You need a GC pass that runs in the background, and a story for peers that have fallen behind the GC horizon, typically forcing a fresh snapshot pull instead of an incremental merge."

OT doesn't have this problem because the server can prune ops freely once everyone has acknowledged them. The trade is that OT puts the burden on the transform function instead.

If this was useful

If you're prepping for system design interviews and want this same step-by-step treatment for the other 14 designs interviewers actually ask (sharded counters, payment ledgers, rate limiters, news feeds), that's the spine of my System Design Pocket Guide: Interviews — 15 Real System Designs, Step by Step. The collaboration chapter goes deeper on the OT transform table and walks through a Yjs server you can actually run.

Which side do you fall on for your own product: OT because the server is the source of truth, or CRDT because the client has to keep working when the network doesn't?