It all began over a year ago. I hadn't tackled any complex projects for a while. I was just writing small snippets and scripts here and there. My brain needed a jolt, so to speak, or I felt like I was going to fall behind. I was also seriously thinking about transitioning into a full-time dev role, so I decided to build a complete project from scratch to sharpen my skills. I wanted it to be challenging and have some connection to my daily life. Since I take notes every day, why not make a notes app? How hard could it be? "We do this not because it's easy, but because we thought it would be easy." There was so much I had to learn, and that motivated me to dive in.
From the start, I had a few key goals: it needed to be reliable across devices, work offline, and support full undo/redo history. Building a web app made the most sense. But one major obstacle loomed in the back of my mind: how do you resolve conflicts in real-time text editing? I didn't want to use existing libraries like ShareDB, Etherpad, Yjs, or Automerge, since that would've undermined my learning process.
I knew that simple text replace logic wouldn't work. The solution had to involve sending only the changes users made. Trying to come up with my own algorithm out of thin air would've been a huge time sink and probably wouldn't have worked well anyway. After some digging, I found the Etherpad EasySync Technical Manual in the Etherpad GitHub repo, which explains Operational Transformation (OT) in detail. Reading it filled in all the gaps I needed to get the project going. Just to be clear, I didn't look at Etherpad's codebase since I wanted to understand and solve the problem on my own.
This article focuses on the challenges I faced while designing and implementing OT from scratch and explains a bit how OT works in practice.
What is OT? Just shifting indices, right?
Operational Transformation (OT) on its own sounds simple: two users edit a shared document, and you just adjust indices so their changes don't overwrite each other. Easy, right?
That's what I initially thought.
I went into this with only the Etherpad Manual in virtual hand. I didn't use pseudocodes nor libraries. It ended up taking me multiple iterations just to get a correct transform function. But the biggest headache was implementing per-user undo/redo without breaking document consistency or user intent. Add offline editing, WebSocket sync, and caret tracking, and you end up dealing with a lot of moving parts. The complexity is less about the algorithms and more about the system around it.
What began as a "just a side project" turned into the most technically demanding thing I've ever built. But I'm glad I didn't cut corners as it gave me a deep understanding of how collaborative systems really work.
Let's walk through some examples to see OT in action.
Example 1: Handling Concurrent Edits (No Conflict)
When two users make changes at the same time, server must reconcile them, so both end up with the same result.
- Alice and Bob start with
"Hello!"
. - Alice inserts
" world"
before!
. - Bob inserts
" Bob!"
after!
. - Server applies Alice's change first.
- When it processes Bob's change, it transforms it accounting for Alice's change.
- Both users see
"Hello world! Bob!"
.
This sequence is visualized in the diagram below.
Although there was no conflict, Bob's change had to be adjusted because it was made on an outdated document. Both changes originally referenced the same document. Since Alice inserted " world"
before !
, Bob's "Bob!"
needed to be shifted 6 characters to the right to preserve his intent.
Here's how server sees Bob's change:
- Bob's change applies to
"Hello!"
- Alice's change was processed first.
- Bob's change is transformed against Alice's change, shifting position to right by 6 characters.
- Server adds Bob's adjusted change to the list.
By applying this transformation, both Alice and Bob will see the same final document: "Hello world! Bob!"
.
Example 2: Handling Concurrent Edits (Conflict)
Conflict can happen when two users insert text at the same position. Here's a slight variation of the previous example showing this case:
- Alice and Bob start with
"Hello!"
. - Alice inserts
" world"
before!
. - Bob inserts
" cat"
before!
. - Server applies Alice's change first.
- When it processes Bob's change, it transforms it accounting for Alice's change.
- Both users see
"Hello world cat!"
.
This sequence is visualized in the diagram below.
You might be wondering why the result is "Hello world cat!"
instead of "Hello cat world!"
. Both answers are valid and neither reflects the user's intent more accurately. When implementing OT, you must choose a bias for handling simultaneous inserts at the same position. In this case, the insertion that the server processes first is placed to the left. That's called a left bias.
Why Implement OT from Scratch? Challenges and Lessons Learned
I didn't build OT from scratch because I had to. An app about taking notes on its own didn't seem in-depth enough, so learning the complexities of real-time collaboration seemed like a fitting challenge. Using a library was out of the question as it felt too much of like a black box. Instead, I accepted the challenge, not fully aware what I was getting myself into, and chose some constraints to keep the scope manageable:
- Plain text only (no rich formatting - no bold or different font sizes)
- Centralized server (avoiding decentralized complexity)
Lexicographical Bias Was a Mistake
One early misstep was choosing lexicographical bias for simultaneous inserts. It seemed like a good idea at the time as characters have a well-defined order. But it introduced subtle, hard-to-debug inconsistencies. For example, history implementation would break, as operations wouldn't always apply in a predictable order. To add cherry on top, later I found out that some system implementations might handle character ordering differently when dealing with cases such as accented letters or locale specific rules. It was a painful lesson on how badly made critical design decisions can really trip you up. So I went with a deterministic option: left bias.
Mutable State Is a Pain
State management turned out to be another constant source of bugs. My first implementation used mutable state. It made debugging a nightmare. Corrupted state required slogging through so many logs just to find the cause of errors. Switching to immutable state with immer.js made a huge impact. Bugs became easier to isolate, and system behaviour became predictable. While immutability can raise performance concerns, it's a trade-off worth making. If performance ever becomes an issue, you can always opt-out incrementally once you have robust well-tested logic.
Minimal API
The interface exposing OT functionality was badly designed. There were multiple ways to interact with OT which led to inconsistent use, hidden bugs, and it was hard to maintain. I accidentally exposed functionality through internal objects. After several iterations, and with better insight into the system, I slimmed down the API to only essentials, which made the system more maintainable and codebase easier to understand.
Undo/Redo When You Collaborate
Local undo/redo is straightforward: push operations onto a stack and pop them when needed - it's all static and linear. But in a collaborative environment it's not trivial. You must transform those operations against concurrent external changes. There's also a more complex case of restoring older operations from server. This required me to experiment with different history stack designs, which was a process full of trial, error, and careful thought.
Rewriting the Whole Thing
In fact, with all these issues combined, I ended up rewriting the entire OT implementation. This time with an immutable state and a simplified interface. This rewrite turned a fragile prototype into a robust and maintainable system.
Ultimately, building OT from scratch was an intense learning journey that gave me invaluable insight into collaborative editing systems. If you're considering implementing your own OT library, be prepared for a steep learning curve and at the very least please use immutable state. Just a reminder that existing libraries can spare you a lot of pain. For me, doing it from scratch became one of the foundations of my project and is the best showcase of my engineering skills.
Engineering Deep Dive: Composition, Transform, Undo
This section walks through the core building blocks of OT: composing changes, transforming concurrent changes, and handling undo. It's not a replication of the Etherpad Manual, but a practical look at how I understood and applied these concepts.
Composition
Composition applies one change to the other and is denoted A ⋅ B
, meaning B
is applied/composed on A
.
Example: Simple Composition
Let's look at an illustrative example where two changes are composed.
-
A
= insert"Hello"
-
B
= retain"Hello"
, insert" world!"
-
A ⋅ B
= insert"Hello world!"
This is visualized in the diagram below where text is split into characters, contained in boxes. Arrows are decorative that point to the source of the character.
Change A
inserts "Hello"
, and change B
retains all from A
and inserts " world!"
. Their composition results in "Hello world!"
.
Transform Algorithm
The transform function is the core of OT. Given two concurrent changes on the same document, transform adjusts them so they can be applied in any order while still producing the same result. In practice it means shifting indices and replacing inserts with retained characters.
In the Etherpad Manual, and in my code, the function to adjust operations is referred to as follow(A,B)
or f(A,B)
. In this article (and for clarity), I'm using the term transform instead and denoting it with t(A,B)
.
The algorithm is based on these three rules:
- R1. Insertions in A become retained characters in t(A, B)
- R2. Insertions in B become insertions in t(A, B)
- R3. Retain whatever characters are retained in both A and B
— Slightly adjusted. Etherpad and EasySync Technical Manual, AppJet, Inc., with modifications by the Etherpad Foundation
Example: Transform in Action
Let's see these rules used with a complete scenario:
- Base document
T = "is long cute tail"
. - Bob's change
X
replaces"is long "
with"big "
→T ⋅ X = "big cute tail"
. - Alice's change
Y
replaces"is "
with"has "
, and replaces"cute "
with"round "
→T ⋅ Y = "has long round tail"
.
These changes are reflected in the diagram below. Document T
is the starting point. Bob's change X
move upwards, and Alice's change Y
move downwards.
-
t(X, Y)
is Bob's change adjusted by Alice's: insert"has "
, retain"big "
, insert"round "
, delete"cute "
, and retain"tail"
. -
t(Y, X)
is Alice's change adjusted by Bob's: retain"has "
, insert"big "
, and retain"round tail"
.
In the end Alice and Bob will both see "has big round tail"
on their screen. It includes everything inserted by either user and excludes everything either of them deleted.
One subtle detail: server treats newly received insertions as atomic, meaning they can't be split up by other users. For example, when Alice replaced "is"
with "has"
. Bob couldn't have injected "rnes"
in the middle of "has"
to get "harness"
. Instead, it would've result in "hasrnes"
. After all, how could Bob insert something inside text he hasn't seen?
Pseudocode: Transform
This pseudocode shows transform function looping one position at a time, but it can be optimized to walk in ranges to reduce time complexity.
// Use changes A and B with an ordering bias isAfirst,
// resulting in a composable change on A that reflects B's intention
Function transform(A, B, isAfirst)
Initialize an empty list called result
// Walk through both changes one position at a time
For each triple (opA, posA, opB) in walk(A, B) Do
If opA.type is 'retain' and opB.type is 'retain' Then
// R3. Retain since both changes retain
Add posA to result
Else If opA.type is 'insert' and opB.type is 'insert' Then
If isAfirst Then
// R1. Insertions in A become retained characters
Add posA to result
// R2. Insertions in B stay insertions
Add opB to result
Else
// R2. Insertions in B stay insertions
Add opB to result
// R1. Insertions in A become retained characters
Add posA to result
End If
Else If opA.type is 'insert' Then
// R1. Insertions in A become retained characters
Add posA to result
Else If opB.type is 'insert' Then
// R2. Insertions in B stay insertions
Add opB to result
End If
End For
Return result
End Function
Undo Amidst External Changes
Undo seems trivial until you factor in external changes made by other users. You must transform an undo operation against external changes to avoid messing up the document.
Example: Undo After External Changes
Let's walk through an example where Bob undoes his latest change after Alice has sent an external change.
- Start with an empty document.
- Bob types
"abc"
→ Bob's undo stack: [insert "a"
,insert "b"
,insert "c"
]. - Alice inserts
"hi"
at the beginning → Document:"hiabc"
. - Bob presses undo. His inverse of
insert "c"
isdelete at 2
, but if applied directly it would result in"hibc"
(deleting the wrong char). - Instead, we transform
delete at 2
by shifting it right by 2 →delete at 4
→ correctly deleting"c"
, resulting"hiab"
.
Here's a diagram showing how Bob's state is modified when undo is pressed.
Every undo operation can have its own list of external changes. But in practice those changes cascade downward in the stack as undo operations are popped. That way undo is adjusted only on demand.
Undo/Redo Symmetry
Redo and undo are nearly symmetric, with one key difference: no-ops. An operation becomes a no-op if it has no visible effect (e.g., deleting already-removed text).
- No-op undo can still populate the redo stack (when redo is still meaningful).
- No-op redo is always discarded.
It can be useful for restoring text deleted by another user. For example:
- Alice inserts
"abc"
- Bob deletes
"c"
(external change) - Alice now sees
"ab"
- Alice presses undo and sees
"a"
- Alice presses redo and sees
"ab"
- Alice presses redo again and sees
"abc"
(Bob's deletion of"c"
is restored)
Pseudocode: Undo
// Undo the last operation by applying its transformed inverse,
// and update the next undo context with external transformations
Procedure undo()
Remove the last operation from the undo stack and assign it to op
Peek at the next operation on the undo stack and assign it to nextOp
Set applyOp to inverse(op)
For each external in op.externalChanges Do
// Transform external using applyOp and store it for next undo
Add transform(external, applyOp) to nextOp.externalChanges
// Update applyOp to reflect external change
Set applyOp to transform(applyOp, external)
End For
Push applyOp onto the redo stack
Apply applyOp to the document
End Procedure
Closing Thoughts
Implementing OT from scratch turned out to be far more challenging than I initially expected. There were times when I questioned my life choices, but ultimately, it was incredibly rewarding and helped me grow. Sure, existing libraries would've spared me a lot of pain, but building it myself gave me a much deeper understanding of conflict resolution and real-time collaboration.
This project isn't perfect. There are still rough edges and features I might add later. If you're curious, check out the live demo or explore the GitHub repo.
Top comments (0)