Introduction
Picture two doctors updating the same patient record at the same time - one in São Paulo, the other in London. Both are offline. When connectivity returns, whose changes prevail?
This is not a hypothetical. It is the everyday reality of distributed systems: multiple nodes, no shared clock, no guaranteed network. The conventional answer has long been locking - one node waits while another writes. But locks are fundamentally incompatible with availability. When the network is unreliable, waiting is not an option.
Conflict-free Replicated Data Types (CRDTs) offer a different path: let both doctors write freely, and design the data structure so that merging their changes is always mathematically sound - no coordination required, no central authority consulted.
What Are CRDTs?
A CRDT is a data structure purpose-built for replicated, eventually consistent distributed systems. Its central guarantee: if two replicas receive the same set of updates - regardless of order or delay - they will converge to the same state.
This property is known as Strong Eventual Consistency (SEC). It was formally defined by Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski at INRIA in 2011. SEC occupies a deliberate middle ground: stronger than plain eventual consistency (which offers no convergence guarantees) and less expensive than strong consistency (which requires every read to reflect the latest write).
CRDTs achieve SEC through algebraic structure. Every merge operation must satisfy three properties:
-
Commutativity -
merge(A, B) = merge(B, A)- the order in which updates arrive does not affect the outcome -
Associativity -
merge(A, merge(B, C)) = merge(merge(A, B), C)- how updates are grouped does not affect the outcome -
Idempotency -
merge(A, A) = A- applying the same update more than once produces no additional effect
When a data structure satisfies all three, replicas can diverge freely across any number of nodes and are guaranteed to reconverge - without any synchronization protocol.
Note: These three properties apply strictly to state-based CRDTs. Operation-based CRDTs relax the idempotency requirement but instead rely on the network to deliver each operation exactly once and in causal order.
Two Primary Families
State-based CRDTs (CvRDTs) work by having nodes periodically exchange their full local state. The merge function computes the least upper bound of the two states - think of a value that can only grow along a defined lattice. The approach is straightforward to reason about, though it can become bandwidth-intensive when state is large.
Operation-based CRDTs (CmRDTs) instead broadcast individual operations between nodes. This is more network-efficient, but shifts the delivery guarantees onto the transport layer: operations must arrive exactly once, in causal order.
A third variant - delta-state CRDTs, introduced by Almeida, Shoker, and Baquero in 2016 - bridges both approaches. Rather than sending full state or individual operations, nodes exchange only the incremental changes since the last synchronization point.
A Concrete Example: The G-Counter
The grow-only counter (G-Counter) is the canonical introductory CRDT. Each node maintains its own slot within a shared vector:
Node A: [3, 0, 0]
Node B: [0, 5, 0]
Node C: [0, 0, 2]
The global total is the sum of all slots: 10. Merging two vectors applies an element-wise maximum: merge([3,0,0], [0,5,0]) = [3,5,0]. Because the counter can only increase, no conflicts are structurally possible.
To support decrement, two G-Counters are combined - one tracking increments, one tracking decrements - producing a PN-Counter. The net value is the difference between the two.
Real-World Applications
CRDTs moved from academic research into production systems with unusual speed. Several widely deployed systems rely on them today.
Collaborative editors. Yjs and Automerge are open-source CRDT libraries used in applications ranging from Jupyter Notebooks to a broad ecosystem of local-first tools. Yjs implements the YATA algorithm; Automerge uses a JSON CRDT. Both address the notoriously difficult problem of concurrent text insertion - ensuring that characters inserted simultaneously on different nodes never interleave in ways that corrupt the document.
Distributed databases. Riak was among the earliest databases to expose CRDTs - G-Counters, PN-Counters, OR-Sets, and Maps - as first-class types available directly to application developers. Azure Cosmos DB integrates CRDTs into its internal replication engine to support multi-master writes across globally distributed regions. Redis Enterprise builds on CRDT-based data types to enable active-active geo-distributed deployments.
Local-first software. The "local-first software" manifesto, authored by Martin Kleppmann, Adam Wiggins, Peter van Hardenberg, and Mark McGranaghan and published by Ink & Switch in 2019, makes the case that applications should treat the user's own device as the primary store, syncing to the network opportunistically rather than requiring it. CRDTs are the synchronization primitive that makes this architecture viable at scale.
Limitations and Trade-offs
CRDTs solve a real and important problem, but they are not a general-purpose answer to distributed systems challenges. Their constraints matter as much as their guarantees.
Not every operation can be made conflict-free. Some state transitions resist CRDT formulation without careful, non-obvious design work. Moving a node within a tree is a well-known example: when two users concurrently relocate the same node to different parents, the system must resolve the conflict deterministically. Kleppmann published a dedicated algorithm for this case in 2020; it is substantially more involved than a simple counter or set.
Metadata overhead. CRDTs frequently carry significant metadata to track causality - version vectors, and tombstones for deleted elements. An OR-Set, for instance, must retain tombstone records for every element ever removed. Production systems invest considerably in compression strategies, such as run-length encoding and columnar storage layouts, to keep this overhead manageable.
Semantic conflicts remain. CRDTs eliminate structural conflicts - two replicas will always produce the same merged state. They do not eliminate semantic conflicts. If two users concurrently schedule the same meeting at different times, the CRDT converges correctly; the application still must decide which time to present. The data structure handles the mechanics; the business logic remains the developer's responsibility.
Incompatible with ACID guarantees. CRDTs are fundamentally optimistic. They trade atomicity and isolation for availability and partition tolerance. They are well suited to counters, sets, registers, and document structures - and poorly suited to contexts requiring strict transactional integrity, such as financial transfers or inventory management where overselling is unacceptable.
Conclusion
CRDTs represent a mathematically grounded, production-proven answer to one of the most persistent problems in distributed systems design: how can multiple nodes write independently and still be guaranteed to reach agreement?
The approach is conceptually clean: rather than preventing divergence, design data structures whose divergence is always resolvable. The algebraic properties underlying that guarantee - commutativity, associativity, idempotency - are not engineering conventions. They are mathematical facts. That is precisely what makes the guarantee unconditional.
For engineers building systems where availability and offline capability take precedence over strict consistency, CRDTs are no longer a research curiosity. They are a mature, production-grade tool - deployed at scale in distributed databases, collaborative applications, and the growing ecosystem of local-first software.
If your work involves collaborative features, multi-region data replication, or offline-capable applications, CRDTs belong in your architectural decision framework.
References
Shapiro, M., Preguiça, N., Baquero, C., & Zawirski, M. (2011). A Comprehensive Study of Convergent and Commutative Replicated Data Types. INRIA Research Report RR-7506. https://inria.hal.science/inria-00555588/document
Preguiça, N., Baquero, C., & Shapiro, M. (2018). Conflict-free Replicated Data Types (CRDTs). arXiv:1805.06358. https://arxiv.org/abs/1805.06358
Kleppmann, M., & Beresford, A. R. (2017). A Conflict-Free Replicated JSON Datatype. IEEE Transactions on Parallel and Distributed Systems, 28(10). https://arxiv.org/abs/1608.03960
Kleppmann, M., Wiggins, A., van Hardenberg, P., & McGranaghan, M. (2019). Local-first software: You own your data, in spite of the cloud. Onward! 2019. https://www.inkandswitch.com/essay/local-first/
Almeida, P. S., Shoker, A., & Baquero, C. (2016). Delta State Replicated Data Types. arXiv:1603.01529. https://arxiv.org/pdf/1603.01529
Kleppmann, M. (2020). CRDTs: The Hard Parts. Hydra Distributed Computing Conference. https://martin.kleppmann.com/2020/07/06/crdt-hard-parts-hydra.html
Top comments (0)