When you hit delete and the data comes right back, the problem isn't a bug. It's a design gap that only shows up at scale.
You've seen this before.
A user gets banned. You delete their entry from the database. The system confirms the deletion. Fifteen minutes later, another server in your cluster still shows them as active. You delete again. It comes back again. You feel like you're in a horror movie.
Or consider a simpler scenario: your team is running a feature-flag system across multiple data centres. Someone deprecates a flag - it needs to die permanently. But your distributed nodes keep merging state, and the deleted flag keeps resurrecting. It's not malicious. It's just physics.
This is one of the most infuriating classes of bugs in distributed systems: deletes that don't stick.
The good news? This has a known cause, a clean mental model, and a data structure purpose-built to solve it - the 2-Phase Set (2P-Set).
Decoding the Problem: Why Deletes Are Hard at Scale
On a single server, deletes are easy. You remove a row. Done. The authoritative truth lives in one place.
The moment you scale horizontally - multiple nodes, multiple replicas, multiple data centres - that single source of truth disappears. Now you have competing sources of truth, and they need to sync up periodically. This sync is called reconciliation or merge.
Here's where things get interesting.
When two nodes merge their state, what happens to a delete? Node A deleted a record. Node B never got that delete message (maybe a network partition, maybe just a delayed sync). Node B still has the record. When A and B merge - Node B's "present" wins. The delete is gone.
This is called the delete wins vs add wins problem, and it's not a theoretical edge case. It's what happens at 3am when your on-call engineer is trying to figure out why a banned user is still sending messages.
The naive fix - timestamps - doesn't help much either. "Most recent write wins" sounds clean until you realise clocks across distributed nodes are never perfectly in sync. You end up with a different class of inconsistency.
The scale-triggered part matters here: on a single node, you never see this. On two nodes with reliable sync, you might not notice. With ten nodes, partial connectivity, and concurrent writes - the problem is unavoidable. This isn't a bug that appears in development. It's a bug that appears in production, under load, after your third customer complaint.
How to Fix It: The Two-Toybox Mental Model
Before we talk algorithms, let me give you a mental model that will make everything else obvious.
Imagine you're a kid with two toyboxes.
The first one is the Add-box. Any toy you want in your collection goes here. You can keep adding things indefinitely.
The second one is the Remove-box. When you're done with a toy - permanently done - you move it here. Not back to the shelf. Not to a friend. Into the Remove-box, and it stays there forever.
The toys you actually "have" at any given moment are: everything in the Add-box that is NOT in the Remove-box.
That's it. That's the 2P-Set.
The magic of this model is what happens when two kids merge their toyboxes. Alice has an Add-box and a Remove-box. Bob has an Add-box and a Remove-box. When they combine:
- Combined Add-box = Alice's Add-box ∪ Bob's Add-box
- Combined Remove-box = Alice's Remove-box ∪ Bob's Remove-box
- Final collection = Combined Add-box − Combined Remove-box
Notice something critical: once a toy lands in any Remove-box across any node, that removal propagates everywhere during the next merge - and it can never be undone. The remove is permanent and irrevocable.
This is why 2P-Set works for the banned-user problem. You don't just delete the user. You move them into the Remove-set. That Remove-set is a grow-only set (just like the G-Set from Episode 4). It only ever gets bigger, never smaller. So when nodes merge, removals always survive.
Real-world rule of thumb: Use a 2P-Set when you have a distributed collection where deletions must be permanent and must survive node merges.
Where 2P-Set Fits Perfectly
Permanent bans and blacklists. Add users to the Allow-set on signup. Move them to the Ban-set on moderation action. Any node that gets the update will propagate the ban everywhere. No re-emergence after sync.
Feature flags with permanent deprecation. Your config system adds feature keys freely. When a flag is killed, it goes into the removed-set. Even if a stale node briefly re-surfaces it, the next merge corrects course.
Service registry tombstones. A microservice is decommissioned. You don't just unregister it - you add it to the removed-set. Any service-discovery node that tries to re-register the old name gets blocked, because the tombstone is present.
Access revocation. A user loses a permission. That revocation needs to hold across all nodes, even if one node was partitioned when the revocation happened. The Remove-set is how you make that hold.
The Trade-off You Need to Know About
2P-Set has one hard constraint that you shouldn't try to work around: once removed, the element is gone forever. You cannot re-add a removed element.
This sounds limiting, but it's actually the source of the guarantee. If you could re-add something after removal, you'd re-introduce the "whose update wins" problem you were trying to escape.
The practical implication: if you need to "re-add" something, model it differently. Don't re-add the same logical identifier - create a new one. A banned user who creates a new account gets a new user ID, not a resurrection of the old one.
This is a clean design constraint that forces your domain model to be explicit about intent.
The Algorithm
The formal structure is straightforward, and once you see it, it clicks immediately.
State: Each node maintains two grow-only sets:
-
A- the Add-set (all elements ever added) -
R- the Remove-set (all elements ever removed)
Operations:
Add(e):
- Precondition:
emust not be inR(can't re-add a removed element) - Effect: Insert
eintoA
Remove(e):
- Precondition:
emust be inA(can only remove what was added) - Effect: Insert
eintoR
Lookup(e):
- Returns
trueife ∈ Aande ∉ R - Equivalently:
e ∈ (A − R)
Value():
- Returns
A − R(all elements in Add-set not in Remove-set)
Merge(local, remote):
merged.A = local.A ∪ remote.A
merged.R = local.R ∪ remote.R
Both A and R are G-Sets under the hood - they only grow. Merging G-Sets via union is always safe, always commutative, always idempotent. Apply the merge in any order, any number of times, you get the same result. That's the CRDT guarantee.
Why this is conflict-free: The only operation that's "destructive" is a remove. But removes go into R, and R only grows. So across any merge, a removal that happened on any node will eventually appear in every node's R-set. The deletion propagates monotonically. There is no conflict to resolve.
Go Implementation
package crdt
import "sync"
// TwoPhaseSet implements a 2P-Set CRDT.
// Once an element is removed, it can never be re-added.
type TwoPhaseSet[T comparable] struct {
mu sync.RWMutex
addSet map[T]struct{}
removeSet map[T]struct{}
}
func NewTwoPhaseSet[T comparable]() *TwoPhaseSet[T] {
return &TwoPhaseSet[T]{
addSet: make(map[T]struct{}),
removeSet: make(map[T]struct{}),
}
}
// Add inserts an element into the set.
// Returns false if the element was previously removed - once removed, it stays removed.
func (s *TwoPhaseSet[T]) Add(item T) bool {
s.mu.Lock()
defer s.mu.Unlock()
if _, removed := s.removeSet[item]; removed {
return false // 2P-Set invariant: removed elements cannot be re-added
}
s.addSet[item] = struct{}{}
return true
}
// Remove moves an element from the add-set to the remove-set.
// The element must have been added first. Returns false otherwise.
func (s *TwoPhaseSet[T]) Remove(item T) bool {
s.mu.Lock()
defer s.mu.Unlock()
if _, exists := s.addSet[item]; !exists {
return false // can only remove what has been added
}
s.removeSet[item] = struct{}{}
return true
}
// Contains returns true if the element is in the add-set but not the remove-set.
func (s *TwoPhaseSet[T]) Contains(item T) bool {
s.mu.RLock()
defer s.mu.RUnlock()
_, inAdd := s.addSet[item]
_, inRemove := s.removeSet[item]
return inAdd && !inRemove
}
// Items returns a snapshot of all live elements (add-set minus remove-set).
func (s *TwoPhaseSet[T]) Items() []T {
s.mu.RLock()
defer s.mu.RUnlock()
items := make([]T, 0, len(s.addSet))
for item := range s.addSet {
if _, removed := s.removeSet[item]; !removed {
items = append(items, item)
}
}
return items
}
// Merge combines another 2P-Set into this one.
// Both add-sets and remove-sets are merged via union - safe, idempotent, commutative.
func (s *TwoPhaseSet[T]) Merge(other *TwoPhaseSet[T]) {
other.mu.RLock()
defer other.mu.RUnlock()
s.mu.Lock()
defer s.mu.Unlock()
for item := range other.addSet {
s.addSet[item] = struct{}{}
}
for item := range other.removeSet {
s.removeSet[item] = struct{}{}
}
}
Walkthrough with the banned-user scenario:
func main() {
nodeA := crdt.NewTwoPhaseSet[string]()
nodeB := crdt.NewTwoPhaseSet[string]()
// Both nodes add the user during normal operation
nodeA.Add("user:42")
nodeB.Add("user:42")
// Node A bans the user - this is the remove
nodeA.Remove("user:42")
// Network partition: Node B hasn't received the ban yet
fmt.Println(nodeB.Contains("user:42")) // true - B doesn't know yet
// Sync happens - B merges A's state
nodeB.Merge(nodeA)
// Now the ban has propagated
fmt.Println(nodeB.Contains("user:42")) // false - ban survived the merge
// Trying to re-add the banned user on Node B fails
fmt.Println(nodeB.Add("user:42")) // false - once removed, gone forever
}
The merge doesn't care about the order of operations or how many times you call it. The ban is permanent.
Java Implementation
import java.util.Collections;
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
public class TwoPhaseSet<T> {
private final Set<T> addSet = ConcurrentHashMap.newKeySet();
private final Set<T> removeSet = ConcurrentHashMap.newKeySet();
/**
* Adds an element to the set.
* Returns false if the element was previously removed - 2P-Set invariant.
*/
public boolean add(T item) {
if (removeSet.contains(item)) {
return false; // once removed, cannot be re-added
}
return addSet.add(item);
}
/**
* Removes an element permanently.
* Element must have been added first; returns false otherwise.
*/
public boolean remove(T item) {
if (!addSet.contains(item)) {
return false; // can only remove what was added
}
return removeSet.add(item);
}
/**
* Returns true if the element is in addSet but not in removeSet.
*/
public boolean contains(T item) {
return addSet.contains(item) && !removeSet.contains(item);
}
/**
* Returns a snapshot of all live elements (addSet minus removeSet).
*/
public Set<T> items() {
Set<T> result = new HashSet<>(addSet);
result.removeAll(removeSet);
return Collections.unmodifiableSet(result);
}
/**
* Merges another 2P-Set into this one via union of both add and remove sets.
* Safe to call multiple times - idempotent and commutative.
*/
public void merge(TwoPhaseSet<T> other) {
addSet.addAll(other.addSet);
removeSet.addAll(other.removeSet);
}
}
Walkthrough:
public class Main {
public static void main(String[] args) {
TwoPhaseSet<String> nodeA = new TwoPhaseSet<>();
TwoPhaseSet<String> nodeB = new TwoPhaseSet<>();
// Both nodes add the feature flag during rollout
nodeA.add("flag:new-checkout");
nodeB.add("flag:new-checkout");
// Flag is deprecated - permanently removed on Node A
nodeA.remove("flag:new-checkout");
// Node B is partitioned - it still sees the flag
System.out.println(nodeB.contains("flag:new-checkout")); // true
// Sync: B merges A's state
nodeB.merge(nodeA);
// Deprecation has propagated
System.out.println(nodeB.contains("flag:new-checkout")); // false
// Someone tries to re-add the deprecated flag on Node B
System.out.println(nodeB.add("flag:new-checkout")); // false - tombstone wins
}
}
The Java version uses ConcurrentHashMap.newKeySet() for thread-safe operations without explicit locking - appropriate for read-heavy workloads where you're merging state from background sync threads.
Comparing to What Came Before
You've seen the G-Set in Episode 4 - it only grows, never shrinks. Simple and elegant, but useless if you need to remove anything.
The 2P-Set is a direct evolution: you layer a second G-Set (the Remove-set) on top. The Remove-set is your delete log, and it only ever grows. Deletions become a monotonic write rather than a destructive one. That's the insight.
The One Catch Worth Repeating
The Remove-set never shrinks. In a long-lived system with lots of deletions, this means your Remove-set grows indefinitely. You can't garbage-collect it without coordination - because the whole point is that it outlives any single node.
For most use cases (bans, tombstones, feature deprecations) the Remove-set is small and this is fine. For high-churn systems where elements are added and removed at high frequency, 2P-Set isn't the right tool. That's where the OR-Set comes in - and that's exactly what Episode 6 covers.
What's Next
The next piece steps back from the counters entirely.
What if instead of tracking numbers, you need to track membership - items in a set, users in a group, tags on a document?
The same concurrent update problems apply. And a surprisingly simple grow-only constraint solves the basic case cleanly.
Episode 6: OR sets, solves the re-add problem without breaking conflict-freedom. It does this using a concept called unique tags, and it's the most practically useful set CRDT for general-purpose distributed state.
The full series:
Episode 1: G-Counter - why YouTube's view count is approximate
Episode 2: PN-Counter - Reddit karma and the two-notebook trick
Episode 3: Bounded Counter - inventory floors and distributed permission
Episode 4: G-Set - grow-only sets
Episode 5: 2P-2 Phase sets (you're here)
Episode 6: OR-Set - the fix for "deleted but still there"
Episode 7: The full picture - when to use CRDTs and when to walk away
Subscribe to the newsletter to get each episode as it drops.
If you're building distributed systems and want to think through invariant enforcement, CRDT design, or distributed architecture with someone who has been deep in this space, I do 1:1 sessions on Topmate.

Top comments (0)