Archit Agarwal

Posted on May 26 • Originally published at Medium

You're Writing Conflict Resolution Code for Data That Can Never Conflict

#distributedsystems #go #softwareengineering #java

You've built complex sync logic, last-write-wins strategies, and vector clocks to handle merges between distributed nodes. But for append-only data — events, logs, audit trails — none of that conflict resolution was ever necessary. At scale, that accidental complexity quietly drops data, creates race conditions, and pages you at 3 AM. There's a data structure that makes the entire problem disappear.

Here's a scenario that plays out more often than people admit.

You're running a multi-region analytics service. Writes come in from Singapore and Amsterdam simultaneously. Both nodes accept events independently — which is correct, because you don't want a transatlantic network hiccup to stop recording user activity. Then the nodes sync.

Now what? Someone has to decide what the "merged" state looks like. Last-write-wins? You'll silently drop events that lost the timestamp race. Vector clocks? Now you're maintaining causal history for every key. Operational transforms? You've just signed up to maintain a distributed consensus protocol. Each of these approaches is a moving part that can fail, drift, or produce surprising behavior under weird race conditions.

Here's what makes this painful: the data was never actually conflicting. An event recorded in Singapore and an event recorded in Amsterdam are two different facts about the world. There's no conflict to resolve. You invented the complexity by reaching for a data structure that was never the right fit.

For append-only data, there's a data structure that makes the entire problem structurally impossible: the Grow-only Set, or G-Set. It's one of the simplest CRDTs (Conflict-free Replicated Data Types), and once you understand it, you'll recognize the pattern everywhere — and stop reaching for solutions to problems you don't actually have.

Let's pull this apart properly.

The Problem: Two Replicas, One Truth, Zero Agreement

To understand why G-Sets matter, you need to feel the pain they solve first.

Imagine you're running a user analytics service. You have two data center nodes — call them Node A (Singapore) and Node B (Amsterdam). Both are recording user events. Both accept writes independently so that a network partition between Singapore and Amsterdam doesn't bring your whole system down. That's the right call. That's the whole point of distributed systems.

Here's where it gets interesting.

A user in Singapore clicks "Purchase". Node A records the event. A user in Amsterdam completes a form. Node B records that event. So far, so good.

Now Node A and Node B sync up. In a naive system backed by a regular set or a plain key-value store, you'd ask: who wins? If both nodes added different things with the same key, you'd need a conflict resolution strategy. Last-write-wins? Vector clocks? Operational transforms? Each of these is a moving part that can fail, drift, or produce surprising behavior under weird race conditions.

And here's the subtle, nasty part: the problem doesn't show up when you have two nodes. It shows up when you have twenty. It shows up when network partitions happen mid-sync. It shows up when one node is lagging by 40 seconds and another is lagging by 4 minutes. It shows up when you're merging states that went through three intermediate nodes before reaching the final destination.

At small scale, your ad-hoc conflict resolution seems fine. At scale — multiple data centers, hundreds of nodes, millions of events per minute — the edge cases multiply faster than your ability to reason about them. Eventually, you're not building a feature. You're building a distributed consensus protocol from scratch, badly, while also shipping product.

The Fix, in Plain English

What if you just… didn't allow the conflicts to exist in the first place?

That's the core insight behind G-Sets. Instead of building conflict resolution for a general-purpose mutable set, you restrict the operation set to only what can never conflict: addition only.
Think about it like a physical toy box with a one-way slot — like a ballot box. You can drop things in. You cannot take things out. The slot only goes one direction.

Now imagine two kids (two nodes) each have their own toy box. They both drop toys in independently, all day long, without talking to each other. At the end of the day, you want one combined box with everything both kids put in.

The merge is trivial: you just take everything from both boxes. There is no conflict because there was never a competing operation. Kid A adding a rubber duck doesn't conflict with Kid B adding a LEGO set. Addition + Addition = Union. Always. No exceptions.

That's a G-Set.

The formal guarantee is this: at any point in time, on any replica, the set of elements only grows. It never shrinks. When two replicas merge, they compute the union of their states. The union of two sets is commutative (order doesn't matter), associative (grouping doesn't matter), and idempotent (merging the same thing twice gives the same result). Those three properties together mean you can sync replicas in any order, any number of times, across any topology, and you always converge to the same state.

No locks. No coordination. No conflict resolution. No 3 AM pages.

When You'd Actually Use This

G-Sets shine everywhere you naturally think "append-only". Here are five real-world patterns where this isn't just theoretical — it's the right tool:

User event logs in analytics. Every click, login, page view, or purchase is an event that happened. Events don't un-happen. You record them, and they stay forever. Each node adds events to its local G-Set replica. When replicas sync, the union is the full history. No event is lost, no event is double-counted from merging.
Training data pipelines in ML. Different ingestion nodes might be collecting labeled training samples from different regions. Once a sample is added to the corpus, it stays. The "global training set" is just the union of everything every node has seen. Merging two replicas is just "take all samples from both" — which is exactly what you want.
Config version history. A service that tracks every version of a configuration key — so you can replay history, debug what changed, or roll back — needs to keep all version metadata permanently. You only ever insert version entries; you never delete them. A G-Set gives you that audit trail with automatic merge semantics across replicas.
Social graph: follow edges. In a "follow" relationship model (think Twitter/X, GitHub stars), you add edges to a graph: "user A followed user B". If your system doesn't support unfollow (or handles unfollow via a different mechanism), the follow-graph itself is a G-Set. New follows can be added from any replica without coordination, and they all merge cleanly.
Permission grant audit trail. Every time a user is granted a permission — "can read resource X", "can deploy to environment Y" — that grant is recorded permanently. You don't delete grants; you expire them or supersede them via a newer grant. The history of all grants ever issued is a G-Set across your auth service replicas.

Notice the pattern: G-Sets are perfect wherever the business semantics naturally say "this thing happened and we never want to forget it."

The Algorithm

The G-Set is deliberately minimal. Three operations, all of them simple:

State: A set S of elements.

Add(element): Insert the element into the local set. This is always safe. It never conflicts with anything. Complexity: O(1) average.

Lookup(element): Check if an element is in the local set. Returns true if present. Complexity: O(1) average.

Merge(remote_state): Given another replica's set, compute the union of both sets and store it locally. Formally: S_local = S_local ∪ S_remote. Complexity: O(|S_remote|).

The convergence proof is embarrassingly simple:

Any two replicas that have seen the same set of Add operations will have the same state.
Merge is monotonically non-decreasing — the set can only grow, never shrink.
Because merge is a union operation, it's commutative (A ∪ B = B ∪ A), associative ((A ∪ B) ∪ C = A ∪ (B ∪ C)), and idempotent (A ∪ A = A).
Therefore, regardless of the order in which replicas sync, they will all eventually converge to the union of all elements ever added.

This property — where any number of merges in any order reaches the same final state — is called Strong Eventual Consistency (SEC). It's a stricter and more useful guarantee than plain eventual consistency, because it means you don't just eventually agree; you agree immediately once you've seen the same information.

One important constraint to keep in mind: G-Sets have no Remove operation. That's not a bug — it's the feature. The moment you add removal, you open the door to conflicts (what if one replica removes an element that another replica just added?). If you need removal semantics, you'd graduate to a 2P-Set (Two-Phase Set) or an OR-Set (Observed-Remove Set), which are more complex CRDTs. But if your use case is naturally append-only, don't reach for the complex tool.

Implementation in Go

package crdt

import "sync"

// GSet is a Grow-only Set — elements can be added but never removed.
// Safe for concurrent use via a read-write mutex.
type GSet[T comparable] struct {
    mu   sync.RWMutex
    data map[T]struct{}
}

func NewGSet[T comparable]() *GSet[T] {
    return &GSet[T]{
        data: make(map[T]struct{}),
    }
}

// Add inserts an element into the set.
// Always safe: no conflict is possible with a pure add operation.
func (s *GSet[T]) Add(item T) {
    s.mu.Lock()
    defer s.mu.Unlock()
    s.data[item] = struct{}{}
}

// Contains returns true if the element is present in the set.
func (s *GSet[T]) Contains(item T) bool {
    s.mu.RLock()
    defer s.mu.RUnlock()
    _, ok := s.data[item]
    return ok
}

// Items returns a snapshot of all elements currently in the set.
func (s *GSet[T]) Items() []T {
    s.mu.RLock()
    defer s.mu.RUnlock()
    items := make([]T, 0, len(s.data))
    for item := range s.data {
        items = append(items, item)
    }
    return items
}

// Merge computes the union of this set and another replica's set.
// Commutative, associative, idempotent — safe to call in any order, any number of times.
func (s *GSet[T]) Merge(other *GSet[T]) {
    other.mu.RLock()
    snapshot := make([]T, 0, len(other.data))
    for item := range other.data {
        snapshot = append(snapshot, item)
    }
    other.mu.RUnlock()

    s.mu.Lock()
    defer s.mu.Unlock()
    for _, item := range snapshot {
        s.data[item] = struct{}{}
    }
}

// Example: two analytics nodes recording events independently, then syncing.
func ExampleTwoReplicas() {
    // Node A: Singapore data center
    nodeA := NewGSet[string]()
    nodeA.Add("event:user_123:purchase")
    nodeA.Add("event:user_456:login")

    // Node B: Amsterdam data center
    nodeB := NewGSet[string]()
    nodeB.Add("event:user_789:pageview")
    nodeB.Add("event:user_456:login") // same event recorded on both nodes

    // Sync: merge B into A
    nodeA.Merge(nodeB)

    // nodeA now contains all three unique events.
    // The duplicate "event:user_456:login" is handled automatically — idempotency in action.
    // Result: {"event:user_123:purchase", "event:user_456:login", "event:user_789:pageview"}
    fmt.Println(nodeA.Items())
}

A few things worth pointing out in this implementation:

The mutex strategy matters here. We take a snapshot of other.data under other's read lock first, then release it before acquiring s's write lock. This avoids a classic deadlock scenario where two goroutines simultaneously try to merge into each other, each waiting for the other's lock. Snapshot-then-write is the safe pattern.

The generic constraint [T comparable] is load-bearing — Go's map keys must be comparable, so string, int, UUID types all work. Custom structs work too as long as all fields are comparable.

Implementation in Java

import java.util.Collections;
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.locks.ReadWriteLock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

/**
 * GSet — a Grow-only Set CRDT.
 *
 * Elements can only be added, never removed.
 * Merge is a union operation: commutative, associative, and idempotent.
 * Thread-safe via a ReentrantReadWriteLock.
 *
 * @param <T> the type of elements; must implement equals() and hashCode() correctly.
 */
public class GSet<T> {

    private final Set<T> data = new HashSet<>();
    private final ReadWriteLock lock = new ReentrantReadWriteLock();

    /**
     * Add an element to the set.
     * Always safe — a pure add can never conflict with any other operation.
     */
    public void add(T item) {
        lock.writeLock().lock();
        try {
            data.add(item);
        } finally {
            lock.writeLock().unlock();
        }
    }

    /**
     * Check if an element is present in the set.
     */
    public boolean contains(T item) {
        lock.readLock().lock();
        try {
            return data.contains(item);
        } finally {
            lock.readLock().unlock();
        }
    }

    /**
     * Returns an immutable snapshot of all elements currently in the set.
     */
    public Set<T> items() {
        lock.readLock().lock();
        try {
            return Collections.unmodifiableSet(new HashSet<>(data));
        } finally {
            lock.readLock().unlock();
        }
    }

    /**
     * Merge another replica's state into this set (union operation).
     *
     * Safe to call in any order, any number of times.
     * A snapshot of the other set is taken under the other set's read lock
     * before acquiring this set's write lock — prevents deadlock on mutual merge.
     */
    public void merge(GSet<T> other) {
        // Snapshot first, under other's read lock
        Set<T> snapshot = other.items();

        lock.writeLock().lock();
        try {
            data.addAll(snapshot);
        } finally {
            lock.writeLock().unlock();
        }
    }
}

// Example: simulating two service nodes syncing their event logs
public class GSetsExample {
    public static void main(String[] args) {
        // Node A: US-East
        GSet<String> nodeA = new GSet<>();
        nodeA.add("grant:user_101:read:resource_A");
        nodeA.add("grant:user_202:write:resource_B");

        // Node B: EU-West
        GSet<String> nodeB = new GSet<>();
        nodeB.add("grant:user_303:read:resource_C");
        nodeB.add("grant:user_202:write:resource_B"); // same grant, recorded independently

        // Sync: merge B into A
        nodeA.merge(nodeB);

        // nodeA now contains all three unique grants.
        // Duplicate handled automatically via HashSet semantics.
        nodeA.items().forEach(System.out::println);
        // Output (order may vary):
        // grant:user_101:read:resource_A
        // grant:user_202:write:resource_B
        // grant:user_303:read:resource_C
    }
}

The Java version's items() returning an unmodifiable snapshot does double duty: it protects the internal state from external mutation and serves as the snapshot mechanism that merge() relies on for the deadlock-safe two-phase lock acquisition.

One Java-specific note: make sure your element type T has a correct equals() and hashCode() implementation. HashSet.add() uses these to determine uniqueness. If you're storing custom objects (like a UserEvent record), and you forget to override these, you'll end up with duplicate "equal" events in your set — which defeats the idempotency guarantee. Java records (record UserEvent(String userId, String type)) handle this correctly by default.

The Same Data Structure, Everywhere You Look

Once you internalize G-Sets, you start recognizing the pattern in places you never expected. The question to ask is simple: "Does this data only ever grow? Is removing something not a real operation, just a display concern?" If yes, you have a G-Set waiting to happen.

IoT sensor readings. A fleet of temperature sensors across a warehouse each record readings independently. Readings never get un-taken. The "global history of all readings" is the union of every sensor's local set. G-Set handles the merge across all nodes with zero coordination.

Distributed tag systems. Content tagging in a CMS or knowledge base: articles accumulate tags across different editor nodes. Tags are added freely; "removing" a tag is typically a business-level soft delete handled separately. The growth of tags across replicas is a pure G-Set problem.

Service discovery registries. When microservices register themselves with a discovery service, each new registration is an append. The set of "services that have ever been registered" is a G-Set. Deregistration is a separate concern (and a different CRDT problem).

Feature flag rollout tracking. Which user IDs have had a feature enabled for them — for audit or rollback purposes — is naturally append-only. You enable features for more users over time, and you want every region's view of "who saw this feature" to converge to the same answer.

Distributed deduplication registries. Idempotency keys for payment systems or message queues. Once a key is "seen", it's in the set forever. Multiple nodes can independently record the same key without conflict — the G-Set's idempotency handles duplicates automatically.

Blockchain-style ledgers. Transaction IDs in a distributed ledger. New transactions are appended; nothing is ever removed. Each participating node maintains a local G-Set of confirmed transaction IDs, and merging across nodes gives you the complete global ledger with zero conflict resolution needed.

The unifying mental model: any time your data represents facts that happened, rather than state that changes, you're looking at a G-Set. Events happened. Grants were issued. Sensors fired. Services registered. None of those facts un-happen. Stop treating them like mutable state — and stop writing conflict resolution code for them.

What G-Sets Can’t Do (and That’s Fine)

It would be dishonest to end without stating the obvious limitation: you cannot remove elements. A G-Set grows forever.

This means:

If you need “soft deletes” or element removal, G-Set is the wrong tool. Look at 2P-Sets or OR-Sets.
If your set grows unboundedly and you care about memory, you’ll need a tombstone strategy or a compaction layer on top.
If the “has this element ever been added” semantic doesn’t match your domain (i.e., you need “is this element currently active”), you’re modeling the wrong thing as a G-Set.

But for the patterns where it does fit — event logs, audit trails, append-only graphs, training data — the G-Set gives you something rare in distributed systems: a correctness guarantee that doesn’t require you to think harder. It just works.

And in systems at scale, “it just works” is worth more than almost anything else.

What’s Next

The next piece steps back from the counters entirely.

What if instead of tracking numbers, you need to track membership — items in a set, users in a group, tags on a document? The same concurrent update problems apply. And a surprisingly simple grow-only constraint solves the basic case cleanly.

Episode 5: 2P — 2-Phase sets, where we add the ability to remove elements — and discover why even that simple extension requires careful design.

The full series:

Episode 1: G-Counter — why YouTube’s view count is approximate
Episode 2: PN-Counter — Reddit karma and the two-notebook trick
Episode 3: Bounded Counter — inventory floors and distributed permission
Episode 4: G-Set — grow-only sets (you’re here)
Episode 5: 2P-2 Phase sets
Episode 6: OR-Set — the fix for “deleted but still there”
Episode 7: The full picture — when to use CRDTs and when to walk away

Subscribe to the newsletter to get each episode as it drops.

If you’re building distributed systems and want to think through invariant enforcement, CRDT design, or distributed architecture with someone who has been deep in this space, I do 1:1 sessions on Topmate.

DEV Community