Mehmet TURAÇ

Posted on May 28

Architecture of Chaos Part 2 — CAP Is Dead, Long Live PACELC (and CRDTs That Actually Work)

#architecture #database #distributedsystems #systemdesign

This is Part 2 of the Architecture of Chaos series. Start from Part 1 here.

⚠️ Names, companies, and specific details are composite/fictional. Patterns and code are drawn from real production experience.

Chapter 3: The CAP Theorem Is Garbage, PACELC, and the Speed-of-Light Wall

The CTO's Question: "Which Database?"

Second month. CTO Serkan called me to his office:

"Selim, we need to pick a database for the bid-service. Cassandra, CockroachDB, or Spanner? Let's decide based on the CAP theorem. Do we want Consistency or Availability?"

I smiled internally. Because the CAP theorem, in 2026, is too old and too incomplete to answer that question. I didn't tell Serkan that outright (telling your CTO "your theory is garbage" isn't a great career move). Instead, I went to the whiteboard and drew PACELC.

Why CAP Falls Short

CAP theorem (Eric Brewer, 2000) says: A distributed system can provide at most two of three guarantees:

Consistency: Every read returns the latest write
Availability: Every request gets a response
Partition Tolerance: System works despite network splits

Since network partitions are inevitable (P is mandatory), you practically choose C vs A. Simple, elegant, wrong.

Why wrong? Because CAP only tells you what happens during a network partition. In production, network partitions happen a few hours per year (if at all). 99.99% of the time, there's no partition. And CAP says absolutely nothing about system behavior when there's no partition.

PACELC: Real-World Decisions

Abadi's 2010 PACELC framework (pronounced "pass-else-see") extends CAP:

If Partition (P) → choose Availability (A) vs Consistency (C)
Else (E) → choose Latency (L) vs Consistency (C)

The real critical decision is in the second line. When there's no partition, do you want low Latency or strong Consistency?

In our case:

Auctions last 30 minutes
Tokyo–New York RTT is 140ms
Users expect response within 50ms of bidding
Strong Consistency means Tokyo bids wait for Virginia confirmation = 140ms minimum
That means losing ~40% of customers

Decision was clear: No partition → choose Latency (EL). Partition → choose Availability (PA). This puts us in the PA/EL category. Cassandra, DynamoDB, Riak live here.

But choosing PA/EL doesn't mean "we gave up on consistency." It means "we'll achieve consistency asynchronously." Critical distinction.

The Speed-of-Light Wall (Literally)

Let's talk physics. Because most developers think latency is an "improvable engineering problem." It's not.

New York → Tokyo:        ~140ms RTT (fiber, 67% of light speed)
New York → London:       ~70ms RTT
London → Singapore:      ~170ms RTT
Frankfurt → Mumbai:      ~110ms RTT

These numbers come from the speed of light and the refractive index of fiber optic cables. You cannot reduce them with software. Just as you can't change the speed of light.

If you want Strong Consistency (linearizability), every write needs a Paxos/Raft round-trip. That's minimum 1 RTT. Writing from Tokyo to Virginia = 140ms minimum latency.

But we need to stay under 50ms. So what do we do?

Multi-Active Architecture

The old active-passive setup:

┌─────────────────┐         ┌─────────────────┐
│  us-east-1      │         │  eu-west-1      │
│  [MASTER]       │ ──────► │  [REPLICA]      │
│  Write + Read   │  async  │  Read-Only      │
└─────────────────┘         └─────────────────┘

Tokyo users had to go to us-east-1 for writes. 140ms. They could read from eu-west-1, but then no "read-your-writes" consistency.

New architecture — Multi-Active:

┌──────────────┐   ┌──────────────┐   ┌──────────────┐
│  us-east-1   │◄─►│  eu-west-1   │◄─►│ ap-northeast │
│  [MASTER]    │   │  [MASTER]    │   │  [MASTER]    │
│  Write+Read  │   │  Write+Read  │   │  Write+Read  │
└──────────────┘   └──────────────┘   └──────────────┘
        ▲                  ▲                  ▲
        └──────────────────┼──────────────────┘
              Async Event Mesh (Kafka/Pulsar)

Every region serves its own users. Writes accepted locally (0ms latency). Events propagate asynchronously to other regions. Consistency achieved "eventually."

But What About Conflicts?

If Tokyo and Virginia write to the same auction simultaneously, two different "truths" emerge. This is a write-write conflict. And in our business rules, this is unacceptable: one auction can't have two different "highest bids."

This is where mathematics came to the rescue. Enter: CRDTs.

Battle Scar #4

Lesson: Saying "we decided based on CAP" is like buying a car and saying "it should have wheels." The real decisions happen in PACELC's second line. And that decision determines your user experience, cost, and operational complexity. It's not a theoretical choice — it's a strategic one.

Why didn't we use CockroachDB or Spanner for "automatic strong consistency"? Simple but brutal: money. Spanner's TrueTime needs GPS receivers and atomic clocks. Google can do this because of 20 years of infrastructure investment. Replicating it would cost us $2-3M/year extra. CockroachDB is cheaper but still adds 1 RTT on multi-region writes. Too much for our 50ms target.

Chapter 4: Mathematics to the Rescue: CRDTs (Conflict-free Replicated Data Types)

Figma's Secret

Fourth month. One evening I was working on mockups in Figma. Six designers in the same file, simultaneously dragging elements around. Zero conflicts, zero "merge conflicts." Everything synced in real-time across every screen.

"How is this possible?" I wondered. Then I opened Figma's engineering blog and found the answer: CRDTs (Conflict-free Replicated Data Types).

CRDTs were introduced in 2011 by Marc Shapiro et al. The core idea: Some data structures can be designed so that merged in any order, at any time, they always converge to the same result. No central coordinator, no locks, no consensus needed.

The Mathematical Guarantees

For a data structure to qualify as a CRDT, its merge function must satisfy three properties:

Commutative: merge(A, B) = merge(B, A)
Associative: merge(A, merge(B, C)) = merge(merge(A, B), C)
Idempotent: merge(A, A) = A

These three properties guarantee the merge can be performed in any order, any number of times, at any moment. Mathematically, this is a join-semilattice.

Our First CRDT: G-Counter (Grow-only Counter)

Many AeroBid metrics are "grow-only": total bid count, total viewer count, total transaction volume. For these, G-Counter is perfect.

Rust implementation for memory safety and performance:

// crdt/gcounter.rs — Production
use std::collections::HashMap;
use std::sync::RwLock;

/// Grow-only Counter: can only be incremented, never decremented.
/// Each node keeps its own counter. Total = sum of all nodes.
pub struct GCounter {
    node_id: String,
    counts: RwLock<HashMap<String, u64>>,
}

impl GCounter {
    pub fn new(node_id: String) -> Self {
        let mut initial = HashMap::new();
        initial.insert(node_id.clone(), 0);
        GCounter { node_id, counts: RwLock::new(initial) }
    }

    pub fn increment(&self, delta: u64) {
        let mut counts = self.counts.write().unwrap();
        *counts.entry(self.node_id.clone()).or_insert(0) += delta;
    }

    pub fn value(&self) -> u64 {
        self.counts.read().unwrap().values().sum()
    }

    pub fn state(&self) -> HashMap<String, u64> {
        self.counts.read().unwrap().clone()
    }

    /// Merge with another node's state: max(local, remote) per node
    /// This is idempotent and commutative
    pub fn merge(&self, other_state: HashMap<String, u64>) {
        let mut counts = self.counts.write().unwrap();
        for (node_id, count) in other_state {
            let entry = counts.entry(node_id).or_insert(0);
            *entry = (*entry).max(count);
        }
    }
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_concurrent_increments() {
        let node_a = GCounter::new("tokyo".to_string());
        let node_b = GCounter::new("virginia".to_string());

        node_a.increment(5);
        node_b.increment(3);

        assert_eq!(node_a.value(), 5);
        assert_eq!(node_b.value(), 3);

        // Merge both directions
        node_a.merge(node_b.state());
        node_b.merge(node_a.state());

        // BOTH SEE THE SAME VALUE: 8
        assert_eq!(node_a.value(), 8);
        assert_eq!(node_b.value(), 8);
    }

    #[test]
    fn test_idempotency() {
        let node = GCounter::new("tokyo".to_string());
        node.increment(5);

        let other = HashMap::from([("virginia".to_string(), 3)]);
        for _ in 0..10 { node.merge(other.clone()); }

        assert_eq!(node.value(), 8);  // Still 8, not 38!
    }
}

Production Use: Live Viewer Counter

AeroBid has a "live viewers" counter for every auction. This counter goes up when someone opens the page and down when they leave. Up to 50,000+ concurrent viewers.

We used to handle this with Redis INCR/DECR. A single Redis instance handled 100K ops/sec but was a single point of failure. Redis Cluster brought cross-shard inconsistencies.

New solution: Each node maintains its own PN-Counter. Every 5 seconds, all nodes publish their state to a Kafka topic. Each node merges others' states.

Result: We shut down the Redis cluster. Each node manages its own state. Even if the network cuts, local counters keep working. When the network returns, all states merge and the final value is correct. Zero data loss, zero locks, zero coordinators.

Why CRDTs Don't Solve Everything

At this point my team was excited. "Let's make everything a CRDT! Wallet balances, auction results, everything!"

I stopped them. Because CRDTs have critical limitations:

Invariants can't be enforced: "Deduct 50 from A, add 50 to B" as an atomic conditional operation is impossible with CRDTs.
Negative balances: If "balance ≥ 0" is an invariant, G-Counter or PN-Counter can't enforce it.
Conditional updates: "If X then do Y" logic doesn't exist in CRDTs.

Can we manage the "highest bid" field with CRDTs? No. Because there's an invariant: "update only if new bid > current bid." That's beyond CRDT's expressive power.

These limitations pushed us toward a CRDT + Event Sourcing hybrid. And that's the next chapter.

Battle Scar #5

Lesson: CRDTs are not a silver bullet. Use them for "user-facing metrics that don't require financial consistency." Viewer counts, like counts, favorite lists, cart contents — CRDTs are perfect. But wallet balances, auction outcomes, escrow state need different tools. Using the right tool in the right place is the essence of engineering.

Coming up next: Chapter 5 covers why CRDTs fail for financial ledgers and how Event Sourcing saved our audit trail. Then Chapter 6 dives into the nightmare of Distributed Sagas — when a transatlantic fiber cable gets cut and 47 workflows are left hanging mid-transaction.

DEV Community