The CAP Theorem in Practice: Making the Right Trade-offs at Scale

#architecture #database #distributedsystems #systemdesign

Every distributed system you build is already taking a side in the CAP trade-off. The question is whether you made that choice deliberately or discover it during an incident.

CAP states that a distributed system can guarantee at most two of three properties: Consistency, Availability, and Partition Tolerance. The critical insight most teams miss — P is not optional. Networks fail. Pods crash. AZs go dark. You are choosing between CP and AP. Full stop.

CP vs AP: What You're Actually Trading

CP systems (etcd, ZooKeeper, CockroachDB) refuse to serve requests during a partition rather than return stale data. Leader-based consensus ensures correctness. Choose CP for financial ledgers, inventory reservation, distributed locks — any domain where stale reads are more dangerous than errors.

AP systems (Cassandra, DynamoDB, DNS) continue serving requests during a partition, accepting diverging state. Reconciliation happens later. Choose AP for user feeds, shopping carts, session data — any domain where temporary inconsistency is tolerable and availability is a hard SLA.

Neither is universally correct. What is unacceptable is having no defined behavior.

PACELC: The Model That Actually Matches Production

CAP only describes behavior during partitions. Your system spends most of its time healthy. PACELC extends CAP: even during normal operation, you are trading Latency against Consistency.

A CP system with synchronous replication pays a latency tax on every write — all the time, not just during incidents. DynamoDB offers eventual consistency by default (low latency) or strong consistency per read (higher latency). The trade-off is continuous, not just during failures.

Architectural Patterns Shaped by CAP

Saga Pattern — inherently AP. Each local transaction commits immediately (available). Global consistency is eventual. Compensating transactions are your consistency guarantee, not your database.

CQRS + Event Sourcing — assigns CP to commands (strong consistency via transactional aggregate root) and AP to queries (eventual consistency via denormalized projections). You are not picking one model — you are assigning different models per use case.

Tunable Consistency (Cassandra) — CONSISTENCY QUORUM on reads and writes achieves CP behavior. CONSISTENCY ONE maximizes AP. Tune per operation, not per cluster. User profile reads can tolerate eventual consistency. Payment status reads cannot.

Common Mistakes

Treating CAP as a database property — it is a system property. Your retry logic, caching, and timeout behavior all participate in the trade-off.
Assuming strong consistency is always safer — CP under partition returns errors. Cascading timeouts from a blocked write path can cause a larger outage than serving stale data.
One consistency model across the entire system — your order service (CP), product catalog (AP), session store (AP), and audit log (CP) should not share a single strategy.

Read the Full Article

This is a summary of my deep dive into CAP theorem trade-offs. The full article covers CP vs AP with canonical examples, the PACELC model, architectural patterns, and a decision framework:

👉 The CAP Theorem in Practice — Full Article

The full article includes: