The CAP Theorem Question Every Senior System Design Interview Asks

#programming #database #tutorial #architecture

Book: System Design Pocket Guide: Fundamentals
Also by me: Database Playbook
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You walk into the system design loop. Forty minutes in, the interviewer draws three letters on the whiteboard: C, A, P. They ask you to pick two. You say "AP, because partitions happen and we need availability." They nod. Then they ask the question that separates the senior candidates from everyone else: "what about latency, when there's no partition?"

That is the question. Not CAP. The follow-up. And if your answer is some variant of "well, eventual consistency is fine," the loop is over and you don't know it yet.

CAP started as Eric Brewer's 2000 PODC keynote conjecture and was proven by Gilbert and Lynch in 2002. It says that a distributed system cannot simultaneously guarantee consistency, availability, and partition tolerance. Fine. The trouble is the framing. Real production systems spend 99% of their life with no partition at all, and CAP has nothing to say about what they do during those four nines of normal time.

That is what PACELC fixes. And it is what the senior interviewer is reaching for.

Two of three is misleading

The "pick two" framing reads like a menu. It is not. Partition tolerance is not optional in any system that runs across more than one machine — networks fail, packets drop, GC pauses look like partitions to the rest of the cluster. So the real choice during a partition is between C and A. That part of CAP is honest.

What CAP does not tell you is what happens the other 99% of the time. Two AP databases can behave completely differently under normal operation. On a healthy single-DC ring, Cassandra at consistency level ONE returns the first replica that answers, typically in single-digit milliseconds. Cassandra at consistency level ALL waits for every replica, which lands in the tens of milliseconds and lets any slow node drag the tail. Same database. Same letters on the whiteboard. Two completely different latency-vs-consistency tradeoffs.

Daniel Abadi's PACELC formulation names this. If a Partition occurs, choose between A and C. Else, choose between L (latency) and C (consistency). Two axes, four corners. PA/EL, PA/EC, PC/EL, PC/EC. The first letter is the partition behavior. The second is the steady-state behavior. That is the whole framework.

Where the three usual suspects actually sit

Interviewers love these three because they cover three corners of the matrix.

DynamoDB (PA/EL). Available during a partition (writes accepted on whichever side has a leader-elected partition for a given key range). Under normal operation, AWS documents "single-digit millisecond" latency for eventually consistent reads, with strongly consistent reads costing extra and adding round-trip overhead. The default is eventually consistent. That is, it chose latency over consistency. You can flip a flag per request to get strong consistency, paying double the read capacity and noticeably more latency. The tradeoff is exposed.

Spanner (PC/EC). Google built Spanner around TrueTime, an API that returns a small interval rather than a single timestamp. Inside that interval, Spanner waits (literally sleeps) to guarantee external consistency, a commit-wait described in Corbett et al., OSDI 2012. The result is that during a partition, Spanner stays consistent (and may become unavailable on the minority side). During normal operation, it stays consistent at the cost of higher write latency. Spanner is the rare system that picks C on both axes.

Cassandra (tunable). Cassandra is fundamentally PA/EL. But the tunable consistency levels (ONE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM) let you push it toward PC/EC for a specific operation. Write at QUORUM, read at QUORUM, and you have read-your-writes consistency in a single datacenter. Add LOCAL_QUORUM across two datacenters and you keep latency reasonable. Crank to ALL and you have effectively chosen consistency over both availability and latency.

This is the senior-level answer to "where does X sit?" Name the corner. Name the knob. Name what the knob costs.

The follow-up: latency under no partition

Here is the trap. A junior candidate hears "AP system" and concludes "eventually consistent, low latency, done." A senior candidate hears "AP system" and asks: AP at what consistency level, with how many replicas, in how many regions, with what read-repair policy?

Because PA/EL is not one point. It is a region of the design space. Three engineers can each build a "PA/EL" system and end up with median read latencies of 2ms, 12ms, and 80ms — depending on quorum settings, replica placement, and whether reads cross AZs. The CAP letters do not tell you which one you have.

The follow-up question ("what about latency under no partition?") is checking whether you understand that consistency is a continuum, not a binary. Strong consistency, bounded staleness, session consistency, monotonic reads, eventual consistency. Cosmos DB exposes five consistency tiers as an account-level default with a per-request override, precisely because the latency cost differs on each.

The senior answer is to map the requirement (read-your-writes for a user's profile? eventual is fine for a feed?) to a consistency level, then accept the latency that level implies. Not the other way around.

A decision tree that actually helps

When you are sketching a system on the board and the interviewer asks "what database?", you do not have time to enumerate twelve options. You need a fast filter that gets you to the right corner of the PACELC matrix. Here is one I keep in my notes.

from dataclasses import dataclass
from typing import Literal

ConsistencyMode = Literal[
    "strong",          # read-your-writes, no staleness
    "bounded",         # staleness window in seconds
    "session",         # consistent within a single client session
    "eventual",        # converges in finite time, no bound
]

@dataclass
class Requirement:
    needs_partition_tolerance: bool   # multi-AZ or multi-region
    can_tolerate_staleness_ms: int    # 0 = strong, >0 = bounded/eventual
    p99_read_budget_ms: int           # latency SLO for reads
    write_volume_per_sec: int         # rough order of magnitude
    cross_region: bool                # active-active across regions

def pick_mode(req: Requirement) -> ConsistencyMode:
    # Strong consistency only when the budget allows it.
    if req.can_tolerate_staleness_ms == 0:
        if req.cross_region and req.p99_read_budget_ms < 50:
            raise ValueError(
                "Strong consistency across regions costs "
                "50ms+ in network alone. Relax one."
            )
        return "strong"
    # Bounded staleness for things like dashboards and feeds.
    if req.can_tolerate_staleness_ms <= 5_000:
        return "bounded"
    # Session works for single-user views (cart, profile).
    if req.write_volume_per_sec < 10_000:
        return "session"
    return "eventual"

def pick_store(mode: ConsistencyMode, req: Requirement) -> str:
    if mode == "strong":
        if req.cross_region:
            return "Spanner / CockroachDB (PC/EC)"
        return "Postgres primary (PC/EC, single-region) / DynamoDB strong reads (PA/EC for that read)"
    if mode == "bounded":
        return "Cosmos DB bounded / Cassandra QUORUM (PA/EC)"
    if mode == "session":
        return "DynamoDB session / Cosmos DB session (PA/EL)"
    return "Cassandra ONE / DynamoDB eventual (PA/EL)"

Run a sample requirement through it.

req = Requirement(
    needs_partition_tolerance=True,
    can_tolerate_staleness_ms=200,
    p99_read_budget_ms=20,
    write_volume_per_sec=5_000,
    cross_region=False,
)
mode = pick_mode(req)        # "bounded"
store = pick_store(mode, req)  # Cosmos DB bounded / Cassandra QUORUM

The code is not the point. The decision tree is the point. You cannot answer "what database" without first answering "what consistency mode," and you cannot answer "what consistency mode" without first answering "what is the staleness budget and the latency budget." The interviewer wants to see you walk that chain.

What the staleness budget actually buys you

Every millisecond of staleness you can tolerate widens your menu of stores. Zero milliseconds means you are paying coordination latency on every read: Spanner's TrueTime wait, or a quorum round trip across replicas. A few hundred milliseconds means you can read from a local replica and let async replication catch up. A few seconds means you can cache aggressively and your store choice barely matters.

The mistake junior candidates make is to treat "strong consistency" as a feature they want for free. It is never free. In a single-region Postgres primary, strong consistency costs you a synchronous fsync and the latency of the leader's disk. In a multi-region Spanner deployment, it costs you the speed of light between regions plus TrueTime padding. In Cassandra at QUORUM, it costs you the slowest of two replicas.

The senior answer reframes the question. The user does not want strong consistency. The user wants something that depends on consistency: read-your-writes after a checkout, monotonic reads on a leaderboard, no double-charge on payment. Each of those has a cheaper local solution than turning on global strong consistency for the whole store.

The closing move

When the interviewer asks the latency-under-no-partition question, the move is to draw a small two-by-two on the board. Partition behavior on one axis. Steady-state behavior on the other. Place DynamoDB in one corner, Spanner in another, Cassandra straddling. Then say: the choice is not which letters to keep. The choice is which corner your workload lives in, and how often you cross it.

That is the answer that ends the round. You did not memorize PACELC. You treated consistency as a knob with a price tag, and you can quote the price.

If this was useful

The matrix above is the same one I keep in System Design Pocket Guide: Fundamentals, with worked examples for each corner and the latency budgets you can expect from each store. If you want the deeper "which database for which workload" reasoning — sharding, replication topologies, when to cross from OLTP to OLAP — that lives in the Database Playbook.