DEV Community

OSVALDO ALVES
OSVALDO ALVES

Posted on

The Architect’s Dilemma — Ep. 1: CAP Theorem in the Real World

Most seasoned architects have been through something like this:

a tense meeting, several stakeholders at the table, deadlines tight — and suddenly someone asks:

“But why can’t the system be consistent and available at the same time?”

It’s in moments like this that we’re reminded that architecture is, above all, about making difficult decisions under non-negotiable constraints.

It’s about trade-offs. It’s about giving things up.

And this series exists to explore exactly these dilemmas.

In the next episodes, we’ll dive into choices that shape modern systems: microservices vs. monoliths, CQRS, event-driven architectures, caching, idempotency, organizational scalability, and more.

Always from a practical, real-world perspective — far from pretty slides and close to the problems that actually hurt.

So let’s begin with a classic.


CAP Theorem in the Real World

The CAP Theorem is no longer an academic curiosity. Introduced by Eric Brewer in 2000, it states that a distributed system cannot simultaneously guarantee all three properties:

  • Consistency
  • Availability
  • Partition Tolerance

In practice, it shows up when your system grows, traffic increases, and non-functional requirements become harder. Suddenly, you’re forced to choose which type of pain you prefer.

The CAP Theorem isn’t theoretical — it manifests every time you need to decide what happens when the network fails.

And networks always fail.


1. Context: When This Dilemma Appears

CAP becomes relevant whenever you have:

  • Distributed data (replication, multiple nodes, multi-region)
  • Requirements for high availability
  • Pressure for low latency
  • Users operating from multiple locations
  • Concurrent write operations

Common triggers:

  • Traffic growth requiring replicas
  • Geographic expansion requiring multi-region
  • Increased availability requirements
  • Non-centralized persistence

This leads to the big question:

Which pillar are you willing to relax — consistency, availability, or partition tolerance?


2. The Options: What Each Alternative Means

CAP defines three properties, but you can only guarantee two in the presence of network failure.

C — Consistency (Strong Consistency)

All nodes see the same data at the same time.

A — Availability

The system always responds, even if with outdated data.

P — Partition Tolerance

The system continues operating even if communication between nodes is broken.

⚠️ In real-world systems, Partition Tolerance is mandatory.

So the choice becomes:

➡️ CP vs. AP


3. Trade-offs

CP (Consistency + Partition Tolerance)

Advantages

  • Strong consistency
  • Predictability in sensitive domains
  • Avoids critical data errors

Disadvantages

  • May reject requests during partitions
  • Higher latency
  • Lower availability perception

AP (Availability + Partition Tolerance)

Advantages

  • Keeps responding even with failures
  • Lower latency
  • Great for real-time UX

Disadvantages

  • Eventual consistency
  • Conflict resolution required
  • Higher cognitive complexity

4. Decision Factors

4.1. Nature of the data

  • Critical → CP
  • Approximate → AP

4.2. UX latency expectations

  • Can wait → CP
  • Must be instant → AP

4.3. Geographic distribution

  • Single region → flexible
  • Multi-region → often AP

4.4. Acceptable latency

  • Low latency → AP

4.5. Team maturity

  • Less experience → CP
  • Higher expertise → AP

4.6. Business impact

  • Errors are costly → CP
  • Errors are tolerable → AP

5. Use Cases

Choose CP when:

  • Financial transactions
  • Critical inventory
  • Process orchestration
  • High auditability

Choose AP when:

  • Social networks
  • Counters, metrics, likes
  • Read-heavy catalogs
  • Global low-latency apps
  • Recommendations

6. Warning Signs of a Wrong Choice

In CP:

  • Slowness complaints
  • Traffic spikes compromising nodes
  • Replication bottlenecks

In AP:

  • Inconsistencies generate rework
  • Write conflicts become frequent
  • UX fluctuation (data “jumping”)

7. How to Evolve Between Models

From CP → AP:

  • Prepare models for divergence
  • Add conflict resolution
  • Implement read-repair
  • Reduce transactional boundaries

From AP → CP:

  • Identify critical domains
  • Centralize degree of truth
  • Use event logs
  • Reduce distributed surfaces

Architecture is about balance, not extremes.


Conclusion

The CAP Theorem is a practical reminder:

There is no perfect distributed system.

There is only the system that best fits your context.

Architecture is about choices — and every choice has a cost.

If this helped clarify a decision, leave a like and share your experience in the comments.

Let’s learn together.

Top comments (0)