OSVALDO ALVES

Posted on Nov 24

The Architect’s Dilemma — Ep. 1: CAP Theorem in the Real World

#distributedsystems #systemdesign #architecture

Most seasoned architects have been through something like this:

a tense meeting, several stakeholders at the table, deadlines tight — and suddenly someone asks:

“But why can’t the system be consistent and available at the same time?”

It’s in moments like this that we’re reminded that architecture is, above all, about making difficult decisions under non-negotiable constraints.

It’s about trade-offs. It’s about giving things up.

And this series exists to explore exactly these dilemmas.

In the next episodes, we’ll dive into choices that shape modern systems: microservices vs. monoliths, CQRS, event-driven architectures, caching, idempotency, organizational scalability, and more.

Always from a practical, real-world perspective — far from pretty slides and close to the problems that actually hurt.

So let’s begin with a classic.

CAP Theorem in the Real World

The CAP Theorem is no longer an academic curiosity. Introduced by Eric Brewer in 2000, it states that a distributed system cannot simultaneously guarantee all three properties:

Consistency
Availability
Partition Tolerance

In practice, it shows up when your system grows, traffic increases, and non-functional requirements become harder. Suddenly, you’re forced to choose which type of pain you prefer.

The CAP Theorem isn’t theoretical — it manifests every time you need to decide what happens when the network fails.

And networks always fail.

1. Context: When This Dilemma Appears

CAP becomes relevant whenever you have:

Distributed data (replication, multiple nodes, multi-region)
Requirements for high availability
Pressure for low latency
Users operating from multiple locations
Concurrent write operations

Common triggers:

Traffic growth requiring replicas
Geographic expansion requiring multi-region
Increased availability requirements
Non-centralized persistence

This leads to the big question:

Which pillar are you willing to relax — consistency, availability, or partition tolerance?

2. The Options: What Each Alternative Means

CAP defines three properties, but you can only guarantee two in the presence of network failure.

C — Consistency (Strong Consistency)

All nodes see the same data at the same time.

A — Availability

The system always responds, even if with outdated data.

P — Partition Tolerance

The system continues operating even if communication between nodes is broken.

⚠️ In real-world systems, Partition Tolerance is mandatory.

So the choice becomes:

➡️ CP vs. AP

3. Trade-offs

CP (Consistency + Partition Tolerance)

Advantages

Strong consistency
Predictability in sensitive domains
Avoids critical data errors

Disadvantages

May reject requests during partitions
Higher latency
Lower availability perception

AP (Availability + Partition Tolerance)

Advantages

Keeps responding even with failures
Lower latency
Great for real-time UX

Disadvantages

Eventual consistency
Conflict resolution required
Higher cognitive complexity

4. Decision Factors

4.1. Nature of the data

Critical → CP
Approximate → AP

4.2. UX latency expectations

Can wait → CP
Must be instant → AP

4.3. Geographic distribution

Single region → flexible
Multi-region → often AP

4.4. Acceptable latency

Low latency → AP

4.5. Team maturity

Less experience → CP
Higher expertise → AP

4.6. Business impact

Errors are costly → CP
Errors are tolerable → AP

5. Use Cases

Choose CP when:

Financial transactions
Critical inventory
Process orchestration
High auditability

Choose AP when:

Social networks
Counters, metrics, likes
Read-heavy catalogs
Global low-latency apps
Recommendations

6. Warning Signs of a Wrong Choice

In CP:

Slowness complaints
Traffic spikes compromising nodes
Replication bottlenecks

In AP:

Inconsistencies generate rework
Write conflicts become frequent
UX fluctuation (data “jumping”)

7. How to Evolve Between Models

From CP → AP:

Prepare models for divergence
Add conflict resolution
Implement read-repair
Reduce transactional boundaries

From AP → CP:

Identify critical domains
Centralize degree of truth
Use event logs
Reduce distributed surfaces

Architecture is about balance, not extremes.

Conclusion

The CAP Theorem is a practical reminder:

There is no perfect distributed system.

There is only the system that best fits your context.

Architecture is about choices — and every choice has a cost.

If this helped clarify a decision, leave a like and share your experience in the comments.

Let’s learn together.

DEV Community

The Architect’s Dilemma — Ep. 1: CAP Theorem in the Real World

CAP Theorem in the Real World

1. Context: When This Dilemma Appears

2. The Options: What Each Alternative Means

C — Consistency (Strong Consistency)

A — Availability

P — Partition Tolerance

3. Trade-offs

CP (Consistency + Partition Tolerance)

Advantages

Disadvantages

AP (Availability + Partition Tolerance)

Advantages

Disadvantages

4. Decision Factors

4.1. Nature of the data

4.2. UX latency expectations

4.3. Geographic distribution

4.4. Acceptable latency

4.5. Team maturity

4.6. Business impact

5. Use Cases

Choose CP when:

Choose AP when:

6. Warning Signs of a Wrong Choice

In CP:

In AP:

7. How to Evolve Between Models

From CP → AP:

From AP → CP:

Conclusion

Top comments (0)