Mastering CAP & BASE Theory with Gemini: From Distributed Principles to Nacos & Redis Reality

#architecture #computerscience #distributedsystems #systemdesign

Core concepts

The CAP theorem (also known as Brewer’s Theorem) is a cornerstone for understanding distributed system design. It states that a distributed system cannot perfectly guarantee all three of the following properties at the same time:

Consistency (C): All nodes see the same data at the same time. For example, checking inventory at any branch returns exactly the same result.
Availability (A): Every request receives a response (success or failure), meaning the system is always “online”.
Partition Tolerance (P): The system continues to operate even when network failures split nodes into isolated groups (a partition).

In real networks, partitions (P) are inevitable, so a distributed system typically must trade off between CP and AP.

CP mode

In a CP system, if the network fails, the system chooses to stop serving requests in order to keep data strictly consistent across nodes.

Idea: It is better to return no result than to return incorrect or stale data.
Example: Bank transfers. If two servers are disconnected, the system must lock the account to prevent withdrawing money in two places and corrupting the data.
Cost: The system becomes unavailable during the fault.

AP mode

In an AP system, even if the network is partitioned, the system still prioritizes responding to requests.

Idea: Data might not be the latest, or different users might see different results, but users can still use the service.
Example: Social media likes. If you like a photo during a network partition, your friend might see it a few seconds later. That is acceptable. What matters is that the service does not become unusable because of network instability.
Cost: Sacrifices immediate consistency.

Case study: Nacos CP vs AP

1. Ephemeral instances vs persistent instances

This is the key logic behind how Nacos differentiates AP and CP:

AP mode (default): Used for ephemeral instances (Ephemeral Nodes). After registration, instances keep a heartbeat with the server. During a partition, Nacos prioritizes service availability, and short-term inconsistency is acceptable. This uses Nacos’s Distro protocol.
CP mode: Used for persistent instances (Persistent Nodes). Instance metadata is persisted to disk and requires strong consistency across nodes. If consensus cannot be reached due to a network failure, the system sacrifices availability. This uses a consensus protocol based on the Raft algorithm.

2. Why does Nacos support both?

This maps back to the trade-off question:

Service discovery usually leans toward AP. If network jitter makes the registry unavailable, all microservices may fail. That impact is too large. Small delays can often be masked by client retries.
Configuration management can lean toward CP. If a critical database password or rate-limit setting changes, it is often desirable for all nodes to immediately and consistently receive the exact latest value.

BASE theory

Once you understand the CAP trade-off between Consistency (C) and Availability (A), BASE theory can be viewed as a practical compromise for distributed systems.

The core idea is: since strong consistency is hard to achieve, we accept a more flexible approach so the system remains usable most of the time.

BASE is an acronym for:

Basically Available (BA): During failures, the system may lose some availability, but should not completely crash. For example, a page that normally loads in 0.1 seconds might take 2 seconds, or some non-core functionality may be temporarily disabled to protect core services.
Soft State (S): The system’s data is allowed to be in an intermediate state. Replication between nodes may be delayed, and this is considered acceptable for overall availability.
Eventually Consistent (E): The most important point. The system does not require data to be consistent at all times, but it guarantees that after some time, all replicas will converge to the same final state.

Case study: Redis Cluster

Redis Cluster (cluster mode) is generally designed to be AP (Availability + Partition Tolerance).

BASE in practice in Redis Cluster

Redis Cluster does not pursue strong consistency. Instead, it achieves eventual consistency via:

Basically Available (BA): Redis Cluster splits data into 16,384 hash slots. Even if a small number of nodes go down, the cluster can continue serving as long as most slots remain covered.
Soft State (S): After a master writes data, it returns success to the client immediately, then replicates to slaves asynchronously. This implies the master and slaves can be inconsistent at any given moment.
Eventually Consistent (E): Under normal conditions, slaves catch up with the master within milliseconds.

Why Redis Cluster is not strongly consistent

Consider this scenario:

Step 1: You write set key1 value1 to master node A.
Step 2: Node A writes to memory and immediately replies “OK”.
Step 3: Before A replicates the data to slave A1, A suddenly loses power and goes down.
Step 4: The cluster promotes slave A1 to become the new master.
Result: The value1 you just wrote is lost.