DEV Community

Jaspreet singh
Jaspreet singh

Posted on

HLD Fundamentals #2: CAP Theorem Trade-offs & Scaling

In System Design, there is rarely a "perfect" solution.

Every architecture decision involves trade-offs between speed, consistency, availability, cost, and scalability. Understanding these trade-offs is one of the most important skills during system design interviews.


Latency vs Throughput

What is Latency?

Latency is the time taken to complete a single request.

In simple terms:

How long does one request take?


What is Throughput?

Throughput is the number of requests a system can process in a given time.

In simple terms:

How many requests can the system handle per second?


Example

Food Delivery Restaurant

Suppose a restaurant takes:

1 Order = 2 Minutes
Enter fullscreen mode Exit fullscreen mode

Latency:

2 Minutes
Enter fullscreen mode Exit fullscreen mode

for a single order.

If it completes:

30 Orders / Hour
Enter fullscreen mode Exit fullscreen mode

then throughput is:

30 Orders per Hour
Enter fullscreen mode Exit fullscreen mode

Real World Example

Netflix

When a user clicks Play:

  • Video should start quickly → Low Latency
  • Millions of users should stream simultaneously → High Throughput

Netflix optimizes for both.


Key Points

  • Latency measures response time.
  • Throughput measures system capacity.
  • Lower latency is better.
  • Higher throughput is better.

Interview One-Liner

Latency is the time taken to serve a request, while throughput is the number of requests processed per unit time.


Availability vs Consistency (CAP Theorem)

What is CAP Theorem?

CAP Theorem states that in a distributed system, you can only guarantee two out of the following three properties:

C = Consistency
A = Availability
P = Partition Tolerance
Enter fullscreen mode Exit fullscreen mode

When a network partition occurs, a system must choose between Consistency and Availability.


Consistency

Every user sees the latest data.

After a write operation:

User A updates balance = ₹1000

User B immediately sees ₹1000
Enter fullscreen mode Exit fullscreen mode

All nodes return the same result.


Availability

Every request receives a response.

Even if some servers are down:

Request → Response
Enter fullscreen mode Exit fullscreen mode

The response may not contain the latest data.


Why Can't We Have Both?

Imagine two database nodes:

Node A       X       Node B
Enter fullscreen mode Exit fullscreen mode

(Network failure between them)

Now a write reaches Node A.

You have two choices:

Option 1: Prioritize Consistency

Node B refuses requests until data is synchronized.

Result:

Consistent
Not Available
Enter fullscreen mode Exit fullscreen mode

Option 2: Prioritize Availability

Node B continues serving requests.

Result:

Available
Possibly Stale Data
Enter fullscreen mode Exit fullscreen mode

Real World Examples

Banking Systems

Prefer Consistency.

Account Balance
Payments
Transactions
Enter fullscreen mode Exit fullscreen mode

Wrong data is unacceptable.


Social Media

Prefer Availability.

Instagram Likes
Comments
Followers
Enter fullscreen mode Exit fullscreen mode

Seeing slightly old data is acceptable.


Key Points

  • Partition Tolerance is mandatory in distributed systems.
  • Usually choose between Consistency and Availability.
  • Different applications require different choices.

Interview One-Liner

CAP Theorem states that during a network partition, a distributed system must choose between Consistency and Availability.


Performance vs Scalability

What is Performance?

Performance measures how efficiently a system handles current workload.

Questions like:

How fast?
How much memory?
How much CPU?
Enter fullscreen mode Exit fullscreen mode

are performance-related.


What is Scalability?

Scalability measures how well a system handles increased workload in the future.

Questions like:

Can it handle 10x traffic?
Can it handle 100x users?
Enter fullscreen mode Exit fullscreen mode

are scalability-related.


Example

Imagine a website serving:

1,000 Users
Enter fullscreen mode Exit fullscreen mode

Very fast today.

But if it crashes at:

100,000 Users
Enter fullscreen mode Exit fullscreen mode

then it has:

Good Performance
Poor Scalability
Enter fullscreen mode Exit fullscreen mode

Real World Example

Startup Application

Initially:

10,000 users
Enter fullscreen mode Exit fullscreen mode

Single server works perfectly.

After growth:

10 Million users
Enter fullscreen mode Exit fullscreen mode

Need scalable architecture.


Key Points

  • Performance = Present efficiency.
  • Scalability = Future growth capability.
  • Fast systems are not always scalable.
  • Scalable systems are designed for growth.

Interview One-Liner

Performance measures efficiency today, while scalability measures the ability to handle future growth.


Vertical Scaling

What is Vertical Scaling?

Vertical Scaling means increasing the resources of an existing server.

Also called:

Scale Up
Enter fullscreen mode Exit fullscreen mode

How Does It Work?

Before

CPU: 4 Core
RAM: 8 GB

↓

After

CPU: 32 Core
RAM: 128 GB
Enter fullscreen mode Exit fullscreen mode

Same server, bigger machine.


Real World Example

A MySQL database starts running slowly.

Instead of adding new servers:

More CPU
More RAM
Faster SSD
Enter fullscreen mode Exit fullscreen mode

are added.


Advantages

  • Simple implementation
  • No major architecture changes
  • Easy maintenance

Disadvantages

  • Hardware limit exists
  • Expensive
  • Single point of failure remains

Interview One-Liner

Vertical Scaling increases the capacity of a single machine by adding more hardware resources.


Horizontal Scaling

What is Horizontal Scaling?

Horizontal Scaling means adding more servers to distribute load.

Also called:

Scale Out
Enter fullscreen mode Exit fullscreen mode

How Does It Work?

Instead of:

1 Server
Enter fullscreen mode Exit fullscreen mode

we add:

Server 1
Server 2
Server 3
Server 4
Enter fullscreen mode Exit fullscreen mode

A Load Balancer distributes traffic among them.

Users
   |
   V
Load Balancer
   |
   +---- Server 1
   |
   +---- Server 2
   |
   +---- Server 3
Enter fullscreen mode Exit fullscreen mode

Real World Example

Netflix

Netflix serves millions of users worldwide.

A single server can never handle that traffic.

Therefore thousands of servers work together behind load balancers.


Advantages

  • Nearly unlimited scaling
  • Better fault tolerance
  • High availability

Disadvantages

  • More complex architecture
  • Data synchronization challenges
  • Load balancing required

Vertical vs Horizontal Scaling

Feature Vertical Scaling Horizontal Scaling
Approach Bigger Server More Servers
Cost Expensive Hardware Commodity Hardware
Complexity Low Higher
Scalability Limit Limited Very High
Fault Tolerance Low High
Example Upgrade Database Server Add More Application Servers

Interview One-Liner

Vertical Scaling increases the power of a single server, whereas Horizontal Scaling increases capacity by adding more servers.


Quick Revision

Latency

Time taken for one request
Enter fullscreen mode Exit fullscreen mode

Example: Video starts in 200ms.


Throughput

Requests processed per second
Enter fullscreen mode Exit fullscreen mode

Example: 100,000 requests/sec.


Consistency

Everyone sees latest data
Enter fullscreen mode Exit fullscreen mode

Example: Banking.


Availability

System always responds
Enter fullscreen mode Exit fullscreen mode

Example: Instagram.


Performance

How efficient is the system today?
Enter fullscreen mode Exit fullscreen mode

Scalability

Can the system handle future growth?
Enter fullscreen mode Exit fullscreen mode

Vertical Scaling

Scale Up
More CPU, RAM, SSD
Enter fullscreen mode Exit fullscreen mode

Horizontal Scaling

Scale Out
Add More Servers
Enter fullscreen mode Exit fullscreen mode

Interview Summary

Latency and Throughput measure speed and capacity. CAP Theorem explains the trade-off between Consistency and Availability during network failures. Performance focuses on current efficiency, while Scalability focuses on future growth. Systems can scale vertically by upgrading hardware or horizontally by adding more servers, with horizontal scaling being the preferred approach for large-scale distributed systems.

Top comments (0)