Jaspreet singh

Posted on Jun 14

HLD Fundamentals #2: CAP Theorem Trade-offs & Scaling

#architecture #interview #performance #systemdesign

In System Design, there is rarely a "perfect" solution.

Every architecture decision involves trade-offs between speed, consistency, availability, cost, and scalability. Understanding these trade-offs is one of the most important skills during system design interviews.

Latency vs Throughput

What is Latency?

Latency is the time taken to complete a single request.

In simple terms:

How long does one request take?

What is Throughput?

Throughput is the number of requests a system can process in a given time.

In simple terms:

How many requests can the system handle per second?

Example

Food Delivery Restaurant

Suppose a restaurant takes:

1 Order = 2 Minutes

Latency:

2 Minutes

for a single order.

If it completes:

30 Orders / Hour

then throughput is:

30 Orders per Hour

Real World Example

Netflix

When a user clicks Play:

Video should start quickly → Low Latency
Millions of users should stream simultaneously → High Throughput

Netflix optimizes for both.

Key Points

Latency measures response time.
Throughput measures system capacity.
Lower latency is better.
Higher throughput is better.

Interview One-Liner

Latency is the time taken to serve a request, while throughput is the number of requests processed per unit time.

Availability vs Consistency (CAP Theorem)

What is CAP Theorem?

CAP Theorem states that in a distributed system, you can only guarantee two out of the following three properties:

C = Consistency
A = Availability
P = Partition Tolerance

When a network partition occurs, a system must choose between Consistency and Availability.

Consistency

Every user sees the latest data.

After a write operation:

User A updates balance = ₹1000

User B immediately sees ₹1000

All nodes return the same result.

Availability

Every request receives a response.

Even if some servers are down:

Request → Response

The response may not contain the latest data.

Why Can't We Have Both?

Imagine two database nodes:

Node A       X       Node B

(Network failure between them)

Now a write reaches Node A.

You have two choices:

Option 1: Prioritize Consistency

Node B refuses requests until data is synchronized.

Result:

Consistent
Not Available

Option 2: Prioritize Availability

Node B continues serving requests.

Result:

Available
Possibly Stale Data

Real World Examples

Banking Systems

Prefer Consistency.

Account Balance
Payments
Transactions

Wrong data is unacceptable.

Social Media

Prefer Availability.

Instagram Likes
Comments
Followers

Seeing slightly old data is acceptable.

Key Points

Partition Tolerance is mandatory in distributed systems.
Usually choose between Consistency and Availability.
Different applications require different choices.

Interview One-Liner

CAP Theorem states that during a network partition, a distributed system must choose between Consistency and Availability.

Performance vs Scalability

What is Performance?

Performance measures how efficiently a system handles current workload.

Questions like:

How fast?
How much memory?
How much CPU?

are performance-related.

What is Scalability?

Scalability measures how well a system handles increased workload in the future.

Questions like:

Can it handle 10x traffic?
Can it handle 100x users?

are scalability-related.

Example

Imagine a website serving:

1,000 Users

Very fast today.

But if it crashes at:

100,000 Users

then it has:

Good Performance
Poor Scalability

Real World Example

Startup Application

Initially:

10,000 users

Single server works perfectly.

After growth:

10 Million users

Need scalable architecture.

Key Points

Performance = Present efficiency.
Scalability = Future growth capability.
Fast systems are not always scalable.
Scalable systems are designed for growth.

Interview One-Liner

Performance measures efficiency today, while scalability measures the ability to handle future growth.

Vertical Scaling

What is Vertical Scaling?

Vertical Scaling means increasing the resources of an existing server.

Also called:

Scale Up

How Does It Work?

Before

CPU: 4 Core
RAM: 8 GB

↓

After

CPU: 32 Core
RAM: 128 GB

Same server, bigger machine.

Real World Example

A MySQL database starts running slowly.

Instead of adding new servers:

More CPU
More RAM
Faster SSD

are added.

Advantages

Simple implementation
No major architecture changes
Easy maintenance

Disadvantages

Hardware limit exists
Expensive
Single point of failure remains

Interview One-Liner

Vertical Scaling increases the capacity of a single machine by adding more hardware resources.

Horizontal Scaling

What is Horizontal Scaling?

Horizontal Scaling means adding more servers to distribute load.

Also called:

Scale Out

How Does It Work?

Instead of:

1 Server

we add:

Server 1
Server 2
Server 3
Server 4

A Load Balancer distributes traffic among them.

Users
   |
   V
Load Balancer
   |
   +---- Server 1
   |
   +---- Server 2
   |
   +---- Server 3

Real World Example

Netflix

Netflix serves millions of users worldwide.

A single server can never handle that traffic.

Therefore thousands of servers work together behind load balancers.

Advantages

Nearly unlimited scaling
Better fault tolerance
High availability

Disadvantages

More complex architecture
Data synchronization challenges
Load balancing required

Vertical vs Horizontal Scaling

Feature	Vertical Scaling	Horizontal Scaling
Approach	Bigger Server	More Servers
Cost	Expensive Hardware	Commodity Hardware
Complexity	Low	Higher
Scalability Limit	Limited	Very High
Fault Tolerance	Low	High
Example	Upgrade Database Server	Add More Application Servers

Interview One-Liner

Vertical Scaling increases the power of a single server, whereas Horizontal Scaling increases capacity by adding more servers.

Quick Revision

Latency

Time taken for one request

Example: Video starts in 200ms.

Throughput

Requests processed per second

Example: 100,000 requests/sec.

Consistency

Everyone sees latest data

Example: Banking.

Availability

System always responds

Example: Instagram.

Performance

How efficient is the system today?

Scalability

Can the system handle future growth?

Vertical Scaling

Scale Up
More CPU, RAM, SSD

Horizontal Scaling

Scale Out
Add More Servers

Interview Summary

Latency and Throughput measure speed and capacity. CAP Theorem explains the trade-off between Consistency and Availability during network failures. Performance focuses on current efficiency, while Scalability focuses on future growth. Systems can scale vertically by upgrading hardware or horizontally by adding more servers, with horizontal scaling being the preferred approach for large-scale distributed systems.