In System Design, there is rarely a "perfect" solution.
Every architecture decision involves trade-offs between speed, consistency, availability, cost, and scalability. Understanding these trade-offs is one of the most important skills during system design interviews.
Latency vs Throughput
What is Latency?
Latency is the time taken to complete a single request.
In simple terms:
How long does one request take?
What is Throughput?
Throughput is the number of requests a system can process in a given time.
In simple terms:
How many requests can the system handle per second?
Example
Food Delivery Restaurant
Suppose a restaurant takes:
1 Order = 2 Minutes
Latency:
2 Minutes
for a single order.
If it completes:
30 Orders / Hour
then throughput is:
30 Orders per Hour
Real World Example
Netflix
When a user clicks Play:
- Video should start quickly → Low Latency
- Millions of users should stream simultaneously → High Throughput
Netflix optimizes for both.
Key Points
- Latency measures response time.
- Throughput measures system capacity.
- Lower latency is better.
- Higher throughput is better.
Interview One-Liner
Latency is the time taken to serve a request, while throughput is the number of requests processed per unit time.
Availability vs Consistency (CAP Theorem)
What is CAP Theorem?
CAP Theorem states that in a distributed system, you can only guarantee two out of the following three properties:
C = Consistency
A = Availability
P = Partition Tolerance
When a network partition occurs, a system must choose between Consistency and Availability.
Consistency
Every user sees the latest data.
After a write operation:
User A updates balance = ₹1000
User B immediately sees ₹1000
All nodes return the same result.
Availability
Every request receives a response.
Even if some servers are down:
Request → Response
The response may not contain the latest data.
Why Can't We Have Both?
Imagine two database nodes:
Node A X Node B
(Network failure between them)
Now a write reaches Node A.
You have two choices:
Option 1: Prioritize Consistency
Node B refuses requests until data is synchronized.
Result:
Consistent
Not Available
Option 2: Prioritize Availability
Node B continues serving requests.
Result:
Available
Possibly Stale Data
Real World Examples
Banking Systems
Prefer Consistency.
Account Balance
Payments
Transactions
Wrong data is unacceptable.
Social Media
Prefer Availability.
Instagram Likes
Comments
Followers
Seeing slightly old data is acceptable.
Key Points
- Partition Tolerance is mandatory in distributed systems.
- Usually choose between Consistency and Availability.
- Different applications require different choices.
Interview One-Liner
CAP Theorem states that during a network partition, a distributed system must choose between Consistency and Availability.
Performance vs Scalability
What is Performance?
Performance measures how efficiently a system handles current workload.
Questions like:
How fast?
How much memory?
How much CPU?
are performance-related.
What is Scalability?
Scalability measures how well a system handles increased workload in the future.
Questions like:
Can it handle 10x traffic?
Can it handle 100x users?
are scalability-related.
Example
Imagine a website serving:
1,000 Users
Very fast today.
But if it crashes at:
100,000 Users
then it has:
Good Performance
Poor Scalability
Real World Example
Startup Application
Initially:
10,000 users
Single server works perfectly.
After growth:
10 Million users
Need scalable architecture.
Key Points
- Performance = Present efficiency.
- Scalability = Future growth capability.
- Fast systems are not always scalable.
- Scalable systems are designed for growth.
Interview One-Liner
Performance measures efficiency today, while scalability measures the ability to handle future growth.
Vertical Scaling
What is Vertical Scaling?
Vertical Scaling means increasing the resources of an existing server.
Also called:
Scale Up
How Does It Work?
Before
CPU: 4 Core
RAM: 8 GB
↓
After
CPU: 32 Core
RAM: 128 GB
Same server, bigger machine.
Real World Example
A MySQL database starts running slowly.
Instead of adding new servers:
More CPU
More RAM
Faster SSD
are added.
Advantages
- Simple implementation
- No major architecture changes
- Easy maintenance
Disadvantages
- Hardware limit exists
- Expensive
- Single point of failure remains
Interview One-Liner
Vertical Scaling increases the capacity of a single machine by adding more hardware resources.
Horizontal Scaling
What is Horizontal Scaling?
Horizontal Scaling means adding more servers to distribute load.
Also called:
Scale Out
How Does It Work?
Instead of:
1 Server
we add:
Server 1
Server 2
Server 3
Server 4
A Load Balancer distributes traffic among them.
Users
|
V
Load Balancer
|
+---- Server 1
|
+---- Server 2
|
+---- Server 3
Real World Example
Netflix
Netflix serves millions of users worldwide.
A single server can never handle that traffic.
Therefore thousands of servers work together behind load balancers.
Advantages
- Nearly unlimited scaling
- Better fault tolerance
- High availability
Disadvantages
- More complex architecture
- Data synchronization challenges
- Load balancing required
Vertical vs Horizontal Scaling
| Feature | Vertical Scaling | Horizontal Scaling |
|---|---|---|
| Approach | Bigger Server | More Servers |
| Cost | Expensive Hardware | Commodity Hardware |
| Complexity | Low | Higher |
| Scalability Limit | Limited | Very High |
| Fault Tolerance | Low | High |
| Example | Upgrade Database Server | Add More Application Servers |
Interview One-Liner
Vertical Scaling increases the power of a single server, whereas Horizontal Scaling increases capacity by adding more servers.
Quick Revision
Latency
Time taken for one request
Example: Video starts in 200ms.
Throughput
Requests processed per second
Example: 100,000 requests/sec.
Consistency
Everyone sees latest data
Example: Banking.
Availability
System always responds
Example: Instagram.
Performance
How efficient is the system today?
Scalability
Can the system handle future growth?
Vertical Scaling
Scale Up
More CPU, RAM, SSD
Horizontal Scaling
Scale Out
Add More Servers
Interview Summary
Latency and Throughput measure speed and capacity. CAP Theorem explains the trade-off between Consistency and Availability during network failures. Performance focuses on current efficiency, while Scalability focuses on future growth. Systems can scale vertically by upgrading hardware or horizontally by adding more servers, with horizontal scaling being the preferred approach for large-scale distributed systems.
Top comments (0)