Sushant Gaurav

Posted on May 19

Scalability in System Design - Vertical vs Horizontal Scaling

#systemdesign #programming #software #architecture

There comes a point in every system’s life where things start to break; not because the system is poorly designed, but because it is being used more than it was ever intended to handle.

At first, everything feels smooth. Requests are processed quickly, users are satisfied, and the system behaves predictably. But as usage grows, subtle changes begin to appear. Pages take longer to load. APIs respond more slowly. Databases struggle to keep up. Eventually, what once worked effortlessly starts becoming unreliable.

This is not a failure of design.

It is a signal.

The system has reached the limits of its current capacity.

And this is where scalability enters the conversation.

What Does Scalability Really Mean?

Scalability is often misunderstood as simply handling more users. But that definition is incomplete.

A system is considered scalable if it can handle increasing load without a proportional drop in performance or reliability.

Notice the nuance here.

It is not just about handling more requests - it is about doing so efficiently. A system that doubles its resources every time the load increases is not truly scalable; it is simply brute-forcing the problem.

True scalability is about growing intelligently.

And to achieve that, systems typically rely on two fundamental approaches:

Scaling up (vertical scaling)
Scaling out (horizontal scaling)

At a high level, both aim to solve the same problem: increasing capacity. But the way they approach it, and the consequences of those choices, are fundamentally different.

Vertical Scaling - Growing Taller

Vertical scaling, often referred to as scaling up, is the simpler and more intuitive approach.

Instead of changing the structure of the system, you make the existing machine more powerful.

You increase:

CPU
RAM
Disk capacity

In essence, you are upgrading the machine so it can handle more work.

From an engineering perspective, vertical scaling feels natural.

There is no need to redesign the system. The application continues to run as it always has, just on better hardware. Databases remain centralised. Communication patterns remain unchanged. There is no need to think about distribution, coordination, or synchronisation.

This simplicity is incredibly valuable - especially in the early stages of a system.

It allows teams to focus on building features rather than solving infrastructure complexity.

Why Vertical Scaling Works So Well (Initially)

In the early lifecycle of a product, vertical scaling often provides the fastest path to growth.

If your database is slowing down, you can upgrade it to a machine with more memory. If your application server is under load, you can increase its CPU capacity.

The system continues to function exactly as before, just with more headroom.

This is why many systems, including those built by companies like Instagram in their early days, start with vertically scaled architectures.

The benefits are clear:

Minimal architectural changes
Lower operational complexity
Faster implementation

For a small team trying to move quickly, this is often the most practical choice.

The Limits of Vertical Scaling

But like everything in system design, vertical scaling has limits.

The first limitation is physical.

A machine can only be upgraded to a certain extent. There is a maximum amount of CPU, memory, and storage you can add. Beyond that point, scaling up is no longer possible.

The second limitation is cost.

As machines become more powerful, their cost increases disproportionately. A machine that is twice as powerful is often significantly more than twice as expensive.

This leads to diminishing returns.

The third and perhaps most critical limitation is risk.

When your entire system depends on a single machine, that machine becomes a single point of failure.

If it goes down:

The entire system goes down

No matter how powerful the machine is, it cannot protect you from hardware failures, network issues, or unexpected crashes.

This is where the need for a different approach begins to emerge.

Horizontal Scaling - Growing Wider

Horizontal scaling, or scaling out, takes a fundamentally different approach.

Instead of making a single machine more powerful, you add more machines and distribute the workload among them.

Now, instead of relying on one powerful server, the system relies on multiple smaller servers working together.

This introduces a new concept: distribution of work.

Requests are no longer handled by a single machine. They are spread across multiple nodes, often using a load balancer that decides where each request should go.

At first, this might seem like a straightforward extension of vertical scaling. But in reality, it changes the nature of the system entirely.

Because the moment you introduce multiple machines, you introduce:

Network communication
Data synchronisation
Failure handling across nodes

In other words, you are stepping into the world of distributed systems.

A Shift in Complexity

Vertical scaling keeps complexity low but limits growth.

Horizontal scaling removes those limits but introduces a new kind of complexity.

This is not just an implementation detail; it is a fundamental shift in how systems are designed and reasoned about.

In a vertically scaled system:

There is one source of truth
Communication is local
Failures are simpler to reason about

In a horizontally scaled system:

Data may exist in multiple places
Communication happens over networks
Failures become partial and unpredictable

This is the same shift we saw earlier when moving from monolithic to distributed systems.

Because in many ways, horizontal scaling is what forces systems to become distributed.

Where This Is Heading

At this point, we’ve built the intuition:

Vertical scaling is simple, powerful, and limited
Horizontal scaling is flexible, scalable, and complex

But this is only the surface. To truly understand horizontal scaling, we need to answer a deeper question:

What actually happens to data and traffic when a system scales horizontally?

Because adding more machines is easy.

Making them work together correctly and efficiently is the real challenge.

Distributing Traffic - The Role of Load Balancing

The moment you introduce multiple servers, you need a way to decide:

Which request goes to which machine?

This is the job of a load balancer.

A load balancer sits between users and your servers, acting as a traffic controller. Instead of users directly hitting a specific server, their requests are routed through the load balancer, which distributes them across available machines.

At a surface level, this seems simple—just spread requests evenly. But in practice, it involves subtle decisions:

Should requests be distributed round-robin?
Should they go to the least loaded server?
Should user sessions stick to the same server?

These choices affect both performance and correctness.

And more importantly, they introduce a critical requirement:

Each server should be able to handle requests independently.

This leads to a key design principle in scalable systems:

Statelessness.

Stateless vs Stateful Systems

In a vertically scaled system, state is easy to manage. Since everything runs on a single machine, user sessions, data, and temporary state can be stored locally.

But in a horizontally scaled system, this approach breaks down.

If a user’s request goes to Server 1, and their next request goes to Server 3, that second server must still understand the user’s context.

This is why scalable systems aim to make application servers stateless.

Instead of storing session data locally, they store it in shared systems such as:

Databases
Distributed caches
External storage systems

This allows any server to handle any request, making load balancing effective.

But this shift pushes complexity elsewhere, into how data is stored and accessed.

Scaling Data - The Real Challenge

Handling more requests is only part of the problem.

The bigger challenge is handling more data.

In a vertically scaled system, data typically lives in a single database. As the load increases, you upgrade the database server. But just like application servers, databases have limits.

This is where horizontal scaling forces a fundamental shift in data strategy.

Two major approaches emerge:

Replication - Copying Data Across Nodes

Replication involves creating multiple copies of the same data across different machines.

This allows:

Multiple servers to read data simultaneously
Improved availability if one node fails

For example, one database node may handle writes, while multiple replicas handle read requests.

This improves throughput, but introduces consistency challenges—something we explored earlier in CAP.

Sharding - Splitting Data Across Nodes

Sharding takes a different approach.

Instead of copying data, it divides it.

Each server is responsible for a subset of the data:

User A–M on one server
User N–Z on another

This allows the system to scale almost indefinitely by adding more shards.

But it introduces new complexity:

How do you decide which shard stores which data?
What happens when data needs to move between shards?
How do you handle queries that span multiple shards?

Sharding improves scalability dramatically, but at the cost of operational and architectural complexity.

Availability - Why Horizontal Scaling Wins

One of the most powerful advantages of horizontal scaling is fault tolerance.

In a vertically scaled system, everything depends on a single machine. If it fails, the system goes down.

In a horizontally scaled system, failure becomes partial.

If one server crashes:

Other servers continue handling requests
The system degrades, but does not collapse

This is the foundation of high-availability systems.

It is also why companies like Netflix design their systems to run across multiple machines, zones, and even regions.

They assume failure will happen—and design systems that survive it.

Cost, Complexity, and Trade-offs

At this point, horizontal scaling may seem like the obvious choice.

But it comes with trade-offs that cannot be ignored.

Cost Dynamics

While horizontal scaling can start with cheaper machines, the total cost can grow quickly as you add more infrastructure, networking, and operational overhead.

Engineering Complexity

You now need to handle:

Distributed communication
Data consistency
Failures across nodes
Monitoring and observability

Debugging Challenges

A single request may pass through multiple machines. Debugging issues becomes significantly harder compared to a single-node system.

This leads to a critical insight:

Horizontal scaling solves scalability problems by introducing distributed systems complexity.

The Hybrid Reality

In practice, most systems do not rely purely on vertical or horizontal scaling.

They combine both.

A common approach is:

Scale vertically first (quick wins, low complexity)
Introduce horizontal scaling as limits are reached

For example:

Start with a powerful database server
Add read replicas as traffic grows
Eventually introduce sharding when needed

This gradual evolution allows systems to grow without unnecessary complexity early on.

The Deeper Insight

At its core, scalability is not about choosing between vertical and horizontal scaling.

It is about understanding when each approach makes sense.

Vertical scaling is about simplicity and speed.
Horizontal scaling is about resilience and long-term growth.

And the transition between them is one of the most important decisions in system design.

Because once you move toward horizontal scaling, you are no longer just scaling a system:

You are designing a distributed system.

Final Thought

Scalability is often seen as a technical challenge.

But in reality, it is a reflection of success.

Systems only need to scale when they are being used, when they are growing, when they matter.

And the way you scale them defines:

Their performance
Their reliability
Their future evolution

Because in the end, scalability is not just about handling more users:

It is about building systems that can grow without breaking.

DEV Community