Scaling up vs scaling out (vertical vs horizontal scalability)

#architecture #computerscience

Scalability is an ability to adjust the system to the desired capacity, which usually means handle more and more workloads cost-efficiently.

When we talk about high scale products, these workloads often represent users, stored data, transactions or number of requests, all of it related to growing without affecting the user experience.

Scalability can be evaluated using some metrics, three of them explained here:

Measurements of scalability

More volume of data
As your business grows you'll have to handle more and more data from user accounts, location data, product information, images, videos, logs and so on. Handle high volumes of data becomes a challenge because you need to search, sort, read from disk and update it back efficiently.

Nowadays, with the advent of big data, store petabytes of information is a common requirement, so the more data stored the better as it can improve the analysis accuracy and expand the data sampling as well.

Higher concurrency
Concurrency means more connections, threads, messages, data flows being processed in parallel, handling several user sessions at the same time.

In web applications, concurrency means how many users can interact with the system at the same time without affecting their experience.

Achieving high levels of concurrency is difficult because in a web-based application all requests are processed by the same servers, so users share memory, CPU, network and disk.

Higher interaction rate
This dimension is related to concurrency but it's a little different, measures how often the user exchanges some data in your servers. The rate of interactions can increase or decrease according to the type of application you have.

For example, an online multiplayer game has a high interaction rate because it has to exchange messages multiple times per second between multiple nodes in the application.

Usually, the greatest challenge is to keep low latency with a higher rate of interaction between several users around the world and the application.

Vertical scale (scale-up)

Scaling up is relatively simple, it just about adding more resources on server's hardware, like CPU and memory, or improve disk performance changing it to a faster one.

This strategy is quick and usually doesn't require any architectural change, especially in cloud computing, where it is possible to increase the capacity of a virtual machine with a few clicks.

However, you can soon reach the hardware limit that can be used on the same server, you cannot increase the size of RAM or the amount of CPUs infinitely.

Powerful servers are expensive, e.g. an m4.2xlarge AWS EC2 server has 8 CPUs, 32 GB of RAM and costs $302 monthly if you need to add 16 GB of RAM you will actually have to double the amount of memory since the next level of this kind of server on AWS is the m4.4xlarge with 16 CPUs, 64 GB of RAM and costs exactly double, $604 / month.

Moreover, centralizing processing on a single server is not a good idea as it can be dangerous, because it's not a fault-tolerant strategy, since the whole system will be unavailable if the server crash.

Horizontal scale (scale-out)

There are several techniques for horizontal scaling, which is, in a simple manner, accomplished by adding more simple servers rather than buying a powerful single machine.

Horizontal scalability is the pot of gold of many global companies e.g. Netflix, Uber, Amazon to attend a large number of customers around the world with the same user experience.

Buy ever-strong hardware is not the most cost-effectively way to serve an always growing user base. Although you have to invest more in software engineering to support high volumes of data, high concurrency levels, and a larger interaction rate.

In a high scaling product scenario invest in software engineering often pay off in a later stage, as it's cheaper to add more capacity distributing the system across several simple node servers with sophisticated software architecture.

When you need to handle more concurrency of users on a web application using scaling out, you should add clones of the same service node side-by-side and distribute requests with a load balancer. This architecture looks like the image below:

However, investing in software engineering usually is complex and expensive at the beginning as you have to hire a larger team and more experienced professionals to achieve the architectural design needed to scaling out.

The ability to scale applications horizontally is a great achievement that can be done only software architecture techniques and technologies designed for this purpose.

In other posts, I'll cover some design patterns, principles, techniques and technologies usually applied to scale-out web applications.