Table of content:
The Basic Application
The basic web application usually is just an application runs on a single machine,
the client sends a request and the server makes some processing and responds to that client as shown in fig(1.0)
The Problem
The problem with this architecture is that the server can handle a limited number of requests, and if the number of requests exceeds the server capacity, the server will crash.
or the client will wait for a long time to get a response. Also, the server is a single point of failure, if the server crashes, the whole application will be down.
and this is why we need to scale our application.
Scalability Types
Vertical Scaling:
Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine to make your application handle more requests.
But this is not a good solution because you will reach a limit where you can't add more power to your machine, and also it's expensive in addition to the downtime that you will have to add more power to your machine.
Also, it's not a good solution for high availability because you will have a single point of failure.
So vertical scaling is not a good solution for scalability if you want to scale applications that handles limited number of requests, and you don't care about high availability like your personal blog, then vertical scaling is a good solution for you.
Horizontal Scaling:
Horizontal scaling means that you scale by adding more machines to your pool of resources, so instead of having a single machine handling all requests, you will have multiple machines handling the requests.
This is a good solution because you can add as many machines as you want, and it's a good solution for high availability because you can have multiple machines handling the requests, so if one machine crashes, the other machines will handle the requests.
So horizontal scaling is a good solution for scalability if you want to scale applications that handles a large number of requests, and you care about high availability like your e-commerce website, then horizontal scaling is a good solution for you.
But the problem with horizontal scaling is that it's not easy to implement, you need to make your application distributed, and you need to make sure that all machines are in sync, and you need to make sure that the requests are distributed equally between the machines which adds more complexity to your application.

Load Balancer
As we knew, horizontal scaling is all about adding more machines to your pool of resources, but how to distribute the requests between these machines? we need some kind of mechanism to do that, and this mechanism is called Load Balancer.
Load balancer is a server that distributes the requests between the machines. Load balancers usually have a public IP address, and the client will send the request to the load balancer, and the load balancer will distribute the request between the machines which have private IP addresses.
load balancing algorithms
Load balancers use different algorithms to distribute the requests between the machines, the most common algorithms are:
- Round Robin: The load balancer will distribute the requests equally between the machines.
- Least Connections: The load balancer will distribute the requests to the machine that has the least number of connections.
- Source: The load balancer will distribute the requests based on the source IP address of the client, so if the client IP address is. this algorithm usually used for distribute the requests between the machines that are in different geographical locations. > There are many other algorithms for load balancing, you can check this link for more information.
Use a Well Known Load Balancer
You can build your own load balancer, it's just a reverse proxy server that distributes the requests between the machines, but it's better to use a well known load balancer like Nginx or HAProxy.
Here is an example of how to configure Nginx as a load balancer:
daemon off;
error_log /dev/stderr info;
events {
worker_connections 2048;
}
http {
access_log /dev/stdout; ## this is where the logs will be written
upstream my-load-balanced-app { ## here is the list of machines that will handle the requests
server 129.48.33.130:8081;
server 127.0.0.1:8082;
}
server {
listen 8080; ## this is the port that the load balancer will listen on
location / {
proxy_pass http://my-load-balanced-app; ## here is the name of the upstream
}
}
}
For Node.js Developers
If you are a Node.js developer, you know that Node.js is single threaded which means that it use only one CPU core, which is a problem because you can't use all the power of your machine.
So, at first you need to make horizontal scaling on the single machine to use all CPU cores in your machine, and then you can make horizontal scaling on multiple machines.
Node.js provides a built-in module called cluster that makes it easy to make horizontal scaling on a single machine.
Cluster Module
The cluster module allows you to create a cluster of processes that all share server ports. and the master process will distribute the requests between the workers processes.
Here is an example of how to use the cluster module:
import { NestFactory } from '@nestjs/core';
import cluster from 'cluster';
import { cpus } from 'os';
import { AppModule } from './app.module';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
await app.listen(process.env.APP_PORT || 3000);
}
if (cluster.isMaster) {
cluster.schedulingPolicy = cluster.SCHED_RR; // (1)
console.log(`Master ${process.pid} is running`);
const cpuCount = cpus().length; // (2)
for (let i = 0; i < cpuCount; i++) {
cluster.fork(); // (3)
};
} else { // (4)
console.log(`Worker ${process.pid} started`);
bootstrap();
}
the code above is an example of a simple Nest.js server that uses Cluster module, this code will create a cluster of processes that all share server ports, and the master process will distribute the requests between the workers processes.
Let's explain the code above:
Explanation
If we are in the master process, we will create a cluster of processes and set the scheduling policy [The load balancing algorithm] to be RR => Round Roben, and if we are in the worker process, we will start the server.
-
cluster.schedulingPolicy = cluster.SCHED_RR;: this line of code is used to set the scheduling policy [Load balancing algorithm] for the cluster,cluster.SCHED_RRis a round robin algorithm.cluster.SCHED_NONEis another policy that make the OS distribute the requests between the workers processes, but this not recommended because the OS usually wants to minimize the number of context switches, so it will not distribute the requests equally between the workers processes.cluster.SCHED_RRis the default policy in all operating systems except Windows. [Now we are in Node.js v21.4.0]. -
const cpuCount = cpus().length;: this line of code is used to get the number of CPU cores in the machine, we will use this number to create the workers processes. -
cluster.fork();: this line of code is used to create a worker process. -
else { ... }: this block of code will be executed in the worker process, so the worker process will start the server.
On The Cloud [AWS]
We are talking about scalability, so we need to talk about how you control the number of machines that you have, and how you add more machines to your pool of resources.
To be honest, I don't have a lot of experience with cloud services as I'm not a DevOps engineer, but I will try to explain the idea.
We want to watch the performance of our machines, and if the performance is not good, we want to add more machines to our pool of resources, and if the performance is good, we want to remove some machines from our pool of resources. I will talk about AWS.
- The compute service in AWS is called EC2 which stands for Elastic Compute Cloud which is a virtual machine.
- The service that watches the performance of your machines is called CloudWatch which is a monitoring service.
- AWS provides a service called Auto Scaling Group (ASG) which is a service that watches the performance of your machines, and if the performance is not good, it will add more machines to your pool of resources till you reach the number of machines that you specified, and if the performance is good, it will remove some machines from your pool of resources till you reach the number of machines that you specified.
- AWS also have a load balancer service called ELB which stands for Elastic Load Balancer which is a load balancer service, and it's a good solution for load balancing, you can use it instead of Nginx or HAProxy with your auto scaling group. ASG uses CloudWatch to watch the performance of your machines, and it uses EC2 to add or remove machines from your pool of resources, All you need to do is to create an ASG and specify the minimum and maximum number of machines that you want to have, and ASG will do the rest for you.

Top comments (0)