We mainly discuss about some kind of “good practices” to build a reliable, scalable and performant API. But this may prove to be insufficient when it comes to APIS that have intensives tasks.
It does not mean that applying methods i gave in the previous series of articles around “Web API performance” are now out to date because they might be insufficient. But it’s also a good thing to combine and apply methods based on what your API needs to perform.
So we’re going to talk about server-side performance.
Server-side performance refers to the efficiency and responsiveness of a server in processing requests and delivering responses in a web application or service. It is a critical aspect of web development, as a well-optimized server can significantly impact the overall user experience.
I thought it best to articulate the Server-side performance topic around 3 concepts that I find important: Load Balancing, Scaling and Concurrency & Threading
Load Balancing : This concept may be new to you, but it is substantial.
— Definition:
Load balancing is a networking technique or process used to distribute incoming network traffic or computing workload across multiple servers, resources, or devices.
The primary purpose of load balancing is to ensure that no single resource becomes overwhelmed with traffic or workload, thereby optimizing the performance, reliability, and availability of a network or application.
In order to prevent the server or an instance from being overloaded with requests, load balancers distribute incoming requests so that each of them is correctly executed in time.
Here’s how it works: A load balancer performs its role based on a selected load balancing algorithm. This algorithm defines how our requests will be sent through our servers or instances.
There are various load balancing algorithms, each with its own characteristics and use cases. Here are some common load balancing algorithms:
Round Robin: This is one of the simplest load balancing algorithms. It distributes traffic evenly to each server in a circular manner. Each incoming request is routed to the next server in the list. Round robin is easy to implement but may not take server load into account, potentially sending traffic to an overloaded server.
— Pros:
If all the available servers have the same capacity then :
Each server has exactly the same requests workload
Routed requests are approximately executed in the same time
The response speed is boosted
Clients receive their responses in time and we have a good user-friendly apps
— Cons:
Distributing request in the round robin fashion may cause server overload because let’s suppose that our “Server 1” is 5 times less powerful that our “server 2”, the “server 1” due to his low capacity will not have enough time to complete the first request in time but will receive new requests witch will be queued. The result will be either a long response time or a down server.
So here we need a Round Robin Algorithm that take into account the capacities of each server.
Weighted Round Robin: Weighted round robin assigns each server a weight, indicating its capacity or performance. Servers with higher weights receive more traffic than those with lower weights. This allows for proportional distribution of traffic based on server capabilities.
— Pros
Same advantages as Round Robin
So basically it’s a Round Robin that takes server capabilities into account.
— Cons
Both Round Robin algorithm don’t take in to account the number of connections that servers must maintain for a certain period when the load balancer distributes requests.
This means that multiple connections can accumulate on a server in the cluster. This causes a server overload, even if it has fewer connections than the others.
Least Connections: This algorithm directs traffic to the server with the fewest active connections or sessions. It is effective at distributing traffic based on server load, ensuring that connections are evenly spread across the servers.
The image above shows that “server 1" has 3 actives connections, “server 3” has 4. So the algorithm send the “Request a” to the “server 3” with has 0 active connection, and here we go again, the operation restarts. So now “server 1” has 3 actives connections, “server 2" has 4 and “server 3” has 1, so the “Request b” goes to the “server 3". The same operation restarts and send the “Request c” to the server 3.
— Pros :
One of the primary advantages of the “Least Connections” algorithm is that it distributes incoming traffic to servers with the fewest active connections. This ensures a relatively even distribution of traffic, preventing any single server from becoming overloaded while others remain underutilized.
— Cons:
The “Least Connections” algorithm is typically stateless. It does not consider server load based on factors like CPU usage, memory usage, or application-specific metrics. It only looks at the number of active connections, which might not be a comprehensive measure of server load in all scenarios.
Weighted Least Connections: Similar to weighted round robin, this algorithm considers server weights but also takes into account the number of active connections. Servers with both lower weights and fewer connections receive priority.
Other Algorithm:
IP Hash
Least Response Time
Random
Least BandwidthChained Failover
Dynamic Algorithms:_
(A part 2 will be entirely dedicated to them)
The selection of an appropriate load balancing algorithm is contingent upon the precise demands and intricacies inherent in the application at hand, as well as the existing infrastructure’s configuration.
Within this context, it becomes evident that certain applications derive notable advantages from employing a straightforward round-robin methodology, which systematically distributes incoming traffic to designated servers in a sequential manner.
However, in contrast, there exist scenarios where the requisites of the application dictate the utilization of more advanced and refined load balancing algorithms, ones that exhibit the capacity to meticulously assess and account for the health and performance metrics of individual servers.
It is imperative to recognize that the world of load balancing is not confined to predetermined algorithms, as some load balancers extend the flexibility of tailoring custom algorithms or intricate configurations, thereby affording the capacity to address idiosyncratic and distinct operational needs with precision and efficacy.
Scaling : _Be prepared to scale your server resources both vertically and horizontally as traffic and demand increase.
_
What is scaling ?
Scaling in computing and technology involves adjusting system capacity and capabilities to accommodate evolving workloads, playing a pivotal role in developing and maintaining applications, networks, and infrastructure to satisfy user and data requirements.
There are two primary types of scaling: vertical & horizontal scaling
*Vertical Scaling * (Scaling Up):
Vertical scaling involves increasing the resources of a single server or node to handle a higher workload.
This typically means upgrading the server’s CPU, memory (RAM), storage, or other hardware components.
Vertical scaling can be limited by the maximum capacity of a single server and can be costlier as more powerful hardware is often more expensive.
Horizontal Scaling (Scaling Out):
Horizontal scaling involves adding more servers or nodes to a system to distribute the workload.
Each new server is identical to the existing ones and shares the processing load.
Horizontal scaling is more scalable and cost-effective for handling high traffic loads because it can be easily expanded as needed.
Here are some common examples of scaling in various technology contexts:
Web Servers: Adding more web servers to distribute incoming web requests across multiple machines to handle increased user traffic. This is horizontal scaling.
Database Scaling: In database systems, scaling can involve adding more database servers or optimizing queries and indexes to handle larger data sets.
Application Scaling: Scaling out application servers to accommodate more concurrent users or application instances to process more tasks simultaneously.
Load Balancing: Distributing incoming network traffic across multiple servers to ensure that no single server is overwhelmed. This can be done using load balancers.
Cloud Computing: Cloud services like AWS, Azure, and Google Cloud offer scalable resources on-demand, allowing you to scale your applications up or down as needed.
Microservices: Breaking down monolithic applications into smaller, independent microservices that can be individually scaled based on demand.
Scalability represents a critical cornerstone in the realm of technology and systems engineering, primarily focused on the assurance that a system possesses the capability to effectively embrace growth, uphold consistent performance levels, and swiftly respond to dynamic shifts in demand.
The essence of scalability lies in its capacity to furnish systems with the agility required to seamlessly adapt to evolving workloads and user requirements. In doing so, it not only assures the preservation of a responsive and reliable user experience but also delivers on the promise of cost-efficiency by optimizing resource utilization and mitigating infrastructure expenses.
Consequently, the concept of scalability stands as a fundamental and indispensable component of modern technological landscapes, underpinning the foundation of robust and adaptive systems.
Multi-Threading : Efficiently handling multiple requests simultaneously.
A thread is similar to a process because it represents the execution of a set of machine language instructions from a processor.
Multithreading in the context of a web API refers to the ability of the web server to handle multiple incoming HTTP requests concurrently by using multiple threads. This is important for improving the scalability and responsiveness of web applications, as it allows the server to efficiently utilize the available hardware resources and handle a large number of client requests simultaneously.
It helps improve the scalability of a web API. As the number of incoming requests increases, the server can create additional threads to handle the load, up to a certain limit, without significant degradation in performance.
When designing a multithreaded web API, it’s essential to consider both blocking and non-blocking operations. Blocking operations, such as I/O operations, can cause a thread to wait, potentially reducing the server’s overall throughput. Non-blocking operations, like asynchronous programming, can help mitigate this issue by allowing threads to work on other tasks while waiting for I/O to complete.
Overall, multithreading is a fundamental concept in the development of high-performance web APIs. It allows web servers to efficiently handle a large number of concurrent client requests, ensuring responsiveness and scalability. However, it also introduces complexity and potential pitfalls that developers need to be aware of and address in their designs and implementations.
Server-side performance is a critical aspect of web and application development focused on optimizing the efficiency and speed of operations that occur on the server. It encompasses various strategies and techniques, including code optimization, database tuning, caching, load balancing, and scalability measures, all aimed at ensuring that server resources are used effectively and that applications can handle increasing user loads while maintaining responsiveness. High server-side performance is crucial for delivering a smooth and responsive user experience, particularly in web and mobile applications.
Follow me there :
linkedin.com/in/laetitia-christelle-ouelle-89373121b/
twitter.com/OuelleLaetitia
dev.to/ouelle
Happy coding ! 😏
Top comments (0)