Taking a real world example to cover the basics of designing a distributed system.
Let's say you got an idea of running a fast-food restaurant.
You start by finding a perfect market place, hiring staff, and managing other expenses.
Assuming the business goes well, and the restaurant goes popular in the niche market and you are handling hundreds of orders every day. Now, it's time for you to decide how to scale up the business.
Time To Scale
You got two options:
-
Pay your current staff more and ask them to work for more hours. This is known as Vertical Scaling wherein you are increasing the throughput by using the same resources.
You can also look out for optimizing the current process by doing some preprocessing before the actual order, let's say preparing some stuff beforehand in the non-peak hour which maps to the usage of a cronjob in system design.
-
Hire more staff to scale up your business and manage the orders effectively at peak hours. This is termed as Horizontal Scaling.
In real-world scenarios, a balanced or hybrid approach is used as there is always an upper limit to how much you can scale a machine vertically. You can even look out to hire some backup chef to make your restaurant/system more resilient.
Expanding on the Strength
As a manager, to make sure your business runs smoothly you brainstorm and divide the current chefs into teams based on their expertise.
Suppose, Team A can be responsible for handling pizza orders, Team B can be utilized for handling burgers and Team C for Sandwiches, and so on.
In a way, you have divided responsibilities amongst the team, and you can route orders to a team basis their specialty can get to know the status more easily plus moving forward you can scale up a particular team based on business requirements.
This is the central idea behind microservices in designing systems. As applications become easier to build and maintain when they are broken down into smaller units which work together. Each unit is continuously developed and maintained to handle the overall product.
Introducing Distributed Systems
At this stage, your shop is doing well, you have divided the chef's basis their specialty and your system is available at all times.
Now consider the case when there is a power outage, your staff goes on strike that would mean a loss of business for that day or maybe you want to expand your business even more.
So you can look out to start another shop in a different location that delivers orders, initially the numbers of chefs may be lesser but at least you have a backup and your business isn't impacted much. With this approach, you have distributed your system.
A distributed system is a system with multiple components located on different machines working together and it appears as if a single computer is processing the query to the end-user. These machines work concurrently and can fail independently without affecting the whole system’s uptime.
Apart from scaling the other advantages of a distributed system are:
-
Fault Tolerance is the property that enables a system to continue operating properly in the event of the failure of (or one or more faults within) some of its components.
Low Latency: Latency is a networking term to describe the total time it takes a data packet to travel from one node to another. With distributed systems, we can have multiple nodes and routing traffic to hit the node that is closest to it thus reducing the latency.
With this, the level of complexity has increased in the system, as it would now get a bit difficult to maintain and handle requests across the system.
Handling orders effectively
Now when a customer places an order through a phone call, the responsibility to route his order to a particular shop is on you basis his location and the order processing time of each shop.
Balance the load and making sure not every request is sent to one single shop/server is done by a load balancer.
A load balancer acts as the “traffic cop” sitting in front of the servers and routing client requests across all servers in a manner that maximizes speed and capacity utilization. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.
Tracking metrics
You can maintain a track of every event that's happening in each shop, such as order processing times, most ordered food items, most delivered location, etc basically the success metric to analyze and leverage it for future use.
Similarly, track for failure too like whenever there is a fault in a machine or deliver agent not showing up early which can hamper your response time if not handled well.
Apart from these, a system should be extensible meaning you can easily build something totally new on top of it. Maybe start a bakery shop and take orders for that too.
The above used approach when designing a system is termed as High-Level System Design.
I hope through the above example you got some basic info related to system design.
Top comments (1)
Thanks for this article. Very easy to understand. Keep it up :)