𝐎𝐨𝐩𝐬! 𝐍𝐨𝐭𝐢𝐜𝐞 𝐬𝐨𝐦𝐞 𝐬𝐞𝐫𝐯𝐞𝐫𝐬 𝐝𝐢𝐞 𝐮𝐧𝐞𝐱𝐩𝐞𝐜𝐭𝐞𝐝𝐥𝐲?
𝘐𝘮𝘢𝘨𝘪𝘯𝘦 𝘢 𝘤𝘦𝘭𝘦𝘣𝘳𝘪𝘵𝘺 𝘱𝘰𝘴𝘵𝘴 𝘺𝘰𝘶𝘳 𝘸𝘦𝘣𝘴𝘪𝘵𝘦 𝘭𝘪𝘯𝘬. 50,000 𝘱𝘦𝘰𝘱𝘭𝘦 𝘩𝘪𝘵 𝘺𝘰𝘶𝘳 𝘴𝘦𝘳𝘷𝘦𝘳 𝘢𝘵 𝘵𝘩𝘦 𝘴𝘢𝘮𝘦 𝘵𝘪𝘮𝘦 𝘮𝘢𝘬𝘪𝘯𝘨 𝘳𝘦𝘲𝘶𝘦𝘴𝘵𝘴. 𝘛𝘩𝘦𝘯 𝘪𝘯 𝘭𝘦𝘴𝘴 𝘵𝘩𝘢𝘯 10 𝘮𝘪𝘯𝘶𝘵𝘦𝘴, 𝘦𝘷𝘦𝘳𝘺𝘵𝘩𝘪𝘯𝘨 𝘨𝘰𝘦𝘴 𝘣𝘭𝘢𝘯𝘬!!!! 𝘏𝘢! 𝘚𝘦𝘳𝘷𝘦𝘳 𝘵𝘪𝘮𝘦𝘥 𝘰𝘶𝘵. 𝘛𝘩𝘢𝘵 𝘤𝘢𝘯 𝘣𝘦 𝘧𝘳𝘶𝘴𝘵𝘳𝘢𝘵𝘪𝘯𝘨. 𝘓𝘦𝘵'𝘴 𝘥𝘪𝘷𝘦 𝘪𝘯𝘵𝘰 𝘢 𝘣𝘦𝘵𝘵𝘦𝘳 𝘴𝘰𝘭𝘶𝘵𝘪𝘰𝘯 𝘴𝘰 𝘵𝘩𝘪𝘴 𝘴𝘩𝘰𝘶𝘭𝘥𝘯'𝘵 𝘣𝘦 𝘩𝘢𝘱𝘱𝘦𝘯𝘪𝘯𝘨, 𝘩𝘢𝘩𝘢 😂
𝐓𝐡𝐞 𝐝𝐨𝐨𝐫 𝐚𝐧𝐚𝐥𝐨𝐠𝐲 𝐭𝐡𝐚𝐭 𝐞𝐱𝐩𝐥𝐚𝐢𝐧𝐬 𝐞𝐯𝐞𝐫𝐲𝐭𝐡𝐢𝐧𝐠
Imagine ten thousand people trying to walk through a single door at the same time. That's what happens to a website when concert tickets go on sale, when iPhone pre-orders open, or when your favorite artist drops surprise merch. The door is your server, and it has a maximum capacity before it breaks.
Now imagine that same crowd, but instead of one door, there are fifty doors, all leading into the same venue. A smart security guard directs people evenly across all fifty doors. Nobody waits long, no door gets overwhelmed, everyone gets in smoothly. 𝐓𝐡𝐚𝐭'𝐬 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠.
𝐓𝐡𝐞 𝐩𝐫𝐨𝐛𝐥𝐞𝐦: 𝐬𝐞𝐫𝐯𝐞𝐫𝐬 𝐡𝐚𝐯𝐞 𝐥𝐢𝐦𝐢𝐭𝐬
Every server can handle a limited number of requests per second. Maybe yours can handle one thousand requests before it starts slowing down. At two thousand requests, it starts timing out. At three thousand, it crashes completely.
For a small blog with a hundred visitors per day, one server is plenty. But what happens when you go viral? What happens when a celebrity tweets your link? What happens when you're selling limited edition sneakers and fifty thousand people hit your site simultaneously at release time?
𝐎𝐧𝐞 𝐬𝐞𝐫𝐯𝐞𝐫 𝐰𝐢𝐥𝐥 𝐜𝐨𝐥𝐥𝐚𝐩𝐬𝐞. It's not a question of if, it's when.
You can buy a bigger server (called vertical scaling), but even the biggest servers have limits, and they're expensive. A better solution is to use multiple smaller servers working together (called horizontal scaling with load balancing).
𝐇𝐨𝐰 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐰𝐨𝐫𝐤𝐬
Instead of having one server handling all traffic, you have multiple servers – five, ten, fifty, whatever you need – all capable of doing the same work. In front of these servers sits a 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐫, which is like a traffic controller. Every incoming request hits the load balancer first, and the load balancer's job is to decide which server should handle that specific request.
Let's say you have five servers behind a load balancer. When a request comes in, the load balancer might:
- Simply pick the server currently handling the fewest requests (called l𝐞𝐚𝐬𝐭 𝐜𝐨𝐧𝐧𝐞𝐜𝐭𝐢𝐨𝐧𝐬 method)
- Rotate through servers in order (called 𝐫𝐨𝐮𝐧𝐝-𝐫𝐨𝐛𝐢𝐧)
- Check which server has the lowest CPU usage and send the request there (called l𝐞𝐚𝐬𝐭 𝐫𝐞𝐬𝐩𝐨𝐧𝐬𝐞 𝐭𝐢𝐦𝐞)
The beautiful part? From the user's perspective, they have no idea this is happening. They visit yoursite.com and get a response. They don't know whether Server 1, Server 3, or Server 5 actually handled their request. 𝗜𝘁 𝗷𝘂𝘀𝘁 𝘄𝗼𝗿𝗸𝘀.
𝐑𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 𝐞𝐱𝐚𝐦𝐩𝐥𝐞: 𝐇𝐨𝐰 𝐒𝐡𝐨𝐩𝐫𝐢𝐭𝐞 𝐜𝐡𝐞𝐜𝐤𝐨𝐮𝐭 𝐰𝐨𝐫𝐤𝐬
You've been to Shoprite during the weekend rush. There are fifteen checkout counters. Imagine if they only opened one counter and everyone had to queue there. The line would stretch to the back of the store, people would abandon their shopping, and the whole system would collapse.
Instead, they open multiple counters and put someone near the entrance directing people: "Counter five is free, counter eight has a short line." 𝐓𝐡𝐚𝐭 𝐩𝐞𝐫𝐬𝐨𝐧 𝐢𝐬 𝐚 𝐡𝐮𝐦𝐚𝐧 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐫. They're distributing the workload (customers) across available resources (cashiers) to prevent any single resource from becoming overwhelmed.
When one cashier is faster than others or when one takes a break, the person directing customers adjusts their strategy. They send more people to the faster cashiers and fewer to the slower ones. This is exactly what digital load balancers do with server traffic.
𝐖𝐡𝐚𝐭 𝐡𝐚𝐩𝐩𝐞𝐧𝐬 𝐰𝐡𝐞𝐧 𝐭𝐡𝐢𝐧𝐠𝐬 𝐠𝐨 𝐰𝐫𝐨𝐧𝐠
Remember when everyone tried to register for COVID vaccines on the NCDC portal or when JAMB registration opens? The sites crashed immediately because they couldn't handle the load. 𝐓𝐡𝐚𝐭'𝐬 𝐚 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠 𝐟𝐚𝐢𝐥𝐮𝐫𝐞.
Compare that to Amazon during Black Friday sales. Millions of people hit their site simultaneously, and it just works. Amazon has thousands of servers behind sophisticated load balancers. When traffic spikes, their system automatically spins up more servers and the load balancer starts directing traffic to them. When traffic drops, they shut down the extra servers to save costs.
𝑻𝒉𝒆 𝒅𝒊𝒇𝒇𝒆𝒓𝒆𝒏𝒄𝒆 𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝒔𝒊𝒕𝒆𝒔 𝒕𝒉𝒂𝒕 𝒄𝒓𝒂𝒔𝒉 𝒂𝒏𝒅 𝒔𝒊𝒕𝒆𝒔 𝒕𝒉𝒂𝒕 𝒔𝒄𝒂𝒍𝒆 𝒔𝒎𝒐𝒐𝒕𝒉𝒍𝒚 𝒐𝒇𝒕𝒆𝒏 𝒄𝒐𝒎𝒆𝒔 𝒅𝒐𝒘𝒏 𝒕𝒐 𝒘𝒉𝒆𝒕𝒉𝒆𝒓 𝒕𝒉𝒆𝒚'𝒗𝒆 𝒊𝒎𝒑𝒍𝒆𝒎𝒆𝒏𝒕𝒆𝒅 𝒑𝒓𝒐𝒑𝒆𝒓 𝒍𝒐𝒂𝒅 𝒃𝒂𝒍𝒂𝒏𝒄𝒊𝒏𝒈.
𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐭𝐲𝐩𝐞𝐬 𝐨𝐟 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠
𝐇𝐚𝐫𝐝𝐰𝐚𝐫𝐞 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐫𝐬 are physical devices that sit in data centers and route traffic. They're expensive but extremely fast. Big companies with their own data centers use these.
𝐒𝐨𝐟𝐭𝐰𝐚𝐫𝐞 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐞𝐫𝐬 run as applications on regular servers. Tools like Nginx, HAProxy, and cloud services like AWS Elastic Load Balancer fall into this category. They're cheaper and more flexible than hardware options.
𝐃𝐍𝐒 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠 uses the domain name system itself to distribute traffic. When someone looks up yoursite.com, the DNS can return different IP addresses for different users, sending some to Server A and others to Server B.
𝐆𝐥𝐨𝐛𝐚𝐥 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠 distributes traffic across different geographic regions. If you have servers in America, Europe, and Asia, a global load balancer can send American users to American servers and Nigerian users to the closest available server, reducing latency.
𝐓𝐡𝐞 𝐡𝐞𝐚𝐥𝐭𝐡 𝐜𝐡𝐞𝐜𝐤 𝐦𝐞𝐜𝐡𝐚𝐧𝐢𝐬𝐦 – 𝐭𝐡𝐞 𝐬𝐞𝐜𝐫𝐞𝐭 𝐭𝐨 𝐫𝐞𝐥𝐢𝐚𝐛𝐢𝐥𝐢𝐭𝐲
Here's a critical feature: load balancers constantly check if your servers are actually healthy. Every few seconds, the load balancer sends a small test request to each server (called a 𝐡𝐞𝐚𝐥𝐭𝐡 𝐜𝐡𝐞𝐜𝐤). If a server doesn't respond or responds with an error, the load balancer marks it as unhealthy and stops sending traffic there.
This is why load-balanced systems are more reliable than single-server setups. If one server crashes, the load balancer notices within seconds and routes all traffic to the remaining healthy servers. 𝐔𝐬𝐞𝐫𝐬 𝐦𝐢𝐠𝐡𝐭 𝐧𝐨𝐭 𝐞𝐯𝐞𝐧 𝐧𝐨𝐭𝐢𝐜𝐞 𝐭𝐡𝐞 𝐟𝐚𝐢𝐥𝐮𝐫𝐞 because their requests just get handled by different servers. Meanwhile, engineers can fix or restart the failed server without taking the whole site offline.
𝐒𝐞𝐬𝐬𝐢𝐨𝐧 𝐩𝐞𝐫𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐞 – 𝐭𝐡𝐞 𝐭𝐫𝐢𝐜𝐤𝐲 𝐩𝐚𝐫𝐭
Load balancing gets complicated when your application needs to remember things about users. If you log into a shopping site and add items to your cart, that information is stored on whichever server handled your requests. But what if your next request goes to a different server? That server doesn't know about your cart.
This is called the 𝐬𝐞𝐬𝐬𝐢𝐨𝐧 𝐩𝐞𝐫𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐞 𝐩𝐫𝐨𝐛𝐥𝐞𝐦 or sticky sessions. Solutions include:
𝐒𝐭𝐢𝐜𝐤𝐲 𝐬𝐞𝐬𝐬𝐢𝐨𝐧𝐬:The load balancer remembers which server you used and always sends your requests there. This works but reduces flexibility.
𝐒𝐡𝐚𝐫𝐞𝐝 𝐬𝐞𝐬𝐬𝐢𝐨𝐧 𝐬𝐭𝐨𝐫𝐚𝐠𝐞: All servers store session data in a central database or cache (like Redis) that they all access. This way, any server can handle any user's request because the session data isn't tied to a specific server.
𝐒𝐭𝐚𝐭𝐞𝐥𝐞𝐬𝐬 𝐝𝐞𝐬𝐢𝐠𝐧: The best solution is making your servers stateless, meaning they don't store user information locally. Instead, authentication tokens or session data get sent with every request. This way, truly any server can handle any request.
𝐋𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠 𝐚𝐭 𝐝𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐥𝐚𝐲𝐞𝐫𝐬
You can load balance at different levels of your architecture:
𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠: Distributing web requests across multiple web servers. This is what most people mean when they say load balancing.
𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠: Distributing read queries across multiple database replicas. Your main database handles writes, but multiple read replicas handle read requests, preventing the main database from being overwhelmed.
𝐌𝐢𝐜𝐫𝐨𝐬𝐞𝐫𝐯𝐢𝐜𝐞𝐬 𝐥𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠: If your app is split into multiple services (user service, payment service, notification service), each service might have multiple instances behind its own load balancer.
𝐂𝐨𝐧𝐧𝐞𝐜𝐭𝐢𝐧𝐠 𝐞𝐯𝐞𝐫𝐲𝐭𝐡𝐢𝐧𝐠 𝐰𝐞'𝐯𝐞 𝐥𝐞𝐚𝐫𝐧𝐞𝐝
Day 1 taught us about geographic distribution with Netflix's Open Connect. Day 2 showed us efficiency through proven technology with WhatsApp's Erlang. Day 3 explained caching. 𝐋𝐨𝐚𝐝 𝐛𝐚𝐥𝐚𝐧𝐜𝐢𝐧𝐠 𝐜𝐨𝐧𝐧𝐞𝐜𝐭𝐬 𝐚𝐥𝐥 𝐨𝐟 𝐭𝐡𝐞𝐬𝐞 𝐜𝐨𝐧𝐜𝐞𝐩𝐭𝐬.
Load balancers can use geographic information to route users to the nearest data center (like Netflix does). They can distribute work efficiently across servers (like WhatsApp's philosophy of doing more with less). They work hand-in-hand with caching because cached content can be served from any server without hitting the database.
𝐒𝐲𝐬𝐭𝐞𝐦 𝐝𝐞𝐬𝐢𝐠𝐧 𝐢𝐬𝐧'𝐭 𝐚𝐛𝐨𝐮𝐭 𝐮𝐬𝐢𝐧𝐠 𝐨𝐧𝐞 𝐭𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞. 𝐈𝐭'𝐬 𝐚𝐛𝐨𝐮𝐭 𝐜𝐨𝐦𝐛𝐢𝐧𝐢𝐧𝐠 𝐦𝐮𝐥𝐭𝐢𝐩𝐥𝐞 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐢𝐞𝐬 𝐭𝐨 𝐛𝐮𝐢𝐥𝐝 𝐬𝐨𝐦𝐞𝐭𝐡𝐢𝐧𝐠 𝐫𝐞𝐥𝐢𝐚𝐛𝐥𝐞 𝐚𝐧𝐝 𝐟𝐚𝐬𝐭.
What this means for your projects
If you're building something small, you probably don't need load balancing yet. But understanding it prepares you for scale. Many cloud providers make it easy to add load balancing when you need it. You can start with one server and add a load balancer plus additional servers when traffic grows.
The mental model matters more than the implementation details. When you're designing any system, ask yourself: 𝐖𝐡𝐚𝐭 𝐡𝐚𝐩𝐩𝐞𝐧𝐬 𝐢𝐟 𝐭𝐫𝐚𝐟𝐟𝐢𝐜 𝐝𝐨𝐮𝐛𝐥𝐞𝐬 𝐭𝐨𝐦𝐨𝐫𝐫𝐨𝐰? 𝐖𝐡𝐚𝐭 𝐢𝐟 𝐢𝐭 𝐢𝐧𝐜𝐫𝐞𝐚𝐬𝐞𝐬 𝐭𝐞𝐧 𝐭𝐢𝐦𝐞𝐬? Load balancing is the answer to those questions.
Join the dev Community
𝐓𝐨𝐦𝐨𝐫𝐫𝐨𝐰 (𝐃𝐚𝐲 𝟓): How does Google show you search results in 0.3 seconds when they're searching billions of web pages? The secret isn't speed, it's pre-computation. We're breaking down why fast apps don't work harder; they work smarter.
JOIN THE CLASS AND SEE HOW LOAD BALANCING COULD MAKE YOUR APP FASTER AND MORE RELIABLE {https://ssic.ng}
Drop a 🔥 if you finally understand why Amazon never crashes, but your favorite local site always does during sales.
Top comments (1)
let's dive deep