I've been thinking about load balancing (web applications) for a while now and I wanted to share with you all what I think is the essence, the end goal of load balancing, is. Let's begin with a legend.
Let's assume we have a computer with 16 available resources and need 4 resources to respond to a request. Responding includes parsing the request and then doing some work to generate the response.
As we can see from our diagram our computer can handle 4 requests at once. If we wanted our computer to be able to handle more requests at a time we would need a bigger computer. This is called scaling up (vertical scaling). However, this system has a few problems.
The web server and/or application are (both) running on the same computer. If the computer dies, they both die. This is bad because getting an "Unavailable" message is far better than getting an "Unreachable" message.
Your resources may be lying idle. If your computer is hosting an e-commerce application and most people used it only during the day, your resources are lying idle for a majority of the time. (This problem has little to do with load balancing itself but stay with me)
Now let's consider the load balanced approach.
The load balancer and the worker nodes each have 4 available resources. Let's assume that the load balancer uses only 1 resource to redirect the request to a worker node and send back the response once it's ready. Our system still can handle only 4 requests at once but it has some desirable properties now.
We can individually scale the worker nodes and the load balancer. By scaling up the load balancer and adding more worker nodes we can increase the number of requests we can handle. This is true going the other way around too. If there's no traffic we can scale down the load balancer and remove worker nodes.
If one worker node dies then the work can be done by another worker node. Our application is always available.
If we assume that each resource costs $x we can immediately see that our example load-balanced system costs more than our non-load-balanced system, but only if it's at capacity. The goal of a load balanced system isn't to eliminate costs, although it is made possible and in fact easier with the addition of auto-scaling tools, but to render applications highly available.