Client–server architecture is one of the most fundamental models used in modern application development.
It consists of two major components:
- Client — the user or application making the request
- Server — the machine that processes the request and returns the response
When a user wants to access a server, the simplest method is by pinging the server’s IP address. However, IP addresses are difficult to remember, which makes direct access impractical. To solve this problem, we use the Domain Name System (DNS).
DNS converts user-friendly domain names into IP addresses, allowing users to connect to the server seamlessly.
Server Performance and Scaling
A server with limited hardware for example, 2 CPUs and 4 GB RAM can process only a certain number of requests. When traffic increases significantly, users may experience delays or the server may even become unresponsive.
To handle this, we can scale the system in two ways:
1. Vertical Scaling (Scaling Up)
This involves increasing the physical resources of the server, such as upgrading to 16 CPUs and 128 GB RAM.
While this increases the server’s capacity, it introduces problems:
- Resource wastage when traffic is low
- Downtime during scaling, since the server needs to be restarted
- Hardware limits — there is a maximum capacity beyond which upgrades are not possible
2. Horizontal Scaling (Scaling Out)
Instead of upgrading one large server, we duplicate multiple small servers with the same configuration (e.g., multiple 2-CPU, 4-GB-RAM servers).
Benefits include:
- No downtime when adding or removing servers
- Better cost efficiency
- High fault tolerance
- Efficient traffic management during peak times
However, when we clone servers, each server gets its own IP address, creating an important question…
How Do We Handle Multiple Server IPs If DNS Points to Only One
This is where Load Balancers come into play.
A load balancer sits between clients and servers. It exposes one public IP to the DNS and intelligently distributes incoming traffic to multiple backend servers.
Critical Design Considerations:
If multiple servers exist, how does the load balancer decide which server to send each request to?
How do we maintain user sessions across multiple servers? (e.g., login sessions)
How do auto-scaling groups work when traffic increases suddenly?
How do applications store shared files when servers are distributed?
How does health-checking ensure that traffic is never sent to a failed server?



Top comments (0)