System design involves understanding various components like load balancers, API gateways, caching, queue systems, and microservices to build robust, scalable, and highly available systems.
This guide provides a beginner-friendly overview of how these components work together in a real-world production environment.
Core Components: Client and Server
A client is any device (e.g., mobile, laptop, IoT device) that uses an application and sends requests to a server.
A server is a machine capable of running 24/7 with a public IP address, which is a unique address on the internet for reaching the server.
Cloud providers like AWS offer virtual machines that act as servers, providing 24/7 availability and public IPs, as it's impractical for personal computers to serve this role.
DNS (Domain Name System)
A DNS server is a global directory that maps human-readable domain names (e.g., amazon.com) to their corresponding IP addresses.
When a user types a domain name, the initial request goes to the DNS server, which returns the IP address, a process known as DNS resolution.
Scaling Strategies
Servers can become overwhelmed by increased user traffic, leading to crashes if their physical resources (CPU, RAM) are insufficient.
To handle high loads and ensure fault tolerance, systems must implement scaling.
Vertical Scaling
Increasing the physical resources (CPU, RAM, disk space) of a single existing server or virtual machine.
Pros
- Improves capacity of a single machine to handle more traffic.
Cons
- Requires down time (server restart) to add physical resources, making it unsuitable for applications requiring zero downtime during traffic spikes.
- Can lead to resource waste and increased costs during periods of low traffic, as resources are provisioned for peak load.
Horizontal Scaling
Adding more identical servers (replicas) to distribute the workload, rather than increasing resources of a single server.
Pros
- Provides zero down time during scaling, as new machines boot up while existing ones handle traffic.
- Offers greater fault tolerance and resilience, as the failure of one server does not bring down the entire system.
Cons
- Requires an additional component, a load balancer, to effectively distribute traffic among the multiple server replicas, as DNS can only return one IP address.
Load Balancers
A load balancer is a critical component placed in front of horizontally scaled servers to distribute incoming requests efficiently.
The DNS maps the domain name to the load balancer's IP address, which then intelligently routes requests to individual servers. s s
Load balancers use various algorithms (e.g., Round Robin, which distributes requests sequentially) to balance the load across servers.They perform health checks to ensure that servers are active and ready to handle traffic before routing requests to them.
In cloud environments like Amazon, this service is often referred to as an Elastic Load Balancer (ELB), designed to be highly available and manage its own internal resources effectively.
Microservices and API Gateway
In a microservice architecture, an application is decomposed into smaller, independent services (e.g., authentication, orders, payments).
Each microservice can be independently scaled horizontally and typically has its own dedicated load balancer.
An API Gateway serves as a centralized entry point for all client requests, acting as a reverse proxy that routes requests to the appropriate microservice's load balancer based on defined rules (e.g., /auth requests go to the authentication service).
It can also integrate an authenticator service (e.g., Auth0) to authenticate users before routing their requests. s
Asynchronous Communication
For tasks like sending bulk emails or processing large data uploads, batch processing and background services are essential, as synchronous calls would overwhelm the system.
Synchronous communication, where a service waits for a response from another service, is not scalable for time-consuming operations.
Asynchronous communication allows services to interact without waiting, significantly improving system scalability and responsiveness.
Queue Systems (e.g., Amazon SQS)
Queue systems provide an asynchronous, First-In-First-Out (FIFO) communication channel between services.
A service pushes events (e.g., order IDs) into the queue, and dedicated worker processes pull and process these events independently.
This approach increases parallelism (multiple workers processing events concurrently) and enables rate limiting to manage dependencies on external APIs (e.g., limiting email sends to 10 per second to avoid hitting API limits).
Worker processes can use pull mechanisms to retrieve messages: short polling (frequent, short-duration checks) or long polling (waiting for a longer period, blocking until messages arrive or a timeout occurs).
Queue systems offer acknowledgment of message processing and support Dead Letter Queues (DLQ) for failed messages, allowing for retries and guaranteed processing.
SQS guarantees that each message is processed exactly once by a single consumer.
Pub/Sub Model (e.g., Amazon SNS)
The Pub/Sub (Publish/Subscribe) model is used when a single event needs to notify multiple services (one-to-many communication).
A service publishes an event (e.g., payment completed) to a notification service (topic), and all subscribed services receive and process it.
This forms the basis of an event-driven architecture. s s
A key limitation is the lack of inherent acknowledgment or retry mechanisms; if a subscriber fails to process a message, it might be lost without a built-in recovery process.
Fan-out Architecture
- The Fan-out architecture combines the Pub/Sub model (SNS) with Queue systems (SQS) to achieve both multi-service notification and reliable message processing.
Rate Limiting
Rate limiting is essential to protect systems from being overwhelmed by excessive requests, including malicious DDoS/DoS attacks.
It defines how many requests a user or system can send within a specific time frame (e.g., 5 requests per second), blocking additional requests once the limit is hit.
Common strategies include Leaky Bucket and Token Bucket algorithms, which control the flow of requests to maintain system stability.
Database Optimization
A single database can become overwhelmed by a high volume of read and write requests from increasing services.
Read Replicas are copies of the primary database that handle read operations, especially for tasks like analytics or logging where slight data delay is acceptable.
All write operations and real-time data reads are directed to the primary (master) node to ensure consistency.
This strategy significantly reduces the load on the primary database, improving overall performance.
Caching
Caching stores frequently accessed data or results of complex computations in a faster, in-memory database (e.g., Redis) to reduce direct calls to the main database.
When data is requested, the system first checks the cache. If found, it's returned immediately; otherwise, it's fetched from the main database, stored in the cache, and then returned.
Caching dramatically optimizes system performance and reduces the load on backend databases.
Content Delivery Network (CDN)
A Content Delivery Network (CDN), such as Amazon CloudFront, deploys edge machines (micro machines) globally to serve content closer to users, minimizing latency.
These edge locations are internally routed to the main load balancer, and a single Anycast IP can be assigned to multiple CDN edges, directing users to the nearest available server.
CDNs cache content (e.g., images, videos) at these edge locations.
If a user requests content that is already cached in their regional
edge, it is served directly from the cache, bypassing the origin
server.This process significantly improves user experience by reducing latency, saves network bandwidth, and decreases the load on the origin servers.
This process significantly improves user experience by reducing latency, saves network bandwidth, and decreases the load on the origin servers. ...
Here is a mind map illustrating the benefits of a CDN:
An event is published to an SNS topic, which then automatically delivers copies of the message to multiple SQS queues.
Each SQS queue is processed by its dedicated workers, benefiting from SQS's reliable features like acknowledgment, retries, and DLQs, ensuring that each downstream service reliably processes its part of the event.
This pattern is ideal for scenarios where one event triggers multiple independent and reliable actions, such as uploading a video and simultaneously triggering processes for transcription, different resolutions, and thumbnail generation.
There are ultimate physical limits to how much a single machine can be upgraded.
IP addresses are difficult to remember, prompting the need for user-friendly domain names.
Top comments (0)