DEV Community

Prashant Mishra
Prashant Mishra

Posted on • Edited on

Common Terminology in System design

CDN
Load balancing
What is Robot.txt
Difference between API Gateway and Load Balancer
Server Sent Events

CDN

Content Delivery Network or CDN is a system of servers which are geographically distributed to serve static content like images, videos, etc to the users across the globe.

The servers in CDN are called as edge servers.

Important points to keep in mind:
Geographical distribution: The CDN server(edge servers) are geographically distributed, this means when a user requests a content, it can be served from the nearest edge server resulting in low latency.
Caching: The contents are cached on the server, when the same content is requested again, it is served from the cache resulting in quick response time.
Load balancing: CDNs use load balancing to distribute the load across the different servers to prevent a single server being overwhelmed with most of the request, insuring reliability and availability.
Dynamic routing: CDNs can dynamically route the request to best performing server based on real time conditions like server load and network latency.

Example: Imagine a global e-commerce application having its origin servers in US but its edge servers (CDN system) are located across the globe, hence if a request comes form India, it will be served from the nearest edge server instead of going all way to origin server which is US. This will improve response time of application resulting in better user engagement.

CDN

Load balancing

Load balancing distributes incoming requests across multiple servers (N servers) to ensure efficient processing.

Given a request with ID r1, its hash value m1 is calculated using Hash(r1), and then the server is determined by m1 % N, where N is the number of servers. This method directs the request to one of the servers, such as s0, s1, s2, or s3 if there are four servers.

However, this approach has limitations. If the number of servers changes (scaling up or down), the modulo operation results in different server assignments, causing most requests to be rerouted to new servers. This can invalidate cached or persistent data, as requests no longer go to the same servers.

Techniques like consistent hashing are used to address this issue, which minimizes the data movement when servers are added or removed.

Robot.txt

Used in webcrawlers
A robots.txt file is a plain text file that instructs search engine crawlers which URLs they can access on a website. It is a standard part of the Robots Exclusion Protocol (REP) and is used to manage crawler traffic and avoid overloading a website with requests.

Example of robot.txt file

User-agent: *
Disallow: /private/
Crawl-delay: 10
Enter fullscreen mode Exit fullscreen mode

User-agent: tells which crawler is allowed to crawl the domain/website like Google crawler or some other crawler, * means all the crawlers are allowed.
Disallow: means what urls the crawlers are not allowed to visit/crawl
Crawl-delay: tells how much delay should be there before the crawler resends/visits the same site. Meaning there has to be a delay of 10 seconds to access two pages of the same site.

Difference between API Gateway and Load Balancer

API gateway

  1. Acts as a single entry point for the client application to interact with the various backend services
  2. Handles authentication and authorization
  3. Handles rate limiting and throttling
  4. Request/response transformation
  5. Caching
  6. Operates on layer 7 i.e. application layer
  7. It is suitable for micro-service architecture
  8. API gateway is more complex due to the focus on a wider range of tasks like handling different API requirements

Load Balancer:

  1. Are used to distribute traffic evenly between multiple instances of a service based on load, location, availability
  2. Performs Health checks on instances
  3. It can be used on layer 7 i.e. Application layer, routing traffic based on URL(Https endpoints)
  4. It can be used on layer 4 i.e. Transport layer routing traffic based on host/IP and port.
  5. In many architectures, both Api gateway and Load balancers are used together. A Load Balancer might distribute traffic to multiple instances of API Gateway which intern routes requests to various backend services

Server Sent Events

It is a Unidirectional persistent connection (from server to client)
If the client has to talk to the server it will has to do so with a different protocol

  • Client Requests Data from Server Using HTTP:
    The client initiates an HTTP request to the server, typically through a GET request targeting an endpoint that supports SSE.

  • The Requested page opens a Connection with the Server:
    This connection is kept open by the server. Unlike regular HTTP requests, which are closed after a response is sent, the SSE connection remains open so that the server can send data continuously.

  • The Server Sends Data to the Client Whenever New Information Is Available:
    The server pushes updates to the client as they become available without the client needing to make additional requests. The server sends data in text/event-stream format, which the client can parse to update the webpage dynamically.

sse

Top comments (0)