A Content Delivery Network (CDN) is a globally distributed network of proxy servers used primarily to deliver content from locations closer to the user. CDNs are considered a type of cache.
The main goal of a CDN is to minimize request latency for users who are geographically far from the origin servers.
Function and Components
A CDN is composed of geographically dispersed servers that cache content from the original server (the origin) and deliver it to users from the nearest CDN server. This geographical distribution of servers often includes hundreds of "points of presence".
Content Served by CDNs:
CDNs generally serve static files, such as:
- HTML, CSS, and JavaScript files/bundles
- Images and photos
- Videos
- Static assets
Some CDNs, such as Amazon's CloudFront, also support dynamic content.
CDN Workflow:
- When a user requests a file using a URL provided by the CDN provider.
- The request is routed to the CDN server closest to the user.
- If the CDN server has the requested content in its cache (cache hit), it delivers it directly to the user.
- If the content is not in the cache (cache miss), the CDN server requests the file from the origin (which could be a web server or cloud storage like Amazon S3).
- The origin returns the content, optionally including an HTTP header, Time-to-Live (TTL), which defines how long the content should be cached.
- The CDN caches the content and delivers it to the user. Subsequent requests for the same content are served from the cache until the TTL expires.
Types of CDNs
CDNs can generally be classified into two models: Push and Pull:
Type | Description | Content Management | Best suited for |
---|---|---|---|
Push CDNs | Receive new content whenever changes occur on the origin server. The content is uploaded directly to the CDN by the content owner. | Minimizes network traffic since content is uploaded only when new or changed, but maximizes storage on the CDN. | Sites with a small amount of traffic or content that is not often updated. |
Pull CDNs | Grab new content from the origin server only when the first user requests it. This leaves the content on your server, requiring you to rewrite URLs to point to the CDN. | Minimizes storage space on the CDN. However, files might be repulled if they expire before they change, potentially causing redundant traffic. | Sites with heavy traffic, as traffic is spread out more evenly. |
Benefits of using a CDN
CDNs offer significant advantages for system design, especially for high-scale, user-facing applications:
- Reduced Latency and Improved Performance: Users receive content from data centers close to them, significantly reducing latency and boosting performance. In video streaming, the edge server closest to the user delivers the video with very little latency.
- Reduced Load on Origin Servers: CDNs fulfill requests for cached content, lessening the burden on the origin servers and databases.
- High Availability and Scalability: CDNs are resilient against hardware failures and can handle high traffic loads. By shifting static assets to a CDN, it contributes to meeting high availability requirements.
- Stateless Architecture: Using a CDN helps make your origin server architecture more stateless.
CDNs are often incorporated into comprehensive system designs. For example, consistent hashing is a technique used in large-scale systems like the Akamai CDN to distribute load across caches.
Disadvantages and Considerations
While CDNs are crucial for scaling, several factors must be considered:
- Cost: CDNs are run by third-party providers, and charges are incurred for data transfers in and out. Costs can be substantial for large volumes of data transfer. To save costs, only the most popular content (e.g., 20% of videos) may be served via CDN, with less popular content served from cheaper storage.
- Stale Content: If content is updated on the origin server before the TTL expires, the CDN may serve stale data. Setting an appropriate cache expiry time is critical; if too long, content might not be fresh, and if too short, it causes repeat reloading from the origin.
- Configuration: Using a CDN requires changing URLs for static content to point to the CDN domain.
- CDN Fallback: A strategy is required for when the CDN fails (a temporary outage); clients should be able to detect the issue and request resources directly from the origin.
- Invalidation: Files can be removed from the CDN cache before their TTL expires using vendor-provided APIs or by using object versioning (e.g., adding a version number to the URL).
- Dynamic Content: Dynamic content that changes frequently or tasks requiring complex server-side logic might still need to hit the origin server instead of the CDN.
Top comments (0)