DEV Community

Cover image for How to build a CDN (3/3): security, monitoring and practical tips
Ján Regeš
Ján Regeš

Posted on

How to build a CDN (3/3): security, monitoring and practical tips

In the first two articles you learned what components you can build a CDN from and how to set up servers and reverse proxies (CDN cache).

In the third and last article of this series, we would like to add tips and recommendations on how to secure your own CDN, protect it from attacks, how to monitor it or how to develop it further.

In the very end you will find various interesting facts or experiences that implementing our own CDN has taught us as well as some off-topic information. At the same time, I apologize for the very late publication of this third article. Some of the information given in the conclusion is not completely up-to-date, but I believe it's useful. I wish you a pleasant reading :)

Security

  • Since your reverse proxy will also forward attacker requests to Origin servers, deploy one of the available WAF (Web Application Firewall) to reject obvious attacker requests outright and not send them to Origin unnecessarily. In many situations, this can also prevent a cache poisoning attack. If you choose Nginx, we recommend ModSecurity, or Nemesida WAF. Even their basic OWASP TOP 10 rule sets will do a good service. The downside with Nemesida is that it also needs RabbitMQ to run, but the upside is that it has a background process that continuously updates the rules according to a maintained database of known vulnerabilities.
  • If you also want to send file types from the CDN that are subject to CORS (e.g. fonts), then your CDN needs to return the correct Access-Control-Allow-Origin header for CORS requests with the Origin header in the request. We have this configurable per-origin and by default only allow loading from the origin domain. The value * is not recommended. The correct way to do this is to have a set of trusted origins in your webserver or application configuration and only return the header for the trusted origin. It's also a good idea to be aware of what possible caching of CORS headers can do, so consider using the Vary: Origin header as well.
  • For CDNs, as with regular application servers, we recommend setting security headers. If you have a CDN mainly for static content, then especially the X-Content-Type-Options: nosniff header, possibly also X-Frame-Options or X-XSS-Protection which make sense mainly for HTML, but possibly also for SVG or XML. Don't forget also HSTS and the Strict-Transport-Security header, so that the browser already enforces HTTPS internally and doesn't allow downgrades to HTTP.
  • To make sure your CDN is not vulnerable to cache poisoning, we recommend setting various buffers and limit values much more strictly than origin servers usually do. At the same time, if you can afford it, it is better to ignore incoming HTTP headers and forward only a few relevant ones to the origins (e.g. Accept, Accept-Encoding, Origin, Referer, User-Agent). It is also worth considering not to cache the HTTP code 400 Bad Request and definitely not to cache e.g. 413 Request Entity Too Large.
  • When deploying TLS v1.3 with 0-RTT (early data), you need to consider the risk of Replay attack. Since our CDN is optimized and strict only for static content and blocks POST/PUT/PATCH/DELETE requests, the risk of real resulting abuse is almost zero. Furthermore, data modification in the application should never implement a GET request, but at least a POST with a CSRF token, which should additionally have a one-time validity (nonce).
  • You can defend against DNS spoofing by having Nginx upstreams to originals set to IP addresses, not hostnames. We host projects for most clients on our own clustered solutions that allow sites to be accessed through a primary as well as a secondary datacenter and through multiple different IP addresses. So even a CDN upstream to a single original load-balances loading across 2-3 IP addresses at different ISPs. If you must already use hostnames, we recommend using at least Dnsmasq as a local DNS cache.
  • This will not protect you from a DDoS attack, but you can defend against a DoS attack from a single IP address by setting rate-limiting (maximum number of requests per second or minute from a single IP) and connection-limiting (maximum number of open concurrent connections from a single IP). We recommend that you study and understand the burst and delay or nodelay parameters, which fundamentally affect the behavior when an IP address starts to exceed the limits. We typically use multiple levels of rate-limiting on application servers. Also, in the case of POST/PUT/PATCH/DELETE requests, we limit the number of requests per 1 minute as a matter of principle - this effectively prevents brute-force attacks.
  • In addition to the HSTS header, force an immediate redirect from HTTP to HTTPS.
  • If the request comes to a URL where the domain is an IP address or another unsupported domain, use return 444; – Nginx immediately terminates such a connection.
  • Be aware of the risk and implement at least basic protection against looping - for example, refuse to process a URL that includes in the path any of the domains that the CDN "listens" on.
  • If you don't want someone to be able to insert content from a specific origin into foreign pages (and thus draw your data), you can use the valid_referers directive, which will set the variable $invalid_referer according to your rules.
  • Test your HTTPS configuration correctly at SSLLabs.com - you should easily achieve an A+ grade. You can also check security headers at SecurityHeaders.com.

In case you don't have a router in front of the servers that would forward/NAT only selected ports to your server, don't forget to set iptables/nftables firewall. By default, everything should be disabled and only TCP ports 80 and 443 explicitly enabled. Furthermore, you can enable IPsec, SSH, etc. from your IP addresses. In terms of security, it has worked for us for a long time to bind all services that are possible, to bind only to the loopback interface and to route only selected ports from the outside using DNAT in the firewall. You can set some high per-ip rate-limiting even with DNAT at the network level, it is nicely described in the article Per-IP rate limiting with iptables. We recommend completely disabling ICMP as well. But you'll probably at least enable echo-request because of the various online tools for measuring latencies in different parts of the world, such as CDN Latency Benchmark.

DDoS attack protection

The most expensive and effective DDoS protection for your CDN would be to use anycast IP addresses (which most commercial CDN providers don't even have) and use robust DDoS protection from commercial providers who have very powerful devices on the backbone that "protects" and in in case of detection of an attack, they activate mitigation and "cleaning" of the traffic of your IP ranges (scrubbing). Some of these solutions manage to clean even the largest DDoS attacks with a power of up to hundreds of Gbps. However, these solutions cost thousands of USD per month, and you definitely cannot afford them at all PoPs in the world. From our experience, we recommend NetScout's Arbor.

Just for the sake of interest, I will state that from February 25 to 27, 2018, one of our hosted Czech clients was the target of a 230Gbps DDoS attack, built on memcrashed (enables amplification of a UDP attack by up to tens of thousands of times, not tens/hundreds as with DNS or NTP amplification attacks). The first big memcrashed attack came on Cloudflare, and as the first in the Czech Republic, just 1 day later, we had to deal with it. If you don't pay for robust DDoS protection, expect that in the event of a massive attack, only the ISP will call you, saying that in order to protect the entire data center and all its clients, they must completely block your IP subnets on the backbone network (blackhole) until the end of the attack.

So, if you are serious about CDN and its high availability, you need to have DDoS protection arranged at least for the main PoPs. At worst, at least a few of them should be able to withstand even the biggest attack. If you use GeoDNS with auto-failover as I described in the first article and if you follow the rule of always returning at least 2 IP addresses of independent providers in each world location, CDN users would necessarily not even notice some DDoS attacks.

What we based our DDoS protection design on:

  • We have very rich statistics from all our PoPs, possibly even from routers for some. We therefore have a detailed overview and trends of legitimate traffic - the number of open connections, packets, unique IP addresses and GEO information about them. We also collect and monitor NetFlow data for some PoPs. Having detailed information about legitimate traffic and its peaks is key - only on the basis of it is it possible to make correct decisions and propose optimal limits for activating mitigation.
  • From all the DDoS attacks that we have completed in the past, we know that more than 99% of the source IP addresses involved were outside the Czech Republic - that means outside the country of our majority visitors.
  • We certainly cannot afford anycast IP addresses for our PoPs. However, there are a few providers on the market that offer physical or virtual servers with anycast IP addresses.
  • Anycast IP addresses and robust DDoS protection are provided by our DNS providers (Constellix, Cloudflare and ClouDNS). They have such a robust infrastructure that DDoS attacks on their NS servers should withstand.
  • We have robust DDoS protection capable of handling hundreds of Gbps at some PoPs in the Czech Republic. Other PoPs have to make do with any robust DDoS protection of the entire network of a specific ISP (most of them have it at least for an additional fee).
  • Due to the nature of the CDN (different resolved IP addresses in different parts of the world), a higher resistance to DDoS attacks can seemingly follow. It is true, but only partially. Finding out all the IP addresses that your CDN domain/hostname resolves to in different corners of the world is a matter of minutes. The attacker therefore needs to direct the attack to multiple IP addresses (which are also not anycast), so attacking the entire CDN network is only several times more difficult (or requires more power), but not impossible. But if they attack only the domain, then attack sources from, for example, Asia will really only affect PoPs in Asia, so in our case the impact on legitimate primary visitors is almost zero.
  • Except for a few exceptions with 10 Gbps, we have a maximum of 1 or 2×1 Gbps line (bonding) everywhere. That's a pretty thin pipe, however, most of the world's DDoS attacks are statistically smaller attacks around 2-5 Gbps, so if we have firewalls on routers or in Linux set up optimally, we can withstand it quite decently.
  • We have GeoDNS available with minute health checks and automatic failover, so in the event of a successful attack (unavailability of some IP addresses/ports) we can connect backup PoPs to the CDN network, which the attacker did not know about until now (DNS translation has never shown them), or it is familiar PoPs, but with robust DDoS protection.
  • We know that 90% of legitimate traffic at some PoPs consists of traffic from IP addresses of a specific country/continent. We can take this into account for setting geo-based limiting.

A couple of tips on how to handle DDoS protection at the end server level:

  • When you use a Linux firewall or have Linux-based routers, drop all unwanted traffic directly in the RAW table (UDP, ICMP or TCP ports other than 80/443). For UDP, only allow responses from the IP whitelist of the DNS servers you use. In this way, you can protect end devices (servers or routers) against UDP amplification attacks and ICMP flood as effectively as possible. If you only do it in the standard filter, which is after connection-tracking, it's already too late. The processor had to deal with each connection or packet (prerouting, connection tracking, mangle, nat, filter) and each open connection also allocates memory.
  • TCP SYN flood on port 80/443 can be prevented by using rate limiting (in iptables limit or dst-limit), where you say how many new connections with the SYN flag per time (typically a second or a minute) ) you accept (globally or with respect to the src/dest IP address or port). Similar to Nginx, the key here is to properly understand the meaning of the burst setting and understand the leaky bucket algorithm. Be sure to activate SYN cookies.
  • You can protect L7 itself (HTTP/HTTPS traffic) with rate and connection limiting on the firewall and secondarily on Nginx (however, it will never be as effective as a firewall).
  • For PoPs where you know the vast majority of legitimate traffic is local traffic, download the IP subnets of the country/countries (e.g. from ip2location.com). E.g. on PoPs in the Czech Republic, you can have more benevolent rate-limiting for the Czech Republic, but you can be significantly stricter for other countries. When the size of the attack does not exceed the throat of your pipe (connectivity), most likely the majority of visitors from the Czech Republic will not even notice the outage, and you will filter out foreign attacking IP addresses. With good routers, you can easily ensure this, including the dynamic creation of an IP blacklist (which you then directly filter in the RAW table). If you only use a firewall in Linux, you can use ipset to manage these IP lists. Whichever firewall you use, study the meaning of the definition of so-called "chains" in order to minimize the number of firewall rules that connections/packets must go through for their final approval or rejection. Use DROP, not REJECT, to reject. If your firewall allows it and you have a lot of memory, you can also use TARPIT for some TCP situations and slow down the attacker.
  • Extra tip (our non-standard, but functional solution for medium-sized DDoS attacks on L7): A DDoS attack on L7 is a situation where an attacker sends thousands of HTTP or HTTPS requests to your servers per second from thousands of completely unique IP addresses various around the world. Usually these L7 attacks are "only" hundreds of Mbps or units of Gbps, so you can handle it. To give you an idea - if an attacker is to generate 1 Gbps of traffic at the input with 500B (bytes) of HTTP/HTTPS requests, he needs to generate 250,000 requests consistently per second. The proposed solution is optimally implemented on the router, or on the SW firewall of your server (iptables/nftables and ipset). The solution consists in defining several levels of connection-limit rules with different high limits for different sized IP subnets (e.g. . /3, /8, /16, /24) and when the number of open connections from a given IP subnet exceeds the limits, you add the IP address (or, in extreme cases, the entire IP subnet) to the temporary blacklist (technically, in the case of Linux ipset with timeout), which ensures the DROP of all source traffic already directly at the input, in the RAW table. Usually, even in the case of a DDoS attack, several requests come from each source IP address at the same time. IP subnet /3 will temporarily block an eighth of global IP addresses, or even all 8 /3 IP subnets, if it is a really extensive DDoS attack. But if you set it in combination with the previous recommendation and traffic from the Czech Republic IP addresses (or domestic IP addresses of the given PoPs), you enable higher limits and these rules are processed earlier, the majority of visitors get to the reverse proxy (cache) and the CDN will be work, although it will send content more slowly due to saturated connectivity. From other corners of the world (some high-traffic IP subnets), however, you will temporarily drop traffic on the given PoPs, and the attacker will feel that he has brought down your servers (= a successful DDoS attack), because he will have ports 80 and 443 unavailable. Of course, then you need to have your origin servers on the IP whitelist, the IP addresses through which monitoring, IPsec, DNS, etc. connect to the servers. This solution is a bit strange and we invented it ourselves, but it works very well even in a real DDoS attack. However, it is necessary to set the individual limit levels with balance and based on the maximum number of open TCP connections during peaks, which the monitoring will show you. Then, for example, you can set the limit of the number of open TCP connections for the entire huge /3 IP subnet to 5-10 times the previous peak peak time. This will not limit legitimate traffic and you may be able to withstand a DDoS attack.
  • If you have the options, test your DoS and DDoS protections, analyze the behavior, monitor the related load. There are also online tools that, for a fee, can generate quite a lot of traffic from a large number of unique IP addresses and it is not something immoral from the dark net.
  • Design some active mechanisms that will immediately notify you of an ongoing attack - for example, by monitoring the size of the blacklist queue.
  • In any case, when designing these protections, it is good to know how/why and how long TCP connections are open, what governs it and how it behaves from a TCP point of view in the case of the majority of HTTP2 traffic today. In the event of a reaction to an active attack, you can automatically temporarily reduce various timeouts in the TCP stack or on the web server, start sending the Connection: close header, etc.
  • It is also worth mentioning here the possibility of using Fail2Ban, however, the way it is detected and how it works in the case of a large-scale DDoS attack, where tens/hundreds of thousands of lines start appearing in the log a second, not much help. Logging alone then easily writes 10 MB/s to the disk, and if you did not have log buffering access turned on, extreme IOPS.

Monitoring

Regardless of how many servers your CDN consists of, you need to actively and passively monitor them.

We use Nagios for active monitoring and Munin for quick basic graphs of vital signs. In Munin, we can also quickly view trend charts for several years. This is simply not possible with the Kibana listed below (part of the Elastic stack), due to the size of the indexes, or it is necessary to use transformations/rollup into archive indexes.

For more live statistics we use 2 other tools:

  • We use collectd to collect metrics of all vital functions (CPU, RAM, IOPS, storage, network, Nginx) - we send everything to Kibana.
  • Using filebeat, we send all access and error logs to another Kibana. From Ansible, we generate Nginx vhosts so that each origin has its own access and error log.

In individual Kibanas, we have dashboards summarizing CDN traffic as a whole as well as breakdowns by individual servers (PoPs). Thanks to the evaluation of absolutely all metrics from access logs, we have detailed information about, for example:

  • cache hit-ratio
  • statistics of IP addresses and rendering of traffic to the GEO map of the world
  • statistics of HTTP codes
  • statistics of data transfers (we collect the sizes of requests and responses)
  • response time statistics
  • breakdown by servers (PoPs) or individual GEO locations
  • breakdown by origin domains
  • breakdown by content types (JS/CSS/images/fonts/audio/video)
  • breakdown by specific URLs.

We recommend monitoring the DNS resolving of your CDN domains as well, so that you can constantly check whether the GeoDNS providers always return the expected sets of IP addresses. We implemented this monitoring as follows:

  • Nagios monitors the controls listed below every minute and immediately notifies us by e-mail and SMS of unexpected conditions or slow responses of NS (name servers).
  • We wrote a Nagios plugin, which receives the NS server (e.g. ns11.constellix.com, or perhaps 8.8.8.8), the tested domain (e.g. my.cdn.com), a set of expected IP addresses, min. the number of IP addresses, how many of the set must occur in resolving and, of course, the maximum response time and timeout of the NS server. In the event that DNS resolve does not contain the expected set of IP addresses in min. number, or the domain is resolved to another IP address/addresses, or resolving takes a long time, notifications are sent.
  • In this way, every minute we test absolutely all authoritative NS servers of our GeoDNS providers (6× NS Constellix and 4× NS ClouDNS).
  • Every minute we also check the correct functionality of DNS resolving on the popular recursive cache NS servers of Google (8.8.8.8) and Cloudflare (1.1.1.1) to make sure that there is no hitch on the way between the authoritative and recursive DNS servers.
  • We carry out this monitoring both from our servers in the Czech Republic and in other countries through NRPE agents, while, for example, in the case of a plugin running on a German server, it is checked that the DNS has been translated to the IP addresses of our German POPs.
  • We record the results of all these checks in daily-rotated logs and, if necessary, serve as a basis for retroactive analysis of problems or anomalies.

Other useful tools

  • vnstat is recommended for quick network traffic statistics on individual servers. Commands such as vnstat -l for live info, or statistical vnstat -h, vnstat -d or vnstat -m are also often useful. iptraf-ng again for a detailed analysis of current traffic. For an overview of TCP connections, use ss -s or e.g. ss -at.
  • For a quick live overview of what the server is currently doing in all important areas, we prefer dstat, specifically with the dstat -ta switches. And of course htop.
  • If you don't have experience with Kibana yet, take a look at Grafana with InfluxDB. We've been using Kibana for years and have hundreds of custom visualizations and dashboards in it (that's why it was our first choice), but our latest experience is that Grafana with InfluxDB is overall faster, especially for long-term dashboards. However, the concept of working with data, creating visualizations and dashboards is quite different.

Tips and highlights from the implementation

  • When implementing some functionalities, you will definitely encounter one inconvenience in Nginx – the add_header directive does not behave inheritably. If you set add_header in the server level and then also inside the location, only the headers set in the location will be sent in the final, but those set one level higher in the server will be ignored. For that reason, it is better to use the more-headers module and its functions that behave inherited (more_set_headers, more _clear_headers, more_set_input_headers, more_clear_input_headers).
  • If you use Debian, I recommend using the repo from Ondřej Surý (packages.sury.org) (thanks to Ondřej thank you for maintaining this repo), which, in addition to the latest versions of Nginx, also contains compatible versions of the more-headers module.
  • The standard of reliability for us has long been HAProxy, which we have been using for many years for load balancing and various auto-failover scenarios. In addition, since version 2, it has completely redesigned and improved handling of HTTP requests and more robust HTTP/2 support. We first tried to use HAProxy instead of Nginx, but unfortunately it only has very limited caching capabilities, which is critical for a CDN. However, we would certainly use HAProxy as a load-balancer in the event that we have multiple servers behind one PoP.
  • If you want maximum performance, we recommend trying H2O instead of Nginx - https://h2o.examp1e.net/. We have many years of experience with Nginx, so even more complex scenarios are already fully automated in Ansible. Transcription into H2O would definitely be interesting, but also quite time-consuming. In addition, the ratio of 500 open and 650 closed tickets on GitHub is a sign that it is not yet completely production ready.
  • If you have even greater demands on the functioning of the cache, we recommend Varnish instead of Nginx. Nginx is great and according to our measurements a bit more powerful, but with Varnish you can get, for example, cache tagging support through HTTP headers, when you can then selectively invalidate the cache of all URLs with the desired tag. This can be very useful, e.g. in combination with caching of POST requests (e.g. on the GraphQL API), where after detecting a change in some entity on the BE, you could invalidate all relevant caches on the API layer. This is how we cache and invalidate it at the application layer, and our future goal is to cache it at the data level in the CDN as well. For future web projects, we want to stick to the JAMStack philosophy, where such a CDN with smart options for selective cache invalidation plays a key role. Therefore, we will definitely be using Varnish for our CDN in the future, probably in combination with Nginx.
  • If you want to support HTTP/3 (QUIC) we recommend quiche from Cloudflare, or lsquic which is part of the OpenLiteSpeed web server. For now, we are just experimenting with HTTP/3. It requires BoringSSL instead of OpenSSL and additionally Nginx older version 1.16.
  • UPDATE: The point above was valid at the end of 2021. At the beginning of 2024, you already have HTTP/3 support directly in Nginx. We are currently cautious with the deployment of HTTP/3, especially due to the risks of DoS/DDoS attacks, for which we do not yet have sufficient protection mechanisms with UDP.
  • If you use virtualized servers and have the supported HW, use SR-IOV and driver ixgbevf with setting InterruptThrottleRate=1. The queue of incoming requests will be processed more efficiently and the CPU load will also be reduced.
  • If you have a lot of CPU cores and optimize for hundreds of thousands of requests per second, also focus on RPS (Receive Packet Steering), because usually only one CPU core processes the incoming queue.
  • For those who are also interested in various network details regarding browser requests, HTTP/2 streams or DNS resolving, we recommend studying the tools around Google Chrome. Specifically chrome://net-internals/, chrome://net-export/ and related tool https://netlog-viewer.appspot.com/. It helps us to understand the influence of the behavior of HTTPs requests on the rendering of the page itself, also to reveal blind spots where something is waiting, etc.
  • If you really want to understand HTTP/2 and optimize the loading speed of your pages, install nghttp2 and understand how HTTP/2 communicates directly with your website. You can try, for example, the command nghttp -nv https://dev.to/.
  • The performance of the server and its connectivity can be easily tested, e.g. using the one-line nench benchmark.
  • In the case of hosting large files, it is necessary to realize that even if the client makes a byte-range request, your CDN must first load the entire file from the origin (or cache it) and only then return the required chunk from it. That's why it can be better if you have the option to push these videos and other large files to the CDN before visitors start accessing them. But you can also help yourself by using the slice module of Nginx, which can download and cache only configurable "chunks" from Origin.
  • Beware of the popular and sometimes slightly treacherous recursive cache DNS servers of Google (8.8.8.8, 8.8.4.4) or Cloudflare (1.1.1.1). It is not uncommon for Czech visitors to occasionally translate requests to foreign IP addresses. But it only happens once every few days or hours and it usually only lasts a few minutes.
  • Although CDN PoPs as such are functionally fully autonomous and independent, you will still need a connection to some central location for their management, monitoring or e.g.: distribution of cache-purge requests. Therefore, set up IPsec tunnels using strongSwan or WireGuard, the configuration of which can be very nicely automated.
  • When implementing cache deletion, you can use the script nginx-cache-purge, which shows how cache files can be effectively found by URL or mask. I also recommend the articles Purging cached items from Nginx with Lua and Improving NGINX LUA cache purges. We decided to base it on this Lua script, we just added a few of our modifications. If you script it in Lua, we recommend making a vhost listening on a non-standard port, which you will only have available through an IPsec tunnel. If you also implement static brotli/gzip compression, don't forget to delete your .br/.gz files or .webp/.avif files as well.
  • If you are deploying your own or commercial CDN in front of your entire domain, be aware of one potential vulnerability that you can quickly overlook. Respect the client IP address from the X-Forwarded-For header only if the request comes to you via the network only from specific known public IP addresses of CDN servers. In Nginx, trusted sources are defined via the realip module and the set_real_ip_from directive. Never use something like set_real_ip_from 0.0.0.0/0. If you have some part of the domain or application functionality limited only to the IP whitelist, then the attacker could obtain another IP address with the HTTP header.
  • In case you decide to use a commercial CDN, we recommend the domestic CDN77 because their support can ensure that all requests to your source Origin domain will only come from a few fixed IP addresses in the Czech Republic (their CDN proxy servers), and you can set them as trusted. Usually, CDN providers do not tell you the entire list of possible IP addresses of their PoPs, and you cannot rely on the fact that they send a header e.g. Via: cdn-provider in requests. This is simply not safe and can be easily thrown away, while the support of CDN providers will often recommend such a dangerous solution.

Conclusion and practical experience

We hope that the series of these 3 articles helped you and showed you how you can build a CDN yourself. We have described to you what it consists of, and how you can lay out specific components and set them up yourself. But carefully consider whether it is really worth it for your needs. Also keep in mind that you will have several servers running around the world that need to be paid for and also taken care of and patched.

Our CDN, built as described in this article, works great. We have it under close scrutiny and carefully actively and passively monitor traffic on all servers. Its performance and speed in browsers is even higher than commercial CDNs (thanks to static compression and also the fact that most of the content is in RAM, since we don't have thousands of clients). We gradually deploy it in the projects we develop for our clients. Thanks to this, we cover Europe in particular very well and under our own power. In order to cover remote corners of the world just as well, we use another commercial CDN in these secondary locations. We know that we provide our clients with a quality service at a good price. And technically, we have another interesting project in our portfolio that is "living" and brings real value.

In addition, since the production deployment in 09/2019, not a single problem has appeared - all components work flawlessly. We tried not to underestimate anything - the production deployment was preceded by stress and penetration tests. We looked for post-mortems of various successful commercial CDN attacks and tried to debug our configurations according to them. We first tested the functionality on various non-production environments of our client projects. Search engines detect the use of our CDN correctly - despite the fact that the images are loaded from the CDN domain, they are indexed correctly under the domain of origin.

In the future, we will consider splitting the CDN into two parts - one optimized especially for many small, frequently loaded files (eg JS/CSS/icons/fonts) and the other for larger files (eg audio/video or large images). Such a solution can have 2 advantages - the browser will allow even more parallelism when rendering the page (assets will be loaded from several different CDN domains/IP addresses depending on the type) and it will also allow fine-tuning even more precisely to the level of traffic, more efficient use of cache, or HW selection.

In our heads, we still have the option of using our CDN as a reverse proxy in front of the entire client domain, i.e. for all requests, including POST/PUT/DELETE. This would give us the benefit of another level of DDoS protection against Origin servers, but we would deprive ourselves of other benefits – especially targeted optimization for static content and also the use of higher parallelism in browsers, thanks to loading content from several different domains, or IP address. At the same time, it would be very tempting for each PoP to use multiple servers for different types of content with load balancing between these servers, e.g. according to the suffix in the URL. But we have a lot of such possible improvements in the drawer, and maybe they will give us meaning and return in the coming years.

I'm asking everyone - let's report bugs

CDN implementation and debugging also showed us that all technologies have flaws. The more super-features someone brings, the more bugs they make. And that regardless of whether it is developed and tested by one or thousands of people. That's why I have one personal off-topic request: please don't be lax and when we encounter a problem, report it to the authors and don't expect someone to do it for us. This way we will solve the community problem, but also our problem, and at the same time we will learn a lot more, because we often have to go in depth. It also teaches us to communicate things to the other party in an understandable form.

I used to not do it myself, and I thought to myself that "they will surely quickly find out and fix it themselves". A mistake and a faulty reasoning, which I admitted to myself over time...

However, in recent years I have already reported or participated in the correction of various errors myself and problems. For example in Firefox (bugs in behavior and headers around AVIF), Google Chrome (problems with CORS vs. cache vs. prefetching), web server Nginx (HTTP/2), PHP (OPcache), ELK Stack (UI/UX errors in Kibana and Grok in Logstash), in Mikrotik RouterOS or GlusterFS. I also have 13 tickets for MariaDB and MaxScale proxy. Although I could not help with these technologies as a developer, I at least provided enough comprehensible information so that developers could quickly understand the problems, simulate and fix it. If you happen to be making some resolutions to 2024, the willingness to open well-described tickets or send PR could be one of them.

If you are interested in any other CDN-related details, ask in the comments or ask on X/Twitter @janreges. I will be happy to answer.

Test your websites with my analyzer

In conclusion, I would like to recommend one of my personal open-source projects, which I would like to help improve the quality of websites around the world. The tool is available as a desktop application, but also a command-line tool usable in CI/CD pipelines. For Windows, macOS and Linux.

SiteOne Crawler - Free Website Analyzer

I launched it at the end of 2023 and I believe that it will help a lot of people to increase security, performance, SEO, accessibility or other important aspects of a quality web presentation or application. It's called SiteOne Crawler - Free Website Analyzer and I also wrote an article about it. Below you will find 3 descriptive videos - the last one also shows what report it will generate for your website.

In addition to various analyses, it also offers, for example, the export of the entire website into an offline form, where you can view the entire website from a local disk without the internet, or the generation of sitemaps.

Sharing this project with your colleagues and friends will be the greatest reward for me for writing these articles. Thank you and I wish you all the best in 2024.

Desktop Application

Command-line tool

HTML report - analysis results

Top comments (0)