Kurt Mackey for Fly.io

Posted on Jun 16, 2021 • Originally published at fly.io

Build a CDN in about 5 hours

#nginx #cdn #performance #distributedsystems

The term "CDN" ("content delivery network") conjures Google-scale companies managing huge racks of hardware, wrangling hundreds of gigabits per second. But CDNs are just web applications. That's not how we tend to think of them, but that's all they are. You can build a functional CDN on an 8-year-old laptop while you're sitting at a coffee shop. I'm going to talk about what you might come up with if you spend the next five hours building a CDN.

It's useful to define exactly what a CDN does. A CDN hoovers up files from a central repository (called an origin) and stores copies close to users. Back in the dark ages, the origin was a CDN's FTP server. These days, origins are just web apps and the CDN functions as a proxy server. So that's what we're building: a distributed caching proxy.

Caching proxies

HTTP defines a whole infrastructure of intricate and fussy caching features. It's all very intimidating and complex. So we're going to resist the urge to build from scratch and use the work other people have done for us.

We have choices. We could use Varnish (scripting! edge side includes! PHK blog posts!). We could use Apache Traffic Server (being the only new team this year to use ATS!). Or we could use NGINX (we're already running it!). The only certainty is that you'll come to hate whichever one you pick. Try them all and pick the one you hate the least.

(We kid! Netlify is built on ATS. Cloudflare uses NGINX. Fastly uses Varnish.)

What we're talking about building is not basic. But it's not so bad. All we have to do is take our antique Rails setup and run it in multiple cities. If we can figure out how to get people in Australia to our server in Sydney and people in Chile to our server in Santiago, we'll have something we could reasonably call a CDN.

Traffic direction

Routing people to nearby servers is a solved problem. You basically have three choices:

Anycast: acquire routable address blocks, advertise them in multiple places with BGP4, and then pretend that you have opinions about "communities" and "route reflectors" on Twitter. Let the Internet do the routing for you. Downside: it's harder to do, and the Internet is sometimes garbage. Upside: you might become insufferable.
DNS: Run trick DNS servers that return specific server addresses based on IP geolocation. Downside: the Internet is moving away from geolocatable DNS source addresses. Upside: you can deploy it anywhere without help.
Be like a game server: Ping a bunch of servers and use the best. Downside: gotta own the client. Upside: doesn't matter, because you don't own the client.

You're probably going to use a little of (1) and a little of (2). DNS load balancing is pretty simple. You don't really even have to build it yourself; you can host DNS on companies like DNSimple, and then define rules for returning addresses. Off you go!

Anycast is more difficult. We have more to say about this — but not here. In the meantime, you can use us, and deploy an app with an Anycast address in about 2 minutes. This is bias. But also: true.

Boom, CDN. Put an NGINX in each of a bunch of cities, run DNS or Anycast for traffic direction, and you're 90% done. The remaining 10% will take you months.

The Internet is breaking

The briny deeps are filled with undersea cables, crying out constantly to nearby ships: "drive through me"! Land isn't much better, as the old networkers shanty goes: "backhoe, backhoe, digging deep — make the backbone go to sleep". When you run a server in a single location, you don't so much notice this. Run two servers and you'll start to notice. Run servers around the world and you'll notice it to death.

What's cool is: running a single NGINX in multiple cities gives you a lot of ready-to-use redundancy. If one of them dies for some reason, there are bunch more to send traffic to. When one of your servers goes offline, the rest are still there serving most of your users.

It's tedious but straightforward to make this work. You have health checks (aside: when CDN regions break, they usually break by being slow, so you'd hope your health checks catch that too). They tell you when your NGINX servers fail. You script DNS changes or withdraw BGP routes (perhaps just by stopping your BGP4 service on those regions) in response.

That's server failure, and it's easy to spot. Internet burps are harder to detect. You'll need to run external health checks, from multiple locations. It's easy to get basic, multi-perspective monitoring – we use Datadog and updown.io, and we're building out our own half-built home grown service. You're not asking for much more than what cURL will tell you. Again: the thing you're super wary about in a CDN is a region getting slow, not falling off the Internet completely.

Quick aside: notice that all those monitoring options work from someone else's data center to your data center. DC-DC traffic is a good start, enough for a lot of jobs. But it isn't representative. Your users aren't in data centers (I hope). When you're really popular, what you want is monitoring from the vantage point of actual clients. For this, you can find hundreds of companies selling RUM (real user monitoring), which usually takes the form of surreptitiously embedded Javascript bugs. There's one rum we like. It's sold by a company called Plantation and it's aged in wine casks. Drink a bunch of it, and then do your own instrumentation with Honeycomb.

Ridiculous Internet problems are the worst. But the good news about them is, everyone is making up the solutions as they go along, so we don't have to talk about them so much. Caching is more interesting. So let's talk about onions.

The Golden Cache Hit Ratio

The figure of merit in cache measurement is "cache ratio". Cache ratio measures how often we're able to server from our cache, versus the origin.

A cache ratio of 80% just means "when we get a request, we can serve it from cache 80% of the time, and the remaining 20% of the time we have to proxy the request to the origin". If you're building something that wants a CDN, high cache ratios are good, and low cache ratios are bad.

If you followed the link earlier in the post to the Github repository, you might've noticed that our naïve NGINX setup is an isolated single server. Deploying it in twenty places gives us twenty individual servers. It's dead simple. But the simplicity has a cost – there's no per-region redundancy. All twenty servers will need to make requests to the origin. This is brittle, and cache ratios will suffer. We can do better.

The simple way to increase redundancy is to add a second server in each region. But doing that might wreck cache ratios. The single server has the benefit of hosting a single cache for all users; with two, you've got twice the number of requests per origin, and twice the number of cache misses.

What you want to do is teach your servers to talk to each other, and make them ask their friends for cache content. The simplest way to do this is to create cache shards – split the data up so each server is responsible for a chunk of it, and everyone else routes requests to the cache shard that owns the right chunk.

That sounds complicated, but NGINX's built in load balancer supports hash based load balancing. It hashes requests, and forwards the "same request" to same server, assuming that server is available. If you're playing the home version of this blog post, here's a ready to go example of an NGINX cluster that discovers its peers, hashes the URL, and serves requests through available servers.

When requests for a.jpg hit our NGINX instances, they will all forward the request to the same server in the cluster. Same for b.jpg. This setup has servers serve as both the load balancing proxy and the storage shard. You can separate these layers, and you might want to if you're building more advanced features into your CDN.

A small, financially motivated aside

Our clustered NGINX example uses Fly-features we think are really cool. Persistent volumes help keep cache ratios high between NGINX upgrades. Encrypted private networking makes secure NGINX to NGINX communications simple and keeps you from having to do complicated mTLS gymnastics. Built in DNS service discovery helps keep the clusters up to date when we add and remove servers. If it sounds a little too perfectly matched, it's because we built these features specifically for CDN-like-workloads.

But of course, you can do all this stuff anywhere, not just on Fly. But it's easy on Fly.

Onions have layers

Two truths: a high cache ratio is good, the Internet is bad. If you like killing birds and conserving stones, you'll really enjoy solving for cache ratios and garbage Internet. The answer to both of those problems involves getting the Internet's grubby hands off our HTTP requests. A simple way to increase cache ratios: bypass the out-of-control Internet and proxy origin requests through networks you trust to behave themselves.

CDNs typically have servers in regions close to their customers' origins. If you put our NGINX example in Virginia, you suddenly have servers close to AWS's largest region. And you definitely have customers on AWS. That's the advantage of existing alongside a giant powerful monopoly!

You can, with a little NGINX and proxy magic, send all requests through Virginia on their way to the origin servers. This is good. There are fewer Internet bear traps between your servers in Virginia and your customers' servers in us-east-1. And now you have a single, canonical set of servers to handle a specific customers' requests.

Good news. This setup improves your cache ratio AND avoids bad Internet. For bonus points, it's also the foundation for extra CDN features.

If you've ever gone CDN shopping, you've come across things like "Shielding" and "Request Coalescing". Origin shielding typically just means sending all traffic through a known data center. This can minimize traffic to origin servers, and also, because you probably know the IPs your CDN regions use, you can control access with simple L4 firewall rules.

Coalescing requests also minimizes origin traffic, especially during big events when many users are trying to get at the same content. When 100,000 users request your latest cleverly written blog post at once, and it's not yet cached, that could end up meaning 100k concurrent requests to your origin. That's a face melting level of traffic for most origins. Solving this is a matter of "locking" a specific URL to ensure that if an NGINX server is making an origin request, the other clients pause until the cache is file. In our clustered NGINX example, this is a two line configuration.

Oh no, slow

Proxying through a single region to increase cache ratios is a little bit of a cheat. The entire purpose of a CDN is to speed things up for users. Sending requests from Singapore to Virginia will make things barely faster, because a set of NGINX servers with cached content is almost always faster than origin services. But, really, it's slow and undesirable.

You can solve this with more onion layers:

Requests in Australia could run through Singapore on the way to Virginia. Even light is slow over 14,624 kilometers (Australia to Virginia), so Australia to Singapore (4,300 kilometers) with a cache cuts a perceptible amount of latency. It will be a little slower on cache misses. But we're talking about the difference between "irritatingly slow" and "150ms worse than irritatingly slow".

If you are building a general purpose CDN, this is a nice way to do it. You can create a handful of super-regions that aggregate cache data for part of the world.

If you're not building a general purpose CDN, and are instead just trying to speed up your application, this is a brittle solution. You are probably better off distributing portions of your application to multiple regions.

Where are we now?

The basic ideas of a CDN are old, and easy to understand. But building out a CDN has historically been an ambitious team enterprise, not a weekend project for a single developer.

But the building blocks for a capable CDN have been in tools like NGINX for a long time. If you've been playing along at home with the Github repo, we hope you've noticed that even the most complicated iteration of the design we're talking about, a design that has per-region redundancy and that allows for rudimentary control of request routing between regions, is mostly just NGINX configuration --- and not an especially complicated configuration. The "code" we've added is just bash sufficient to plug in addresses.

So that's a CDN. It'll work just great for simple caching. For complicated apps, it's only missing a few things.

Notably, we didn't address cache expiration at all. One ironclad rule of using a CDN is: you will absolutely put an embarrassing typo on a launch release, notice it too late, and discover that all your cache servers have a copy titled "A Better Amercia". Distributed cache invalidation is a big, hairy problem for a CDN. Someone could write a whole article about it.

The CDN layer is also an exceptionally good place to add app features. Image optimization, WAF, API rate limiting, bot detection, we could go on. Someone could turn these into ten more articles.

One last thing. Like we mentioned earlier: this whole article is bias. We're highlighting this CDN design because we built a platform that makes it very easy to express (you should play with it). Those same platform features that make it trivial to build a CDN on Fly also make it easy to distribute your whole application; an application designed for edge distribution may not need a CDN at all.