DEV Community: Rasheed Bustamam

NGINX Load Balancing, Failover & TLS on a VPS

Rasheed Bustamam — Wed, 25 Feb 2026 23:30:48 +0000

Using the tools of titans

In our previous post, we built an L7 load balancer using Caddy reverse proxy. In this post, we'll migrate that configuration over to nginx so we can compare tradeoffs. But first, what is nginx?

What is Nginx?

NGINX, or nginx because I don't like screaming (pronounced engine-x, though some folks will say en-jinx), is a high-performance HTTP server and reverse proxy commonly used for load balancing, TLS termination, and serving static content.

Note: for the purposes of this guide, when nginx is written, it should be assumed it's nginx OSS and not nginx's enterprise offering, nginx Plus. Ensure that when reading docs about nginx, you are reading docs about nginx OSS, typically hosted at nginx.org

Where Caddy optimizes for simplicity and automatic TLS, nginx exposes lower-level control over request routing, buffering, and upstream behavior. In our previous setup, Caddy handled reverse proxying and active health checks across two upstream nodes.

In this migration, we move to nginx to gain explicit control over upstream pools, failure detection, and connection timeouts.

Preconditions

I'm assuming you read the last post. If not, here are our baseline assumptions:

Domain bustamam.tech A record points to server-1 public IP
server-1 and server-2 are on the same Hetzner private network
server-2 exposes app on private IP/port: 10.0.0.3:3100 -> container:3000

To confirm, from an ssh session in server-1, run this:

curl -s http://10.0.0.3:3100/api/whoami

If that returns server-2 (or whatever your SERVER_ID is) then we can continue.

$ curl http://10.0.0.3:3100/api/whoami
{"message":"hello from server bustamam-tech-2","serverId":"bustamam-tech-2","pid":1,"time":"2026-02-24T22:34:37.462Z"}

Basic nginx scaffold

Right now, our app looks something like this:

Internet (HTTPS)
        ↓
     Caddy
        ↓
   App containers

Caddy:

Listens on 80/443 (http/s)
Owns the TLS cert
Decrypts HTTPS to HTTP
Forwards HTTP to upstream containers
Does L7 load balancing between them

We are going to introduce nginx as the new edge reverse proxy.

That means nginx will do the same exact thing, and we'll remove Caddy from the loop.

The new architecture becomes:

Internet (HTTPS)
        ↓
     nginx
        ↓
   App containers

We are not:

Changing the app
Changing Docker build
Changing the private network
Moving certs to backend servers
Doing TLS passthrough

We are just replacing Caddy with nginx as the TLS-terminating L7 proxy.

TLS termination at nginx does the following:

It keeps certificates in one place
It allows HTTP-aware load balancing
It lets nginx inspect requests if needed
It simplifies backend containers (they only speak HTTP)

This is a common production pattern.

In order for all of this to work, we need three things:

1. nginx needs config files

So it knows:

which domain it serves
where to proxy traffic
where the cert files are

You can check out the documentation on nginx web servers here.

2. nginx needs certificates

Let's Encrypt cert + private key must live in a mounted volume.

Documentation on nginx certs here.

3. nginx needs to expose ports 80 and 443

Because it becomes the public entrypoint.

Let's start with the config. On your load balancer server, run the following:

mkdir -p nginx/conf.d

This is where your nginx site configs will live. The location is arbitrary -- we will map it to a docker volume.

OK, let's spin nginx up. Update your docker-compose.yml file:

services:
  # caddy, your app, etc
  nginx:
    image: nginx:1.27-alpine
    container_name: nginx
    ports:
      - "8080:8080"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
    depends_on: [bustamam-tech]
    restart: unless-stopped

Then, create a file called 00-shadow.conf in your conf.d directory.

Note: conf files' names are not super important, but nginx does load them in alphabetical/numerical order, so it's a common practice to prepend with 00 for sorting purposes.

# Shadow nginx: runs on :8080 so we can test without touching Caddy (:80/:443)

upstream bustamam_upstreams {
  # round robin + retry-on-failure behavior are nginx defaults
  server bustamam-tech:3000;
  server 10.0.0.3:3100;

  # Note: we're currently relying on nginx's default passive health checks
}


server {
  listen 8080;
  server_name _;

  location / {
    proxy_pass http://bustamam_upstreams;

    # Minimum headers to keep apps happy
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
}

What this does: nginx is now an L7 proxy and can load balance, but it's intentionally naive.

Let's spin up our nginx service.

docker compose up nginx -d

# after it's running

docker compose ps

# you should see your nginx service, as well as any other service that might be running

Now you can do the loop:

for i in {1..10}; do curl -s http://bustamam.tech:8080/api/whoami; echo; done

Note: this is http, not https, and note the port as well, it matches the port in the .conf file.

You should get alternating server IDs. If you don't, double check your config!

Testing Failover

Let's pull down server-2 for a second and try this again. docker compose down on server-2. Then try curl again.

Uh-oh! We're hanging where the round robin would have sent us to server-2! Let's fix that.

Note: nginx has some pretty long defaults, so while it may feel like forever, it might be something like 60 seconds. While it is said that patience is a virtue, a user won't use an app that takes 60 seconds to load or fetch data! Timeouts are your first production knob.

Handling Failover

Let's update our config so we don't wait forever when a destination server is down.

  location / {
    proxy_pass http://bustamam_upstreams;

    proxy_connect_timeout 1s; # timeout for connecting to the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_connect_timeout
    proxy_read_timeout 5s; # timeout for reading the response from the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout

    # Minimum headers to keep apps happy
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }

Restart nginx on server-1

docker compose restart nginx

Note: for the remainder of this post, we will assume that a config change is followed by container restart.

And try it again!

Great! But wait, if server-2 is down, how long are we waiting before nginx sends the request to server-1? Let's instrument some observability. Update your location config so we have access to the upstream IP addresses:

  location / {
    proxy_pass http://bustamam_upstreams;

    proxy_connect_timeout 1s; # timeout for connecting to the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_connect_timeout
    proxy_read_timeout 5s; # timeout for reading the response from the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout

    # Minimum headers to keep apps happy
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    add_header X-Upstream $upstream_addr always;
  }

And let's try a slightly different bash script.

for i in {1..10}; do
  echo "---- $i ----"

  curl -s -D headers.txt \
       -w "\ncode=%{http_code} time=%{time_total}\n" \
       http://bustamam.tech:8080/api/whoami

  grep -i x-upstream headers.txt
  echo
done

Aha! Notice that even though we're getting 200's and getting the right server to respond, look at the third one. We added a whole second to our latency, and you can see that a request attempted to go to server-2 in the X-Upstream header. Even when the request succeeds, failover can still cost you a timeout. Success isn't the same as fast.

Let's flesh this out a bit more. Let's update our upstream config. Defaults exist, but we want our system to be able to explain itself:

# Shadow nginx: runs on :8080 so we can test without touching Caddy (:80/:443)

upstream bustamam_upstreams {
  # primary server with default settings
  # note that because this service lives on this machine, if this server is down, the nginx container will also be down.
  server bustamam-tech:3000;

  # secondary server with custom settings
  # max_fails=1
  #   If 1 request fails within the fail_timeout window,
  #   mark this upstream as "unavailable".
  #
  # fail_timeout=10s
  #   How long to consider that backend "down" before retrying it.
  #
  server 10.0.0.3:3100 max_fails=1 fail_timeout=10s;
}

server {
  listen 8080;
  server_name _;

  location / {
    proxy_pass http://bustamam_upstreams;

    proxy_connect_timeout 1s; # timeout for connecting to the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_connect_timeout
    proxy_read_timeout 5s; # timeout for reading the response from the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout

    # Minimum headers to keep apps happy
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    add_header X-Upstream $upstream_addr always;


    # Note: this is nginx's default https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream
    proxy_next_upstream error timeout;

    # how many retries to attempt before giving up https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream_tries
    proxy_next_upstream_tries 2; # default is 0, which means unbounded retries!
  }
}

Let's quickly talk about max_fails, fail_timeout, proxy_next_upstream and proxy_next_upstream_tries.

max_fails / fail_timeout is upstream-level passive failure marking.
proxy_next_upstream / tries is request-level retry routing.

Think of this as two layers: upstream marking (which servers are considered eligible) and per-request retries (what nginx does when a request fails mid-flight).

Note: unbounded does not mean infinite. To be more explicit, 0 means no explicit limit (i.e., not bounded by tries). In practice, retries are still bounded by timeouts and available upstreams, but it's not a safe default if you're trying to reason about worst-case latency.

Feel free to play around with numbers! For example:

server 10.0.0.3:3100 max_fails=1 fail_timeout=60s;

Now if this server fails once, then it won't be tried again for another 60 seconds. But it also means that if the server came back up, no one would be able to access it for 60 seconds. This is where metrics and understanding your system as a whole is important. For a toy project like this, even 10+ minutes would be fine.

The trade-off is fast recovery (low timeout) vs minimizing probe spikes (high timeout).

You can also tighten up the connect timeout:

proxy_connect_timeout 200ms;

But this only works if your private network is reliably fast. If it ever takes more than 200ms for the server to respond, you may mark an otherwise healthy server as dead due to jitter.

Note: If you're running into issues with your config, you should the difference between upstream definition (where servers live) and proxy behavior (how requests fail over). Mixing them up leads to configs that look reasonable but don't load, and the failure mode is "nothing works, and you're not sure why" unless you validate with nginx -t

So, let's summarize what nginx is doing so far.

nginx tries server-2 (round robin)
it fails, and nginx marks it 'down" for ~10s
for the next ~10 seconds, nginx only uses server-1 (fast)
once the 10s window expires, nginx will probe server-2 again by selecting it for a real request
that request pays the 1s connect timeout (your ~1.5s)
nginx retries server-1 and succeeds
server-2 gets marked down again for another 10 seconds
if server-2 ever comes back up, then any probes will mark server-2 back online

So it seems like we're at parity with Caddy, right? Well, unfortunately, no. We still need TLS termination. Let's handle that next.

TLS Termination

Right now:

Caddy terminates TLS on :443 and proxies to your backends.
nginx is shadow-testing on :8080 (plain HTTP).

TLS termination means:

The client's HTTPS connection ends at nginx. nginx decrypts the request, then forwards it to your upstreams over plain HTTP (usually over a private network/VPC).

So:

Browser ⇄ HTTPS ⇄ nginx (edge)
nginx ⇄ HTTP ⇄ upstreams (private)

That's what we mean when we say "terminate TLS at the load balancer."

Our plan is to replace Caddy. We want the following:

nginx serves HTTP on :80 and handles the Let's Encrypt ACME challenge
certbot obtains certs via webroot
nginx serves HTTPS on :443 using those certs and proxies to upstreams
shut down Caddy (to free 80/443), bring up nginx+certbot

Redirect http to https

Alright, we'll need some new directories for configs and certs.

mkdir -p nginx/www nginx/letsencrypt

Your directory structure should look something like this:

I have a few extra files from messing around with configs. And again, the directory names are arbitrary. We'll get them mapped in docker. Important to understand that certbot doesn't "talk to nginx." They just share a filesystem. Certbot writes files. nginx serves them. That's it.

nginx/www is where the ACME challenge files are written. When Let's Encrypt validates your domain, it requests http://bustamam.tech/.well-known/acme-challenge/<token> . Certbot writes that token file into your www/ directory, and nginx will serve that directory.
nginx/letsencrypt is where certs live (shared with nginx). When certbot succeeds, it writes cert files into: /etc/letsencrypt/live/bustamam.tech/ . So whatever local directory maps to /etc/letsencrypt must also be shared between certbot (read/write) and nginx (read-only).

Note: for more information on ACME and other Let's Encrypt challenges, check out their documentation on challenge types

Let's delete everything in conf.d and start with a fresh config: bustamam.tech.conf (or whatever you wanna name it)

# ================================
# Upstreams
# ================================
upstream bustamam_upstreams {
  # Primary (local container)
  server bustamam-tech:3000;

  # Secondary (remote server over private network)
  server 10.0.0.3:3100 max_fails=1 fail_timeout=10s;
}

# ================================
# HTTP (port 80)
# - Serve ACME challenge
# - Redirect everything else to HTTPS
# ================================
server {
  listen 80;
  server_name bustamam.tech;

  # Let's Encrypt HTTP-01 challenge files live here
  location /.well-known/acme-challenge/ {
    root /var/www/certbot;
  }

  # Everything else goes to HTTPS
  location / {
    return 301 https://$host$request_uri;
  }
}

Footgun: We are purposely deferring https for later in this article. If you enable the listen 443 ssl server block before certs exist, nginx may fail to start, and you'll see port 80 "hang" because nothing is listening. The bootstrap sequence is: HTTP first → obtain cert → enable HTTPS.

OK, now we need to update our docker-compose.yml file:

  nginx:
    image: nginx:1.27-alpine
    container_name: nginx
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/conf.d:/etc/nginx/conf.d:ro
      - ./nginx/www:/var/www/certbot:ro
      - ./nginx/letsencrypt:/etc/letsencrypt:ro
    depends_on:
      - bustamam-tech
    restart: unless-stopped

  certbot:
    image: certbot/certbot:latest
    container_name: certbot
    volumes:
      - ./nginx/www:/var/www/certbot:rw
      - ./nginx/letsencrypt:/etc/letsencrypt:rw
    restart: "no"

Important to note:

nginx mounts certs directory read-only
certbot mounts cert directory read-write

Now let's bring our creation to life.

Bring nginx up on port 80, test http

Caddy is currently occupying ports 80 and 443. So if you have Caddy running, bring it down with docker compose down caddy. Then, bring up nginx. If it's already running, run docker compose restart nginx. Otherwise, docker compose up nginx -d.

Then test http connection:

curl -I http://bustamam.tech

You should see a 301 redirect to https, which is exactly what we want.

Note: if this hangs, you may need to debug if the services are running on the ports. Try running this on the host machine: sudo ss -lntp | grep -E ':80|:443' and starting there.

But we don't have https set up. Let's go do that.

Set up https

Let's update our conf file:

# ================================
# Upstreams
# ================================
upstream bustamam_upstreams {
  # Primary (local container)
  server bustamam-tech:3000;

  # Secondary (remote server over private network)
  server 10.0.0.3:3100 max_fails=1 fail_timeout=10s;
}

# ================================
# HTTP (port 80)
# - Serve ACME challenge
# - Redirect everything else to HTTPS
# ================================
server {
  listen 80;
  server_name bustamam.tech;

  # Let's Encrypt HTTP-01 challenge files live here
  location /.well-known/acme-challenge/ {
    root /var/www/certbot;
  }

  # Everything else goes to HTTPS
  location / {
    return 301 https://$host$request_uri;
  }
}

# ================================
# HTTPS (port 443)
# - Terminate TLS here
# - Reverse proxy to upstreams over HTTP
# ================================
server {
  listen 443 ssl;
  server_name bustamam.tech;

  # TLS certs (provided by certbot via shared volume)
  ssl_certificate     /etc/letsencrypt/live/bustamam.tech/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/bustamam.tech/privkey.pem;

  # A minimal modern TLS posture
  ssl_protocols TLSv1.2 TLSv1.3;

  location / {
    proxy_pass http://bustamam_upstreams;

    # Fail fast
    proxy_connect_timeout 1s;
    proxy_read_timeout 5s;
    proxy_send_timeout 5s;

    # Deterministic retry behavior (make defaults explicit)
    proxy_next_upstream error timeout http_502 http_503 http_504;
    proxy_next_upstream_tries 2;

    # Forwarding headers
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto https;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # Debug: show which upstream served (or was attempted)
    add_header X-Upstream $upstream_addr always;
  }
}

The http part (port 80) is the same. https is just a barebones skeleton with some sensible defaults. The ssl_certificates don't exist yet though, so let's make those.

Obtain the certificates

Let's start with a test cert. In your host machine, run this command:

docker compose run --rm certbot certonly \
  --webroot -w /var/www/certbot \
  -d bustamam.tech \
  --test-cert \
  --agree-tos \
  -m rasheed.bustamam@gmail.com \
  --no-eff-email

It'll probably pull from docker, and when it succeeds, you should see a bunch of stuff appear under your letsencrypt directory:

If yes, then rerun the command without the test-cert flag.

docker compose run --rm certbot certonly \
  --webroot -w /var/www/certbot \
  -d bustamam.tech \
  --agree-tos \
  -m rasheed.bustamam@gmail.com \
  --no-eff-email

It's possible this will ask you to reuse your current cert, or create a new one. Choose to create a new one; you can't use a test cert in production environments.

Now let's restart nginx so it can read our new certs!

Activate https in nginx

Just run

docker compose restart nginx

And test:

curl -I https://bustamam.tech

Let's test our whoami route too:

curl -s https://bustamam.tech/api/whoami

Now we have https working and our load balancer is still working!

Now, I have to note -- since we are managing our own certs, we also have to renew it:

docker compose run --rm certbot renew --webroot -w /var/www/certbot
docker exec nginx nginx -s reload

You can run this on a cronjob if you'd like, but it's not in the scope of this article.

Comparison to Caddy

Now that we finally got parity with Caddy, let's compare!

As a reminder, this was our Caddyfile:

bustamam.tech {
    reverse_proxy bustamam-tech:3000 10.0.0.3:3100 {
        lb_policy round_robin

        # total retry window across upstreams
        lb_try_duration 3s             

        # how often to retry upstreams within that window
        lb_try_interval 250ms

        # Active health checking
        health_uri /api/healthz
        health_interval 5s
        health_timeout 2s

        # How long to consider a backend "down" after failures (circuit breaker window)
        # duration to keep an upstream marked as unhealthy
        fail_duration 10s              

        # threshold of failures before marking an upstream down
        max_fails 1                    

        # Fail fast when an upstream is unresponsive
        transport http {
            # TCP connect timeout to the upstream
            dial_timeout 1s            

            # slow backend detection (time waiting for first byte)
            response_header_timeout 2s 
        }
    }
}

We got active health checking and automatic TLS issuance and renewal. And then this was nginx:

# ================================
# Upstreams
# ================================
upstream bustamam_upstreams {
  # Primary (local container)
  server bustamam-tech:3000;

  # Secondary (remote server over private network)
  server 10.0.0.3:3100 max_fails=1 fail_timeout=10s;
}

# ================================
# HTTP (port 80)
# - Serve ACME challenge
# - Redirect everything else to HTTPS
# ================================
server {
  listen 80;
  server_name bustamam.tech;

  # Let's Encrypt HTTP-01 challenge files live here
  location /.well-known/acme-challenge/ {
    root /var/www/certbot;
  }

  # Everything else goes to HTTPS
  location / {
    return 301 https://$host$request_uri;
  }
}

# ================================
# HTTPS (port 443)
# - Terminate TLS here
# - Reverse proxy to upstreams over HTTP
# ================================
server {
  listen 443 ssl;
  server_name bustamam.tech;

  # TLS certs (provided by certbot via shared volume)
  ssl_certificate     /etc/letsencrypt/live/bustamam.tech/fullchain.pem;
  ssl_certificate_key /etc/letsencrypt/live/bustamam.tech/privkey.pem;

  # A minimal modern TLS posture
  ssl_protocols TLSv1.2 TLSv1.3;

  location / {
    proxy_pass http://bustamam_upstreams;

    # Fail fast
    proxy_connect_timeout 1s;
    proxy_read_timeout 5s;
    proxy_send_timeout 5s;

    # Deterministic retry behavior (make defaults explicit)
    proxy_next_upstream error timeout http_502 http_503 http_504;
    proxy_next_upstream_tries 2;

    # Forwarding headers
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto https;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    # Debug: show which upstream served (or was attempted)
    add_header X-Upstream $upstream_addr always;
  }
}

We get passive health checking, and then we needed Certbot to manage our certs for us.

So you may be asking, "Why is nginx better than Caddy??" and the answer is that it isn't, not necessarily. Caddy is the better default for small systems. nginx is better when you need explicit control, standardized ops, or you're operating inside a bigger ecosystem.

Doing nginx here isn't "because it's better," it's because it teaches you how the edge actually works when the platform stops holding your hand.

Caddy can keep a backend out of rotation before a user hits it. nginx usually learns a backend is dead because a user hit it (or because passive marking was configured).

Caddy is the better default for small systems. nginx is better when you need explicit control, standardized ops, or you're operating inside a bigger ecosystem.

It's important to call out that we don't want to be comparing "lines of config" when evaluating tools. It's a matter of what you own vs what you delegate.

Caddy: batteries included, opinionated defaults

We got, almost for free:

automatic TLS issuance/renewal
active health checks
nice LB ergonomics (health_uri, fail_duration, etc.)
fewer footguns
10-ish lines of config

So for a $20 VPS and learning, Caddy is amazing.

nginx OSS: modular and explicit

We had to build the edge out of primitives:

TLS is not automatic (had to use certbot)
health checks are passive unless you add extra machinery
reload behavior and config validation are on you
you need to understand contexts (upstream vs location) or you break it
about 60-ish lines of config

That pain is the point: nginx forces us to learn the contract between:

TCP port binding
TLS termination
request routing
retries/timeouts
failure detection
certificate lifecycle

This is the systems knowledge that we're trying to learn in the first place.

When to Choose nginx over Caddy

Ideally, your team is already using one and you just need to learn it :)

But for greenfield projects, or for understanding when to migrate from Caddy to nginx:

1) When you need a boring industry standard

nginx is everywhere. If you join a team with existing nginx infra, knowing it is immediate leverage.

2) When you need predictable, explicit behavior at the edge

In nginx you can be extremely specific about:

what counts as retryable
how many tries
timeouts per phase (connect/send/read)
failure semantics per upstream

Caddy has knobs too, but nginx's model maps closely to how a lot of production stacks think.

3) When the ecosystem around it matters

nginx has deep integration patterns with:

legacy deployments
enterprise tooling
common security hardening playbooks
common debugging muscle memory (every SRE has done nginx -T, nginx -t, reloads, etc.)

4) When performance tuning at massive scale is the job

At large companies, nobody is choosing nginx because "it's faster" per request in isolation. They're choosing because:

they know how to operate it safely
they know how it fails
it has predictable resource profiles and instrumentation patterns

The interesting part isn't that we got it working. It's that we can now explain worst-case latency: connect timeout + number of tries + fail_timeout window. That's the difference between 'it seems fine' and 'I can predict how it fails.'

Conclusion

For my $20 VPS and my hobby projects, Caddy is obviously the better tool. It's simpler, safer, and gives me active health checks and automatic TLS with almost no ceremony.

I rebuilt it in nginx anyway because nginx makes the hidden parts visible: TLS bootstrapping, reload semantics, passive vs active failure detection, and how retries interact with timeouts. Those are the concepts that scale, and that's the whole point of this series.

In the next post, we'll actually go in the opposite direction -- we'll use a managed service to do all of this for us. See you there!

Building a Load Balancer from Scratch on a $20 VPS

Rasheed Bustamam — Tue, 24 Feb 2026 21:32:28 +0000

The journey to a million users begins with a simple load balancer

If you’ve deployed a web app, you’ve already built the first 10% of a scalable system.

The next 10% is learning what happens when one server isn’t enough, and what failure feels like in production.

If you’ve never deployed a web app, start here: Reddit Guide

In this post, I’ll recreate the "load balancer" chapter from Alex Xu’s System Design Interview on a $20 VPS, using Caddy as an L7 reverse proxy. The goal isn’t novelty, it’s building intuition for retries, health checks, and failover.

Baseline Assumptions

I had my portfolio site deployed on a Hetzner VPS. The app is a Tanstack Start app, but we are assuming you have something similar to the following:

An app that runs in a container on port 3000.
A CI process builds/pushes to GHCR.
A server pulls latest via compose.

Once you have an app that lives on a server somewhere, we are good to go.

Load Balancing

Now what if I post a cool project on Hacker News and it goes viral and I get thousands of people looking at my site? My poor half-CPU server would probably melt. I could scale vertically and just add more compute. But let's say it went really viral and I got millions of people looking at my cool project! Well, you can't have a single machine with infinite CPU.

That's where the load balancer comes in. A load balancer is also a failure detector and a policy engine: it decides where traffic goes and how quickly it gives up when a backend misbehaves. It can do a few things:

It can distribute traffic among several servers. This assumes your app is largely stateless (or that state lives in shared systems like a DB/Redis).
If one server is down, your app can still work because it exists on the other ones

Without a load balancer:

One server → single point of failure (SPOF)
One server → limited CPU
One server → deployment risk

Load balancing is the first real step from "app" to "system." Think of it as traffic control: each request gets directed to a healthy server.

Of course this can cause some funny behavior! If server 1 had one version of an app and server 2 had another version, then you can easily run into a situation where you see a bug happen in one user's session and not another. This is where observability comes into play, but that is not in the scope of this blog post.

There are two types of load balancers: L4 and L7.

L4 Load Balancing

L4 load balancing is on the transport layer. L4 is concerned with TCP/UDP forwarding, but is blind to HTTP routes. In other words, it allows you to route requests to different servers without knowing what is in the request. Thus, you cannot inspect the MIME type or URL of the request, for example, or any of its contents. But it's used because it's fast.

L7 Load Balancing

L7 is on the application layer. L7 understands HTTP, can route by path/headers, can do HTTP health checks. For example, we can route requests to /api to our API server, but all other requests go to our web app server.

For the purposes of this article, we will be focusing on L7 load balancing using Caddy reverse proxy. I'm using L7 because it’s the most common "first load balancer" in web stacks and it's where practical concerns like health checks and timeouts show up fast.

Wiring two backends behind one entrypoint

So, how do you set up a load balancer? Well, I spun up another small Hetzner server in the same network zone (in my case, eu-central). I copied the bustamam-tech portion of the docker-compose file to the new server. I ran docker compose up -d to spin up my portfolio site on that server.

Then, I set up a private network on Hetzner. Go to your Hetzner project and go to Networks. Click on Create Network.

You can name your network whatever you want. I called mine load-balancer-test. You can change the name later. Ensure the network zone matches the same network zone of the two servers. Then, for IP range, you can leave that as default.

In case you're curious about what the IP range means though, this Stack Overflow post explains what the IP range means, and you can do more research into it. But suffice to say, it allocates a bunch of private IP addresses for the resources you put into your network. the 16 means that you get around 65k IP addresses, which ought to be plenty.

Then, click into your network and click on "Attach Resources." Click on your servers and you should see the private IP addresses your resources have.

Note: I used /24 for my IP range; this gives me 256 unique IP addresses which should suffice for my experimentation

Now your servers can talk to each other! Let's add a basic /api/whoami route to our app so we know which server we're talking to.

import { createFileRoute } from "@tanstack/react-router";
import { json } from "@tanstack/react-start";

function getServerId(): string {
  return (
    process.env.SERVER_ID ?? 'unknown'
  );
}

export const Route = createFileRoute("/api/whoami")({
  server: {
    handlers: {
      GET: async () => {
        const serverId = getServerId();
        return json({
          message: `hello from server ${serverId}`,
          serverId,
          pid: process.pid,
          time: new Date().toISOString(),
        });
      },
    },
  },
});

Great, now let's update your docker-compose.yml files on both of our servers.

server-1:

  bustamam-tech:
    image: ghcr.io/abustamam/bustamam-tech:latest
    container_name: bustamam-tech
    environment:
      SERVER_ID: bustamam-tech-1
    expose:
      - "3000"
    restart: unless-stopped

Notice the addition of the SERVER_ID env var.

server-2:

services:
  bustamam-tech:
    image: ghcr.io/abustamam/bustamam-tech:latest
    container_name: bustamam-tech
    environment:
      SERVER_ID: bustamam-tech-2
    ports:
      - "10.0.0.3:3100:3000"
    restart: unless-stopped

On server-2, we bind the container to its private IP so other machines in the network can reach it.

Note: 10.0.0.3 is the private IP address of server-2.

Now, try doing an API call from one server to another (for example, from bustamam-tech-1 I ran http://10.0.0.3:3100/api/whoami and got a response).

root@bustamam-tech-1: $ curl http://10.0.0.3:3100/api/whoami

{"message":"hello from server bustamam-tech-2","serverId":"bustamam-tech-2","pid":1,"time":"2026-02-24T18:15:13.089Z"}

Hooray! There's a lot of neat stuff you can do just by setting up a private network (shared DBs, etc), but we'll probably tackle that later in the series.

Note: For cost, I colocated the load balancer and one backend on the same box. This is a SPOF (single point of failure). In production, the LB should be an independent failure domain (or managed).

To set up your load balancer using Caddy, update your Caddyfile to reference your other server:

bustamam.tech {
    reverse_proxy bustamam-tech:3000 10.0.0.3:3100 {
        lb_policy round_robin
    }
}

Notice that we added 10.0.0.3:3100 as a target. If we were using a dedicated server for load balancing, then we'd need to update bustamam-tech:3000 in a similar fashion and use its private IP address, but since we are using the server on which the bustamam-tech service is running, we can just refer to the server name.

lb_policy round_robin means that the load balancer will forward each request to each server once before starting over from the first one. There's a bunch of other algorithms; another commonly used one is least_conn which will connect the request to the server with the fewest connections. For more information, check out the Caddy docs.

And that's it! Restart your Caddy server docker compose restart caddy and then from a terminal outside of your Hetzner network (like running locally), run the following bash command:

for i in {1..10}; do curl {your_whoami_route}; echo; done

If you properly see the requests alternating, congrats! You just set up your own load balancer.

Failover: the first real production footgun

Let's test failover. If server 2 goes down, then the load balancer should only ever connect to server 1.

Run docker compose down on server 2 to simulate the server being down. Then run the bash script again.

for i in {1..10}; do curl {your_whoami_route}; echo; done

Uh oh! Notice the 5 seconds of latency? This means failover isn't working. We need a way for our load balancer to know if a server is up or not. There's a few ways to do this, but a common solution is to use a health check, which is a route that basically says "I'm alive!".

Why health checks? It doesn't actually do anything, right? It's just a route that returns static OK.

Well, this is why we have health checks:

Without health checks, the LB only discovers failure after a request fails.
That means user-facing latency spikes.
Health checks convert reactive detection into proactive detection.

Without health checks, failover happens only after a user request times out. That means your users become your monitoring system, which is not an ideal experience for them.

Well, we already have a /api/whoami route that doesn't do anything but return an environment variable. Let's use that.

bustamam.tech {
    reverse_proxy bustamam-tech:3000 10.0.0.3:3100 {
        lb_policy round_robin

        # Active health checking
        health_uri /api/whoami
    }
}

Note: in many production environments, the health check route is at a route like /api/healthz or /api/healthcheck that just returns { OK: true } or something like that. I'll leave that as an exercise for the reader to implement if interested.

Restart your Caddy service (docker compose restart caddy), then try it again.

for i in {1..10}; do curl {your_whoami_route}; echo; done

Yay! All of our traffic is being routed to server 1!

So, what happened during the failover process?

Caddy picked server-2 (round robin)
server-2 was down
the client waited for any connect/response timeouts
only then did Caddy tried the next upstream

Without health checks, your first failure is paid for by a real user.

The final config I went with is:

bustamam.tech {
    reverse_proxy bustamam-tech:3000 10.0.0.3:3100 {
        lb_policy round_robin

        # total retry window across upstreams
        lb_try_duration 3s             

        # how often to retry upstreams within that window
        lb_try_interval 250ms

        # Active health checking
        health_uri /api/healthz # note I changed this from /api/whoami
        health_interval 5s
        health_timeout 2s

        # How long to consider a backend “down” after failures (circuit breaker window)
        # duration to keep an upstream marked as unhealthy
        fail_duration 10s              

        # threshold of failures before marking an upstream down
        max_fails 1                    

        # Fail fast when an upstream is unresponsive
        transport http {
            # TCP connect timeout to the upstream
            dial_timeout 1s            

            # slow backend detection (time waiting for first byte)
            response_header_timeout 2s 
        }
    }
}

Again, refer to the Caddy docs for more information on the config. These settings are basically: bounded retries, active health checks, and a circuit breaker, plus aggressive timeouts so failure is detected quickly. Rule of thumb: timeouts first, retries second. Retries without timeouts just turn slow failures into traffic jams.

And there we have it! A complete load balancer in just a few lines of code.

Wrapping it up

At this point, you've moved from deploying an app to operating a system.

We explored L7 load balancing. We set up a second server to host our web app, we used Caddy to implement failover, and we watched it work before our eyes. Best of all, this didn't require a lot of code!

Notably though, we did not cover:

Database replication
Session stickiness
Deployment coordination
Distributed logging
Observability

We will tackle these later in the series.

Caddy is great for reverse proxies and basic L7 load balancing. But many companies will expect you to also know how to set up a load balancer in nginx or HAProxy. Next: nginx or HAProxy, and why teams choose one over the other (operability, observability, failure semantics).

Why Design for Scale?

Rasheed Bustamam — Tue, 24 Feb 2026 21:24:53 +0000

Or, "can't we just use more compute?"

Preface

Hi, I’m Rasheed Bustamam.

I’ve been a full-stack engineer since 2015. I’ve worked with and consulted for startups, served as a founding engineer, and been part of multiple successful exits. In many circles, that’s considered startup gold.

But here’s the gap: while I’ve built fast, shipped quickly, and prototyped aggressively, I haven’t had deep exposure to scale.

Not real scale.

What does “scale” even mean?

Is it user volume?

Geographic distribution?

Latency under load?

Operational complexity?

Different companies optimize for different things. And I realized that while I understood how to build features, I didn’t deeply understand how to design systems that hold up under pressure.

The Turning Point

Five years ago, I interviewed at Google and was asked to “design the Google search bar.”

At the time, my mental model was:

“Isn’t it just a text input that calls GET /api/search?”

Needless to say, I didn’t get the job.

But five years later, I should be able to answer that question.

So I decided to fix that.

The Plan

I started with System Design Interview: An Insider’s Guide by Alex Xu as a structured entry point into systems thinking.

The book is high-level and intentionally generic. It discusses concepts like load balancers, caching, replication, and data partitioning -- but not how to actually implement them in a real environment.

So I’m doing both:

Studying the concepts
Building them myself to solidify the understanding

I’ll experiment with infrastructure, set up load balancers, configure services, and test failure scenarios -- not just talk about them.

I may use AI to help synthesize information, but I won’t rely on AI to implement the systems for me. The goal is understanding, not automation.

Why Write This Publicly?

This series is primarily for me.

But I suspect I’m not alone.

There are many engineers who:

Ship quickly
Build great product experiences
Have deep frontend or application-level expertise
But feel underprepared when conversations shift toward distributed systems and scalability

If that sounds familiar, this series might resonate with you.

What to Expect

Clear breakdowns of system design concepts
Practical implementations
Tradeoffs and failure modes
Reflections on what “scale” actually means in different contexts
Lessons learned from hands-on experiments

No hype. No pretending to be an expert.

Just deliberate, structured growth.

If you have questions, ideas, or critiques -- I’d love to hear them.

Let’s learn this properly.

If you're in, leave a comment and say that you're in!