Using the tools of titans
In our previous post, we built an L7 load balancer using Caddy reverse proxy. In this post, we'll migrate that configuration over to nginx so we can compare tradeoffs. But first, what is nginx?
What is Nginx?
NGINX, or nginx because I don't like screaming (pronounced engine-x, though some folks will say en-jinx), is a high-performance HTTP server and reverse proxy commonly used for load balancing, TLS termination, and serving static content.
Note: for the purposes of this guide, when nginx is written, it should be assumed it's nginx OSS and not nginx's enterprise offering, nginx Plus. Ensure that when reading docs about nginx, you are reading docs about nginx OSS, typically hosted at nginx.org
Where Caddy optimizes for simplicity and automatic TLS, nginx exposes lower-level control over request routing, buffering, and upstream behavior. In our previous setup, Caddy handled reverse proxying and active health checks across two upstream nodes.
In this migration, we move to nginx to gain explicit control over upstream pools, failure detection, and connection timeouts.
Preconditions
I'm assuming you read the last post. If not, here are our baseline assumptions:
Domain
bustamam.techA record points to server-1 public IPserver-1 and server-2 are on the same Hetzner private network
server-2 exposes app on private IP/port:
10.0.0.3:3100 -> container:3000
To confirm, from an ssh session in server-1, run this:
curl -s http://10.0.0.3:3100/api/whoami
If that returns server-2 (or whatever your SERVER_ID is) then we can continue.
$ curl http://10.0.0.3:3100/api/whoami
{"message":"hello from server bustamam-tech-2","serverId":"bustamam-tech-2","pid":1,"time":"2026-02-24T22:34:37.462Z"}
Basic nginx scaffold
Right now, our app looks something like this:
Internet (HTTPS)
↓
Caddy
↓
App containers
Caddy:
Listens on 80/443 (http/s)
Owns the TLS cert
Decrypts HTTPS to HTTP
Forwards HTTP to upstream containers
Does L7 load balancing between them
We are going to introduce nginx as the new edge reverse proxy.
That means nginx will do the same exact thing, and we'll remove Caddy from the loop.
The new architecture becomes:
Internet (HTTPS)
↓
nginx
↓
App containers
We are not:
Changing the app
Changing Docker build
Changing the private network
Moving certs to backend servers
Doing TLS passthrough
We are just replacing Caddy with nginx as the TLS-terminating L7 proxy.
TLS termination at nginx does the following:
It keeps certificates in one place
It allows HTTP-aware load balancing
It lets nginx inspect requests if needed
It simplifies backend containers (they only speak HTTP)
This is a common production pattern.
In order for all of this to work, we need three things:
1. nginx needs config files
So it knows:
which domain it serves
where to proxy traffic
where the cert files are
You can check out the documentation on nginx web servers here.
2. nginx needs certificates
Let's Encrypt cert + private key must live in a mounted volume.
Documentation on nginx certs here.
3. nginx needs to expose ports 80 and 443
Because it becomes the public entrypoint.
Let's start with the config. On your load balancer server, run the following:
mkdir -p nginx/conf.d
This is where your nginx site configs will live. The location is arbitrary -- we will map it to a docker volume.
OK, let's spin nginx up. Update your docker-compose.yml file:
services:
# caddy, your app, etc
nginx:
image: nginx:1.27-alpine
container_name: nginx
ports:
- "8080:8080"
volumes:
- ./nginx/conf.d:/etc/nginx/conf.d:ro
depends_on: [bustamam-tech]
restart: unless-stopped
Then, create a file called 00-shadow.conf in your conf.d directory.
Note: conf files' names are not super important, but nginx does load them in alphabetical/numerical order, so it's a common practice to prepend with 00 for sorting purposes.
# Shadow nginx: runs on :8080 so we can test without touching Caddy (:80/:443)
upstream bustamam_upstreams {
# round robin + retry-on-failure behavior are nginx defaults
server bustamam-tech:3000;
server 10.0.0.3:3100;
# Note: we're currently relying on nginx's default passive health checks
}
server {
listen 8080;
server_name _;
location / {
proxy_pass http://bustamam_upstreams;
# Minimum headers to keep apps happy
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
What this does: nginx is now an L7 proxy and can load balance, but it's intentionally naive.
Let's spin up our nginx service.
docker compose up nginx -d
# after it's running
docker compose ps
# you should see your nginx service, as well as any other service that might be running
Now you can do the loop:
for i in {1..10}; do curl -s http://bustamam.tech:8080/api/whoami; echo; done
Note: this is http, not https, and note the port as well, it matches the port in the .conf file.
You should get alternating server IDs. If you don't, double check your config!
Testing Failover
Let's pull down server-2 for a second and try this again. docker compose down on server-2. Then try curl again.
Uh-oh! We're hanging where the round robin would have sent us to server-2! Let's fix that.
Note: nginx has some pretty long defaults, so while it may feel like forever, it might be something like 60 seconds. While it is said that patience is a virtue, a user won't use an app that takes 60 seconds to load or fetch data! Timeouts are your first production knob.
Handling Failover
Let's update our config so we don't wait forever when a destination server is down.
location / {
proxy_pass http://bustamam_upstreams;
proxy_connect_timeout 1s; # timeout for connecting to the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_connect_timeout
proxy_read_timeout 5s; # timeout for reading the response from the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
# Minimum headers to keep apps happy
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
Restart nginx on server-1
docker compose restart nginx
Note: for the remainder of this post, we will assume that a config change is followed by container restart.
And try it again!
Great! But wait, if server-2 is down, how long are we waiting before nginx sends the request to server-1? Let's instrument some observability. Update your location config so we have access to the upstream IP addresses:
location / {
proxy_pass http://bustamam_upstreams;
proxy_connect_timeout 1s; # timeout for connecting to the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_connect_timeout
proxy_read_timeout 5s; # timeout for reading the response from the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
# Minimum headers to keep apps happy
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
add_header X-Upstream $upstream_addr always;
}
And let's try a slightly different bash script.
for i in {1..10}; do
echo "---- $i ----"
curl -s -D headers.txt \
-w "\ncode=%{http_code} time=%{time_total}\n" \
http://bustamam.tech:8080/api/whoami
grep -i x-upstream headers.txt
echo
done
Aha! Notice that even though we're getting 200's and getting the right server to respond, look at the third one. We added a whole second to our latency, and you can see that a request attempted to go to server-2 in the X-Upstream header. Even when the request succeeds, failover can still cost you a timeout. Success isn't the same as fast.
Let's flesh this out a bit more. Let's update our upstream config. Defaults exist, but we want our system to be able to explain itself:
# Shadow nginx: runs on :8080 so we can test without touching Caddy (:80/:443)
upstream bustamam_upstreams {
# primary server with default settings
# note that because this service lives on this machine, if this server is down, the nginx container will also be down.
server bustamam-tech:3000;
# secondary server with custom settings
# max_fails=1
# If 1 request fails within the fail_timeout window,
# mark this upstream as "unavailable".
#
# fail_timeout=10s
# How long to consider that backend "down" before retrying it.
#
server 10.0.0.3:3100 max_fails=1 fail_timeout=10s;
}
server {
listen 8080;
server_name _;
location / {
proxy_pass http://bustamam_upstreams;
proxy_connect_timeout 1s; # timeout for connecting to the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_connect_timeout
proxy_read_timeout 5s; # timeout for reading the response from the upstream https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout
# Minimum headers to keep apps happy
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
add_header X-Upstream $upstream_addr always;
# Note: this is nginx's default https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream
proxy_next_upstream error timeout;
# how many retries to attempt before giving up https://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_next_upstream_tries
proxy_next_upstream_tries 2; # default is 0, which means unbounded retries!
}
}
Let's quickly talk about max_fails, fail_timeout, proxy_next_upstream and proxy_next_upstream_tries.
max_fails/fail_timeoutis upstream-level passive failure marking.proxy_next_upstream/triesis request-level retry routing.
Think of this as two layers: upstream marking (which servers are considered eligible) and per-request retries (what nginx does when a request fails mid-flight).
Note: unbounded does not mean infinite. To be more explicit, 0 means no explicit limit (i.e., not bounded by tries). In practice, retries are still bounded by timeouts and available upstreams, but it's not a safe default if you're trying to reason about worst-case latency.
Feel free to play around with numbers! For example:
server 10.0.0.3:3100 max_fails=1 fail_timeout=60s;
Now if this server fails once, then it won't be tried again for another 60 seconds. But it also means that if the server came back up, no one would be able to access it for 60 seconds. This is where metrics and understanding your system as a whole is important. For a toy project like this, even 10+ minutes would be fine.
The trade-off is fast recovery (low timeout) vs minimizing probe spikes (high timeout).
You can also tighten up the connect timeout:
proxy_connect_timeout 200ms;
But this only works if your private network is reliably fast. If it ever takes more than 200ms for the server to respond, you may mark an otherwise healthy server as dead due to jitter.
Note: If you're running into issues with your config, you should the difference between upstream definition (where servers live) and proxy behavior (how requests fail over). Mixing them up leads to configs that look reasonable but don't load, and the failure mode is "nothing works, and you're not sure why" unless you validate with
nginx -t
So, let's summarize what nginx is doing so far.
nginx tries server-2 (round robin)
it fails, and nginx marks it 'down" for ~10s
for the next ~10 seconds, nginx only uses server-1 (fast)
once the 10s window expires, nginx will probe server-2 again by selecting it for a real request
that request pays the 1s connect timeout (your ~1.5s)
nginx retries server-1 and succeeds
server-2 gets marked down again for another 10 seconds
if server-2 ever comes back up, then any probes will mark server-2 back online
So it seems like we're at parity with Caddy, right? Well, unfortunately, no. We still need TLS termination. Let's handle that next.
TLS Termination
Right now:
Caddy terminates TLS on
:443and proxies to your backends.nginx is shadow-testing on
:8080(plain HTTP).
TLS termination means:
The client's HTTPS connection ends at nginx. nginx decrypts the request, then forwards it to your upstreams over plain HTTP (usually over a private network/VPC).
So:
Browser ⇄ HTTPS ⇄ nginx (edge)
nginx ⇄ HTTP ⇄ upstreams (private)
That's what we mean when we say "terminate TLS at the load balancer."
Our plan is to replace Caddy. We want the following:
nginx serves HTTP on :80 and handles the Let's Encrypt ACME challenge
certbot obtains certs via webroot
nginx serves HTTPS on :443 using those certs and proxies to upstreams
shut down Caddy (to free 80/443), bring up nginx+certbot
Redirect http to https
Alright, we'll need some new directories for configs and certs.
mkdir -p nginx/www nginx/letsencrypt
Your directory structure should look something like this:
I have a few extra files from messing around with configs. And again, the directory names are arbitrary. We'll get them mapped in docker. Important to understand that certbot doesn't "talk to nginx." They just share a filesystem. Certbot writes files. nginx serves them. That's it.
nginx/wwwis where the ACME challenge files are written. When Let's Encrypt validates your domain, it requestshttp://bustamam.tech/.well-known/acme-challenge/<token>. Certbot writes that token file into yourwww/directory, and nginx will serve that directory.nginx/letsencryptis where certs live (shared with nginx). When certbot succeeds, it writes cert files into:/etc/letsencrypt/live/bustamam.tech/. So whatever local directory maps to/etc/letsencryptmust also be shared betweencertbot(read/write) and nginx (read-only).
Note: for more information on ACME and other Let's Encrypt challenges, check out their documentation on challenge types
Let's delete everything in conf.d and start with a fresh config: bustamam.tech.conf (or whatever you wanna name it)
# ================================
# Upstreams
# ================================
upstream bustamam_upstreams {
# Primary (local container)
server bustamam-tech:3000;
# Secondary (remote server over private network)
server 10.0.0.3:3100 max_fails=1 fail_timeout=10s;
}
# ================================
# HTTP (port 80)
# - Serve ACME challenge
# - Redirect everything else to HTTPS
# ================================
server {
listen 80;
server_name bustamam.tech;
# Let's Encrypt HTTP-01 challenge files live here
location /.well-known/acme-challenge/ {
root /var/www/certbot;
}
# Everything else goes to HTTPS
location / {
return 301 https://$host$request_uri;
}
}
Footgun: We are purposely deferring https for later in this article. If you enable the
listen 443 sslserver block before certs exist, nginx may fail to start, and you'll see port 80 "hang" because nothing is listening. The bootstrap sequence is: HTTP first → obtain cert → enable HTTPS.
OK, now we need to update our docker-compose.yml file:
nginx:
image: nginx:1.27-alpine
container_name: nginx
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/conf.d:/etc/nginx/conf.d:ro
- ./nginx/www:/var/www/certbot:ro
- ./nginx/letsencrypt:/etc/letsencrypt:ro
depends_on:
- bustamam-tech
restart: unless-stopped
certbot:
image: certbot/certbot:latest
container_name: certbot
volumes:
- ./nginx/www:/var/www/certbot:rw
- ./nginx/letsencrypt:/etc/letsencrypt:rw
restart: "no"
Important to note:
nginx mounts certs directory read-only
certbot mounts cert directory read-write
Now let's bring our creation to life.
Bring nginx up on port 80, test http
Caddy is currently occupying ports 80 and 443. So if you have Caddy running, bring it down with docker compose down caddy. Then, bring up nginx. If it's already running, run docker compose restart nginx. Otherwise, docker compose up nginx -d.
Then test http connection:
curl -I http://bustamam.tech
You should see a 301 redirect to https, which is exactly what we want.
Note: if this hangs, you may need to debug if the services are running on the ports. Try running this on the host machine:
sudo ss -lntp | grep -E ':80|:443'and starting there.
But we don't have https set up. Let's go do that.
Set up https
Let's update our conf file:
# ================================
# Upstreams
# ================================
upstream bustamam_upstreams {
# Primary (local container)
server bustamam-tech:3000;
# Secondary (remote server over private network)
server 10.0.0.3:3100 max_fails=1 fail_timeout=10s;
}
# ================================
# HTTP (port 80)
# - Serve ACME challenge
# - Redirect everything else to HTTPS
# ================================
server {
listen 80;
server_name bustamam.tech;
# Let's Encrypt HTTP-01 challenge files live here
location /.well-known/acme-challenge/ {
root /var/www/certbot;
}
# Everything else goes to HTTPS
location / {
return 301 https://$host$request_uri;
}
}
# ================================
# HTTPS (port 443)
# - Terminate TLS here
# - Reverse proxy to upstreams over HTTP
# ================================
server {
listen 443 ssl;
server_name bustamam.tech;
# TLS certs (provided by certbot via shared volume)
ssl_certificate /etc/letsencrypt/live/bustamam.tech/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/bustamam.tech/privkey.pem;
# A minimal modern TLS posture
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://bustamam_upstreams;
# Fail fast
proxy_connect_timeout 1s;
proxy_read_timeout 5s;
proxy_send_timeout 5s;
# Deterministic retry behavior (make defaults explicit)
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 2;
# Forwarding headers
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Debug: show which upstream served (or was attempted)
add_header X-Upstream $upstream_addr always;
}
}
The http part (port 80) is the same. https is just a barebones skeleton with some sensible defaults. The ssl_certificates don't exist yet though, so let's make those.
Obtain the certificates
Let's start with a test cert. In your host machine, run this command:
docker compose run --rm certbot certonly \
--webroot -w /var/www/certbot \
-d bustamam.tech \
--test-cert \
--agree-tos \
-m rasheed.bustamam@gmail.com \
--no-eff-email
It'll probably pull from docker, and when it succeeds, you should see a bunch of stuff appear under your letsencrypt directory:
If yes, then rerun the command without the test-cert flag.
docker compose run --rm certbot certonly \
--webroot -w /var/www/certbot \
-d bustamam.tech \
--agree-tos \
-m rasheed.bustamam@gmail.com \
--no-eff-email
It's possible this will ask you to reuse your current cert, or create a new one. Choose to create a new one; you can't use a test cert in production environments.
Now let's restart nginx so it can read our new certs!
Activate https in nginx
Just run
docker compose restart nginx
And test:
curl -I https://bustamam.tech
Let's test our whoami route too:
curl -s https://bustamam.tech/api/whoami
Now we have https working and our load balancer is still working!
Now, I have to note -- since we are managing our own certs, we also have to renew it:
docker compose run --rm certbot renew --webroot -w /var/www/certbot
docker exec nginx nginx -s reload
You can run this on a cronjob if you'd like, but it's not in the scope of this article.
Comparison to Caddy
Now that we finally got parity with Caddy, let's compare!
As a reminder, this was our Caddyfile:
bustamam.tech {
reverse_proxy bustamam-tech:3000 10.0.0.3:3100 {
lb_policy round_robin
# total retry window across upstreams
lb_try_duration 3s
# how often to retry upstreams within that window
lb_try_interval 250ms
# Active health checking
health_uri /api/healthz
health_interval 5s
health_timeout 2s
# How long to consider a backend "down" after failures (circuit breaker window)
# duration to keep an upstream marked as unhealthy
fail_duration 10s
# threshold of failures before marking an upstream down
max_fails 1
# Fail fast when an upstream is unresponsive
transport http {
# TCP connect timeout to the upstream
dial_timeout 1s
# slow backend detection (time waiting for first byte)
response_header_timeout 2s
}
}
}
We got active health checking and automatic TLS issuance and renewal. And then this was nginx:
# ================================
# Upstreams
# ================================
upstream bustamam_upstreams {
# Primary (local container)
server bustamam-tech:3000;
# Secondary (remote server over private network)
server 10.0.0.3:3100 max_fails=1 fail_timeout=10s;
}
# ================================
# HTTP (port 80)
# - Serve ACME challenge
# - Redirect everything else to HTTPS
# ================================
server {
listen 80;
server_name bustamam.tech;
# Let's Encrypt HTTP-01 challenge files live here
location /.well-known/acme-challenge/ {
root /var/www/certbot;
}
# Everything else goes to HTTPS
location / {
return 301 https://$host$request_uri;
}
}
# ================================
# HTTPS (port 443)
# - Terminate TLS here
# - Reverse proxy to upstreams over HTTP
# ================================
server {
listen 443 ssl;
server_name bustamam.tech;
# TLS certs (provided by certbot via shared volume)
ssl_certificate /etc/letsencrypt/live/bustamam.tech/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/bustamam.tech/privkey.pem;
# A minimal modern TLS posture
ssl_protocols TLSv1.2 TLSv1.3;
location / {
proxy_pass http://bustamam_upstreams;
# Fail fast
proxy_connect_timeout 1s;
proxy_read_timeout 5s;
proxy_send_timeout 5s;
# Deterministic retry behavior (make defaults explicit)
proxy_next_upstream error timeout http_502 http_503 http_504;
proxy_next_upstream_tries 2;
# Forwarding headers
proxy_set_header Host $host;
proxy_set_header X-Forwarded-Proto https;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# Debug: show which upstream served (or was attempted)
add_header X-Upstream $upstream_addr always;
}
}
We get passive health checking, and then we needed Certbot to manage our certs for us.
So you may be asking, "Why is nginx better than Caddy??" and the answer is that it isn't, not necessarily. Caddy is the better default for small systems. nginx is better when you need explicit control, standardized ops, or you're operating inside a bigger ecosystem.
Doing nginx here isn't "because it's better," it's because it teaches you how the edge actually works when the platform stops holding your hand.
Caddy can keep a backend out of rotation before a user hits it. nginx usually learns a backend is dead because a user hit it (or because passive marking was configured).
Caddy is the better default for small systems. nginx is better when you need explicit control, standardized ops, or you're operating inside a bigger ecosystem.
It's important to call out that we don't want to be comparing "lines of config" when evaluating tools. It's a matter of what you own vs what you delegate.
Caddy: batteries included, opinionated defaults
We got, almost for free:
automatic TLS issuance/renewal
active health checks
nice LB ergonomics (
health_uri,fail_duration, etc.)fewer footguns
10-ish lines of config
So for a $20 VPS and learning, Caddy is amazing.
nginx OSS: modular and explicit
We had to build the edge out of primitives:
TLS is not automatic (had to use certbot)
health checks are passive unless you add extra machinery
reload behavior and config validation are on you
you need to understand contexts (
upstreamvslocation) or you break itabout 60-ish lines of config
That pain is the point: nginx forces us to learn the contract between:
TCP port binding
TLS termination
request routing
retries/timeouts
failure detection
certificate lifecycle
This is the systems knowledge that we're trying to learn in the first place.
When to Choose nginx over Caddy
Ideally, your team is already using one and you just need to learn it :)
But for greenfield projects, or for understanding when to migrate from Caddy to nginx:
1) When you need a boring industry standard
nginx is everywhere. If you join a team with existing nginx infra, knowing it is immediate leverage.
2) When you need predictable, explicit behavior at the edge
In nginx you can be extremely specific about:
what counts as retryable
how many tries
timeouts per phase (connect/send/read)
failure semantics per upstream
Caddy has knobs too, but nginx's model maps closely to how a lot of production stacks think.
3) When the ecosystem around it matters
nginx has deep integration patterns with:
legacy deployments
enterprise tooling
common security hardening playbooks
common debugging muscle memory (every SRE has done
nginx -T,nginx -t, reloads, etc.)
4) When performance tuning at massive scale is the job
At large companies, nobody is choosing nginx because "it's faster" per request in isolation. They're choosing because:
they know how to operate it safely
they know how it fails
it has predictable resource profiles and instrumentation patterns
The interesting part isn't that we got it working. It's that we can now explain worst-case latency: connect timeout + number of tries + fail_timeout window. That's the difference between 'it seems fine' and 'I can predict how it fails.'
Conclusion
For my $20 VPS and my hobby projects, Caddy is obviously the better tool. It's simpler, safer, and gives me active health checks and automatic TLS with almost no ceremony.
I rebuilt it in nginx anyway because nginx makes the hidden parts visible: TLS bootstrapping, reload semantics, passive vs active failure detection, and how retries interact with timeouts. Those are the concepts that scale, and that's the whole point of this series.
In the next post, we'll actually go in the opposite direction -- we'll use a managed service to do all of this for us. See you there!







Top comments (0)