Mishan Raj Shah

Posted on Jun 27

Why I still use NGINX over everything else

#devops #discuss #infrastructure #performance

People keep recommending Caddy. The config is cleaner, automatic HTTPS, three lines and you're done. They're not wrong. For a side project or a small team that doesn't want to think about infrastructure, Caddy is fine.

I still use NGINX. Here's why.

The setup that made it clear

Earlier this year I deployed a production system on a single VPS. Hostinger KVM2, 2 vCPU, 8 GB RAM. The kind of system that sits quiet for months and then gets hit hard in a short window, hundreds of concurrent users at the same moment.

The requirement was 1,000 requests per minute. I load tested to 2,500 requests per second. Zero errors. Zero timeouts. p99 on the result lookup was 308ms.

No autoscaling. No managed load balancer. No cloud bill. One NGINX config that I wrote by hand.

That's the argument.

What NGINX actually does in production

Here's the architecture that handled it:

Browser
    | HTTPS
    v
Cloudflare  (DDoS protection, CDN, TLS to browser)
    | HTTPS
    v
NGINX (host)  (TLS termination, rate limiting, cache)
    |-- /api/*  →  Express + Prisma container (127.0.0.1:5001)
    └-- /*      →  Next.js container (127.0.0.1:3001)
                        |
                 PostgreSQL (internal Docker network only)

Containers bind to localhost only. Nothing is reachable from the internet except through NGINX. That one decision eliminates an entire class of exposure.

The five things NGINX handles that people underestimate

1. Rate limiting before the app sees the request

Rate limiting in Express middleware still consumes a Node.js worker thread per rejected request. NGINX rejects rate-limited requests before they reach Node.js. No event loop, no DB query, no memory allocation. A 429 from NGINX costs almost nothing.

limit_req_zone $real_ip zone=api:10m rate=10r/s;

location /api/ {
    limit_req zone=api burst=20 nodelay;
    limit_req_status 429;
    proxy_pass http://127.0.0.1:5001;
}

2. Cache stampede protection

When the traffic spike hits, hundreds of users land on the same URL at the exact same moment. Without a cache, every request goes Next.js → Node.js → PostgreSQL. With proxy_cache_lock on, when the cache expires only one upstream request goes through. Every other concurrent request waits for that single response, then gets served from cache. Without the lock, every request that arrives during the cache miss window hits upstream simultaneously. That's the thundering herd problem, and it's what kills servers on high-traffic moments.

proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=pages:10m
                 max_size=100m inactive=1m use_temp_path=off;

location / {
    proxy_cache pages;
    proxy_cache_valid 200 10s;
    proxy_cache_lock on;
    proxy_cache_lock_timeout 5s;
    proxy_pass http://127.0.0.1:3001;
}

10 seconds is short enough to feel live, long enough to absorb any realistic spike.

3. Cloudflare real-IP restoration

Cloudflare proxies all traffic, which means every request arrives at NGINX from a Cloudflare edge IP. Without fixing this, your rate limiting targets Cloudflare's shared IPs and blocks every user at once instead of individual abusers.

real_ip_header CF-Connecting-IP;

set_real_ip_from 173.245.48.0/20;
set_real_ip_from 103.21.244.0/22;
set_real_ip_from 103.22.200.0/22;
set_real_ip_from 103.31.4.0/22;
set_real_ip_from 141.101.64.0/18;
set_real_ip_from 108.162.192.0/18;
set_real_ip_from 190.93.240.0/20;
set_real_ip_from 188.114.96.0/20;
set_real_ip_from 197.234.240.0/22;
set_real_ip_from 198.41.128.0/17;
set_real_ip_from 162.158.0.0/15;
set_real_ip_from 104.16.0.0/13;
set_real_ip_from 104.24.0.0/14;
set_real_ip_from 172.64.0.0/13;
set_real_ip_from 131.0.72.0/22;

map $http_cf_connecting_ip $real_ip {
    default $http_cf_connecting_ip;
    ""      $remote_addr;
}

4. One domain, multiple apps, zero complexity

The entire system runs under one domain. Frontend on port 3001, backend on port 5001, both invisible to the internet, both served through the same NGINX vhost.

server {
    server_name yourdomain.com;

    location /api/ {
        proxy_pass http://127.0.0.1:5001;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $real_ip;
    }

    location / {
        proxy_pass http://127.0.0.1:3001;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $real_ip;
    }
}

No separate subdomains. No DNS juggling. One cert. One vhost.

5. Static file serving and compression with no extra layer

Gzip compression built in. Static assets served directly without hitting the app.

gzip on;
gzip_types text/plain text/css application/json
           application/javascript text/xml application/xml
           image/svg+xml;
gzip_min_length 1000;

location /_next/static/ {
    proxy_pass http://127.0.0.1:3001;
    proxy_cache_valid 200 1y;
    add_header Cache-Control "public, max-age=31536000, immutable";
}

Next.js static assets are immutable by design (content-hashed filenames). Tell the browser and Cloudflare to cache them for a year. Zero repeat fetches on return visits.

Why not Caddy or Traefik

Caddy's killer feature is automatic HTTPS with zero config. That's genuinely useful if cert management is your problem. But the same server was already running WordPress, MariaDB, and a PM2 process with existing certs. Caddy would mean either migrating all of that or running two proxies. NGINX was already there.

Traefik is the right choice if your services come and go dynamically, containers spinning up and down, labels driving routing automatically. When you own a fixed VPS with a known set of services, that auto-discovery is solving a problem you don't have. You end up with more moving parts for no benefit.

NGINX wins on control. When something goes wrong at 2 AM, I want to read a config file, not debug a label hierarchy.

What happens when none of this is in place

Failure under load is not isolated. It cascades.

The primary endpoint slows down. Users retry. The request queue builds. Database connections spike. Other parts of the app start timing out. Users see errors on pages that have nothing to do with the original problem. One block hits another, then another.

Every one of those config blocks above cuts a link in that chain. Rate limiting stops the retry spiral before it reaches the app. Cache lock stops the thundering herd before it reaches the container. Real-IP restoration stops rate limiting from blocking the wrong people entirely. These are not optimizations. They are the difference between a system that survives a spike and one that takes everything else down with it.

The actual numbers

2,500 requests per second. 147 times the required throughput. p99 on the primary endpoint at 308ms. Zero errors in 30 seconds of sustained load.

One VPS. One NGINX config. No managed services.

A VPS is not a limitation. It is an exam. Pass it and you understand distributed systems better than most people who have only ever scaled horizontally.

Have you hit a case where NGINX was not enough on a single server? Curious where the ceiling actually is before you have to scale out.

DEV Community