How I stopped my API from going down on a €5 VPS

#api #backend #devops #monitoring

My server went down three times in one day. The first time I didn't even notice.

That last part is the real problem. No alerts, no health checks, no monitoring. I only found out because I happened to check the dashboard. By then it had already happened twice more.

First lesson: build a health check that lives somewhere else

If your health check runs on the same server that crashes, it crashes with it. Obvious in hindsight. I set up an external uptime monitor that pings a /health endpoint every minute and sends a notification when it does not respond. Free tier on most uptime tools covers this. Do it before anything else.

What was actually causing the crashes

After looking at the logs, the cause was simple. A single customer was sending 3,000 requests per second. No malicious intent, they were just running a batch job and did not know there were limits. The API docs said "unlimited" and they took that literally.

I had tried to add rate limiting earlier using the built-in API key management from my auth library. It never worked reliably. The bigger issue was that subscription state got cached, so when a user upgraded their plan the old limits stayed active for a while. It created subtle bugs that were hard to track down, so I eventually ripped it out.

This time I wanted to build something I actually understood.

The solution: two-layer rate limiting with Redis

After some research I landed on a two-layer approach: a daily quota and a per-minute burst limit, both backed by Redis atomic counters.

The logic runs like this. Every request first checks the daily quota. If that passes, it checks the burst limit. If either fails, the request is blocked and returns a 429.

Tier definitions live in one file:

export const RATE_LIMITS = {
  anonymous: { limit: 30,  windowSeconds: 86400 },
  free:      { limit: 100, windowSeconds: 86400 },
  trial:     { limit: 500, windowSeconds: 86400 },
  pro:       { limit: 5000, windowSeconds: 86400 },
};

export const BURST_LIMITS = {
  anonymous: { limit: 5,   windowSeconds: 60 },
  free:      { limit: 10,  windowSeconds: 60 },
  trial:     { limit: 30,  windowSeconds: 60 },
  pro:       { limit: 120, windowSeconds: 60 },
};

The daily check uses a Redis key scoped to the current calendar date:

const dailyKey = `ratelimit:daily:${identityKey}:${dateKey}`;
const dailyCurrent = await redis.incr(dailyKey);
if (dailyCurrent === 1) {
  await redis.expire(dailyKey, 48 * 60 * 60);
}
if (dailyCurrent > limit) {
  return { allowed: false, remaining: 0 };
}

Only requests that pass the daily check hit the burst counter. The burst key is scoped to a fixed 60-second window:

const burstKey = `ratelimit:burst:${identityKey}:${windowStart}`;
const current = await redis.incr(burstKey);
if (current === 1) {
  await redis.expire(burstKey, windowSeconds + 10);
}

One detail worth keeping: if the burst check fails, the daily counter gets decremented back. A blocked request should not eat into the user's daily budget.

if (!burst.allowed) {
  await redis.decr(dailyKey);
  return { allowed: false, burstLimited: true };
}

Every response also sends back standard rate limit headers so clients can self-throttle instead of just hitting 429 and retrying blind:

"X-RateLimit-Limit": String(result.limit),
"X-RateLimit-Remaining": String(result.remaining),
"X-RateLimit-Reset": String(result.resetAt),

Why this design

Fixed windows are simpler than sliding windows and good enough for this use case. Atomic INCR means no race conditions and no Lua scripts. Daily quota runs before burst so a request that is already over the daily limit does not waste a burst slot.

The tier is resolved from the user's subscription status, which is cached in Redis with a short TTL. This avoids a database call on every request and also fixes the plan-upgrade bug I had before, because the cache expires fast enough to reflect changes quickly.

Testing was harder than building

Getting the implementation right took one failed attempt. The first version had an off-by-one in the window calculation that only showed up under concurrent load. I wrote a small test script that fires requests in parallel across different tiers and checks that the right ones get blocked. Only after that ran cleanly did I trust the implementation in production.

If you are building something similar: write the test script before you deploy, not after.

Adding Cloudflare in front of everything

Rate limiting at the application layer is good but it still means traffic hits your server first. On a €5 VPS with limited bandwidth, even blocked requests have a cost if they arrive at thousands per second.

I added Cloudflare as a proxy in front of the server and set a request rate rule that drops anything above 50 requests per second per IP before it reaches the VPS. Cloudflare absorbs the traffic spike. The application rate limiting handles the finer-grained per-tier logic.

Since then: zero downtime.

What I would do differently from day one

External health check with notifications before anything else
Redis on the same VPS, rate limiting from the start
Cloudflare proxy from the first public launch
Write test scripts for the rate limiter, not just unit tests

The crashes were not a scaling problem. The VPS can handle the load fine with proper caching and controlled concurrency. The problem was that nothing was protecting it.