Rate Limiting — What Happens When Too Many Requests Hit Your Server

#backend #performance #security #systemdesign

What if the number of requests coming to your server is more than what your server can actually handle?

Think about this for a second. Your server has a limit — how many requests it can process at a time. If that limit gets crossed, what happens? Your server starts struggling. Response times go up. And in the worst case, your server goes down. Crashes completely.

I started thinking about this — how do production systems actually prevent this from happening? Because this is not a rare situation. One bad actor, or even a sudden spike in traffic, and your server is in trouble.

That's where rate limiting comes in.

What Rate Limiting Actually Does

The idea is simple. You set a limit, how many requests are allowed in a certain time window. If the number of requests goes beyond that limit, the rate limiter steps in and stops the extra requests. It protects your server before things get out of control.

While reading about this, I found there isn't just one way to do this. There are multiple algorithms, fixed window counter, leaky bucket, token bucket, sliding window, and sliding window counter. Each one handles the counting and limiting logic a bit differently. Some are simple but have edge cases at window boundaries. Some are smoother but a bit more complex to implement. The core idea behind all of them is the same count requests, compare with the limit, decide whether to allow or block.

The Good Part You Don't Have to Build This From Scratch

Here's something that made this easier than I expected. Node.js already has a ready-made solution for this express-rate-limit. You don't need to implement any of these algorithms yourself. Just install the library and configure it.

This is the kind of infrastructure you build specifically for public APIs, authentication endpoints, and expensive endpoints the ones that are most likely to get hit hard, either by real traffic spikes or by someone trying to abuse your system.

Where Redis Comes In

Now here's a question I had what if your application is not running on just one server? What if it's running across multiple pods, like in a Kubernetes setup?

If each pod keeps its own request counter, the rate limiting becomes useless. One user could hit pod 1, get counted there, then hit pod 2, and the counter resets effectively bypassing the limit.

This is where Redis comes in. Instead of each pod keeping its own count, you set up a central counter using Redis. Every pod checks and updates the same counter. That's what rate-limit-redis is for connecting express-rate-limit to a shared Redis store so the count stays accurate no matter how many pods your app is running on.

The Configuration What Each Option Actually Does

When I looked into express-rate-limit, there were a bunch of config options. At first they looked like just settings, but each one actually controls a specific part of the behavior.

windowMs — this defines the duration of the rate-limiting window. Basically, the time frame in which requests are counted.

max — the maximum number of requests allowed during that window. Cross this, and the limiter kicks in.

standardHeaders — adds the modern rate-limit headers to the response, so clients know their current limit status.

legacyHeaders — adds the old-style rate limit headers, for backward compatibility.

store — this determines where the request counter is actually stored. By default it's in memory. But for distributed systems, this is where you'd plug in Redis instead.

keyGenerator — this defines how a user is identified. Usually by IP, but it can be customized — by user ID, API key, whatever makes sense for your system.

handler — a custom response for when the limit is exceeded. Instead of a generic error, you can send exactly the response you want.

Why This Matters

What I realized going through this rate limiting is not just about blocking abuse. It's about protecting your server's stability. Without it, one spike in traffic, intentional or not, can take your entire system down.

And the fact that this comes as a configurable, ready-made solution it's not something you need to over-engineer. You just need to understand what each piece does, and configure it according to how your system is set up.

That's the thing I keep noticing as I learn more about production systems
most of the hard problems already have solutions. The real skill is knowing the problem exists in the first place, and knowing where to look.

If I got something wrong or anything can be improved please drop it in the comments. I'm still learning and I want to get this right.