Biased: Fixed Window rate limiting algorithm explained

#ratelimiting #webdev #security #tutorial

Fixed Window Rate Limiting Algorithm

Fixed Window rate limiting algorithm enforces a limit on the number of events allowed within a time window. "Maximum 5 password attempts in 10 minutes" is a classic example.

The common misconception about Fixed Window is that people describe it as "allows N requests per fixed calendar window" and then warn about the "burst at boundary" problem.

Flexible Fixed Window Algorithm

In the rate-limiter-flexible Node.js package, a Flexible Fixed Window Algorithm is implemented. I call it flexible, because somehow the developer community has managed to portray the fixed window algorithm as if it were anchored to a specific calendar time, as if time window starts at 12:00 PM and ends at 12:10 PM for all clients. Not necessarily. For every unique client, IP address, or fingerprint, the window start time varies. Time Window begins when the first client request arrives, and the counter expires after a specified duration, e.g. in 10 minutes. The window start is always variable.

The "Burst at Boundary" Myth

Here is a false belief that gets copied over and over, from one article to another: it leads to bursts at the boundary of windows. Let's take a closer look.

If every unique client has its own window start time, then boundaries don't matter. On the contrary, in many services you want traffic spikes allowed on boundaries. Here's the deeper explanation:

When you use rate-limited services, like AI agents, do you care if you spend your daily allowed tokens at the end of the day and then continue working after midnight, consuming tokens from the next day? Of course you do. It's natural to expect that once a limiter resets, tokens become available. Nobody calls this a boundary problem. On the contrary, you'd be frustrated if that spike on the window boundary weren't allowed.
Creating a boundary spike is less probable statistically than it seems. To pull it off, a client would have to try 1 password, wait 9 minutes and 59 seconds, try another 4, and then immediately try 5 more. The probability of that event is quite low.
Statistically, different clients send requests at different times. They are not in perfect sync. Even if one client manages to create a spike on a window boundary, it isn't an issue for your application: other clients follow different patterns, causing overall requests to scatter across time.

What About Token Bucket?

Sure, a malicious user could control 100 accounts and coordinate attacks on boundaries. But does Token Bucket protect against that? No, it doesn't. The same attacker can simply wait for the bucket to fully refill and then unleash a burst. Speaking of bursts, before Token Bucket was introduced in the 1980s, Leaky Bucket was the primary pattern for limiting traffic. Its problem was that it didn't allow traffic bursts at all. Token Bucket does. And that's never mentioned as an issue. It's a feature.

When to Be Careful

There is one case you should be careful about when using the Flexible Fixed Window algorithm. If you're limiting requests because of infrastructure constraints and traffic spikes could degrade performance, then create two limiters: one for unique clients and one for total traffic per second. This approach keeps your application running under pressure, with some users' experience degraded rather than everyone's. Yes, it is inevitable. You either make the user experience worse by disallowing traffic spikes entirely, or you mitigate the consequences of allowing them. No rate limiting algorithm can win the fight between your infrastructure limitations — limited budget, in fact — and an overwhelming volume of malicious requests.

To manage spikes even more effectively, take a look at BurstyRateLimiter. It allows spontaneous traffic bursts with finer control.

Conclusion

After years of applying different rate limiting algorithms, I've found that any algorithm can be adapted for specific needs. What I value about the Flexible Fixed Window Algorithm is that it provides clear control over application behavior and the ability to build custom solutions across multiple dimensions of traffic using two or more combined limiters. And it is always predictable in terms of performance.

Never forget to question the basics. Take control over the information you consume.

Happy coding!