Understand Rate Limiting – A Beginner-Friendly Overview

#programming #systemdesign #webdev #beginners

Whenever we visit a website or call API too many times in short span. We might get blocked, or request get rejected. That's due to Rate Limiter.
But why it is important?

Let's understand with real life scenario.

Think of Coffee shop.
Only 5 customers can be served in every minute because that fast the coffee-maker can make coffee.
Now, if 10 customers come at once, chaos happens. So to deal with that shop introduce a token system - only 5 customers are allowed in each minute. The rest will wait.

That the rate limiter - in real life.

What is Rate Limiter in System Design?

Based on example, you might understood that, rate limiting in system design is a technique used to control the number of requests of users or client, within a time period.

we will check benefits of using a API rate Limiter:

1. Prevent resource starvation caused by Denial of Service (DoS) attack.

Let understand with an example/scenario, imagine of login api. POST /login.
Now an attacker write a script that sends 10,000 login requests per second to your API.

What will happens without rate Limiter?

server CPU spikes.
Database connections max out (because each request hits the user table).
Genuine users can't log in.
The login service crashes due to resource starvation (too many open threads, memory usage, DB locks).

So a rate limiter prevents DoS attacks, either intentional or unintentional, by blocking the excess call.

2. Reduce Cost

Limiting excess request means fewer servers and allocating more resources to high priority APIs. Rate limiting is extremely important for companies that use paid third party APIs.
For example, company charged on per-call basis for that following external APIs. so Limiting the number of calls is essentials to reduce costs.

3. Prevent servers from being overload.

To reduce server load a rate limiter is used to filter out the excess requests caused by bots or user's misbehaviour.

Algorithms/Strategies for Rate Limiting

Rate limiting can be implemented in various algorithms/strategies, each of the strategies have it own pros and cons. We will overview most common algorithms/strategies.

Token Bucket: This strategies is widely used for rate limiting. The concept is simple, imagine a bucket that get filled with token rate over time - Let's say, one token one second. Each token represents a persimmon to make a request. So, whenever the client/user request to service, the system first checks if there is token available on bucket.
- if token available, it's consumed and the request is allowed.
- if bucket is empty, meaning the token is been already been used - then the request get denied or delayed.

Leaking Bucket: The Leaking Bucket is another popular strategies for rate limiting. Just like a real bucket with small hole at the bottom, it process request by "leaking" them in steady and fixed rate, regardless of how many requests come in at once.Incoming requests are added to bucket (queue), and if the bucket is full, excess requests are either dropped or delayed.

Fixed Window Counter: The Fixed Window Counter strategies is basic rate limiting methods where we define the time window (suppose, 1 minute) and maximum number of request (Let's say 100). it keep the counter of each time window,and every request increases the count. If the number of requests exceeds the limit within that time window, further requests are blocked until the window resets. Once the minute is over, the counter resets back to zero, and a new time window begins.