Rate limiting is a mechanism that many developers may have to deal with at some point in their life. It’s useful for a variety of purposes like sharing access to limited resources or limit the number of requests made to an API endpoint and respond with a 429 status code.
Here we’ll explore some rate limiting algorithms using Python and Redis, starting from a naive approach and culminate at an advanced one called Generic Cell Rate Algorithm (GCRA).
A first approach for rate limiting a number of requests within a
period is to use a time-bucketed algorithm, where a counter (initially set to
limit) and expiry time (
period) are stored for each rate-key (something unique like username or IP address) and decremented on subsequent requests.
Using Python and Redis we can implement time-bucket logic in this way:
- check if the rate-key
keydoesn't exist, initialize it to the
limitvalue (Redis SETNX) and an expiration time
- decrement this value on each subsequent requests (Redis DECRBY)
- the request is limited only when
keyvalue drops below zero
- after the given
periodthe key will automatically be deleted
from datetime import timedelta from redis import Redis def request_is_limited(r: Redis, key: str, limit: int, period: timedelta): if r.setnx(key, limit): r.expire(key, int(period.total_seconds())) bucket_val = r.get(key) if bucket_val and int(bucket_val) > 0: r.decrby(key, 1) return False return True
You can see this code in action simulating requests with a limit of
20 requests / 30 seconds (to keep things clear I wrapped the function in a module as you can see in my repository).
import redis from datetime import timedelta from ratelimit import time_bucketed r = redis.Redis(host='localhost', port=6379, db=0) requests = 25 for i in range(requests): if time_bucketed.request_is_limited(r, 'admin', 20, timedelta(seconds=30)): print ('🛑 Request is limited') else: print ('✅ Request is allowed')
Only the first 20 requests won't be limited, you have to wait 30 seconds before being allowed to make a new request again.
The downside to this approach is it actually enforces a limit to the number of requests a user can do within a period rather than a rate. This means that the entire quota can be exhausted immediately, resetting only after the period has expired.
There's an algorithm that can take care of rate limiting and produces a very smooth rate limiting effect: the leaky-bucket approach. The algorithm works similarly to an actual leaky bucket: you can pour water (requests) in the bucket up to a maximum capacity while some amount of water is escaping at a constant rate (usually implemented using a background process). If the amount of water entering the bucket is greater than the amount leaving through the leak, the bucket starts to fill and when the bucket is full new requests are limited.
We can skip over this for a more elegant approach that doesn't require a separate process to simulate a leak: the
Generic Cell Rate Algorithm.
Generic Cell Rate Algorithm (GCRA) comes from a communications technology called
Asynchronous Transfer Mode (ATM) and was used in ATM network’s schedulers to delay or drop cells, small fixed size packets of data, that came in over their rate limit.
GCRA works by tracking remaining limit through a time called the
Theoretical Arrival Time (TAT) on each request
tat = current_time + period
and limit the next request if the arrival time is less than the stored TAT. This works fine if
rate = 1 request / period where each request is separated by
period, anyway in the real world we usually have
rate = limit / period. For example with
rate = 10 requests / 60 seconds a user would be allowed to make 10 requests in the first 6 seconds, with
rate = 1 request / 6 seconds the user would have to wait 6 seconds between requests.
To be able to bunch requests in a short period of time and support
limit requests within
limit > 1, each request should be separated by
period / limit and the next Theoretical Arrival Time (
new_tat) calculated in a different way than before. Indicating with
t the time of the request arrival
new_tat = tat + period / limitif requests bunch (
t <= tat)
new_tat = t + period / limitif requests don’t bunch (
t > tat)
new_tat = max(tat, t) + period / limit.
The request is limited if
new_tat exceeds the current time plus the period,
new_tat > t + period. With
new_tat = tat + period / limit we have
tat + period / limit > t + period
Hence, we should limit requests only when
tat — t > period — period / limit
period — period / limit <-----------------------> --|----------------------------|---> t TAT
In this case TAT should not be updated, because limited requests don’t have a theoretical arrival time.
Now it's time to unveil the final code!
from datetime import timedelta from redis import Redis def request_is_limited(r: Redis, key: str, limit: int, period: timedelta): period_in_seconds = int(period.total_seconds()) t = r.time() separation = round(period_in_seconds / limit) r.setnx(key, 0) tat = max(int(r.get(key)), t) if tat - t <= period_in_seconds - separation: new_tat = max(tat, t) + separation r.set(key, new_tat) return False return True
In this implementation we use Redis TIME because GCRA is dependent on time and it’s crucial to make sure that the current time is consistent between multiple deployments (clock drift between machines could lead to false positives).
In the following script you can see GCRA in action with
rate = 10 requests / 60 seconds
import redis from datetime import timedelta from ratelimit import gcra r = redis.Redis(host='localhost', port=6379, db=0) requests = 10 for i in range(requests): if gcra.request_is_limited(r, 'admin', 10, timedelta(minutes=1)): print ('🛑 Request is limited') else: print ('✅ Request is allowed')
The first 10 requests are allowed by GCRA, but you need to wait at least 6 seconds to make another request. Try to run the script after some time to see how it works and try to change
To keep GCRA implementation clear, I avoided to add a lock between
set calls, but it's needed in case of multiple requests with the same key at the same time. Using
Lua-based lock we can simply add a lock as a context manager.
def request_is_limited(r: Redis, key: str, limit: int, period: timedelta): period_in_seconds = int(period.total_seconds()) t = r.time() separation = round(period_in_seconds / limit) r.setnx(key, 0) try: with r.lock('lock:' + key, blocking_timeout=5) as lock: tat = max(int(r.get(key)), t) if tat - t <= period_in_seconds - separation: new_tat = max(tat, t) + separation r.set(key, new_tat) return False return True except LockError: return True
Check the complete code on GitHub.