Rate Limiting
The term "rate-limiting" is known to almost everyone who has worked with web servers. I think It's one of those subjects that we often just use and don't think about it. But I think knowing how rate limiting works is useful If you have ever used it.
As you may know, "rate-limiting" is basically monitoring the incoming requests and making sure they don't violate a threshold. Basically, we define a rate of requests per time-unit and discard packets of a stream that is sending more packets than the one defined in our rate.
You probably noticed that I used the word "discard" when our threshold is reached. That's not always the case. In a lot of cases, we put the packets in a queue to keep the packets at a constant output rate. (And packets won't be discarded.)
What we'll be discussing doesn't require a queue and I dare say It's probably the simplest way to implement rate-limiting.
The most famous ways of implementing rate-limiting (Traffic Shaping) are:
- Token Bucket
- Leaky Bucket
- (r, t) Traffic Shaping
The Leaky Bucket is somewhat similar to the Token Bucket but right now we don't care about the constant output rate and Token Bucket is the only algorithm we will talk about.
Token Bucket
The Token Bucket analogy is very simple. It's all about a bucket and tokens in it. Let's discuss it step by step.
- Picture a bucket in your mind.
- Fill the buckets with tokens at a constant rate.
- When a packet arrives, check if there is any token in the bucket.
- If there was any token left, remove one from the bucket and forward the packet. If the bucket was empty, simply drop the packet.
That's it. Wasn't it simple?
Implementation
Enough talking. Let's write a simple Token Bucket throttler in Python.
We start by defining a class with 4 arguments when It's being instantiated.
- tokens: number of tokens added to the bucket in each time unit.
- time_unit: the tokens are added in this time frame.
- forward_callback: this function is called when the packet is being forwarded.
- drop_callback: this function is called when the packet should be dropped.
We also prefill the bucket with the given tokens. This allows for a burst of packets when the throttler first starts working.
last_check
is the timestamp that we previously handled a packet. For initialization, we set the time when we created the bucket.
import time
class TokenBucket:
def __init__(self, tokens, time_unit, forward_callback, drop_callback):
self.tokens = tokens
self.time_unit = time_unit
self.forward_callback = forward_callback
self.drop_callback = drop_callback
self.bucket = tokens
self.last_check = time.time()
Then, we define handle()
where the magic happens.
def handle(self, packet): #1
current = time.time() #2
time_passed = current - self.last_check #3
self.last_check = current #4
self.bucket = self.bucket + \
time_passed * (self.tokens / self.time_unit) #5
if (self.bucket > self.tokens): #6
self.bucket = self.tokens
if (self.bucket < 1): #7
self.drop_callback(packet)
else:
self.bucket = self.bucket - 1 #8
self.forward_callback(packet)
-
handle
accepts only 1 parameter: the packet. - The current timestamp is put into
current
. - The time passed between now and the previous
handle
call is put intotime_passed
. - We update the
last_check
to the current timestamp. - This is the most important part. By multiplying the
time_passed
and our rate (which istokens
/time_unit
) we find out how many token needs to be added to the bucket. - Reset the bucket if it has more tokens than the default value.
- If the bucket doesn't have enough token, drop the packet.
- Otherwise, remove one token and forward the packet.
That's all. This is the final working version:
import time
class TokenBucket:
def __init__(self, tokens, time_unit, forward_callback, drop_callback):
self.tokens = tokens
self.time_unit = time_unit
self.forward_callback = forward_callback
self.drop_callback = drop_callback
self.bucket = tokens
self.last_check = time.time()
def handle(self, packet):
current = time.time()
time_passed = current - self.last_check
self.last_check = current
self.bucket = self.bucket + \
time_passed * (self.tokens / self.time_unit)
if (self.bucket > self.tokens):
self.bucket = self.tokens
if (self.bucket < 1):
self.drop_callback(packet)
else:
self.bucket = self.bucket - 1
self.forward_callback(packet)
def forward(packet):
print("Packet Forwarded: " + str(packet))
def drop(packet):
print("Packet Dropped: " + str(packet))
throttle = TokenBucket(1, 1, forward, drop)
packet = 0
while True:
time.sleep(0.2)
throttle.handle(packet)
packet += 1
You should be getting something like this:
Packet Forwarded: 0
Packet Dropped: 1
Packet Dropped: 2
Packet Dropped: 3
Packet Dropped: 4
Packet Forwarded: 5
Packet Dropped: 6
Packet Dropped: 7
Packet Dropped: 8
Packet Dropped: 9
Packet Forwarded: 10
Packet Dropped: 11
Packet Dropped: 12
Packet Dropped: 13
Packet Dropped: 14
Packet Forwarded: 15
As you can see in the code, our rate is 1 packet per second. We also send a packet every 0.2 seconds. When we first send a packet it quickly empties the bucket and it takes some time for the bucket to be filled again and that's why 4 packets are dropped between each successful forward.
Rate limiting and traffic policing, in general, is a very vast subject so If you liked what you just read, there are many materials available online about this subject that you can use to have a much deeper understanding of traffic policing.
Thank you for your time.
Top comments (6)
Hi, at step 7, instead of
if (self.bucket < 1):
it must be:if (self.bucket < self.tokens):
right?if (self.bucket < 1):
This is correct as it checks if the current bucket size if less than 1, i.e. if current bucket size is less than 1, then it means there are not tokens in the bucket, hence drop the packet.
I hope you understood.
I don't know why but it seems the code has slightly incorrect variable name handling. Instead of decrementing the tokens, the code it decrementing the bucket itself which would cause a lot of confusion.
Hi, i go through this [en.wikipedia.org/wiki/Token_bucket....], that every packet has a weight (bytes). When a packet with n bytes is conform (at least n token in bucket), then n tokens will be remove from bucket. But in your algorithm, you default remove only one token, but your packet is only conform when bucket is fully tokens.
Can you explain why and how this affect the output of your alg than the one in wiki?. Thank.
Hey Amir, do you have a guide on integrating this with something like a Flask back end??
you can put this ratelimiter in middleware in Flask.