Amir Keshavarz

Posted on Dec 17, 2020 • Edited on Aug 15, 2021

Rate limiting using the Token Bucket algorithm

#tutorial #computerscience #python #programming

Rate Limiting

The term "rate-limiting" is known to almost everyone who has worked with web servers. I think It's one of those subjects that we often just use and don't think about it. But I think knowing how rate limiting works is useful If you have ever used it.

As you may know, "rate-limiting" is basically monitoring the incoming requests and making sure they don't violate a threshold. Basically, we define a rate of requests per time-unit and discard packets of a stream that is sending more packets than the one defined in our rate.

You probably noticed that I used the word "discard" when our threshold is reached. That's not always the case. In a lot of cases, we put the packets in a queue to keep the packets at a constant output rate. (And packets won't be discarded.)

What we'll be discussing doesn't require a queue and I dare say It's probably the simplest way to implement rate-limiting.

The most famous ways of implementing rate-limiting (Traffic Shaping) are:

Token Bucket
Leaky Bucket
(r, t) Traffic Shaping

The Leaky Bucket is somewhat similar to the Token Bucket but right now we don't care about the constant output rate and Token Bucket is the only algorithm we will talk about.

Token Bucket

The Token Bucket analogy is very simple. It's all about a bucket and tokens in it. Let's discuss it step by step.

Picture a bucket in your mind.
Fill the buckets with tokens at a constant rate.
When a packet arrives, check if there is any token in the bucket.
If there was any token left, remove one from the bucket and forward the packet. If the bucket was empty, simply drop the packet.

That's it. Wasn't it simple?

Implementation

Enough talking. Let's write a simple Token Bucket throttler in Python.

We start by defining a class with 4 arguments when It's being instantiated.

tokens: number of tokens added to the bucket in each time unit.
time_unit: the tokens are added in this time frame.
forward_callback: this function is called when the packet is being forwarded.
drop_callback: this function is called when the packet should be dropped.

We also prefill the bucket with the given tokens. This allows for a burst of packets when the throttler first starts working.

last_check is the timestamp that we previously handled a packet. For initialization, we set the time when we created the bucket.



import time


class TokenBucket:

    def __init__(self, tokens, time_unit, forward_callback, drop_callback):
        self.tokens = tokens
        self.time_unit = time_unit
        self.forward_callback = forward_callback
        self.drop_callback = drop_callback
        self.bucket = tokens
        self.last_check = time.time()

Then, we define handle() where the magic happens.



def handle(self, packet): #1
    current = time.time() #2
    time_passed = current - self.last_check #3
    self.last_check = current #4

    self.bucket = self.bucket + \
        time_passed * (self.tokens / self.time_unit) #5

    if (self.bucket > self.tokens): #6
        self.bucket = self.tokens

    if (self.bucket < 1): #7
        self.drop_callback(packet)
    else:
        self.bucket = self.bucket - 1 #8
        self.forward_callback(packet)

handle accepts only 1 parameter: the packet.
The current timestamp is put into current.
The time passed between now and the previous handle call is put into time_passed.
We update the last_check to the current timestamp.
This is the most important part. By multiplying the time_passed and our rate (which is tokens / time_unit) we find out how many token needs to be added to the bucket.
Reset the bucket if it has more tokens than the default value.
If the bucket doesn't have enough token, drop the packet.
Otherwise, remove one token and forward the packet.

That's all. This is the final working version:



import time


class TokenBucket:

    def __init__(self, tokens, time_unit, forward_callback, drop_callback):
        self.tokens = tokens
        self.time_unit = time_unit
        self.forward_callback = forward_callback
        self.drop_callback = drop_callback
        self.bucket = tokens
        self.last_check = time.time()

    def handle(self, packet):
        current = time.time()
        time_passed = current - self.last_check
        self.last_check = current

        self.bucket = self.bucket + \
            time_passed * (self.tokens / self.time_unit)

        if (self.bucket > self.tokens):
            self.bucket = self.tokens

        if (self.bucket < 1):
            self.drop_callback(packet)
        else:
            self.bucket = self.bucket - 1
            self.forward_callback(packet)


def forward(packet):
    print("Packet Forwarded: " + str(packet))


def drop(packet):
    print("Packet Dropped: " + str(packet))


throttle = TokenBucket(1, 1, forward, drop)

packet = 0

while True:
    time.sleep(0.2)
    throttle.handle(packet)
    packet += 1

You should be getting something like this:



Packet Forwarded: 0
Packet Dropped: 1
Packet Dropped: 2
Packet Dropped: 3
Packet Dropped: 4
Packet Forwarded: 5
Packet Dropped: 6
Packet Dropped: 7
Packet Dropped: 8
Packet Dropped: 9
Packet Forwarded: 10
Packet Dropped: 11
Packet Dropped: 12
Packet Dropped: 13
Packet Dropped: 14
Packet Forwarded: 15

As you can see in the code, our rate is 1 packet per second. We also send a packet every 0.2 seconds. When we first send a packet it quickly empties the bucket and it takes some time for the bucket to be filled again and that's why 4 packets are dropped between each successful forward.

Rate limiting and traffic policing, in general, is a very vast subject so If you liked what you just read, there are many materials available online about this subject that you can use to have a much deeper understanding of traffic policing.

Thank you for your time.

Top comments (6)

AnLuubk • May 27 '23 • Edited

Hi, i go through this [en.wikipedia.org/wiki/Token_bucket....], that every packet has a weight (bytes). When a packet with n bytes is conform (at least n token in bucket), then n tokens will be remove from bucket. But in your algorithm, you default remove only one token, but your packet is only conform when bucket is fully tokens.

Can you explain why and how this affect the output of your alg than the one in wiki?. Thank.

AnLuubk • May 27 '23

Hi, at step 7, instead of if (self.bucket < 1): it must be: if (self.bucket < self.tokens): right?

Mahboob Alam • Jun 8 '24

if (self.bucket < 1):
This is correct as it checks if the current bucket size if less than 1, i.e. if current bucket size is less than 1, then it means there are not tokens in the bucket, hence drop the packet.

I hope you understood.

Mahboob Alam • Jun 8 '24

I don't know why but it seems the code has slightly incorrect variable name handling. Instead of decrementing the tokens, the code it decrementing the bucket itself which would cause a lot of confusion.

Anshuman Kumar • Feb 11 '23

Hey Amir, do you have a guide on integrating this with something like a Flask back end??

Archer Reilly • Feb 17 '23

you can put this ratelimiter in middleware in Flask.