Amir Keshavarz

Posted on Aug 12, 2021 • Edited on Aug 15, 2021

Rate limiting using the Sliding Window algorithm

#tutorial #computerscience #python #programming

In this part of the rate-limiting series, we will introduce the Sliding Window algorithm and implement it in Python.

Even though there are more rate-limiting algorithms out there, I'm going to end the series here since I think These three algorithms are a pretty good gateway to the rate-limiting techniques.

Sliding Window

Unlike the previous Fixed Window algorithm, this algorithm does not limit the requests based on a fixed time unit.

A problem with Fixed Window was that It allowed a huge burst at the edge of windows because you can combine the capacity of the current and the next window to send a burst of requests.

Sliding Window tries to fix that by taking the previous counter into account, causing the flow to be more smooth.

Let's explain it with an example. Imagine that we have a rate limiter with a time unit of 60 seconds and a capacity of 50.

A request has entered at 00:01:20. The current time unit still has capacity left but as we learned the previous counter also affects the outcome.

The previous counter only affects as much as the space it still has occupied in the window. In our example, that would be ~67%.

So with that in mind, we use the following formula to calculate an estimated count.

ec = previous counter * ((time unit - time into the current counter) / time unit) + current counter

ec = 50 * 0.67 + 20 = 53.5

Our capacity was 50 and because of that, this request is dropped since 53.5 > 50.

Now, If another request comes in at 00:01:40, the ec would be:
ec = 50 * 0.34 + 20 = 37

This request is allowed to pass since 37 < 50.

Implementation

In this very simple implementation, We will build a rate-limiter that uses Sliding Window to limit packets in 1-second time frames.

We start by defining a class with 3 arguments when It's being instantiated.

capacity: number of the allowed packets that can pass through in a second.
time_unit: length of time unit in seconds.
forward_callback: this function is called when the packet is being forwarded.
drop_callback: this function is called when the packet should be dropped.



from time import time, sleep


class SlidingWindow:

    def __init__(self, capacity, time_unit, forward_callback, drop_callback):
        self.capacity = capacity
        self.time_unit = time_unit
        self.forward_callback = forward_callback
        self.drop_callback = drop_callback

        self.cur_time = time()
        self.pre_count = capacity
        self.cur_count = 0

cur_time is the edge of our current time unit.
pre_count is the previous counter. We pre-fill this to prevent bursting at the beginning of the rate limiter.
cur_count is the current counter that will get filled as new requests come in.

Then, we define handle() where the magic happens.



def handle(self, packet): #1
    if (time() - self.cur_time) > self.time_unit: #2
        self.cur_time = time()
        self.pre_count = self.cur_count
        self.cur_count = 0

    ec = (self.pre_count * (self.time_unit - (time() - self.cur_time)) / self.time_unit) + self.cur_count #3

    if (ec > self.capacity): #4
        return self.drop_callback(packet)

    self.cur_count += 1 #5
    return self.forward_callback(packet)

handle accepts only 1 parameter: the packet.
If the current time unit is passed, switch over to a new time unit.
Calculate the estimated count based on the given formula
Drop the packet if ec is bigger than the capacity.
Otherwise, add one to the current counter and forward the packet.

Final Code



from time import time, sleep


class SlidingWindow:

    def __init__(self, capacity, time_unit, forward_callback, drop_callback):
        self.capacity = capacity
        self.time_unit = time_unit
        self.forward_callback = forward_callback
        self.drop_callback = drop_callback

        self.cur_time = time()
        self.pre_count = capacity
        self.cur_count = 0

    def handle(self, packet):

        if (time() - self.cur_time) > self.time_unit:
            self.cur_time = time()
            self.pre_count = self.cur_count
            self.cur_count = 0

        ec = (self.pre_count * (self.time_unit - (time() - self.cur_time)) / self.time_unit) + self.cur_count

        if (ec > self.capacity):
            return self.drop_callback(packet)

        self.cur_count += 1
        return self.forward_callback(packet)


def forward(packet):
    print("Packet Forwarded: " + str(packet))


def drop(packet):
    print("Packet Dropped: " + str(packet))


throttle = SlidingWindow(5, 1, forward, drop)

packet = 0

while True:
    sleep(0.1)
    throttle.handle(packet)
    packet += 1

You should be getting something like this:



Packet Forwarded: 0

Packet Forwarded: 1

Packet Dropped: 2

Packet Forwarded: 3

Packet Dropped: 4

Packet Forwarded: 5

Packet Dropped: 6

Packet Forwarded: 7

Packet Dropped: 8

Packet Forwarded: 9

Packet Dropped: 10

Packet Forwarded: 11

Packet Dropped: 12

Packet Forwarded: 13

Packet Dropped: 14

Packet Forwarded: 15

Packet Dropped: 16

Packet Forwarded: 17

Packet Dropped: 18

Packet Forwarded: 19

Packet Dropped: 20

Packet Forwarded: 21

Packet Dropped: 22

Packet Forwarded: 23

Conclusion

In this post, we learned about the Sliding Window algorithm and tried to implement it in Python.

During this series, we covered Token Bucket, Fixed Window, and Sliding Window. Even though there are famous algorithms like Leaky Bucket or Sliding Log, I decided to end this series here since I believe these three covers the basic ideas of most rate-limiting algorithms, and this series was also expected to be short and simple.

Thank you for your time and I hope to see you on my other posts :)

Top comments (4)

miquelvir • Aug 31 '22

Great post! I believe that the last implementation is not right however.

if (time() - self.cur_time) > self.time_unit:
            self.cur_time = time()
            self.pre_count = self.cur_count
            self.cur_count = 0

ec = (self.pre_count * (self.time_unit - (time() - self.cur_time)) / self.time_unit) + self.cur_count

Let time() = self.cur_time + 1.5*self.time_unit. Let self.cur_count = capacity. That is, we are in the middle of the next window. Then, it enters inside the if. According to the logic, we would expect to have an estimated rate of 1/2*previousWindowCount+1/2*currentWindowCount. Nonetheless, since self.cur_time = time(), ec = ((self.pre_count * (self.time_unit - (0)) / self.time_unit) + self.cur_count = (self.pre_count * (self.time_unit / self.time_unit) + self.cur_count = self.pre_count + self.cur_count = capacity + 0 = capacity. The estimated rate will be the capacity for a whole new window.

I believe that the right approach would be setting curTime to the "fixed window" start time, and not the time of the request. That is:

if (time() - self.cur_time) > self.time_unit:
            current_window_start = self.cur_time
            while current_window_start + self.time_unit < time():
                        current_window_start += self.time_unit

            self.cur_time = current_window_start
            self.pre_count = self.cur_count
            self.cur_count = 0

ec = (self.pre_count * (self.time_unit - (time() - self.cur_time)) / self.time_unit) + self.cur_count

What do you think? Am I missing something?

OrenRosen • May 5 '23

Thanks, I also noticed it.
Also, you could just do

current_window_start = math.floor(time() / time_unit) * time_unit

malvarez • Feb 25 '22 • Edited

Hi Amir, this is a great post. I'm networking engineer, studying network programmability and also studying for Cisco Devnet certifications. I read the theory about these rate limiting techniques, but it wasn't until i read this post that i understood. Thank you for sharing this series.

I would like to ask you. In a REST API, what is the best approach to do this rate limiting by user? Any guidance will be appreciate.