DEV Community

Krishna Nayak
Krishna Nayak

Posted on

Token Bucket - Rate Limiter

Token Bucket rate limiter is one of the popular rate-limiter algorithms use to control number of requests sent to server. It work by maintaining a bucket which hold fixed number tokens, which will be refilled at a constant rate time.

Each Request consume one token to processed.
If the bucket don't have tokens left, the request is either delayed or droped, depends on implementation.


token bucket rate limiter

Let have a high level understanding.
The bucket is full, holding a maximum number of tokens (let say, capacity is 5). Every time when a request comes in, it check whether there's a token in bucket. if yes, a token is out the request is allow to proceed. if no token is available, the request is denied/delayed, depending on implementation.

Meanwhile, system refill bucket with new token at a fixed rate(like 3 token/sec). But bucket can't hold more than it bucket capacity - so any extra tokens beyond that are discarded.

Let consider a scenario where, within the a second there were requests are made, say 3 request. Before allow these request to proceed, rate limiter mechanism will checks system bucket tokens. Since tokens are available, all three requests are allowed, and three tokens are consumed from the bucket. However, within that same second another 5 requests arrive, so again rate limiter mechanism, will check for availability of tokens. At this point, only contain 2 tokens remain, so only the first two requests are allowed and those token are consumed. The remaining three requests are denied or delayed due to insufficient tokens.

token bucket mechanisms within a second

Now, no additional requests will permite until the bucket is refilled, which happen every one second, based on the defined refill rate.

token-refill-mechanism

At 01.00 second, the refill mechanism kicks in, adding 3 new tokens to the bucket. Immediately after that, 1 new request arrives, which is allowed by consuming 1 of the newly added tokens. Since tokens were available, the request proceeds, and no denial occurs. Moving to 02.00 seconds, no new requests arrive, so while the refill logic adds more tokens, none are consumed. The bucket now reaches its maximum capacity of 5 tokens. Again, no requests are denied, simply because none were received. At 03.00 seconds, the system attempts another refill, but since the bucket is already full, no new tokens are added. No requests arrive at this point either, so no tokens are consumed and no requests are denied. This idle period demonstrates how the token bucket patiently accumulates tokens during inactivity, preparing the system to handle upcoming bursts of traffic efficiently.


Now that we have developed concept of token bucket rate limiter behavior, it's time to shift our focus to how this implementation in code.

Core Components of the Token Bucket Algorithm in Code

Before we jump into the Java implementation, let’s break down what needs to be built:

  • Token Bucket Capacity – This defines how many tokens the bucket can hold at any time.
  • Refill Rate – Determines how many tokens to add per second.
  • Token Consumption – Each request checks if a token is available; if so, it’s consumed.
  • Refill Logic – The bucket must be topped up based on time elapsed since the last refill.

The class starts with two key configuration parameters: capacity and refillRate. The capacity represents the maximum number of tokens the bucket that can hold, while refillRate defines how many tokens should be added per second.

Internally, the class maintains a tokens counter to track the number of currently available tokens. Since all access to this variable is handled within synchronized methods, thread safety is ensured. It also keep tracks of last time bucket was refilled using lastRefillTimestamp, which stores the timestamp in nanoseconds (via System.nanoTime()), allowing for precise time-based calculations.

refill() method, whenever a request comes in, this method calculates how much time has passed since the last refill. If one or more full seconds have passed, it computes how many tokens should be added (tokensToAdd = secondsPassed * refillRate). Then, it updates the token count, ensuring it doesn't exceed the bucket's maximum capacity using Math.min(capacity, currentTokens + tokensToAdd). After that, lastRefillTimestamp is also updated with elapsed seconds to accurately track the time for the next refill cycle.

allowRequest() method is synchronized to ensure thread-safety. When a request arrives, it first refills the bucket. Then, if there's at least one token available, it decrements the token count and allows the request. Otherwise, the request is denied. This mechanism ensures that only a limited number of requests are allowed over time, enforcing both burst and sustained rate limits.

// TokenBucketRateLimiter.java
public class TokenBucketRateLimiter {

    private final int capacity;
    private final int refillRate; // tokens per second

    private int tokens;
    private long lastRefillTimestamp;

    public TokenBucketRateLimiter(int capacity, int refillRate) {
        this.capacity = capacity;
        this.refillRate = refillRate;
        this.tokens = capacity;
        this.lastRefillTimestamp = System.nanoTime();
    }

    private void refill() {
        long now = System.nanoTime();
        long elapsedTime = now - lastRefillTimestamp;

        long secondsPassed = elapsedTime / 1_000_000_000;

        if (secondsPassed > 0) {
            int tokensToAdd = (int) (secondsPassed * refillRate);

            tokens = Math.min(capacity, tokens + tokensToAdd);

            lastRefillTimestamp += secondsPassed * 1_000_000_000;
        }
    }

    public synchronized boolean allowRequest() {
        refill();

        if (tokens > 0) {
            tokens--;
            return true;
        }

        return false;
    }

}

Enter fullscreen mode Exit fullscreen mode

To see the Token Bucket rate limiter in action, we will simulate a sequence of requests using the Main class. In this simple example, we create an instance of the TokenBucketRateLimiter with a bucket capacity of 5 tokens and a refill rate of 3 tokens per second. We then simulate 10 consecutive requests in a loop, each separated by a 100-millisecond of pause using Thread.sleep(100). This delay helps mimic how requests might arrive in a real-world application over a short period of time.

// Main.java
public class Main
{
    // Simulation of request
    public static void main(String[] args) throws InterruptedException {
        TokenBucketRateLimiter limiter = new TokenBucketRateLimiter(5, 3);
        for (int i = 0; i < 10; i++) {
            boolean allowed = limiter.allowRequest();
            System.out.println("Request " + (i + 1) + " allowed? " + allowed);
            Thread.sleep(100); // simulate requests over time
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Now it’s your turn - try it out and see for yourself how the Token Bucket rate limiter behaves in different scenarios. You can tweak the capacity, change the refillRate, or even modify the request intervals in the Main class to observe how the system reacts under various traffic patterns.

If you have any additional insights, know alternative approaches, or spot potential improvements in this implementation, feel free to share them in the comments.

Thanks for reading — and happy coding! 🚀

Top comments (0)