Leapcell

Posted on Jul 13, 2025

Build a Token Bucket Limiter in Go in Under 100 Lines

#go #gin #django #rails

Leapcell: The Best of Serverless Web Hosting

Implementing the Token Bucket Algorithm in Go

Introduction to Rate Limiting

In the realm of computer science and network engineering, rate limiting stands as a critical mechanism for maintaining system stability, preventing abuse, and ensuring fair resource allocation. At its core, rate limiting controls the number of requests or operations that can be processed within a specific time window. This becomes indispensable in scenarios such as API gateways, distributed systems, and any application where resource consumption must be regulated.

Among the various algorithms designed to implement rate limiting, the Token Bucket Algorithm has emerged as one of the most popular and versatile choices. Its appeal lies in its ability to handle both steady-state traffic and occasional traffic bursts, making it suitable for a wide range of real-world applications. In this comprehensive guide, we will delve into the inner workings of the Token Bucket Algorithm, explore its implementation in the Go programming language, and examine its practical applications and considerations.

Understanding the Token Bucket Algorithm

Core Concepts

The Token Bucket Algorithm operates on a simple yet elegant metaphor: a bucket that holds tokens. Each token represents the right to perform an operation or process a request. The algorithm functions through the following key mechanisms:

Token Generation: At fixed intervals, new tokens are added to the bucket. The rate at which tokens are generated determines the long-term average request rate that the system can handle.
Token Consumption: To process a request, a token must be removed from the bucket. If a token is available, the request is allowed; if not, the request is either delayed or rejected.
Bucket Capacity: The bucket has a maximum capacity, limiting the number of tokens it can hold at any given time. This capacity determines the system's ability to handle traffic bursts.

Visual Representation

+------------------------+
|                        |
|  [Token]  [Token]      |
|  [Token]  [Token]      |  <- Bucket with capacity = 5
|  [Token]               |     Currently holding 5 tokens
|                        |
+------------------------+
        ^       |
        |       v
+------------------------+
|                        |
|  Token Generator       |  <- Adds 2 tokens per second
|                        |
+------------------------+
        ^
        |
+------------------------+
|                        |
|  Incoming Requests     |
|                        |
+------------------------+

In this diagram, the bucket can hold a maximum of 5 tokens. The token generator adds 2 tokens to the bucket every second, but the bucket will never exceed its capacity of 5 tokens. When incoming requests arrive, each request takes one token from the bucket. If no tokens are available, the request is either queued for later processing or rejected outright.

Dynamic Behavior

The true power of the Token Bucket Algorithm lies in its dynamic behavior:

During periods of low request volume, tokens accumulate in the bucket, up to its maximum capacity. This builds up a "reserve" that can be used to handle sudden traffic spikes.
During traffic bursts, the system can process requests at a rate higher than the average token generation rate, as long as there are tokens in the bucket.
Once the bucket is empty, the system can only process requests at the token generation rate, effectively throttling traffic to the long-term average.

This behavior makes the Token Bucket Algorithm particularly well-suited for applications that experience variable traffic patterns, such as web servers, API endpoints, and network routers.

Implementing the Token Bucket in Go

Go's concurrency primitives, particularly goroutines and channels, make it an excellent language for implementing the Token Bucket Algorithm. Let's walk through a step-by-step implementation.

Basic Token Bucket Structure

First, let's define the structure of our token bucket:

package tokenbucket

import (
    "sync"
    "time"
)

// TokenBucket represents a token bucket rate limiter
type TokenBucket struct {
    capacity  int           // Maximum number of tokens the bucket can hold
    rate      int           // Number of tokens to add per second
    tokens    int           // Current number of tokens in the bucket
    lastRefill time.Time    // Timestamp of the last token refill
    mutex     sync.Mutex    // Mutex to protect concurrent access
}

This structure contains:

capacity: The maximum number of tokens the bucket can hold
rate: The number of tokens to add per second
tokens: The current number of tokens in the bucket
lastRefill: The timestamp when tokens were last added to the bucket
mutex: A synchronization primitive to ensure thread safety

Creating a New Token Bucket

Next, let's implement a function to create a new token bucket:

// NewTokenBucket creates a new token bucket with the specified capacity and rate
func NewTokenBucket(capacity, rate int) *TokenBucket {
    return &TokenBucket{
        capacity:  capacity,
        rate:      rate,
        tokens:    capacity, // Start with a full bucket
        lastRefill: time.Now(),
    }
}

This function initializes a new token bucket with the specified capacity and token generation rate. By starting with a full bucket, we allow for an immediate traffic burst up to the bucket's capacity.

Refilling Tokens

The heart of the Token Bucket Algorithm is the token refilling mechanism. We need to calculate how many tokens should be added to the bucket based on the time elapsed since the last refill:

// refill adds tokens to the bucket based on the elapsed time since the last refill
func (tb *TokenBucket) refill() {
    now := time.Now()
    elapsed := now.Sub(tb.lastRefill)

    // Calculate how many tokens should be added
    tokensToAdd := int(elapsed.Seconds() * float64(tb.rate))

    if tokensToAdd > 0 {
        tb.lastRefill = now
        // Add tokens, but don't exceed the bucket's capacity
        tb.tokens = min(tb.tokens+tokensToAdd, tb.capacity)
    }
}

// min is a helper function to find the minimum of two integers
func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

Acquiring Tokens

Now, let's implement the method that allows clients to acquire tokens from the bucket:

// Take attempts to take a specified number of tokens from the bucket
// Returns true if successful, false otherwise
func (tb *TokenBucket) Take(tokens int) bool {
    tb.mutex.Lock()
    defer tb.mutex.Unlock()

    // First, refill the bucket with tokens based on elapsed time
    tb.refill()

    // Check if we have enough tokens
    if tb.tokens >= tokens {
        tb.tokens -= tokens
        return true
    }

    // Not enough tokens available
    return false
}

The Take method follows these steps:

It acquires a lock to ensure thread safety, as multiple goroutines may try to access the bucket simultaneously.
It refills the bucket with any tokens that have been generated since the last operation.
It checks if enough tokens are available to satisfy the request.
If enough tokens are available, it removes them from the bucket and returns true; otherwise, it returns false.

Blocking Acquire

In some cases, we might want to block until tokens become available, rather than immediately returning false. Let's implement a TakeWithTimeout method that waits for a specified duration for tokens to become available:

// TakeWithTimeout attempts to take tokens from the bucket, waiting up to the specified timeout
// Returns true if successful, false if the timeout is reached
func (tb *TokenBucket) TakeWithTimeout(tokens int, timeout time.Duration) bool {
    // Calculate the earliest time we can stop waiting
    deadline := time.Now().Add(timeout)

    for {
        tb.mutex.Lock()

        // Refill the bucket
        tb.refill()

        // Check if we have enough tokens now
        if tb.tokens >= tokens {
            tb.tokens -= tokens
            tb.mutex.Unlock()
            return true
        }

        // Calculate how long we need to wait for more tokens
        tokensNeeded := tokens - tb.tokens
        timeNeeded := time.Duration(tokensNeeded) * time.Second / time.Duration(tb.rate)

        // If we can get the tokens before the deadline, wait and try again
        if time.Now().Add(timeNeeded).Before(deadline) {
            // Unlock before waiting to allow other operations
            tb.mutex.Unlock()

            // Wait for the required time, but no longer than the remaining timeout
            waitTime := minDuration(timeNeeded, deadline.Sub(time.Now()))
            time.Sleep(waitTime)
        } else {
            // Not enough time to get the required tokens
            tb.mutex.Unlock()
            return false
        }
    }
}

// minDuration is a helper function to find the minimum of two durations
func minDuration(a, b time.Duration) time.Duration {
    if a < b {
        return a
    }
    return b
}

This method uses a loop to repeatedly check for available tokens until either the required tokens are obtained or the timeout is reached. It calculates the minimum time needed to generate the required tokens and waits for that period (or until the timeout, whichever comes first) before checking again.

Advanced Features and Optimizations

Burst Control

While the basic implementation handles bursts by allowing token accumulation, we might want to add more sophisticated burst control. For example, we could limit the maximum number of tokens that can be taken in a single request:

// TakeWithBurstLimit is similar to Take, but limits the maximum tokens taken in one request
func (tb *TokenBucket) TakeWithBurstLimit(tokens, maxBurst int) bool {
    // Ensure we don't take more than the maximum burst size
    if tokens > maxBurst {
        tokens = maxBurst
    }
    return tb.Take(tokens)
}

This modification prevents a single request from consuming all available tokens, ensuring that some tokens remain for other requests.

Monitoring and Metrics

For production systems, it's often useful to monitor the token bucket's state:

// Metrics returns the current state of the token bucket
func (tb *TokenBucket) Metrics() (capacity, rate, currentTokens int) {
    tb.mutex.Lock()
    defer tb.mutex.Unlock()

    // Refill to get accurate current token count
    tb.refill()

    return tb.capacity, tb.rate, tb.tokens
}

This method provides insight into the bucket's capacity, token generation rate, and current token count, which can be invaluable for performance monitoring and tuning.

Dynamic Rate Adjustment

In some scenarios, we might want to dynamically adjust the token generation rate based on system conditions:

// SetRate updates the token generation rate
func (tb *TokenBucket) SetRate(newRate int) {
    tb.mutex.Lock()
    defer tb.mutex.Unlock()

    // Refill first to account for the time with the old rate
    tb.refill()

    tb.rate = newRate
}

This allows the system to adapt to changing conditions, such as increasing the rate during off-peak hours or decreasing it during periods of high load.

Practical Applications

API Rate Limiting

One of the most common applications of the Token Bucket Algorithm is API rate limiting:

// Example: Using a token bucket to rate limit an API endpoint
func handleAPIRequest(w http.ResponseWriter, r *http.Request, bucket *TokenBucket) {
    // Try to take a token for this request
    if !bucket.Take(1) {
        http.Error(w, "Too many requests", http.StatusTooManyRequests)
        return
    }

    // Process the request...
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("Request processed successfully"))
}

In this example, each API request consumes one token from the bucket. If no tokens are available, the server returns a 429 Too Many Requests response.

Traffic Shaping

The Token Bucket Algorithm can also be used for traffic shaping in network applications:

// Example: Using a token bucket for traffic shaping
func sendPackets(packets [][]byte, bucket *TokenBucket) {
    for _, packet := range packets {
        // Each packet consumes tokens based on its size
        packetSize := len(packet)

        // Wait until we have enough tokens for this packet
        if !bucket.TakeWithTimeout(packetSize, 5*time.Second) {
            fmt.Println("Timeout waiting for tokens")
            return
        }

        // Send the packet...
        fmt.Printf("Sending packet of size %d\n", packetSize)
    }
}

Here, the number of tokens required is proportional to the size of the packet being sent, allowing for more granular control over network bandwidth.

Resource Allocation

Token buckets can also be used to manage resource allocation within a system:

// Example: Using token buckets to prioritize different types of operations
type ResourceManager struct {
    criticalOperationsBucket *TokenBucket
    normalOperationsBucket   *TokenBucket
    lowPriorityBucket        *TokenBucket
}

func NewResourceManager() *ResourceManager {
    return &ResourceManager{
        // Critical operations get more tokens and a larger burst capacity
        criticalOperationsBucket: NewTokenBucket(100, 50),
        // Normal operations get moderate token allocation
        normalOperationsBucket: NewTokenBucket(50, 20),
        // Low priority operations get fewer tokens
        lowPriorityBucket: NewTokenBucket(20, 5),
    }
}

func (rm *ResourceManager) performCriticalOperation() bool {
    return rm.criticalOperationsBucket.Take(1)
}

func (rm *ResourceManager) performNormalOperation() bool {
    return rm.normalOperationsBucket.Take(1)
}

func (rm *ResourceManager) performLowPriorityOperation() bool {
    return rm.lowPriorityBucket.Take(1)
}

In this example, different types of operations have different token buckets with varying capacities and rates, ensuring that critical operations receive priority access to system resources.

Performance Considerations

Concurrency

When implementing the Token Bucket Algorithm in Go, it's important to consider the performance implications of concurrent access. The use of a mutex ensures thread safety but can become a bottleneck in highly concurrent systems.

One way to mitigate this is to use a more fine-grained locking strategy or to shard the token buckets across multiple instances:

// Example: Sharding token buckets for better concurrency
type ShardedTokenBucket struct {
    buckets   []*TokenBucket
    numShards int
}

func NewShardedTokenBucket(capacity, rate, numShards int) *ShardedTokenBucket {
    buckets := make([]*TokenBucket, numShards)
    for i := 0; i < numShards; i++ {
        // Each shard gets an equal share of the total capacity and rate
        buckets[i] = NewTokenBucket(capacity/numShards, rate/numShards)
    }

    return &ShardedTokenBucket{
        buckets:   buckets,
        numShards: numShards,
    }
}

func (stb *ShardedTokenBucket) Take(tokens int) bool {
    // Choose a shard based on a hash (e.g., of the request ID)
    shard := 0 // In practice, use a proper hash function

    return stb.buckets[shard].Take(tokens)
}

By dividing the token bucket into multiple shards, we reduce contention on the mutex, allowing for better performance in concurrent environments.

Precision vs. Efficiency

The implementation presented in this article recalculates the token count on each operation, which provides high precision but may have performance implications in very high-throughput systems.

An alternative approach is to use a background goroutine to periodically add tokens to the bucket:

// Example: Using a background goroutine to add tokens
func startRefillGoroutine(bucket *TokenBucket, interval time.Duration) {
    go func() {
        ticker := time.NewTicker(interval)
        defer ticker.Stop()

        for range ticker.C {
            bucket.mutex.Lock()
            // Add tokens based on the interval and rate
            tokensToAdd := int(interval.Seconds() * float64(bucket.rate))
            bucket.tokens = min(bucket.tokens+tokensToAdd, bucket.capacity)
            bucket.mutex.Unlock()
        }
    }()
}

This approach can be more efficient but introduces some imprecision, as tokens are added in batches rather than continuously.

Conclusion

The Token Bucket Algorithm provides a flexible and efficient way to implement rate limiting and traffic shaping in a wide range of applications. Its ability to handle both steady traffic and sudden bursts makes it particularly valuable in real-world systems where traffic patterns are often unpredictable.

In this article, we've explored the core concepts of the Token Bucket Algorithm, implemented it in Go, and examined various extensions and optimizations. We've also looked at practical applications, from API rate limiting to network traffic shaping.

When implementing a token bucket in your own applications, remember to consider:

The appropriate capacity and token generation rate for your use case
How to handle requests when tokens are unavailable (reject, queue, or delay)
Thread safety and performance in concurrent environments
Monitoring and dynamic adjustment based on system conditions

Leapcell: The Best of Serverless Web Hosting

Finally, we recommend an excellent platform for deploying Go services: Leapcell

🚀 Build with Your Favorite Language

Develop effortlessly in JavaScript, Python, Go, or Rust.

🌍 Deploy Unlimited Projects for Free

Only pay for what you use—no requests, no charges.

⚡ Pay-as-You-Go, No Hidden Costs

No idle fees, just seamless scalability.

📖 Explore Our Documentation

🔹 Follow us on Twitter: @LeapcellHQ

DEV Community