DEV Community

Sumedhvats
Sumedhvats

Posted on

Production-Ready Rate Limiter in Go: From Side Project to Distributed System

A deep dive into three algorithms, atomic Redis operations, and building a high-performance, flexible library from scratch.

When you're building a new service, rate limiting is one of those things you know you need, but you often start with something simple. Maybe it's a basic in-memory counter. But what happens when your service grows? When you move from a single server to a distributed system, that simple counter breaks down. You're stuck rewriting your rate limiting logic.

Most Go rate limiters I found forced me into a single algorithm (usually token bucket) or locked me into a specific storage backend. This was the problem I set out to solve.

I decided to build rate-limiter-go, a library that scales with you. It provides:

  • Multiple battle-tested algorithms
  • Pluggable storage (in-memory or Redis)
  • Atomic Redis operations for concurrency-safe, production-ready limiting

In this post, I'm going to walk you through the journey of building it: the algorithms I explored, the edge cases I found, and the final high-performance library.


Part 1: The Quest for the "Perfect" Algorithm

Rate limiting seems simple, but there are many ways to do it, each with critical trade-offs.

1. The Naive Start: Fixed Window

This is the most intuitive approach. You divide time into fixed "windows" (e.g., one minute) and allow a certain number of requests in that window.

  • Mental Model:

    • Set a limit (e.g., 100 requests per minute).
    • If the time is 12:24:02, the window is 12:24:00 to 12:24:59.
    • All requests in this period increment a single counter.
    • If counter > 100, reject.
    • At 12:25:00, the counter resets to 0.
  • The Problem: Burst Errors
    This algorithm has a major flaw. Imagine your limit is 100 requests/minute. A user could send 100 requests at 12:24:59 (which are allowed) and then another 100 requests at 12:25:00 (which are also allowed, as it's a new window).

    This user just sent 200 requests in two seconds, effectively doubling your intended rate limit and bypassing your protection.

  • When to Use It: Simple, low-traffic, or single-node setups where absolute precision isn't critical.

2. The "Smooth" Approach: Token Bucket

This algorithm is a classic for a reason. It's designed to handle bursts gracefully while maintaining a steady average rate.

  • Mental Model:
    • Each user gets a "bucket" with a maximum capacity (e.g., 100 tokens).
    • The bucket is refilled at a constant rate (e.g., 10 tokens per second).
    • Every request tries to consume one token.
    • If a token is available, the request is allowed.
    • If the bucket is empty, the request is rejected.

This is much better. It allows a user to "save up" tokens to send a short burst (up to the bucket capacity), but they can't exceed the steady-state refill rate over the long term.

  • Implementation Edge Cases:

    • Clock Skew: In a distributed system, different servers will have different clocks, leading to inconsistent refill calculations. Solution: Use Redis server time (TIME command) as the single source of truth.
    • Float Precision: Refill rates are often fractional (e.g., 1.66 tokens/sec). This can lead to floating-point precision issues. Solution: Be careful to round values before comparison.
  • When to Use It: This is ideal for most public APIs. It provides smooth flow control and allows for legitimate, short-term bursts of traffic.

3. The Balanced Approach: Sliding Window Counter

This was the algorithm that struck the best balance for me. It solves the "burst error" of the Fixed Window but is simpler to implement and often more performant than a Token Bucket.

  • Mental Model:
    This algorithm smooths out the rate by considering a weighted average of the previous window and the current window.

    Imagine a 1-minute window (limit 100).

    • It's 12:25:15 (so, we are 25% of the way through the current window).
    • Previous Window (12:24): Had 80 requests.
    • Current Window (12:25): Has 10 requests so far.

    We don't just look at the 10 requests. We calculate a weighted count:

    • Weight of Previous Window: 75% (since 75% of the sliding window is still in the past)
    • Weight of Current Window: 25%

    Weighted Count = (80 requests * 75%) + (10 requests * 25%)
    Weighted Count = 60 + 2.5 = 62.5

    The user's current effective count is 62.5. They can continue making requests. This approach gracefully "slides" the count from one window to the next, completely eliminating the boundary burst problem.

  • When to Use It: My recommendation for most general-purpose, distributed rate limiting. It provides excellent accuracy and performance without the complexity of token management.


Part 2: From Theory to Production Library

Knowing the algorithms is one thing; implementing them in a production-ready way is another. Here were my core design goals for rate-limiter-go.

1. Storage Backend Abstraction

I wanted to start with in-memory storage for development and scale to Redis in production without changing my application code.

I defined a simple Storage interface:

type Storage interface {
    Get(key string) (interface{}, error)
    Set(key string, value interface{}, ttl time.Duration) error
    Delete(key string) error
    Increment(key string, amount int, ttl time.Duration) (int64, error)
}
Enter fullscreen mode Exit fullscreen mode

Now, I can initialize my limiter with either backend:

// Development: in-memory
store := storage.NewMemoryStorage()

// Production: Redis (same interface)
store := storage.NewRedisStorage("redis-cluster:6379")

// Same limiter code works with both
rateLimiter := limiter.NewSlidingWindowLimiter(store, config)
Enter fullscreen mode Exit fullscreen mode

2. Atomic Redis Operations

In a concurrent system, you can't just GET a value, check it, and then SET it. This is a classic race condition.

# !! RACE CONDITION !!
# Client 1 GETS count (99)
# Client 2 GETS count (99)
# Client 1 increments to 100, SETS 100. (Allowed)
# Client 2 increments to 100, SETS 100. (Also Allowed)
# !! We just allowed 101 requests !!
Enter fullscreen mode Exit fullscreen mode

The solution is to perform all operations atomically. I used Lua scripts, which Redis guarantees will run without interruption.

Here is the (simplified) Lua script for the Fixed Window algorithm. It gets, checks, increments, and sets the expiry all in one atomic step.

-- Fixed Window example (simplified)
local current = tonumber(redis.call('GET', key) or '0')
if current + increment > limit then
    return 0  -- Denied
end
redis.call('INCRBY', key, increment)
redis.call('EXPIRE', key, ttl)
return 1  -- Allowed
Enter fullscreen mode Exit fullscreen mode

No race conditions. No approximate counting. Just correctness.

3. High-Performance In-Memory Storage

For the in-memory backend, the obvious choice is a sync.Mutex wrapping a map[string]int. However, Go's documentation mentions sync.Map is optimized for a specific case: "when a given key is written once but read many times."

A rate limiter cache is the opposite: keys are read and written to on almost every request.

My implementation for in-memory storage uses sync.Map but leverages its CompareAndSwap (CAS) atomic operations to safely increment counters under high concurrency, which performs better than a single, global mutex blocking all goroutines.


Part 3: Putting It All Together

Here's what the final library looks like in practice.

Quick Start: 5 Lines to Rate Limiting

This is all it takes to add rate limiting to any function.

package main

import (
    "fmt"
    "time"

    "github.com/sumedhvats/rate-limiter-go/pkg/limiter"
    "github.com/sumedhvats/rate-limiter-go/pkg/storage"
)

func main() {
    // 1. Create in-memory storage
    store := storage.NewMemoryStorage()

    // 2. Create limiter: 10 requests per minute
    rateLimiter := limiter.NewSlidingWindowLimiter(store, limiter.Config{
        Rate:   10,
        Window: 1 * time.Minute,
    })

    // 3. Check if request is allowed
    allowed, err := rateLimiter.Allow("user:alice")
    if err != nil {
        panic(err)
    }

    // 4. Deny or allow
    if !allowed {
        fmt.Println("Rate limit exceeded!")
        return
    }

    // 5. Allow
    fmt.Println("Request allowed!")
}
Enter fullscreen mode Exit fullscreen mode

The Most Common Use Case: HTTP Middleware

Of course, the most common need is for an HTTP API. I built a middleware that handles everything automatically.

package main

import (
    "net/http"
    "time"

    "github.com/sumedhvats/rate-limiter-go/middleware"
    "github.com/sumedhvats/rate-limiter-go/pkg/limiter"
    "github.com/sumedhvats/rate-limiter-go/pkg/storage"
)

func main() {
    // Use Redis for a distributed system
    store := storage.NewRedisStorage("localhost:6379")

    // 100 requests per minute per IP
    rateLimiter := limiter.NewSlidingWindowLimiter(store, limiter.Config{
        Rate:   100,
        Window: 1 * time.Minute,
    })

    // Apply middleware
    mux := http.NewServeMux()
    mux.HandleFunc("/api/data", dataHandler)

    // The middleware automatically uses IP address as the key
    handler := middleware.RateLimitMiddleware(middleware.Config{
        Limiter: rateLimiter,
    })(mux)

    http.ListenAndServe(":8080", handler)
}

func dataHandler(w http.ResponseWriter, r *http.Request) {
    w.Write([]byte("Data served successfully"))
}
Enter fullscreen mode Exit fullscreen mode

This middleware automatically:

  • Extracts the client IP (handling X-Forwarded-For proxies).
  • Returns a 429 Too Many Requests JSON error.
  • Adds standard rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset).

Part 4: The Proof: Does it Scale?

I built this for performance, so I benchmarked it heavily. Here are the results on my 12th Gen Intel i5.

This first test shows a realistic, concurrent load with many different keys (e.g., many different users hitting the API).

Multiple Keys (Realistic Load) - Concurrent
| Algorithm | Time/op | Memory/op|
|--------------------|--------------|----------|
| Sliding Window | 68 ns/op | 100 B/op |
| Token Bucket | 76 ns/op | 160 B/op |
| Fixed Window | 130 ns/op | 261 B/op |

This test shows how the system scales when hammering the cache with 10,000 unique keys.

Scalability (10K Keys) - Concurrent
| Algorithm | Time/op | Throughput |
|--------------------|----------|-----------------|
| Token Bucket | 56 ns/op | ~17M ops/sec |
| Sliding Window | 74 ns/op | ~13M ops/sec |
| Fixed Window | 95 ns/op | ~10M ops/sec |

Key Insights:

  • Sliding Window and Token Bucket are the clear winners, both able to handle 13-17 million operations per second on a single core.
  • They are incredibly lightweight, using 100-160 bytes per operation.
  • The performance scales linearly with the number of keys.

Conclusion

Building this library was a fantastic journey through algorithm design, concurrency patterns in Go, and atomic database operations with Redis.

I started with a simple goal: create a rate limiter that wouldn't need to be rewritten when a project scaled. The result is a library that lets you choose the right algorithm for the job, scales from a single in-memory instance to a distributed Redis cluster, and operates with atomic, concurrency-safe guarantees.

If you want to check out the code, contribute, or use the library in your own project, you can find it on GitHub.

Thanks for reading!

Top comments (0)