Maksat Ramazan

Posted on Nov 19

Building a Zero-Dependency Rate Limiter in Go (Token Bucket, Leaky Bucket, Sliding Window)

#go #ratelimiting #webdev

Rate limiting is essential for protecting APIs from abuse, ensuring fair resource allocation, and maintaining system stability. While there are existing solutions, I wanted to build something lightweight, performant, and easy to integrate into any Go project.

Today, I'm sharing kazrl - a zero-dependency rate limiter library that implements three different algorithms and comes with ready-to-use middleware for popular Go web frameworks.

The Problem

Most rate limiting libraries either:

Come with heavy dependencies
Support only one algorithm
Require complex setup for per-client limiting
Lack middleware integration

I needed something that:

Has zero external dependencies
Supports multiple algorithms (Token Bucket, Leaky Bucket, Sliding Window)
Works with popular frameworks out of the box
Provides flexible per-client rate limiting

Installation

go get github.com/Makennsky/kazrl

That's it! No transitive dependencies to worry about.

Quick Start

Here's the simplest way to add rate limiting to your HTTP handler:

import (
    "net/http"
    "github.com/Makennsky/kazrl"
    "github.com/Makennsky/kazrl/middleware"
)

func main() {
    // Create a rate limiter: 100 requests per second, burst of 200
    limiter := kazrl.NewTokenBucket(100, 200)

    // Apply middleware
    rateLimitMiddleware := middleware.HTTP(limiter)

    handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello, World!"))
    })

    http.Handle("/api/", rateLimitMiddleware(handler))
    http.ListenAndServe(":8080", nil)
}

That's literally 10 lines of code to add rate limiting!

Three Algorithms, One Interface

Different use cases need different strategies. kazrl implements three battle-tested algorithms:

1. Token Bucket

Perfect for APIs that need to allow bursts while maintaining average rate limits.

limiter := kazrl.NewTokenBucket(10, 20)
// 10 requests per second, allows bursts up to 20

Use case: Public APIs, user-facing endpoints

2. Leaky Bucket

Smooths out traffic spikes by processing requests at a constant rate.

limiter := kazrl.NewLeakyBucket(10, 20)
// Processes 10 req/s, queues up to 20

Use case: Protecting downstream services, database queries

3. Sliding Window

Provides the most accurate rate limiting without fixed window edge cases.

limiter := kazrl.NewSlidingWindow(10, 20)
// 10 req/s with a sliding time window

Use case: Strict rate enforcement, billing APIs

All three implement the same interface, so switching is trivial:

type RateLimiter interface {
    Allow() bool
    AllowN(n int) bool
    Wait(ctx context.Context) error
    WaitN(ctx context.Context, n int) error
    Reserve() time.Duration
    ReserveN(n int) time.Duration
}

Per-Client Rate Limiting Made Easy

The real power comes with per-client limiting. Here's how to rate limit by IP address:

rateLimitMiddleware := middleware.HTTPWithKeyFunc(
    func() kazrl.RateLimiter {
        return kazrl.NewTokenBucket(10, 20) // 10 req/s per IP
    },
    middleware.KeyByIP, // Built-in IP extractor
)

http.Handle("/api/", rateLimitMiddleware(handler))

Each IP address automatically gets its own rate limiter instance. The library handles X-Forwarded-For and X-Real-IP headers correctly.

Rate Limit by API Key

rateLimitMiddleware := middleware.HTTPWithKeyFunc(
    func() kazrl.RateLimiter {
        return kazrl.NewTokenBucket(100, 200)
    },
    middleware.KeyByAPIKey, // Extracts from Authorization header
)

Custom Key Functions

Need something more complex? Write your own key extractor:

customKeyFunc := func(r *http.Request) string {
    // Rate limit by IP + endpoint combination
    return middleware.KeyByIP(r) + ":" + r.URL.Path
}

rateLimitMiddleware := middleware.HTTPWithKeyFunc(
    func() kazrl.RateLimiter {
        return kazrl.NewTokenBucket(5, 10)
    },
    customKeyFunc,
)

Framework Integration

kazrl provides native middleware for popular frameworks:

Gin

r := gin.Default()
limiter := kazrl.NewTokenBucket(100, 200)
r.Use(middleware.Gin(limiter))

Echo

e := echo.New()
limiter := kazrl.NewTokenBucket(100, 200)
e.Use(middleware.Echo(limiter))

Fiber

app := fiber.New()
limiter := kazrl.NewTokenBucket(100, 200)
app.Use(middleware.Fiber(limiter))

Chi

r := chi.NewRouter()
limiter := kazrl.NewTokenBucket(100, 200)
r.Use(middleware.Chi(limiter))

Multi-Layer Rate Limiting

For advanced scenarios, you can stack multiple rate limiters:

// Global limit: 1000 req/s for all clients
globalLimiter := kazrl.NewTokenBucket(1000, 2000)
globalMiddleware := middleware.HTTP(globalLimiter)

// Per-IP limit: 10 req/s per client
perIPMiddleware := middleware.HTTPWithKeyFunc(
    func() kazrl.RateLimiter {
        return kazrl.NewTokenBucket(10, 20)
    },
    middleware.KeyByIP,
)

// Stack them!
handler := globalMiddleware(perIPMiddleware(yourHandler))

This protects against both individual abuse and total system overload.

Performance

Benchmarks on a modern system (Intel i7-1355U):

BenchmarkTokenBucketAllow-12        4,574,689 ops    255.6 ns/op    0 allocs/op
BenchmarkLeakyBucketAllow-12        5,218,902 ops    208.3 ns/op    0 allocs/op
BenchmarkSlidingWindowAllow-12      6,476,462 ops    198.6 ns/op    0 allocs/op

200-260 nanoseconds per operation with zero allocations. That's fast enough for the most demanding applications.

Production-Ready Features

Context Support

All blocking operations support context cancellation:

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
defer cancel()

if err := limiter.Wait(ctx); err != nil {
    // Handle timeout or cancellation
}

Reservation API

For advanced use cases, you can reserve tokens and schedule work:

waitDuration := limiter.Reserve()
if waitDuration > 0 {
    // Schedule for later
    time.AfterFunc(waitDuration, processRequest)
} else {
    // Process immediately
    processRequest()
}

Thread-Safe

All operations are thread-safe. You can safely use the same limiter instance across multiple goroutines.

Algorithm Comparison

Algorithm	Burst Support	Smoothing	Memory	Best For
Token Bucket	Yes	No	Low	Public APIs, burst tolerance
Leaky Bucket	Queued	Yes	Medium	Downstream protection
Sliding Window	No	No	High	Strict enforcement

Implementation Insights

Why Zero Dependencies?

Dependencies are a security and maintenance burden. By keeping kazrl dependency-free:

No supply chain attacks via transitive dependencies
Faster installation and smaller binaries
No version conflicts with your other dependencies
Easy to audit (< 2000 lines of code)

Concurrency Design

Each algorithm uses sync.Mutex for thread-safety. While this might seem simple, it's actually the right choice here:

type tokenBucket struct {
    mu         sync.Mutex
    rate       float64
    burst      int
    tokens     float64
    lastUpdate time.Time
}

Lock contention is minimal because:

Operations are extremely fast (< 300ns)
The critical section is tiny (just token math)
Per-client limiting distributes the load

For most applications, you'll never see contention. If you're handling millions of requests per second per endpoint, you might need a distributed solution anyway.

Memory Management

The library is designed to minimize allocations:

// No allocations in the hot path
func (tb *tokenBucket) Allow() bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()

    now := time.Now()
    tb.refillTokens(now) // Pure math, no allocations

    if tb.tokens >= 1.0 {
        tb.tokens -= 1.0
        return true
    }
    return false
}

The only allocations happen when creating new per-client limiters, which is infrequent.

Real-World Example

Here's a complete example of a production-ready API server:

package main

import (
    "encoding/json"
    "net/http"

    "github.com/Makennsky/kazrl"
    "github.com/Makennsky/kazrl/middleware"
)

func main() {
    // Global rate limit: 10,000 req/s
    globalLimiter := kazrl.NewTokenBucket(10000, 20000)
    globalMiddleware := middleware.HTTP(globalLimiter)

    // Per-IP rate limit: 100 req/s
    perIPMiddleware := middleware.HTTPWithKeyFunc(
        func() kazrl.RateLimiter {
            return kazrl.NewTokenBucket(100, 200)
        },
        middleware.KeyByIP,
    )

    // API handler
    apiHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        response := map[string]string{
            "status": "ok",
            "message": "Request processed",
        }
        json.NewEncoder(w).Encode(response)
    })

    // Stack middleware
    http.Handle("/api/", globalMiddleware(perIPMiddleware(apiHandler)))

    http.ListenAndServe(":8080", nil)
}

When to Use Each Algorithm

Token Bucket - Default choice

Public APIs
User-facing endpoints
Services that need burst capacity
Not suitable when strict rate enforcement is needed

Leaky Bucket - Traffic shaping

Protecting slow downstream services
Database query rate limiting
Smoothing traffic spikes
Not suitable when you need to allow bursts

Sliding Window - Strict enforcement

Billing/metered APIs
When accuracy is critical
Preventing gaming fixed windows
Not suitable when you need burst capacity

Future Plans

Ideas I'm considering:

Distributed rate limiting (Redis backend)
Prometheus metrics integration
Response header injection (X-RateLimit-*)
Dynamic rate adjustment based on system load
gRPC interceptors

What would you find useful? Let me know in the comments!

Resources

GitHub: https://github.com/Makennsky/kazrl
Documentation: Full examples in README
Benchmarks: Run go test -bench=. -benchmem

Try It Out

Give kazrl a try in your next project! It's production-ready, battle-tested, and takes 2 minutes to integrate.

go get github.com/Makennsky/kazrl

If you find it useful, please star the repo on GitHub!

What rate limiting challenges have you faced? Share your experiences in the comments below!

Built in Kazakhstan

DEV Community