DEV Community

Cover image for Building a Zero-Dependency Rate Limiter in Go (Token Bucket, Leaky Bucket, Sliding Window)
Maksat Ramazan
Maksat Ramazan

Posted on

Building a Zero-Dependency Rate Limiter in Go (Token Bucket, Leaky Bucket, Sliding Window)

Rate limiting is essential for protecting APIs from abuse, ensuring fair resource allocation, and maintaining system stability. While there are existing solutions, I wanted to build something lightweight, performant, and easy to integrate into any Go project.

Today, I'm sharing kazrl - a zero-dependency rate limiter library that implements three different algorithms and comes with ready-to-use middleware for popular Go web frameworks.

The Problem

Most rate limiting libraries either:

  • Come with heavy dependencies
  • Support only one algorithm
  • Require complex setup for per-client limiting
  • Lack middleware integration

I needed something that:

  • Has zero external dependencies
  • Supports multiple algorithms (Token Bucket, Leaky Bucket, Sliding Window)
  • Works with popular frameworks out of the box
  • Provides flexible per-client rate limiting

Installation

go get github.com/Makennsky/kazrl
Enter fullscreen mode Exit fullscreen mode

That's it! No transitive dependencies to worry about.

Quick Start

Here's the simplest way to add rate limiting to your HTTP handler:

import (
    "net/http"
    "github.com/Makennsky/kazrl"
    "github.com/Makennsky/kazrl/middleware"
)

func main() {
    // Create a rate limiter: 100 requests per second, burst of 200
    limiter := kazrl.NewTokenBucket(100, 200)

    // Apply middleware
    rateLimitMiddleware := middleware.HTTP(limiter)

    handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.Write([]byte("Hello, World!"))
    })

    http.Handle("/api/", rateLimitMiddleware(handler))
    http.ListenAndServe(":8080", nil)
}
Enter fullscreen mode Exit fullscreen mode

That's literally 10 lines of code to add rate limiting!

Three Algorithms, One Interface

Different use cases need different strategies. kazrl implements three battle-tested algorithms:

1. Token Bucket

Perfect for APIs that need to allow bursts while maintaining average rate limits.

limiter := kazrl.NewTokenBucket(10, 20)
// 10 requests per second, allows bursts up to 20
Enter fullscreen mode Exit fullscreen mode

Use case: Public APIs, user-facing endpoints

2. Leaky Bucket

Smooths out traffic spikes by processing requests at a constant rate.

limiter := kazrl.NewLeakyBucket(10, 20)
// Processes 10 req/s, queues up to 20
Enter fullscreen mode Exit fullscreen mode

Use case: Protecting downstream services, database queries

3. Sliding Window

Provides the most accurate rate limiting without fixed window edge cases.

limiter := kazrl.NewSlidingWindow(10, 20)
// 10 req/s with a sliding time window
Enter fullscreen mode Exit fullscreen mode

Use case: Strict rate enforcement, billing APIs

All three implement the same interface, so switching is trivial:

type RateLimiter interface {
    Allow() bool
    AllowN(n int) bool
    Wait(ctx context.Context) error
    WaitN(ctx context.Context, n int) error
    Reserve() time.Duration
    ReserveN(n int) time.Duration
}
Enter fullscreen mode Exit fullscreen mode

Per-Client Rate Limiting Made Easy

The real power comes with per-client limiting. Here's how to rate limit by IP address:

rateLimitMiddleware := middleware.HTTPWithKeyFunc(
    func() kazrl.RateLimiter {
        return kazrl.NewTokenBucket(10, 20) // 10 req/s per IP
    },
    middleware.KeyByIP, // Built-in IP extractor
)

http.Handle("/api/", rateLimitMiddleware(handler))
Enter fullscreen mode Exit fullscreen mode

Each IP address automatically gets its own rate limiter instance. The library handles X-Forwarded-For and X-Real-IP headers correctly.

Rate Limit by API Key

rateLimitMiddleware := middleware.HTTPWithKeyFunc(
    func() kazrl.RateLimiter {
        return kazrl.NewTokenBucket(100, 200)
    },
    middleware.KeyByAPIKey, // Extracts from Authorization header
)
Enter fullscreen mode Exit fullscreen mode

Custom Key Functions

Need something more complex? Write your own key extractor:

customKeyFunc := func(r *http.Request) string {
    // Rate limit by IP + endpoint combination
    return middleware.KeyByIP(r) + ":" + r.URL.Path
}

rateLimitMiddleware := middleware.HTTPWithKeyFunc(
    func() kazrl.RateLimiter {
        return kazrl.NewTokenBucket(5, 10)
    },
    customKeyFunc,
)
Enter fullscreen mode Exit fullscreen mode

Framework Integration

kazrl provides native middleware for popular frameworks:

Gin

r := gin.Default()
limiter := kazrl.NewTokenBucket(100, 200)
r.Use(middleware.Gin(limiter))
Enter fullscreen mode Exit fullscreen mode

Echo

e := echo.New()
limiter := kazrl.NewTokenBucket(100, 200)
e.Use(middleware.Echo(limiter))
Enter fullscreen mode Exit fullscreen mode

Fiber

app := fiber.New()
limiter := kazrl.NewTokenBucket(100, 200)
app.Use(middleware.Fiber(limiter))
Enter fullscreen mode Exit fullscreen mode

Chi

r := chi.NewRouter()
limiter := kazrl.NewTokenBucket(100, 200)
r.Use(middleware.Chi(limiter))
Enter fullscreen mode Exit fullscreen mode

Multi-Layer Rate Limiting

For advanced scenarios, you can stack multiple rate limiters:

// Global limit: 1000 req/s for all clients
globalLimiter := kazrl.NewTokenBucket(1000, 2000)
globalMiddleware := middleware.HTTP(globalLimiter)

// Per-IP limit: 10 req/s per client
perIPMiddleware := middleware.HTTPWithKeyFunc(
    func() kazrl.RateLimiter {
        return kazrl.NewTokenBucket(10, 20)
    },
    middleware.KeyByIP,
)

// Stack them!
handler := globalMiddleware(perIPMiddleware(yourHandler))
Enter fullscreen mode Exit fullscreen mode

This protects against both individual abuse and total system overload.

Performance

Benchmarks on a modern system (Intel i7-1355U):

BenchmarkTokenBucketAllow-12        4,574,689 ops    255.6 ns/op    0 allocs/op
BenchmarkLeakyBucketAllow-12        5,218,902 ops    208.3 ns/op    0 allocs/op
BenchmarkSlidingWindowAllow-12      6,476,462 ops    198.6 ns/op    0 allocs/op
Enter fullscreen mode Exit fullscreen mode

200-260 nanoseconds per operation with zero allocations. That's fast enough for the most demanding applications.

Production-Ready Features

Context Support

All blocking operations support context cancellation:

ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
defer cancel()

if err := limiter.Wait(ctx); err != nil {
    // Handle timeout or cancellation
}
Enter fullscreen mode Exit fullscreen mode

Reservation API

For advanced use cases, you can reserve tokens and schedule work:

waitDuration := limiter.Reserve()
if waitDuration > 0 {
    // Schedule for later
    time.AfterFunc(waitDuration, processRequest)
} else {
    // Process immediately
    processRequest()
}
Enter fullscreen mode Exit fullscreen mode

Thread-Safe

All operations are thread-safe. You can safely use the same limiter instance across multiple goroutines.

Algorithm Comparison

Algorithm Burst Support Smoothing Memory Best For
Token Bucket Yes No Low Public APIs, burst tolerance
Leaky Bucket Queued Yes Medium Downstream protection
Sliding Window No No High Strict enforcement

Implementation Insights

Why Zero Dependencies?

Dependencies are a security and maintenance burden. By keeping kazrl dependency-free:

  • No supply chain attacks via transitive dependencies
  • Faster installation and smaller binaries
  • No version conflicts with your other dependencies
  • Easy to audit (< 2000 lines of code)

Concurrency Design

Each algorithm uses sync.Mutex for thread-safety. While this might seem simple, it's actually the right choice here:

type tokenBucket struct {
    mu         sync.Mutex
    rate       float64
    burst      int
    tokens     float64
    lastUpdate time.Time
}
Enter fullscreen mode Exit fullscreen mode

Lock contention is minimal because:

  1. Operations are extremely fast (< 300ns)
  2. The critical section is tiny (just token math)
  3. Per-client limiting distributes the load

For most applications, you'll never see contention. If you're handling millions of requests per second per endpoint, you might need a distributed solution anyway.

Memory Management

The library is designed to minimize allocations:

// No allocations in the hot path
func (tb *tokenBucket) Allow() bool {
    tb.mu.Lock()
    defer tb.mu.Unlock()

    now := time.Now()
    tb.refillTokens(now) // Pure math, no allocations

    if tb.tokens >= 1.0 {
        tb.tokens -= 1.0
        return true
    }
    return false
}
Enter fullscreen mode Exit fullscreen mode

The only allocations happen when creating new per-client limiters, which is infrequent.

Real-World Example

Here's a complete example of a production-ready API server:

package main

import (
    "encoding/json"
    "net/http"

    "github.com/Makennsky/kazrl"
    "github.com/Makennsky/kazrl/middleware"
)

func main() {
    // Global rate limit: 10,000 req/s
    globalLimiter := kazrl.NewTokenBucket(10000, 20000)
    globalMiddleware := middleware.HTTP(globalLimiter)

    // Per-IP rate limit: 100 req/s
    perIPMiddleware := middleware.HTTPWithKeyFunc(
        func() kazrl.RateLimiter {
            return kazrl.NewTokenBucket(100, 200)
        },
        middleware.KeyByIP,
    )

    // API handler
    apiHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        response := map[string]string{
            "status": "ok",
            "message": "Request processed",
        }
        json.NewEncoder(w).Encode(response)
    })

    // Stack middleware
    http.Handle("/api/", globalMiddleware(perIPMiddleware(apiHandler)))

    http.ListenAndServe(":8080", nil)
}
Enter fullscreen mode Exit fullscreen mode

When to Use Each Algorithm

Token Bucket - Default choice

  • Public APIs
  • User-facing endpoints
  • Services that need burst capacity
  • Not suitable when strict rate enforcement is needed

Leaky Bucket - Traffic shaping

  • Protecting slow downstream services
  • Database query rate limiting
  • Smoothing traffic spikes
  • Not suitable when you need to allow bursts

Sliding Window - Strict enforcement

  • Billing/metered APIs
  • When accuracy is critical
  • Preventing gaming fixed windows
  • Not suitable when you need burst capacity

Future Plans

Ideas I'm considering:

  • Distributed rate limiting (Redis backend)
  • Prometheus metrics integration
  • Response header injection (X-RateLimit-*)
  • Dynamic rate adjustment based on system load
  • gRPC interceptors

What would you find useful? Let me know in the comments!

Resources

Try It Out

Give kazrl a try in your next project! It's production-ready, battle-tested, and takes 2 minutes to integrate.

go get github.com/Makennsky/kazrl
Enter fullscreen mode Exit fullscreen mode

If you find it useful, please star the repo on GitHub!


What rate limiting challenges have you faced? Share your experiences in the comments below!


Built in Kazakhstan

Top comments (0)