Young Gao

Posted on Mar 21

Building a Production Rate Limiter from Scratch in Go (2026 Guide)

#go #api #ratelimiting #backend

Every API needs rate limiting. Most developers reach for a library or cloud service, but understanding the algorithms lets you pick the right one and debug it when things go wrong. Here's how to build four different rate limiters in Go — from simple to production-grade.

The Four Algorithms

Algorithm	Behavior	Best For
Fixed Window	Count requests per time window	Simple API limits
Sliding Window	Weighted count across windows	Smoother traffic shaping
Token Bucket	Tokens replenish at fixed rate	Burst-tolerant APIs
Sliding Window Log	Exact per-request tracking	Strict compliance

Shared Interface

// limiter.go
package ratelimit

import "time"

type Result struct {
    Allowed   bool
    Remaining int
    ResetAt   time.Time
    RetryAt   time.Time // Zero if allowed
}

type Limiter interface {
    Allow(key string) Result
}

1. Fixed Window Counter

Simplest approach: count requests in fixed time intervals (e.g., 100 requests per minute). At the boundary of each window, the count resets.

// fixed_window.go
package ratelimit

import (
    "sync"
    "time"
)

type FixedWindow struct {
    limit    int
    window   time.Duration
    counters map[string]*counter
    mu       sync.Mutex
}

type counter struct {
    count    int
    windowStart time.Time
}

func NewFixedWindow(limit int, window time.Duration) *FixedWindow {
    return &FixedWindow{
        limit:    limit,
        window:   window,
        counters: make(map[string]*counter),
    }
}

func (fw *FixedWindow) Allow(key string) Result {
    fw.mu.Lock()
    defer fw.mu.Unlock()

    now := time.Now()
    windowStart := now.Truncate(fw.window)

    c, exists := fw.counters[key]
    if !exists || c.windowStart != windowStart {
        c = &counter{count: 0, windowStart: windowStart}
        fw.counters[key] = c
    }

    resetAt := windowStart.Add(fw.window)

    if c.count >= fw.limit {
        return Result{
            Allowed:   false,
            Remaining: 0,
            ResetAt:   resetAt,
            RetryAt:   resetAt,
        }
    }

    c.count++
    return Result{
        Allowed:   true,
        Remaining: fw.limit - c.count,
        ResetAt:   resetAt,
    }
}

Problem: At the boundary between two windows, a client can send limit * 2 requests (e.g., 100 at 12:00:59 and 100 at 12:01:00). The sliding window fixes this.

2. Sliding Window Counter

Approximates a sliding window by weighting the previous window's count based on how far into the current window we are.

// sliding_window.go
package ratelimit

import (
    "sync"
    "time"
)

type SlidingWindow struct {
    limit    int
    window   time.Duration
    counters map[string]*windowPair
    mu       sync.Mutex
}

type windowPair struct {
    prevCount    int
    prevStart    time.Time
    currentCount int
    currentStart time.Time
}

func NewSlidingWindow(limit int, window time.Duration) *SlidingWindow {
    return &SlidingWindow{
        limit:    limit,
        window:   window,
        counters: make(map[string]*windowPair),
    }
}

func (sw *SlidingWindow) Allow(key string) Result {
    sw.mu.Lock()
    defer sw.mu.Unlock()

    now := time.Now()
    currentWindowStart := now.Truncate(sw.window)

    wp, exists := sw.counters[key]
    if !exists {
        wp = &windowPair{currentStart: currentWindowStart}
        sw.counters[key] = wp
    }

    // Rotate windows if needed
    if currentWindowStart != wp.currentStart {
        if currentWindowStart.Sub(wp.currentStart) == sw.window {
            wp.prevCount = wp.currentCount
            wp.prevStart = wp.currentStart
        } else {
            wp.prevCount = 0
        }
        wp.currentCount = 0
        wp.currentStart = currentWindowStart
    }

    // Weight: how far through the current window are we?
    elapsed := now.Sub(currentWindowStart)
    weight := float64(sw.window-elapsed) / float64(sw.window)
    estimated := int(float64(wp.prevCount)*weight) + wp.currentCount

    resetAt := currentWindowStart.Add(sw.window)

    if estimated >= sw.limit {
        return Result{
            Allowed:   false,
            Remaining: 0,
            ResetAt:   resetAt,
            RetryAt:   now.Add(time.Duration(float64(sw.window) * (1 - weight))),
        }
    }

    wp.currentCount++
    return Result{
        Allowed:   true,
        Remaining: sw.limit - estimated - 1,
        ResetAt:   resetAt,
    }
}

This uses only two counters per key — O(1) memory regardless of request volume.

3. Token Bucket

Tokens accumulate at a fixed rate up to a maximum. Each request consumes one token. Allows bursts up to the bucket size while maintaining a long-term average rate.

// token_bucket.go
package ratelimit

import (
    "sync"
    "time"
)

type TokenBucket struct {
    rate       float64 // tokens per second
    bucketSize int
    buckets    map[string]*bucket
    mu         sync.Mutex
}

type bucket struct {
    tokens   float64
    lastFill time.Time
}

func NewTokenBucket(ratePerSecond float64, bucketSize int) *TokenBucket {
    return &TokenBucket{
        rate:       ratePerSecond,
        bucketSize: bucketSize,
        buckets:    make(map[string]*bucket),
    }
}

func (tb *TokenBucket) Allow(key string) Result {
    tb.mu.Lock()
    defer tb.mu.Unlock()

    now := time.Now()

    b, exists := tb.buckets[key]
    if !exists {
        b = &bucket{tokens: float64(tb.bucketSize), lastFill: now}
        tb.buckets[key] = b
    }

    // Add tokens based on elapsed time
    elapsed := now.Sub(b.lastFill).Seconds()
    b.tokens += elapsed * tb.rate
    if b.tokens > float64(tb.bucketSize) {
        b.tokens = float64(tb.bucketSize)
    }
    b.lastFill = now

    if b.tokens < 1 {
        // Calculate when next token arrives
        waitSeconds := (1 - b.tokens) / tb.rate
        return Result{
            Allowed:   false,
            Remaining: 0,
            RetryAt:   now.Add(time.Duration(waitSeconds * float64(time.Second))),
        }
    }

    b.tokens--
    return Result{
        Allowed:   true,
        Remaining: int(b.tokens),
    }
}

Token bucket is what most cloud APIs use (AWS, Stripe, GitHub). It naturally handles bursty traffic.

4. HTTP Middleware

// middleware.go
package ratelimit

import (
    "encoding/json"
    "fmt"
    "net/http"
    "strconv"
    "time"
)

func Middleware(limiter Limiter, keyFunc func(r *http.Request) string) func(http.Handler) http.Handler {
    return func(next http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
            key := keyFunc(r)
            result := limiter.Allow(key)

            // Always set rate limit headers (draft RFC 7231 extension)
            w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(result.Remaining))
            if !result.ResetAt.IsZero() {
                w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(result.ResetAt.Unix(), 10))
            }

            if !result.Allowed {
                w.Header().Set("Retry-After", fmt.Sprintf("%.0f", time.Until(result.RetryAt).Seconds()))
                w.Header().Set("Content-Type", "application/json")
                w.WriteHeader(http.StatusTooManyRequests)
                json.NewEncoder(w).Encode(map[string]any{
                    "error":      "rate limit exceeded",
                    "retry_after": time.Until(result.RetryAt).Seconds(),
                })
                return
            }

            next.ServeHTTP(w, r)
        })
    }
}

// Common key functions
func ByIP(r *http.Request) string {
    // Check X-Forwarded-For first (when behind a reverse proxy)
    if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
        return "ip:" + xff
    }
    return "ip:" + r.RemoteAddr
}

func ByAPIKey(r *http.Request) string {
    key := r.Header.Get("X-API-Key")
    if key == "" {
        key = r.URL.Query().Get("api_key")
    }
    if key == "" {
        return ByIP(r) // Fallback to IP
    }
    return "key:" + key
}

Usage

// main.go
package main

import (
    "fmt"
    "log"
    "net/http"

    "yourmod/ratelimit"
)

func main() {
    // 100 requests per minute per IP
    limiter := ratelimit.NewSlidingWindow(100, time.Minute)

    mux := http.NewServeMux()
    mux.HandleFunc("/api/data", func(w http.ResponseWriter, r *http.Request) {
        fmt.Fprintf(w, `{"status": "ok"}`)
    })

    // Apply rate limiting
    handler := ratelimit.Middleware(limiter, ratelimit.ByIP)(mux)

    log.Println("Server starting on :8080")
    log.Fatal(http.ListenAndServe(":8080", handler))
}

Distributed Rate Limiting with Redis

The in-memory implementations above work for single servers. For distributed systems, use Redis:

// redis_limiter.go
package ratelimit

import (
    "context"
    "time"

    "github.com/redis/go-redis/v9"
)

type RedisLimiter struct {
    client *redis.Client
    limit  int
    window time.Duration
}

func NewRedisLimiter(client *redis.Client, limit int, window time.Duration) *RedisLimiter {
    return &RedisLimiter{client: client, limit: limit, window: window}
}

// Lua script ensures atomic increment + expiry check
var slidingWindowScript = redis.NewScript(`
    local key = KEYS[1]
    local limit = tonumber(ARGV[1])
    local window = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])

    -- Remove expired entries
    redis.call('ZREMRANGEBYSCORE', key, 0, now - window)

    -- Count current entries
    local count = redis.call('ZCARD', key)

    if count < limit then
        -- Add this request
        redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
        redis.call('PEXPIRE', key, window)
        return {1, limit - count - 1}
    end

    return {0, 0}
`)

func (rl *RedisLimiter) Allow(key string) Result {
    ctx := context.Background()
    now := time.Now()

    result, err := slidingWindowScript.Run(ctx, rl.client,
        []string{"ratelimit:" + key},
        rl.limit,
        rl.window.Milliseconds(),
        now.UnixMilli(),
    ).Int64Slice()

    if err != nil {
        // On Redis failure, allow the request (fail open)
        return Result{Allowed: true, Remaining: rl.limit}
    }

    return Result{
        Allowed:   result[0] == 1,
        Remaining: int(result[1]),
        RetryAt:   now.Add(rl.window),
    }
}

Using a Lua script makes the check-and-increment atomic — no race conditions even under high concurrency.

Choosing the Right Algorithm

Fixed Window: Simplest. Use when exact precision at boundaries doesn't matter (internal APIs, development).
Sliding Window Counter: Best default. Low memory, smooth limiting, good enough for most production APIs.
Token Bucket: When you want to allow bursts. Good for user-facing APIs where occasional spikes are expected.
Sliding Window Log (Redis sorted set): When you need exact counting. Higher memory usage but zero approximation error.

Memory Cleanup

For in-memory limiters, run periodic cleanup:

func (fw *FixedWindow) Cleanup() {
    fw.mu.Lock()
    defer fw.mu.Unlock()

    cutoff := time.Now().Add(-fw.window * 2)
    for key, c := range fw.counters {
        if c.windowStart.Before(cutoff) {
            delete(fw.counters, key)
        }
    }
}

// Run every minute
go func() {
    ticker := time.NewTicker(time.Minute)
    for range ticker.C {
        limiter.Cleanup()
    }
}()

Conclusion

Rate limiting isn't complex, but picking the wrong algorithm causes either false rejections (lost revenue) or ineffective protection (outages). Start with sliding window counters for single servers, move to Redis when you scale horizontally. Always set Retry-After headers — good clients will back off automatically.

If this was helpful, you can support my work at ko-fi.com/nopkt ☕

If this article helped you, consider buying me a coffee on Ko-fi! Follow me for more production backend patterns.

Follow me for more production-ready backend content!

If this helped you, buy me a coffee on Ko-fi!

DEV Community