Building a Production Rate Limiter from Scratch in Go
Every API needs rate limiting. Most developers reach for a library or cloud service, but understanding the algorithms lets you pick the right one and debug it when things go wrong. Here's how to build four different rate limiters in Go — from simple to production-grade.
The Four Algorithms
| Algorithm | Behavior | Best For |
|---|---|---|
| Fixed Window | Count requests per time window | Simple API limits |
| Sliding Window | Weighted count across windows | Smoother traffic shaping |
| Token Bucket | Tokens replenish at fixed rate | Burst-tolerant APIs |
| Sliding Window Log | Exact per-request tracking | Strict compliance |
Shared Interface
// limiter.go
package ratelimit
import "time"
type Result struct {
Allowed bool
Remaining int
ResetAt time.Time
RetryAt time.Time // Zero if allowed
}
type Limiter interface {
Allow(key string) Result
}
1. Fixed Window Counter
Simplest approach: count requests in fixed time intervals (e.g., 100 requests per minute). At the boundary of each window, the count resets.
// fixed_window.go
package ratelimit
import (
"sync"
"time"
)
type FixedWindow struct {
limit int
window time.Duration
counters map[string]*counter
mu sync.Mutex
}
type counter struct {
count int
windowStart time.Time
}
func NewFixedWindow(limit int, window time.Duration) *FixedWindow {
return &FixedWindow{
limit: limit,
window: window,
counters: make(map[string]*counter),
}
}
func (fw *FixedWindow) Allow(key string) Result {
fw.mu.Lock()
defer fw.mu.Unlock()
now := time.Now()
windowStart := now.Truncate(fw.window)
c, exists := fw.counters[key]
if !exists || c.windowStart != windowStart {
c = &counter{count: 0, windowStart: windowStart}
fw.counters[key] = c
}
resetAt := windowStart.Add(fw.window)
if c.count >= fw.limit {
return Result{
Allowed: false,
Remaining: 0,
ResetAt: resetAt,
RetryAt: resetAt,
}
}
c.count++
return Result{
Allowed: true,
Remaining: fw.limit - c.count,
ResetAt: resetAt,
}
}
Problem: At the boundary between two windows, a client can send limit * 2 requests (e.g., 100 at 12:00:59 and 100 at 12:01:00). The sliding window fixes this.
2. Sliding Window Counter
Approximates a sliding window by weighting the previous window's count based on how far into the current window we are.
// sliding_window.go
package ratelimit
import (
"sync"
"time"
)
type SlidingWindow struct {
limit int
window time.Duration
counters map[string]*windowPair
mu sync.Mutex
}
type windowPair struct {
prevCount int
prevStart time.Time
currentCount int
currentStart time.Time
}
func NewSlidingWindow(limit int, window time.Duration) *SlidingWindow {
return &SlidingWindow{
limit: limit,
window: window,
counters: make(map[string]*windowPair),
}
}
func (sw *SlidingWindow) Allow(key string) Result {
sw.mu.Lock()
defer sw.mu.Unlock()
now := time.Now()
currentWindowStart := now.Truncate(sw.window)
wp, exists := sw.counters[key]
if !exists {
wp = &windowPair{currentStart: currentWindowStart}
sw.counters[key] = wp
}
// Rotate windows if needed
if currentWindowStart != wp.currentStart {
if currentWindowStart.Sub(wp.currentStart) == sw.window {
wp.prevCount = wp.currentCount
wp.prevStart = wp.currentStart
} else {
wp.prevCount = 0
}
wp.currentCount = 0
wp.currentStart = currentWindowStart
}
// Weight: how far through the current window are we?
elapsed := now.Sub(currentWindowStart)
weight := float64(sw.window-elapsed) / float64(sw.window)
estimated := int(float64(wp.prevCount)*weight) + wp.currentCount
resetAt := currentWindowStart.Add(sw.window)
if estimated >= sw.limit {
return Result{
Allowed: false,
Remaining: 0,
ResetAt: resetAt,
RetryAt: now.Add(time.Duration(float64(sw.window) * (1 - weight))),
}
}
wp.currentCount++
return Result{
Allowed: true,
Remaining: sw.limit - estimated - 1,
ResetAt: resetAt,
}
}
This uses only two counters per key — O(1) memory regardless of request volume.
3. Token Bucket
Tokens accumulate at a fixed rate up to a maximum. Each request consumes one token. Allows bursts up to the bucket size while maintaining a long-term average rate.
// token_bucket.go
package ratelimit
import (
"sync"
"time"
)
type TokenBucket struct {
rate float64 // tokens per second
bucketSize int
buckets map[string]*bucket
mu sync.Mutex
}
type bucket struct {
tokens float64
lastFill time.Time
}
func NewTokenBucket(ratePerSecond float64, bucketSize int) *TokenBucket {
return &TokenBucket{
rate: ratePerSecond,
bucketSize: bucketSize,
buckets: make(map[string]*bucket),
}
}
func (tb *TokenBucket) Allow(key string) Result {
tb.mu.Lock()
defer tb.mu.Unlock()
now := time.Now()
b, exists := tb.buckets[key]
if !exists {
b = &bucket{tokens: float64(tb.bucketSize), lastFill: now}
tb.buckets[key] = b
}
// Add tokens based on elapsed time
elapsed := now.Sub(b.lastFill).Seconds()
b.tokens += elapsed * tb.rate
if b.tokens > float64(tb.bucketSize) {
b.tokens = float64(tb.bucketSize)
}
b.lastFill = now
if b.tokens < 1 {
// Calculate when next token arrives
waitSeconds := (1 - b.tokens) / tb.rate
return Result{
Allowed: false,
Remaining: 0,
RetryAt: now.Add(time.Duration(waitSeconds * float64(time.Second))),
}
}
b.tokens--
return Result{
Allowed: true,
Remaining: int(b.tokens),
}
}
Token bucket is what most cloud APIs use (AWS, Stripe, GitHub). It naturally handles bursty traffic.
4. HTTP Middleware
// middleware.go
package ratelimit
import (
"encoding/json"
"fmt"
"net/http"
"strconv"
"time"
)
func Middleware(limiter Limiter, keyFunc func(r *http.Request) string) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
key := keyFunc(r)
result := limiter.Allow(key)
// Always set rate limit headers (draft RFC 7231 extension)
w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(result.Remaining))
if !result.ResetAt.IsZero() {
w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(result.ResetAt.Unix(), 10))
}
if !result.Allowed {
w.Header().Set("Retry-After", fmt.Sprintf("%.0f", time.Until(result.RetryAt).Seconds()))
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusTooManyRequests)
json.NewEncoder(w).Encode(map[string]any{
"error": "rate limit exceeded",
"retry_after": time.Until(result.RetryAt).Seconds(),
})
return
}
next.ServeHTTP(w, r)
})
}
}
// Common key functions
func ByIP(r *http.Request) string {
// Check X-Forwarded-For first (when behind a reverse proxy)
if xff := r.Header.Get("X-Forwarded-For"); xff != "" {
return "ip:" + xff
}
return "ip:" + r.RemoteAddr
}
func ByAPIKey(r *http.Request) string {
key := r.Header.Get("X-API-Key")
if key == "" {
key = r.URL.Query().Get("api_key")
}
if key == "" {
return ByIP(r) // Fallback to IP
}
return "key:" + key
}
Usage
// main.go
package main
import (
"fmt"
"log"
"net/http"
"yourmod/ratelimit"
)
func main() {
// 100 requests per minute per IP
limiter := ratelimit.NewSlidingWindow(100, time.Minute)
mux := http.NewServeMux()
mux.HandleFunc("/api/data", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, `{"status": "ok"}`)
})
// Apply rate limiting
handler := ratelimit.Middleware(limiter, ratelimit.ByIP)(mux)
log.Println("Server starting on :8080")
log.Fatal(http.ListenAndServe(":8080", handler))
}
Distributed Rate Limiting with Redis
The in-memory implementations above work for single servers. For distributed systems, use Redis:
// redis_limiter.go
package ratelimit
import (
"context"
"time"
"github.com/redis/go-redis/v9"
)
type RedisLimiter struct {
client *redis.Client
limit int
window time.Duration
}
func NewRedisLimiter(client *redis.Client, limit int, window time.Duration) *RedisLimiter {
return &RedisLimiter{client: client, limit: limit, window: window}
}
// Lua script ensures atomic increment + expiry check
var slidingWindowScript = redis.NewScript(`
local key = KEYS[1]
local limit = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
-- Remove expired entries
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
-- Count current entries
local count = redis.call('ZCARD', key)
if count < limit then
-- Add this request
redis.call('ZADD', key, now, now .. '-' .. math.random(1000000))
redis.call('PEXPIRE', key, window)
return {1, limit - count - 1}
end
return {0, 0}
`)
func (rl *RedisLimiter) Allow(key string) Result {
ctx := context.Background()
now := time.Now()
result, err := slidingWindowScript.Run(ctx, rl.client,
[]string{"ratelimit:" + key},
rl.limit,
rl.window.Milliseconds(),
now.UnixMilli(),
).Int64Slice()
if err != nil {
// On Redis failure, allow the request (fail open)
return Result{Allowed: true, Remaining: rl.limit}
}
return Result{
Allowed: result[0] == 1,
Remaining: int(result[1]),
RetryAt: now.Add(rl.window),
}
}
Using a Lua script makes the check-and-increment atomic — no race conditions even under high concurrency.
Choosing the Right Algorithm
- Fixed Window: Simplest. Use when exact precision at boundaries doesn't matter (internal APIs, development).
- Sliding Window Counter: Best default. Low memory, smooth limiting, good enough for most production APIs.
- Token Bucket: When you want to allow bursts. Good for user-facing APIs where occasional spikes are expected.
- Sliding Window Log (Redis sorted set): When you need exact counting. Higher memory usage but zero approximation error.
Memory Cleanup
For in-memory limiters, run periodic cleanup:
func (fw *FixedWindow) Cleanup() {
fw.mu.Lock()
defer fw.mu.Unlock()
cutoff := time.Now().Add(-fw.window * 2)
for key, c := range fw.counters {
if c.windowStart.Before(cutoff) {
delete(fw.counters, key)
}
}
}
// Run every minute
go func() {
ticker := time.NewTicker(time.Minute)
for range ticker.C {
limiter.Cleanup()
}
}()
Conclusion
Rate limiting isn't complex, but picking the wrong algorithm causes either false rejections (lost revenue) or ineffective protection (outages). Start with sliding window counters for single servers, move to Redis when you scale horizontally. Always set Retry-After headers — good clients will back off automatically.
If this was helpful, you can support my work at ko-fi.com/nopkt ☕
If this article helped you, consider buying me a coffee on Ko-fi! Follow me for more production backend patterns.
Top comments (0)