Rate limiting is essential for protecting APIs from abuse, ensuring fair resource allocation, and maintaining system stability. While there are existing solutions, I wanted to build something lightweight, performant, and easy to integrate into any Go project.
Today, I'm sharing kazrl - a zero-dependency rate limiter library that implements three different algorithms and comes with ready-to-use middleware for popular Go web frameworks.
The Problem
Most rate limiting libraries either:
- Come with heavy dependencies
- Support only one algorithm
- Require complex setup for per-client limiting
- Lack middleware integration
I needed something that:
- Has zero external dependencies
- Supports multiple algorithms (Token Bucket, Leaky Bucket, Sliding Window)
- Works with popular frameworks out of the box
- Provides flexible per-client rate limiting
Installation
go get github.com/Makennsky/kazrl
That's it! No transitive dependencies to worry about.
Quick Start
Here's the simplest way to add rate limiting to your HTTP handler:
import (
"net/http"
"github.com/Makennsky/kazrl"
"github.com/Makennsky/kazrl/middleware"
)
func main() {
// Create a rate limiter: 100 requests per second, burst of 200
limiter := kazrl.NewTokenBucket(100, 200)
// Apply middleware
rateLimitMiddleware := middleware.HTTP(limiter)
handler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("Hello, World!"))
})
http.Handle("/api/", rateLimitMiddleware(handler))
http.ListenAndServe(":8080", nil)
}
That's literally 10 lines of code to add rate limiting!
Three Algorithms, One Interface
Different use cases need different strategies. kazrl implements three battle-tested algorithms:
1. Token Bucket
Perfect for APIs that need to allow bursts while maintaining average rate limits.
limiter := kazrl.NewTokenBucket(10, 20)
// 10 requests per second, allows bursts up to 20
Use case: Public APIs, user-facing endpoints
2. Leaky Bucket
Smooths out traffic spikes by processing requests at a constant rate.
limiter := kazrl.NewLeakyBucket(10, 20)
// Processes 10 req/s, queues up to 20
Use case: Protecting downstream services, database queries
3. Sliding Window
Provides the most accurate rate limiting without fixed window edge cases.
limiter := kazrl.NewSlidingWindow(10, 20)
// 10 req/s with a sliding time window
Use case: Strict rate enforcement, billing APIs
All three implement the same interface, so switching is trivial:
type RateLimiter interface {
Allow() bool
AllowN(n int) bool
Wait(ctx context.Context) error
WaitN(ctx context.Context, n int) error
Reserve() time.Duration
ReserveN(n int) time.Duration
}
Per-Client Rate Limiting Made Easy
The real power comes with per-client limiting. Here's how to rate limit by IP address:
rateLimitMiddleware := middleware.HTTPWithKeyFunc(
func() kazrl.RateLimiter {
return kazrl.NewTokenBucket(10, 20) // 10 req/s per IP
},
middleware.KeyByIP, // Built-in IP extractor
)
http.Handle("/api/", rateLimitMiddleware(handler))
Each IP address automatically gets its own rate limiter instance. The library handles X-Forwarded-For and X-Real-IP headers correctly.
Rate Limit by API Key
rateLimitMiddleware := middleware.HTTPWithKeyFunc(
func() kazrl.RateLimiter {
return kazrl.NewTokenBucket(100, 200)
},
middleware.KeyByAPIKey, // Extracts from Authorization header
)
Custom Key Functions
Need something more complex? Write your own key extractor:
customKeyFunc := func(r *http.Request) string {
// Rate limit by IP + endpoint combination
return middleware.KeyByIP(r) + ":" + r.URL.Path
}
rateLimitMiddleware := middleware.HTTPWithKeyFunc(
func() kazrl.RateLimiter {
return kazrl.NewTokenBucket(5, 10)
},
customKeyFunc,
)
Framework Integration
kazrl provides native middleware for popular frameworks:
Gin
r := gin.Default()
limiter := kazrl.NewTokenBucket(100, 200)
r.Use(middleware.Gin(limiter))
Echo
e := echo.New()
limiter := kazrl.NewTokenBucket(100, 200)
e.Use(middleware.Echo(limiter))
Fiber
app := fiber.New()
limiter := kazrl.NewTokenBucket(100, 200)
app.Use(middleware.Fiber(limiter))
Chi
r := chi.NewRouter()
limiter := kazrl.NewTokenBucket(100, 200)
r.Use(middleware.Chi(limiter))
Multi-Layer Rate Limiting
For advanced scenarios, you can stack multiple rate limiters:
// Global limit: 1000 req/s for all clients
globalLimiter := kazrl.NewTokenBucket(1000, 2000)
globalMiddleware := middleware.HTTP(globalLimiter)
// Per-IP limit: 10 req/s per client
perIPMiddleware := middleware.HTTPWithKeyFunc(
func() kazrl.RateLimiter {
return kazrl.NewTokenBucket(10, 20)
},
middleware.KeyByIP,
)
// Stack them!
handler := globalMiddleware(perIPMiddleware(yourHandler))
This protects against both individual abuse and total system overload.
Performance
Benchmarks on a modern system (Intel i7-1355U):
BenchmarkTokenBucketAllow-12 4,574,689 ops 255.6 ns/op 0 allocs/op
BenchmarkLeakyBucketAllow-12 5,218,902 ops 208.3 ns/op 0 allocs/op
BenchmarkSlidingWindowAllow-12 6,476,462 ops 198.6 ns/op 0 allocs/op
200-260 nanoseconds per operation with zero allocations. That's fast enough for the most demanding applications.
Production-Ready Features
Context Support
All blocking operations support context cancellation:
ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
defer cancel()
if err := limiter.Wait(ctx); err != nil {
// Handle timeout or cancellation
}
Reservation API
For advanced use cases, you can reserve tokens and schedule work:
waitDuration := limiter.Reserve()
if waitDuration > 0 {
// Schedule for later
time.AfterFunc(waitDuration, processRequest)
} else {
// Process immediately
processRequest()
}
Thread-Safe
All operations are thread-safe. You can safely use the same limiter instance across multiple goroutines.
Algorithm Comparison
| Algorithm | Burst Support | Smoothing | Memory | Best For |
|---|---|---|---|---|
| Token Bucket | Yes | No | Low | Public APIs, burst tolerance |
| Leaky Bucket | Queued | Yes | Medium | Downstream protection |
| Sliding Window | No | No | High | Strict enforcement |
Implementation Insights
Why Zero Dependencies?
Dependencies are a security and maintenance burden. By keeping kazrl dependency-free:
- No supply chain attacks via transitive dependencies
- Faster installation and smaller binaries
- No version conflicts with your other dependencies
- Easy to audit (< 2000 lines of code)
Concurrency Design
Each algorithm uses sync.Mutex for thread-safety. While this might seem simple, it's actually the right choice here:
type tokenBucket struct {
mu sync.Mutex
rate float64
burst int
tokens float64
lastUpdate time.Time
}
Lock contention is minimal because:
- Operations are extremely fast (< 300ns)
- The critical section is tiny (just token math)
- Per-client limiting distributes the load
For most applications, you'll never see contention. If you're handling millions of requests per second per endpoint, you might need a distributed solution anyway.
Memory Management
The library is designed to minimize allocations:
// No allocations in the hot path
func (tb *tokenBucket) Allow() bool {
tb.mu.Lock()
defer tb.mu.Unlock()
now := time.Now()
tb.refillTokens(now) // Pure math, no allocations
if tb.tokens >= 1.0 {
tb.tokens -= 1.0
return true
}
return false
}
The only allocations happen when creating new per-client limiters, which is infrequent.
Real-World Example
Here's a complete example of a production-ready API server:
package main
import (
"encoding/json"
"net/http"
"github.com/Makennsky/kazrl"
"github.com/Makennsky/kazrl/middleware"
)
func main() {
// Global rate limit: 10,000 req/s
globalLimiter := kazrl.NewTokenBucket(10000, 20000)
globalMiddleware := middleware.HTTP(globalLimiter)
// Per-IP rate limit: 100 req/s
perIPMiddleware := middleware.HTTPWithKeyFunc(
func() kazrl.RateLimiter {
return kazrl.NewTokenBucket(100, 200)
},
middleware.KeyByIP,
)
// API handler
apiHandler := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
response := map[string]string{
"status": "ok",
"message": "Request processed",
}
json.NewEncoder(w).Encode(response)
})
// Stack middleware
http.Handle("/api/", globalMiddleware(perIPMiddleware(apiHandler)))
http.ListenAndServe(":8080", nil)
}
When to Use Each Algorithm
Token Bucket - Default choice
- Public APIs
- User-facing endpoints
- Services that need burst capacity
- Not suitable when strict rate enforcement is needed
Leaky Bucket - Traffic shaping
- Protecting slow downstream services
- Database query rate limiting
- Smoothing traffic spikes
- Not suitable when you need to allow bursts
Sliding Window - Strict enforcement
- Billing/metered APIs
- When accuracy is critical
- Preventing gaming fixed windows
- Not suitable when you need burst capacity
Future Plans
Ideas I'm considering:
- Distributed rate limiting (Redis backend)
- Prometheus metrics integration
- Response header injection (X-RateLimit-*)
- Dynamic rate adjustment based on system load
- gRPC interceptors
What would you find useful? Let me know in the comments!
Resources
- GitHub: https://github.com/Makennsky/kazrl
- Documentation: Full examples in README
-
Benchmarks: Run
go test -bench=. -benchmem
Try It Out
Give kazrl a try in your next project! It's production-ready, battle-tested, and takes 2 minutes to integrate.
go get github.com/Makennsky/kazrl
If you find it useful, please star the repo on GitHub!
What rate limiting challenges have you faced? Share your experiences in the comments below!
Built in Kazakhstan
Top comments (0)