What happens if one user sends 10,000 requests per second to your API?
Your system crashes.
Unless you have a rate limiter.
What is Rate Limiting?
Rate limiting controls how many requests a user can make in a given time.
Example:
100 requests per minute per user
If the limit is exceeded:
- requests are blocked
- or delayed
Why It Matters
Without rate limiting:
- APIs get overloaded
- systems crash under traffic spikes
- abuse (spam, brute force) increases
Common Approaches
1. Fixed Window
Limit requests in a fixed time window.
Example:
100 requests per minute
Problem:
Burst traffic at window edges
2. Sliding Window
Tracks requests over a rolling time window.
Better accuracy than fixed window
Slightly more complex
3. Token Bucket
Tokens are added at a fixed rate.
Each request consumes a token.
If no tokens:
request is rejected
Best for handling bursts
4. Leaky Bucket
Requests are processed at a constant rate.
Extra requests are queued or dropped.
Good for smoothing traffic
How It’s Implemented
Typical setup:
- API Gateway or middleware
- Redis for storing counters
- Key = user/IP
- Value = request count
Redis is used because:
- fast
- supports atomic operations
- works well at scale
Trade-offs
- accuracy vs performance
- memory usage
- handling bursts
No single approach is perfect.
Where It’s Used
- login attempts
- public APIs
- payment systems
- search endpoints
Rate limiting is not just about blocking users.
It’s about protecting your system from overload.
A small feature that prevents big failures.

Top comments (0)