π¦ Rate Limiting / Throttling β Complete Explanation
In modern backend systems, thousands or even millions of requests hit your APIs every day. Not every request is friendly β some might be spam, brute-force attempts, or even DDoS attacks. To ensure your server stays stable, secure, and fair for all users, backend engineers implement Rate Limiting, also known as API Throttling.
This mechanism determines how many requests a client is allowed to make within a specific time window, preventing abuse and ensuring system reliability.
β 1. What is Rate Limiting?
Rate limiting (a.k.a. throttling) is a mechanism to control how many requests a client can make to a server within a specific time period.
Example:
βMax 100 requests per minute per user.β
If a client exceeds the limit β you block or delay the request.
β Why do we need it?
Rate limiting is mainly used to:
π 1. Protect APIs from abuse
- Bots sending too many requests
- Brute-force login attempts
- Spammers trying to overload your API
π‘οΈ 2. Prevent DDoS attacks
Even if attackers hit your server hard, rate limiting ensures damage is reduced.
βοΈ 3. Fair usage
Multiple users get a fair share of server resources.
πΈ 4. Reduce server cost
Less unnecessary load β cheaper infra.
π Real-World Examples
β GitHub API
- Allows 5000 requests/hour per token
- Ensures fair usage for all developers.
β Cloudflare
- Blocks excessive calls from bad IPs automatically.
β Instagram / Facebook APIs
- Strict rate limits to prevent bots/scrapers.
π§± 2. Types of Rate Limiting
There are many styles of implementing rate limits:
1) Fixed Window Counter (Simple & popular)
Rule: Allow X requests per fixed time window.
Example:
100 requests per minute
Working:
- Counter resets every minute.
β Weakness:
If the user sends 100 requests at the end of a minute + 100 at start of next minute β burst allowed (200 requests).
2) Sliding Window (Smarter)
Keeps track of the rolling last N seconds.
Example:
Last 60 seconds β only 100 requests allowed.
β Prevents burst problem
β More accurate and fair
3) Token Bucket (Most used in production)
A bucket is filled at a fixed rate with tokens.
Each request consumes 1 token.
If bucket empty β request denied or delayed.
β Allows small bursts
β Smooth flow
β Used by AWS, Google Cloud, Nginx
4) Leaky Bucket
Requests enter a bucket and leak at a constant rate.
β Very stable output
β Perfect for smoothing traffic
ποΈ 3. What to rate limit ON?
Common ways:
β per IP
Useful for public APIs.
β per API key
Best for developer APIs.
β per user
For authenticated systems.
β per route
For example:
- /login β strict limits
- /products β relaxed limits
β per device
Mobile apps track device ID.
π§ͺ 4. Implementing Rate Limiting in Node.js (Express)
π Install the library
npm install express-rate-limit
π Basic Middleware
import rateLimit from 'express-rate-limit';
const limiter = rateLimit({
windowMs: 1 * 60 * 1000, // 1 minute
max: 100, // limit each IP to 100 requests per window
message: "Too many requests, please try again later."
});
app.use(limiter);
Now all routes are limited to 100 req/min per IP.
π― Per Route Limit
app.post('/login', rateLimit({
windowMs: 60 * 1000,
max: 5, // only 5 login attempts per minute
}), loginController);
π― Custom Handler
const limiter = rateLimit({
windowMs: 60000,
max: 50,
handler: (req, res) => {
res.status(429).json({
success: false,
message: "Rate limit exceeded, chill bro!"
});
}
});
π§΅ 5. Distributed Rate Limiting (Redis)
Local memory works only on 1 server.
If you have multiple servers with load balancer β you need shared storage.
Most common solution:
Use Redis as a central counter.
Popular libs:
rate-limiter-flexibleredis-rate-limiter
Benefits:
β Works across multiple nodes
β Very fast
β Production-grade
βοΈ 6. How Big Companies Implement It
π API Gateways
- AWS API Gateway
- Kong
- NGINX
- Cloudflare Workers
These use algorithms like:
- Token Bucket
- Sliding Window
- Leaky Bucket
π When limit hits:
- Return 429 Too Many Requests
π 7. Response Codes for Rate Limiting
429 β Too Many Requests
This is the official code for throttling.
Headers returned (optional):
Retry-After: 30X-RateLimit-Limit: 100X-RateLimit-Remaining: 0X-RateLimit-Reset: 17000000
π§ 8. Best Practices
β Use stricter limits on sensitive routes:
/login/password-reset
β Allow small bursts (Token Bucket)
β Add Redis for multiple servers
β Return proper headers
β Use exponential backoff retry
β Block abusive IPs automatically
β Log and monitor rate-limit hits
π Final Summary
| Topic | Explanation |
|---|---|
| What | Controls number of requests within time window |
| Why | Protect from abuse, DDoS, ensure fair usage |
| Types | Fixed window, sliding window, token bucket, leaky bucket |
| Where | IP, user, API key, route |
| Node.js |
express-rate-limit or Redis solutions |
| Real world | GitHub, Cloudflare, AWS |
Rate limiting is one of the most essential backend security techniques and should always be included in production-grade APIs. It protects your infrastructure, ensures fair usage, keeps costs down, and significantly reduces the risk of targeted attacks.
Below is a table of your previous detailed JS / backend topics (for quick revision), so your learning stays connected and structured.
Top comments (0)