DEV Community

Cover image for Throttle Smart, Scale Safe β€” Complete Guide to Rate Limiting β€” Architecture Series: Part 6
MUHAMMAD USMAN AWAN
MUHAMMAD USMAN AWAN

Posted on

Throttle Smart, Scale Safe β€” Complete Guide to Rate Limiting β€” Architecture Series: Part 6

🚦 Rate Limiting / Throttling β€” Complete Explanation

In modern backend systems, thousands or even millions of requests hit your APIs every day. Not every request is friendly β€” some might be spam, brute-force attempts, or even DDoS attacks. To ensure your server stays stable, secure, and fair for all users, backend engineers implement Rate Limiting, also known as API Throttling.
This mechanism determines how many requests a client is allowed to make within a specific time window, preventing abuse and ensuring system reliability.


βœ… 1. What is Rate Limiting?

Rate limiting (a.k.a. throttling) is a mechanism to control how many requests a client can make to a server within a specific time period.

Example:
β€œMax 100 requests per minute per user.”

If a client exceeds the limit β†’ you block or delay the request.


❓ Why do we need it?

Rate limiting is mainly used to:

πŸ”’ 1. Protect APIs from abuse

  • Bots sending too many requests
  • Brute-force login attempts
  • Spammers trying to overload your API

πŸ›‘οΈ 2. Prevent DDoS attacks

Even if attackers hit your server hard, rate limiting ensures damage is reduced.

βš™οΈ 3. Fair usage

Multiple users get a fair share of server resources.

πŸ’Έ 4. Reduce server cost

Less unnecessary load β†’ cheaper infra.


🌍 Real-World Examples

βœ” GitHub API

  • Allows 5000 requests/hour per token
  • Ensures fair usage for all developers.

βœ” Cloudflare

  • Blocks excessive calls from bad IPs automatically.

βœ” Instagram / Facebook APIs

  • Strict rate limits to prevent bots/scrapers.

🧱 2. Types of Rate Limiting

There are many styles of implementing rate limits:

1) Fixed Window Counter (Simple & popular)

Rule: Allow X requests per fixed time window.

Example:
100 requests per minute

Working:

  • Counter resets every minute.

❌ Weakness:
If the user sends 100 requests at the end of a minute + 100 at start of next minute β†’ burst allowed (200 requests).

2) Sliding Window (Smarter)

Keeps track of the rolling last N seconds.

Example:
Last 60 seconds β†’ only 100 requests allowed.

βœ” Prevents burst problem
βœ” More accurate and fair

3) Token Bucket (Most used in production)

A bucket is filled at a fixed rate with tokens.
Each request consumes 1 token.

If bucket empty β†’ request denied or delayed.

βœ” Allows small bursts
βœ” Smooth flow
βœ” Used by AWS, Google Cloud, Nginx

4) Leaky Bucket

Requests enter a bucket and leak at a constant rate.

βœ” Very stable output
βœ” Perfect for smoothing traffic

πŸ—‚οΈ 3. What to rate limit ON?

Common ways:

βœ” per IP

Useful for public APIs.

βœ” per API key

Best for developer APIs.

βœ” per user

For authenticated systems.

βœ” per route

For example:

  • /login β†’ strict limits
  • /products β†’ relaxed limits

βœ” per device

Mobile apps track device ID.

πŸ§ͺ 4. Implementing Rate Limiting in Node.js (Express)

πŸ“Œ Install the library

npm install express-rate-limit
Enter fullscreen mode Exit fullscreen mode

πŸ“Œ Basic Middleware

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 1 * 60 * 1000, // 1 minute
  max: 100, // limit each IP to 100 requests per window
  message: "Too many requests, please try again later."
});

app.use(limiter);
Enter fullscreen mode Exit fullscreen mode

Now all routes are limited to 100 req/min per IP.


🎯 Per Route Limit

app.post('/login', rateLimit({
  windowMs: 60 * 1000,
  max: 5, // only 5 login attempts per minute
}), loginController);
Enter fullscreen mode Exit fullscreen mode

🎯 Custom Handler

const limiter = rateLimit({
  windowMs: 60000,
  max: 50,
  handler: (req, res) => {
    res.status(429).json({
      success: false,
      message: "Rate limit exceeded, chill bro!"
    });
  }
});
Enter fullscreen mode Exit fullscreen mode

🧡 5. Distributed Rate Limiting (Redis)

Local memory works only on 1 server.

If you have multiple servers with load balancer β†’ you need shared storage.

Most common solution:

Use Redis as a central counter.

Popular libs:

  • rate-limiter-flexible
  • redis-rate-limiter

Benefits:
βœ” Works across multiple nodes
βœ” Very fast
βœ” Production-grade


βš™οΈ 6. How Big Companies Implement It

🌐 API Gateways

  • AWS API Gateway
  • Kong
  • NGINX
  • Cloudflare Workers

These use algorithms like:

  • Token Bucket
  • Sliding Window
  • Leaky Bucket

πŸ›‘ When limit hits:

  • Return 429 Too Many Requests

πŸ“Š 7. Response Codes for Rate Limiting

429 β€” Too Many Requests

This is the official code for throttling.

Headers returned (optional):

  • Retry-After: 30
  • X-RateLimit-Limit: 100
  • X-RateLimit-Remaining: 0
  • X-RateLimit-Reset: 17000000

🧠 8. Best Practices

βœ” Use stricter limits on sensitive routes:

  • /login
  • /password-reset

βœ” Allow small bursts (Token Bucket)
βœ” Add Redis for multiple servers
βœ” Return proper headers
βœ” Use exponential backoff retry
βœ” Block abusive IPs automatically
βœ” Log and monitor rate-limit hits


πŸŽ‰ Final Summary

Topic Explanation
What Controls number of requests within time window
Why Protect from abuse, DDoS, ensure fair usage
Types Fixed window, sliding window, token bucket, leaky bucket
Where IP, user, API key, route
Node.js express-rate-limit or Redis solutions
Real world GitHub, Cloudflare, AWS

Rate limiting is one of the most essential backend security techniques and should always be included in production-grade APIs. It protects your infrastructure, ensures fair usage, keeps costs down, and significantly reduces the risk of targeted attacks.

Below is a table of your previous detailed JS / backend topics (for quick revision), so your learning stays connected and structured.

πŸ“˜ Architecture Series – Index

Top comments (0)