DEV Community

Cover image for Throttle Smart, Scale Safe โ€” Complete Guide to Rate Limiting โ€” Architecture Series: Part 6
MUHAMMAD USMAN AWAN
MUHAMMAD USMAN AWAN

Posted on

Throttle Smart, Scale Safe โ€” Complete Guide to Rate Limiting โ€” Architecture Series: Part 6

๐Ÿšฆ Rate Limiting / Throttling โ€” Complete Explanation

In modern backend systems, thousands or even millions of requests hit your APIs every day. Not every request is friendly โ€” some might be spam, brute-force attempts, or even DDoS attacks. To ensure your server stays stable, secure, and fair for all users, backend engineers implement Rate Limiting, also known as API Throttling.
This mechanism determines how many requests a client is allowed to make within a specific time window, preventing abuse and ensuring system reliability.


โœ… 1. What is Rate Limiting?

Rate limiting (a.k.a. throttling) is a mechanism to control how many requests a client can make to a server within a specific time period.

Example:
โ€œMax 100 requests per minute per user.โ€

If a client exceeds the limit โ†’ you block or delay the request.


โ“ Why do we need it?

Rate limiting is mainly used to:

๐Ÿ”’ 1. Protect APIs from abuse

  • Bots sending too many requests
  • Brute-force login attempts
  • Spammers trying to overload your API

๐Ÿ›ก๏ธ 2. Prevent DDoS attacks

Even if attackers hit your server hard, rate limiting ensures damage is reduced.

โš™๏ธ 3. Fair usage

Multiple users get a fair share of server resources.

๐Ÿ’ธ 4. Reduce server cost

Less unnecessary load โ†’ cheaper infra.


๐ŸŒ Real-World Examples

โœ” GitHub API

  • Allows 5000 requests/hour per token
  • Ensures fair usage for all developers.

โœ” Cloudflare

  • Blocks excessive calls from bad IPs automatically.

โœ” Instagram / Facebook APIs

  • Strict rate limits to prevent bots/scrapers.

๐Ÿงฑ 2. Types of Rate Limiting

There are many styles of implementing rate limits:

1) Fixed Window Counter (Simple & popular)

Rule: Allow X requests per fixed time window.

Example:
100 requests per minute

Working:

  • Counter resets every minute.

โŒ Weakness:
If the user sends 100 requests at the end of a minute + 100 at start of next minute โ†’ burst allowed (200 requests).

2) Sliding Window (Smarter)

Keeps track of the rolling last N seconds.

Example:
Last 60 seconds โ†’ only 100 requests allowed.

โœ” Prevents burst problem
โœ” More accurate and fair

3) Token Bucket (Most used in production)

A bucket is filled at a fixed rate with tokens.
Each request consumes 1 token.

If bucket empty โ†’ request denied or delayed.

โœ” Allows small bursts
โœ” Smooth flow
โœ” Used by AWS, Google Cloud, Nginx

4) Leaky Bucket

Requests enter a bucket and leak at a constant rate.

โœ” Very stable output
โœ” Perfect for smoothing traffic

๐Ÿ—‚๏ธ 3. What to rate limit ON?

Common ways:

โœ” per IP

Useful for public APIs.

โœ” per API key

Best for developer APIs.

โœ” per user

For authenticated systems.

โœ” per route

For example:

  • /login โ†’ strict limits
  • /products โ†’ relaxed limits

โœ” per device

Mobile apps track device ID.

๐Ÿงช 4. Implementing Rate Limiting in Node.js (Express)

๐Ÿ“Œ Install the library

npm install express-rate-limit
Enter fullscreen mode Exit fullscreen mode

๐Ÿ“Œ Basic Middleware

import rateLimit from 'express-rate-limit';

const limiter = rateLimit({
  windowMs: 1 * 60 * 1000, // 1 minute
  max: 100, // limit each IP to 100 requests per window
  message: "Too many requests, please try again later."
});

app.use(limiter);
Enter fullscreen mode Exit fullscreen mode

Now all routes are limited to 100 req/min per IP.


๐ŸŽฏ Per Route Limit

app.post('/login', rateLimit({
  windowMs: 60 * 1000,
  max: 5, // only 5 login attempts per minute
}), loginController);
Enter fullscreen mode Exit fullscreen mode

๐ŸŽฏ Custom Handler

const limiter = rateLimit({
  windowMs: 60000,
  max: 50,
  handler: (req, res) => {
    res.status(429).json({
      success: false,
      message: "Rate limit exceeded, chill bro!"
    });
  }
});
Enter fullscreen mode Exit fullscreen mode

๐Ÿงต 5. Distributed Rate Limiting (Redis)

Local memory works only on 1 server.

If you have multiple servers with load balancer โ†’ you need shared storage.

Most common solution:

Use Redis as a central counter.

Popular libs:

  • rate-limiter-flexible
  • redis-rate-limiter

Benefits:
โœ” Works across multiple nodes
โœ” Very fast
โœ” Production-grade


โš™๏ธ 6. How Big Companies Implement It

๐ŸŒ API Gateways

  • AWS API Gateway
  • Kong
  • NGINX
  • Cloudflare Workers

These use algorithms like:

  • Token Bucket
  • Sliding Window
  • Leaky Bucket

๐Ÿ›‘ When limit hits:

  • Return 429 Too Many Requests

๐Ÿ“Š 7. Response Codes for Rate Limiting

429 โ€” Too Many Requests

This is the official code for throttling.

Headers returned (optional):

  • Retry-After: 30
  • X-RateLimit-Limit: 100
  • X-RateLimit-Remaining: 0
  • X-RateLimit-Reset: 17000000

๐Ÿง  8. Best Practices

โœ” Use stricter limits on sensitive routes:

  • /login
  • /password-reset

โœ” Allow small bursts (Token Bucket)
โœ” Add Redis for multiple servers
โœ” Return proper headers
โœ” Use exponential backoff retry
โœ” Block abusive IPs automatically
โœ” Log and monitor rate-limit hits


๐ŸŽ‰ Final Summary

Topic Explanation
What Controls number of requests within time window
Why Protect from abuse, DDoS, ensure fair usage
Types Fixed window, sliding window, token bucket, leaky bucket
Where IP, user, API key, route
Node.js express-rate-limit or Redis solutions
Real world GitHub, Cloudflare, AWS

Rate limiting is one of the most essential backend security techniques and should always be included in production-grade APIs. It protects your infrastructure, ensures fair usage, keeps costs down, and significantly reduces the risk of targeted attacks.

Below is a table of your previous detailed JS / backend topics (for quick revision), so your learning stays connected and structured.

๐Ÿ“˜ Architecture Series โ€“ Index

Top comments (0)