Fazal Mansuri

Posted on Aug 24 • Edited on Aug 30

🛡️ Rate Limiting in Web Applications

#ratelimit #softwaredevelopment #security #learning

In today’s API-driven world, applications face thousands—or even millions—of requests every day. While high traffic is great, it can also create challenges:

🔒 Security risks (brute force, DDoS attacks)
⚖️ Fair usage enforcement (avoid abuse of free tiers)
🚀 Performance stability (prevent one user from hogging resources)

This is where Rate Limiting comes in.

Rate limiting ensures a user, IP or client can only make a defined number of requests within a specific time window. It’s a cornerstone of modern, scalable APIs.

🧩 Why Do We Need Rate Limiting?

🔐 Security → Protect against brute force login attempts, API scraping, and DDoS attacks.
⚖️ Fair Usage → Prevent abuse of APIs, especially for freemium services.
🚀 Performance → Ensure backend resources are shared fairly among all users.
💰 Cost Control → Reduce infrastructure bills by blocking excessive or abusive requests.

⚙️ Common Rate Limiting Algorithms

1️⃣ Token Bucket

Requests are allowed if tokens are available.
Tokens refill at a fixed rate.
Commonly used (flexible + efficient).

🔧 Example: Allow 10 requests per second. If unused, tokens accumulate up to a max limit (burst handling).

2️⃣ Leaky Bucket

Works like water dripping from a bucket at a fixed rate.
Bursts are smoothed out.
Useful for evenly distributing traffic.

3️⃣ Fixed Window Counter

Count requests in a fixed time window (e.g., 100 requests per minute).
❌ Edge case: allows bursts at window boundaries.

4️⃣ Sliding Window Log

Keeps timestamps of requests.
Precise but memory-heavy for large scale.

5️⃣ Sliding Window Counter

Hybrid approach: averages counts across windows.
More accurate than fixed window, less heavy than logs.

🔧 Implementing Rate Limiting in Applications

Example in Go (Gin Framework)

package main

import (
    "github.com/gin-gonic/gin"
    "golang.org/x/time/rate"
    "net/http"
    "time"
)

func main() {
    r := gin.Default()

    // Create a limiter: 5 requests/sec, burst up to 10
    limiter := rate.NewLimiter(5, 10)

    r.GET("/api", func(c *gin.Context) {
        if !limiter.Allow() {
            c.JSON(http.StatusTooManyRequests, gin.H{"error": "Too many requests"})
            return
        }
        c.JSON(http.StatusOK, gin.H{"message": "Request successful"})
    })

    r.Run(":8080")
}

✅ Each client is limited to 5 requests per second with a burst allowance of 10.

🚦 Rate Limiting with NGINX

NGINX can handle rate limiting at the web server or reverse proxy level, making it a powerful tool to protect your backend before requests hit your app.

⚙️ How It Works

NGINX uses two main directives:

limit_req_zone → Defines a shared memory zone to track requests (by IP or custom key).
limit_req → Applies the limit to endpoints.

🔧 Basic Example

http {
  # 1. Define rate limit zone (1 request/sec, burst 5)
  limit_req_zone $binary_remote_addr zone=api_limit:10m rate=1r/s;

  server {
    location /api/ {
      # 2. Apply rate limit
      limit_req zone=api_limit burst=5 nodelay;

      proxy_pass http://backend_service;
    }
  }
}

🔍 Explanation

Key: $binary_remote_addr → Track by client IP.
Rate: 1r/s → Allows 1 request per second per IP.
Burst: 5 → Short bursts up to 5 requests allowed.
nodelay: Burst requests are processed immediately.

✅ Advantages

Filters traffic before hitting your backend.
Can apply limits per endpoint (e.g., /login stricter than /products).
Lightweight & fast.

📊 Logs & Monitoring

Exceeded requests → logged as 503 (Service Unavailable).
Useful for tracking abuse & fine-tuning limits.

⚠️ Considerations

Avoid overly strict global limits (may block valid traffic).
Use $realip_remote_addr when behind a proxy/load balancer.
Always test before production rollout.

🔒 Best Practices for Rate Limiting

🎯 Apply stricter limits on sensitive endpoints (like /login).
🌍 Use different limits per client type (mobile app vs. server-to-server).
🧩 Combine server-level (NGINX/API Gateway) + app-level checks.
📊 Monitor logs & dashboards for blocked traffic trends.
🚀 Use distributed stores (Redis) for shared limits in multi-instance apps.

💡 Bonus Tips

✅ 429 Status Code: Always return HTTP 429 Too Many Requests with a helpful message.
✅ Retry-After Header: Tell clients when to retry. Example:

HTTP/1.1 429 Too Many Requests
Retry-After: 60

✅ Graceful Degradation: Don’t just block—offer reduced functionality for non-critical requests.

🎯 Final Thoughts

Rate limiting is not just a performance tool—it’s a security guard, cost saver and reliability booster.

Whether you use algorithms in code, NGINX or API Gateways (Kong, Apigee, AWS API Gateway), implementing proper rate limiting ensures:

✔️ Fair usage
✔️ Secure endpoints
✔️ Scalable systems

🚀 The bottom line? Smart rate limiting makes APIs faster, safer, and fairer—for everyone.

💬 What approach do you use for rate limiting in your backend? Have you tried using Redis or NGINX rules? Let’s discuss in the comments!

DEV Community