DEV Community

Cover image for How Should You Implement API Rate Limiting?
Wanda
Wanda

Posted on • Originally published at apidog.com

How Should You Implement API Rate Limiting?

TL;DR

Implement API rate limiting using token bucket or sliding window algorithms. Return standard IETF rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) and 429 Too Many Requests when limits are exceeded. Modern PetstoreAPI uses per-user quotas and clear error responses for robust rate limiting.

Introduction

A client sends 10,000 requests to your API in one minute. Your database fails, monitoring alerts trigger, and other customers are locked out. Whether it's an attack or a buggy client in a loop, rate limiting is essential.

Try Apidog today

Rate limiting caps how many requests a client can make within a given time window. Exceed the limit, return 429 Too Many Requests, and the client should back off to keep your API stable.

The old Swagger Petstore lacked rate limiting. Modern PetstoreAPI uses standard IETF headers, per-user quotas, and clear error responses.

💡 Tip: If you’re building or testing REST APIs, Apidog helps you test rate limiting, validate headers, and simulate high-volume scenarios.

This guide covers rate limiting algorithms, standard headers, and practical implementation with Modern PetstoreAPI.

Why APIs Need Rate Limiting

Rate limiting protects your API from abuse and ensures fair usage.

Protection Against Abuse

1. Denial-of-Service (DoS) attacks

Flooding your API with requests. Rate limiting caps their impact.

2. Credential stuffing

Slows attackers trying large numbers of credentials.

3. Data scraping

Discourages bots scraping your dataset.

4. Cost control

Prevents excessive costs if your API triggers expensive downstream calls.

Fair Usage

1. Prevent resource monopolization

Stops one client from starving others.

2. Predictable performance

Ensures consistent response times for all clients.

3. Tiered access

Free tier: 100 req/hour. Paid tier: 10,000 req/hour. Rate limiting enforces these plans.

Operational Benefits

1. Capacity planning

You know your maximum API load.

2. Cost predictability

Caps infrastructure costs.

3. Graceful degradation

Prevents cascading failures during heavy load.

Rate Limiting Algorithms

Choose an algorithm based on your requirements.

1. Fixed Window

Count requests in fixed intervals.

How it works:

Window 1 (00:00-00:59): 100 requests allowed
Window 2 (01:00-01:59): 100 requests allowed
Enter fullscreen mode Exit fullscreen mode

Implementation (Python/Redis):

def is_allowed(user_id):
    current_minute = get_current_minute()
    key = f"rate_limit:{user_id}:{current_minute}"
    count = redis.incr(key)
    redis.expire(key, 60)
    return count <= 100
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Simple
  • Low memory

Cons:

  • Burst problem: 200 requests in 2 seconds across window boundaries

2. Sliding Window

Counts requests in a rolling time window.

How it works:

At 01:30, count from 00:30 to 01:30.

Implementation:

def is_allowed(user_id):
    now = time.time()
    window_start = now - 3600  # 1 hour
    key = f"rate_limit:{user_id}"

    # Remove outdated requests
    redis.zremrangebyscore(key, 0, window_start)

    count = redis.zcard(key)
    if count < 100:
        redis.zadd(key, {now: now})
        redis.expire(key, 3600)
        return True
    return False
Enter fullscreen mode Exit fullscreen mode

Pros:

  • No burst problem
  • Accurate

Cons:

  • Higher memory usage (stores timestamps)

3. Token Bucket

Tokens accumulate at a fixed rate; each request consumes a token.

How it works:

Bucket: 100 tokens
Refill: 10 tokens/sec
Each request: -1 token
Enter fullscreen mode Exit fullscreen mode

Implementation:

def is_allowed(user_id):
    now = time.time()
    key = f"rate_limit:{user_id}"

    data = redis.hgetall(key)
    tokens = float(data.get('tokens', 100))
    last_refill = float(data.get('last_refill', now))

    # Refill tokens
    elapsed = now - last_refill
    tokens = min(100, tokens + elapsed * 10)  # 10 tokens/sec

    if tokens >= 1:
        tokens -= 1
        redis.hset(key, 'tokens', tokens)
        redis.hset(key, 'last_refill', now)
        redis.expire(key, 3600)
        return True
    return False
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Allows bursts (up to bucket size)
  • Smooth limiting
  • Industry standard

Cons:

  • More complex
  • Stores state

4. Leaky Bucket

Requests enter a queue, processed at a fixed rate.

How it works:

Queue: 100 requests
Process: 10 requests/sec
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Smooth output rate
  • Protects downstream services

Cons:

  • Adds latency
  • Complex

Which Algorithm to Use?

Token bucket is industry standard for most APIs—allows bursts, smooths limits.

Modern PetstoreAPI uses token bucket with per-user quotas.

Standard Rate Limit Headers

Use IETF standard headers (draft-ietf-httpapi-ratelimit-headers).

Standard Headers

  • RateLimit-Limit: Max requests per window
  RateLimit-Limit: 100
Enter fullscreen mode Exit fullscreen mode
  • RateLimit-Remaining: Requests left in this window
  RateLimit-Remaining: 45
Enter fullscreen mode Exit fullscreen mode
  • RateLimit-Reset: Seconds until reset
  RateLimit-Reset: 3600
Enter fullscreen mode Exit fullscreen mode

Example Response

GET /pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3600

{
  "data": [...]
}
Enter fullscreen mode Exit fullscreen mode

Legacy Headers (Deprecated)

Don't use custom or X- prefixed headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1710331200
Enter fullscreen mode Exit fullscreen mode

How Modern PetstoreAPI Implements Rate Limiting

Modern PetstoreAPI uses token bucket rate limiting with standard headers.

Rate Limits by Tier

  • Free:
    • 100 requests/hour
    • 1,000 requests/day
  • Pro:
    • 10,000 requests/hour
    • 100,000 requests/day
  • Enterprise:
    • Custom limits

Implementation

On success:

GET /v1/pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3540

{
  "data": [...]
}
Enter fullscreen mode Exit fullscreen mode

When limit exceeded:

GET /v1/pets
429 Too Many Requests
Content-Type: application/problem+json
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120

{
  "type": "https://petstoreapi.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded the rate limit of 100 requests per hour",
  "instance": "/v1/pets",
  "retryAfter": 120,
  "limit": 100,
  "window": "1h"
}
Enter fullscreen mode Exit fullscreen mode

Per-User vs Per-IP

  • Per-user (authenticated): Rate limit by user ID or API key:
  user_id = get_authenticated_user()
  is_allowed(user_id)
Enter fullscreen mode Exit fullscreen mode
  • Per-IP (unauthenticated): Rate limit by IP address:
  ip_address = request.remote_addr
  is_allowed(ip_address)
Enter fullscreen mode Exit fullscreen mode

Modern PetstoreAPI uses per-user for authenticated, per-IP for public endpoints.

Rate Limit Response Format

Return 429 with error details per RFC 9457.

Response Structure

{
  "type": "https://petstoreapi.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded your rate limit. Please try again later.",
  "instance": "/v1/pets",
  "retryAfter": 120,
  "limit": 100,
  "remaining": 0,
  "reset": 120,
  "window": "1h"
}
Enter fullscreen mode Exit fullscreen mode

Headers

429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120
Enter fullscreen mode Exit fullscreen mode

Retry-After: tells clients when to retry (in seconds).

Testing Rate Limits with Apidog

Apidog helps automate rate limit testing.

Test Scenarios

1. Normal usage

Send 50 requests → All succeed
Check RateLimit-Remaining decreases
Enter fullscreen mode Exit fullscreen mode

2. Exceed limit

Send 101 requests → 101st returns 429
Verify error response format
Check Retry-After header
Enter fullscreen mode Exit fullscreen mode

3. Reset behavior

Exceed limit → Wait for reset → Verify limit restored
Enter fullscreen mode Exit fullscreen mode

4. Different tiers

Test free tier (100/hour)
Test pro tier (10,000/hour)
Verify limits enforced
Enter fullscreen mode Exit fullscreen mode

Apidog Test Example

// Test rate limit headers
pm.test("Rate limit headers present", () => {
  pm.response.to.have.header("RateLimit-Limit");
  pm.response.to.have.header("RateLimit-Remaining");
  pm.response.to.have.header("RateLimit-Reset");
});

// Test rate limit exceeded
pm.test("Returns 429 when limit exceeded", () => {
  // Make 101 requests
  for (let i = 0; i < 101; i++) {
    pm.sendRequest("GET /v1/pets");
  }
  pm.response.to.have.status(429);
});
Enter fullscreen mode Exit fullscreen mode

Rate Limiting Best Practices

  1. Use standard headers Adopt IETF headers, not X-RateLimit-*.
  2. Return 429, not 403 429 = too many requests. 403 = forbidden.
  3. Include Retry-After Tell clients when to retry.
  4. Document your limits Make rate limits visible in your docs.
  5. Provide different tiers Free: low limits. Paid: higher.
  6. Rate limit by user, not IP Per-user is more accurate.
  7. Allow bursts Token bucket supports short bursts.
  8. Monitor rate limit hits Track how often clients hit limits.
  9. Provide a rate limit status endpoint
   GET /v1/rate-limit
   200 OK
   {
     "limit": 100,
     "remaining": 45,
     "reset": 3540
   }
Enter fullscreen mode Exit fullscreen mode
  1. Test rate limiting Use Apidog before deployment.

Conclusion

Rate limiting protects your API and ensures fair usage. Use the token bucket algorithm with IETF headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset). Return 429 Too Many Requests with RFC 9457 error details when limits are exceeded.

Modern PetstoreAPI enforces correct rate limiting with per-user quotas, standard headers, and clear error responses. See the documentation for more details.

Test your implementation with Apidog to ensure robust behavior under load and edge cases.

FAQ

What rate limits should I set?

Start with 100 requests/hour for free tier, 10,000/hour for paid. Adjust based on real usage and infrastructure.

Should I rate limit by IP or user?

Rate limit by user/API key for authenticated requests. Use IP-based limiting only for public endpoints.

What happens if a client exceeds the rate limit?

Return 429 Too Many Requests with Retry-After header. Don’t block clients permanently—let them retry after reset.

How do I handle rate limits for webhooks?

Webhooks are server-to-server, so set higher limits. Consider separate limits for webhooks and API calls.

Should I rate limit internal services?

Yes, but use higher limits. Rate limiting prevents cascading failures even internally.

How do I test rate limiting?

Use Apidog to send multiple requests and verify 429 responses, headers, and reset timing.

What if my API is behind a CDN?

CDN caching reduces load, but you still need rate limiting for cache misses and non-GET requests.

How do I implement rate limiting across multiple servers?

Use a shared data store (Redis, Memcached) to track limits across all servers. Don’t use local memory.

Top comments (0)