Wanda

Posted on Mar 13 • Originally published at apidog.com

How Should You Implement API Rate Limiting?

#api #backend #security #systemdesign

TL;DR

Implement API rate limiting using token bucket or sliding window algorithms. Return standard IETF rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) and 429 Too Many Requests when limits are exceeded. Modern PetstoreAPI uses per-user quotas and clear error responses for robust rate limiting.

Introduction

A client sends 10,000 requests to your API in one minute. Your database fails, monitoring alerts trigger, and other customers are locked out. Whether it's an attack or a buggy client in a loop, rate limiting is essential.

Try Apidog today

Rate limiting caps how many requests a client can make within a given time window. Exceed the limit, return 429 Too Many Requests, and the client should back off to keep your API stable.

The old Swagger Petstore lacked rate limiting. Modern PetstoreAPI uses standard IETF headers, per-user quotas, and clear error responses.

💡 Tip: If you’re building or testing REST APIs, Apidog helps you test rate limiting, validate headers, and simulate high-volume scenarios.

This guide covers rate limiting algorithms, standard headers, and practical implementation with Modern PetstoreAPI.

Why APIs Need Rate Limiting

Rate limiting protects your API from abuse and ensures fair usage.

Protection Against Abuse

1. Denial-of-Service (DoS) attacks

Flooding your API with requests. Rate limiting caps their impact.

2. Credential stuffing

Slows attackers trying large numbers of credentials.

3. Data scraping

Discourages bots scraping your dataset.

4. Cost control

Prevents excessive costs if your API triggers expensive downstream calls.

Fair Usage

1. Prevent resource monopolization

Stops one client from starving others.

2. Predictable performance

Ensures consistent response times for all clients.

3. Tiered access

Free tier: 100 req/hour. Paid tier: 10,000 req/hour. Rate limiting enforces these plans.

Operational Benefits

1. Capacity planning

You know your maximum API load.

2. Cost predictability

Caps infrastructure costs.

3. Graceful degradation

Prevents cascading failures during heavy load.

Rate Limiting Algorithms

Choose an algorithm based on your requirements.

1. Fixed Window

Count requests in fixed intervals.

How it works:

Window 1 (00:00-00:59): 100 requests allowed
Window 2 (01:00-01:59): 100 requests allowed

Implementation (Python/Redis):

def is_allowed(user_id):
    current_minute = get_current_minute()
    key = f"rate_limit:{user_id}:{current_minute}"
    count = redis.incr(key)
    redis.expire(key, 60)
    return count <= 100

Pros:

Simple
Low memory

Cons:

Burst problem: 200 requests in 2 seconds across window boundaries

2. Sliding Window

Counts requests in a rolling time window.

How it works:

At 01:30, count from 00:30 to 01:30.

Implementation:

def is_allowed(user_id):
    now = time.time()
    window_start = now - 3600  # 1 hour
    key = f"rate_limit:{user_id}"

    # Remove outdated requests
    redis.zremrangebyscore(key, 0, window_start)

    count = redis.zcard(key)
    if count < 100:
        redis.zadd(key, {now: now})
        redis.expire(key, 3600)
        return True
    return False

Pros:

No burst problem
Accurate

Cons:

Higher memory usage (stores timestamps)

3. Token Bucket

Tokens accumulate at a fixed rate; each request consumes a token.

How it works:

Bucket: 100 tokens
Refill: 10 tokens/sec
Each request: -1 token

Implementation:

def is_allowed(user_id):
    now = time.time()
    key = f"rate_limit:{user_id}"

    data = redis.hgetall(key)
    tokens = float(data.get('tokens', 100))
    last_refill = float(data.get('last_refill', now))

    # Refill tokens
    elapsed = now - last_refill
    tokens = min(100, tokens + elapsed * 10)  # 10 tokens/sec

    if tokens >= 1:
        tokens -= 1
        redis.hset(key, 'tokens', tokens)
        redis.hset(key, 'last_refill', now)
        redis.expire(key, 3600)
        return True
    return False

Pros:

Allows bursts (up to bucket size)
Smooth limiting
Industry standard

Cons:

More complex
Stores state

4. Leaky Bucket

Requests enter a queue, processed at a fixed rate.

How it works:

Queue: 100 requests
Process: 10 requests/sec

Pros:

Smooth output rate
Protects downstream services

Cons:

Adds latency
Complex

Which Algorithm to Use?

Token bucket is industry standard for most APIs—allows bursts, smooths limits.

Modern PetstoreAPI uses token bucket with per-user quotas.

Standard Rate Limit Headers

Use IETF standard headers (draft-ietf-httpapi-ratelimit-headers).

Standard Headers

RateLimit-Limit: Max requests per window

  RateLimit-Limit: 100

RateLimit-Remaining: Requests left in this window

  RateLimit-Remaining: 45

RateLimit-Reset: Seconds until reset

  RateLimit-Reset: 3600

Example Response

GET /pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3600

{
  "data": [...]
}

Legacy Headers (Deprecated)

Don't use custom or X- prefixed headers:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1710331200

How Modern PetstoreAPI Implements Rate Limiting

Modern PetstoreAPI uses token bucket rate limiting with standard headers.

Rate Limits by Tier

Free:
- 100 requests/hour
- 1,000 requests/day
Pro:
- 10,000 requests/hour
- 100,000 requests/day
Enterprise:
- Custom limits

Implementation

On success:

GET /v1/pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3540

{
  "data": [...]
}

When limit exceeded:

GET /v1/pets
429 Too Many Requests
Content-Type: application/problem+json
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120

{
  "type": "https://petstoreapi.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded the rate limit of 100 requests per hour",
  "instance": "/v1/pets",
  "retryAfter": 120,
  "limit": 100,
  "window": "1h"
}

Per-User vs Per-IP

Per-user (authenticated): Rate limit by user ID or API key:

  user_id = get_authenticated_user()
  is_allowed(user_id)

Per-IP (unauthenticated): Rate limit by IP address:

  ip_address = request.remote_addr
  is_allowed(ip_address)

Modern PetstoreAPI uses per-user for authenticated, per-IP for public endpoints.

Rate Limit Response Format

Return 429 with error details per RFC 9457.

Response Structure

{
  "type": "https://petstoreapi.com/errors/rate-limit-exceeded",
  "title": "Rate Limit Exceeded",
  "status": 429,
  "detail": "You have exceeded your rate limit. Please try again later.",
  "instance": "/v1/pets",
  "retryAfter": 120,
  "limit": 100,
  "remaining": 0,
  "reset": 120,
  "window": "1h"
}

Headers

429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120

Retry-After: tells clients when to retry (in seconds).

Testing Rate Limits with Apidog

Apidog helps automate rate limit testing.

Test Scenarios

1. Normal usage

Send 50 requests → All succeed
Check RateLimit-Remaining decreases

2. Exceed limit

Send 101 requests → 101st returns 429
Verify error response format
Check Retry-After header

3. Reset behavior

Exceed limit → Wait for reset → Verify limit restored

4. Different tiers

Test free tier (100/hour)
Test pro tier (10,000/hour)
Verify limits enforced

Apidog Test Example

// Test rate limit headers
pm.test("Rate limit headers present", () => {
  pm.response.to.have.header("RateLimit-Limit");
  pm.response.to.have.header("RateLimit-Remaining");
  pm.response.to.have.header("RateLimit-Reset");
});

// Test rate limit exceeded
pm.test("Returns 429 when limit exceeded", () => {
  // Make 101 requests
  for (let i = 0; i < 101; i++) {
    pm.sendRequest("GET /v1/pets");
  }
  pm.response.to.have.status(429);
});

Rate Limiting Best Practices

Use standard headers Adopt IETF headers, not X-RateLimit-*.
Return 429, not 403 429 = too many requests. 403 = forbidden.
Include Retry-After Tell clients when to retry.
Document your limits Make rate limits visible in your docs.
Provide different tiers Free: low limits. Paid: higher.
Rate limit by user, not IP Per-user is more accurate.
Allow bursts Token bucket supports short bursts.
Monitor rate limit hits Track how often clients hit limits.
Provide a rate limit status endpoint

   GET /v1/rate-limit
   200 OK
   {
     "limit": 100,
     "remaining": 45,
     "reset": 3540
   }

Test rate limiting Use Apidog before deployment.

Conclusion

Rate limiting protects your API and ensures fair usage. Use the token bucket algorithm with IETF headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset). Return 429 Too Many Requests with RFC 9457 error details when limits are exceeded.

Modern PetstoreAPI enforces correct rate limiting with per-user quotas, standard headers, and clear error responses. See the documentation for more details.

Test your implementation with Apidog to ensure robust behavior under load and edge cases.

FAQ

What rate limits should I set?

Start with 100 requests/hour for free tier, 10,000/hour for paid. Adjust based on real usage and infrastructure.

Should I rate limit by IP or user?

Rate limit by user/API key for authenticated requests. Use IP-based limiting only for public endpoints.

What happens if a client exceeds the rate limit?

Return 429 Too Many Requests with Retry-After header. Don’t block clients permanently—let them retry after reset.

How do I handle rate limits for webhooks?

Webhooks are server-to-server, so set higher limits. Consider separate limits for webhooks and API calls.

Should I rate limit internal services?

Yes, but use higher limits. Rate limiting prevents cascading failures even internally.

How do I test rate limiting?

Use Apidog to send multiple requests and verify 429 responses, headers, and reset timing.

What if my API is behind a CDN?

CDN caching reduces load, but you still need rate limiting for cache misses and non-GET requests.

How do I implement rate limiting across multiple servers?

Use a shared data store (Redis, Memcached) to track limits across all servers. Don’t use local memory.

DEV Community

How Should You Implement API Rate Limiting?

TL;DR

Introduction

Why APIs Need Rate Limiting

Protection Against Abuse

Fair Usage

Operational Benefits

Rate Limiting Algorithms

1. Fixed Window

2. Sliding Window

3. Token Bucket

4. Leaky Bucket

Which Algorithm to Use?

Standard Rate Limit Headers

Standard Headers

Example Response

Legacy Headers (Deprecated)

How Modern PetstoreAPI Implements Rate Limiting

Rate Limits by Tier

Implementation

Per-User vs Per-IP

Rate Limit Response Format

Response Structure

Headers

Testing Rate Limits with Apidog

Test Scenarios

Apidog Test Example

Rate Limiting Best Practices

Conclusion

FAQ

What rate limits should I set?

Should I rate limit by IP or user?

What happens if a client exceeds the rate limit?

How do I handle rate limits for webhooks?

Should I rate limit internal services?

How do I test rate limiting?

What if my API is behind a CDN?

How do I implement rate limiting across multiple servers?

Top comments (0)