TL;DR
Implement API rate limiting using token bucket or sliding window algorithms. Return standard IETF rate limit headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset) and 429 Too Many Requests when limits are exceeded. Modern PetstoreAPI uses per-user quotas and clear error responses for robust rate limiting.
Introduction
A client sends 10,000 requests to your API in one minute. Your database fails, monitoring alerts trigger, and other customers are locked out. Whether it's an attack or a buggy client in a loop, rate limiting is essential.
Rate limiting caps how many requests a client can make within a given time window. Exceed the limit, return 429 Too Many Requests, and the client should back off to keep your API stable.
The old Swagger Petstore lacked rate limiting. Modern PetstoreAPI uses standard IETF headers, per-user quotas, and clear error responses.
💡 Tip: If you’re building or testing REST APIs, Apidog helps you test rate limiting, validate headers, and simulate high-volume scenarios.
This guide covers rate limiting algorithms, standard headers, and practical implementation with Modern PetstoreAPI.
Why APIs Need Rate Limiting
Rate limiting protects your API from abuse and ensures fair usage.
Protection Against Abuse
1. Denial-of-Service (DoS) attacks
Flooding your API with requests. Rate limiting caps their impact.
2. Credential stuffing
Slows attackers trying large numbers of credentials.
3. Data scraping
Discourages bots scraping your dataset.
4. Cost control
Prevents excessive costs if your API triggers expensive downstream calls.
Fair Usage
1. Prevent resource monopolization
Stops one client from starving others.
2. Predictable performance
Ensures consistent response times for all clients.
3. Tiered access
Free tier: 100 req/hour. Paid tier: 10,000 req/hour. Rate limiting enforces these plans.
Operational Benefits
1. Capacity planning
You know your maximum API load.
2. Cost predictability
Caps infrastructure costs.
3. Graceful degradation
Prevents cascading failures during heavy load.
Rate Limiting Algorithms
Choose an algorithm based on your requirements.
1. Fixed Window
Count requests in fixed intervals.
How it works:
Window 1 (00:00-00:59): 100 requests allowed
Window 2 (01:00-01:59): 100 requests allowed
Implementation (Python/Redis):
def is_allowed(user_id):
current_minute = get_current_minute()
key = f"rate_limit:{user_id}:{current_minute}"
count = redis.incr(key)
redis.expire(key, 60)
return count <= 100
Pros:
- Simple
- Low memory
Cons:
- Burst problem: 200 requests in 2 seconds across window boundaries
2. Sliding Window
Counts requests in a rolling time window.
How it works:
At 01:30, count from 00:30 to 01:30.
Implementation:
def is_allowed(user_id):
now = time.time()
window_start = now - 3600 # 1 hour
key = f"rate_limit:{user_id}"
# Remove outdated requests
redis.zremrangebyscore(key, 0, window_start)
count = redis.zcard(key)
if count < 100:
redis.zadd(key, {now: now})
redis.expire(key, 3600)
return True
return False
Pros:
- No burst problem
- Accurate
Cons:
- Higher memory usage (stores timestamps)
3. Token Bucket
Tokens accumulate at a fixed rate; each request consumes a token.
How it works:
Bucket: 100 tokens
Refill: 10 tokens/sec
Each request: -1 token
Implementation:
def is_allowed(user_id):
now = time.time()
key = f"rate_limit:{user_id}"
data = redis.hgetall(key)
tokens = float(data.get('tokens', 100))
last_refill = float(data.get('last_refill', now))
# Refill tokens
elapsed = now - last_refill
tokens = min(100, tokens + elapsed * 10) # 10 tokens/sec
if tokens >= 1:
tokens -= 1
redis.hset(key, 'tokens', tokens)
redis.hset(key, 'last_refill', now)
redis.expire(key, 3600)
return True
return False
Pros:
- Allows bursts (up to bucket size)
- Smooth limiting
- Industry standard
Cons:
- More complex
- Stores state
4. Leaky Bucket
Requests enter a queue, processed at a fixed rate.
How it works:
Queue: 100 requests
Process: 10 requests/sec
Pros:
- Smooth output rate
- Protects downstream services
Cons:
- Adds latency
- Complex
Which Algorithm to Use?
Token bucket is industry standard for most APIs—allows bursts, smooths limits.
Modern PetstoreAPI uses token bucket with per-user quotas.
Standard Rate Limit Headers
Use IETF standard headers (draft-ietf-httpapi-ratelimit-headers).
Standard Headers
- RateLimit-Limit: Max requests per window
RateLimit-Limit: 100
- RateLimit-Remaining: Requests left in this window
RateLimit-Remaining: 45
- RateLimit-Reset: Seconds until reset
RateLimit-Reset: 3600
Example Response
GET /pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3600
{
"data": [...]
}
Legacy Headers (Deprecated)
Don't use custom or X- prefixed headers:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 99
X-RateLimit-Reset: 1710331200
How Modern PetstoreAPI Implements Rate Limiting
Modern PetstoreAPI uses token bucket rate limiting with standard headers.
Rate Limits by Tier
-
Free:
- 100 requests/hour
- 1,000 requests/day
-
Pro:
- 10,000 requests/hour
- 100,000 requests/day
-
Enterprise:
- Custom limits
Implementation
On success:
GET /v1/pets
200 OK
RateLimit-Limit: 100
RateLimit-Remaining: 99
RateLimit-Reset: 3540
{
"data": [...]
}
When limit exceeded:
GET /v1/pets
429 Too Many Requests
Content-Type: application/problem+json
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120
{
"type": "https://petstoreapi.com/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "You have exceeded the rate limit of 100 requests per hour",
"instance": "/v1/pets",
"retryAfter": 120,
"limit": 100,
"window": "1h"
}
Per-User vs Per-IP
- Per-user (authenticated): Rate limit by user ID or API key:
user_id = get_authenticated_user()
is_allowed(user_id)
- Per-IP (unauthenticated): Rate limit by IP address:
ip_address = request.remote_addr
is_allowed(ip_address)
Modern PetstoreAPI uses per-user for authenticated, per-IP for public endpoints.
Rate Limit Response Format
Return 429 with error details per RFC 9457.
Response Structure
{
"type": "https://petstoreapi.com/errors/rate-limit-exceeded",
"title": "Rate Limit Exceeded",
"status": 429,
"detail": "You have exceeded your rate limit. Please try again later.",
"instance": "/v1/pets",
"retryAfter": 120,
"limit": 100,
"remaining": 0,
"reset": 120,
"window": "1h"
}
Headers
429 Too Many Requests
RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 120
Retry-After: 120
Retry-After: tells clients when to retry (in seconds).
Testing Rate Limits with Apidog
Apidog helps automate rate limit testing.
Test Scenarios
1. Normal usage
Send 50 requests → All succeed
Check RateLimit-Remaining decreases
2. Exceed limit
Send 101 requests → 101st returns 429
Verify error response format
Check Retry-After header
3. Reset behavior
Exceed limit → Wait for reset → Verify limit restored
4. Different tiers
Test free tier (100/hour)
Test pro tier (10,000/hour)
Verify limits enforced
Apidog Test Example
// Test rate limit headers
pm.test("Rate limit headers present", () => {
pm.response.to.have.header("RateLimit-Limit");
pm.response.to.have.header("RateLimit-Remaining");
pm.response.to.have.header("RateLimit-Reset");
});
// Test rate limit exceeded
pm.test("Returns 429 when limit exceeded", () => {
// Make 101 requests
for (let i = 0; i < 101; i++) {
pm.sendRequest("GET /v1/pets");
}
pm.response.to.have.status(429);
});
Rate Limiting Best Practices
-
Use standard headers
Adopt IETF headers, not
X-RateLimit-*. - Return 429, not 403 429 = too many requests. 403 = forbidden.
- Include Retry-After Tell clients when to retry.
- Document your limits Make rate limits visible in your docs.
- Provide different tiers Free: low limits. Paid: higher.
- Rate limit by user, not IP Per-user is more accurate.
- Allow bursts Token bucket supports short bursts.
- Monitor rate limit hits Track how often clients hit limits.
- Provide a rate limit status endpoint
GET /v1/rate-limit
200 OK
{
"limit": 100,
"remaining": 45,
"reset": 3540
}
- Test rate limiting Use Apidog before deployment.
Conclusion
Rate limiting protects your API and ensures fair usage. Use the token bucket algorithm with IETF headers (RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset). Return 429 Too Many Requests with RFC 9457 error details when limits are exceeded.
Modern PetstoreAPI enforces correct rate limiting with per-user quotas, standard headers, and clear error responses. See the documentation for more details.
Test your implementation with Apidog to ensure robust behavior under load and edge cases.
FAQ
What rate limits should I set?
Start with 100 requests/hour for free tier, 10,000/hour for paid. Adjust based on real usage and infrastructure.
Should I rate limit by IP or user?
Rate limit by user/API key for authenticated requests. Use IP-based limiting only for public endpoints.
What happens if a client exceeds the rate limit?
Return 429 Too Many Requests with Retry-After header. Don’t block clients permanently—let them retry after reset.
How do I handle rate limits for webhooks?
Webhooks are server-to-server, so set higher limits. Consider separate limits for webhooks and API calls.
Should I rate limit internal services?
Yes, but use higher limits. Rate limiting prevents cascading failures even internally.
How do I test rate limiting?
Use Apidog to send multiple requests and verify 429 responses, headers, and reset timing.
What if my API is behind a CDN?
CDN caching reduces load, but you still need rate limiting for cache misses and non-GET requests.
How do I implement rate limiting across multiple servers?
Use a shared data store (Redis, Memcached) to track limits across all servers. Don’t use local memory.
Top comments (0)