Colin McDermott

Posted on Apr 14, 2023

API Rate Limiting Cheat Sheet

#webdev #security #programming #api

Jump to a section:

Gateway-level rate limiting

Token bucket algorithm

Leaky bucket algorithm

Sliding window algorithm

Distributed rate limiting

User-based rate limiting

API key rate limiting

Custom rate limiting

Gateway-level rate limiting

Gateway-level rate limiting is a popular approach to rate limiting that allows developers to set rate limits at the gateway level.
Gateway-level rate limiting is typically implemented in API gateways such as Kong, Google's Apigee, or Amazon API Gateway.
Gateway-level rate limiting can provide simple and effective rate limiting, but may not offer as much fine-grained control as other approaches.

Token bucket algorithm

image source

The token bucket algorithm is a popular rate limiting algorithm that involves allocating tokens to API requests.
The tokens are refilled at a set rate, and when an API request is made, it must consume a token.
If there are no tokens available, the request is rejected.
The token bucket algorithm is commonly used in many rate limiting libraries and tools, such as rate-limiter, redis-rate-limiter, and the Google Cloud Endpoints.

More: Token Bucket vs Bursty Rate Limiter by @animir

Leaky bucket algorithm

image source

The leaky bucket algorithm is similar to the token bucket algorithm, but instead of allocating tokens, API requests are added to a "bucket" at a set rate.
If the bucket overflows, the requests are rejected.
The leaky bucket algorithm can be useful for smoothing out request bursts, and for ensuring that requests are processed at a consistent rate.

Sliding window algorithm

image source

The sliding window algorithm is a rate limiting approach that involves tracking the number of requests made in a sliding window of time.
If the number of requests exceeds a set limit, further requests are rejected.
The sliding window algorithm is commonly used in many rate limiting libraries and tools, such as Django Ratelimit, Express Rate Limit, and the Kubernetes Rate Limiting.

More: Rate limiting using the Sliding Window algorithm by @satrobit

Distributed rate limiting

For high-traffic APIs, it may be necessary to implement rate limiting across multiple servers.
Distributed rate limiting algorithms such as Redis-based rate limiting or Consistent Hashing-based rate limiting can be used to implement rate limiting across multiple servers.
Distributed rate limiting can help to ensure that rate limiting is consistent across multiple servers, and can help to reduce the impact of traffic spikes.

In this example, we'll create a simple Next.js application with a rate-limited API endpoint using Redis and Upstash. Upstash is a serverless Redis database provider that allows you to interact with Redis easily and cost-effectively.

First, let's create a new Next.js project:

npx create-next-app redis-rate-limit-example
cd redis-rate-limit-example

Install the required dependencies:

npm install upstash-redis@0.4.4 ioredis@4.27.6 express-rate-limit@5.3.0

Create a .env.local file in the project root to store your Upstash Redis credentials:

UPSTASH_REDIS_URL=your_upstash_redis_url_here

Replace your_upstash_redis_url_here with your actual Upstash Redis URL.

Create a new API route in pages/api/limited.js:

import { connectRedis } from '../../lib/redis';
import rateLimit from 'express-rate-limit';
import { createError } from 'micro';

const redisClient = connectRedis();

const rateLimiter = rateLimit({
  store: new RedisStore({
    client: redisClient,
  }),
  windowMs: 60 * 1000, // 1 minute
  max: 5, // limit each IP to 5 requests per minute
  handler: (req, res) => {
    res.status(429).json({ message: 'Too many requests, please try again later.' });
  },
});

export default async function handler(req, res) {
  try {
    await rateLimiter(req, res);
  } catch (error) {
    if (error instanceof createError.HttpError) {
      return res.status(error.statusCode).json({ message: error.message });
    }
    res.status(500).json({ message: 'Internal server error' });
  }

  res.status(200).json({ message: 'Success! Your request was not rate-limited.' });
}

export const config = {
  api: {
    bodyParser: false,
  },
};

Create a lib/redis.js file to handle Redis connections:

import Redis from 'ioredis';

let cachedRedis = null;

export function connectRedis() {
  if (cachedRedis) {
    return cachedRedis;
  }

  const redis = new Redis(process.env.UPSTASH_REDIS_URL);
  cachedRedis = redis;
  return redis;
}

Create a new RedisStore class in lib/redis-store.js:

import { connectRedis } from './redis';

export class RedisStore {
  constructor({ client } = {}) {
    this.redis = client || connectRedis();
  }

  async get(key) {
    const data = await this.redis.get(key);
    return JSON.parse(data);
  }

  async set(key, value, ttl) {
    await this.redis.set(key, JSON.stringify(value), 'EX', ttl);
  }

  async resetKey(key) {
    await this.redis.del(key);
  }
}

Now you can test your rate-limited API endpoint by starting the development server:

npm run dev

Visit http://localhost:3000/api/limited in your browser or use a tool like Postman or curl to make requests. You should see the Success! Your request was not rate-limited. message. If you make more than 5 requests within a minute, you'll receive the rate limit message:

Too many requests, please try again later.

User-based rate limiting

Some APIs may require rate limiting at the user level, rather than the IP address or client ID level.
User-based rate limiting involves tracking the number of requests made by a particular user account, and limiting requests if the user exceeds a set limit.
User-based rate limiting is commonly used in many API frameworks, such as Django Rest Framework, and can be implemented using session-based or token-based authentication.

API key rate limiting

For APIs that require authentication with an API key, rate limiting can be implemented at the API key level.
API key rate limiting involves tracking the number of requests made with a particular API key, and limiting requests if the key exceeds a set limit.
API key rate limiting is commonly used in many API frameworks, such as Flask-Limiter, and can be implemented using API key-based authentication.

Custom rate limiting

Finally, it's worth noting that there are many other rate limiting approaches that can be customized to suit the needs of a particular API.
Some examples include adaptive rate limiting, which adjusts the rate limit based on the current traffic load, and request complexity-based rate limiting, which takes into account the complexity of individual requests when enforcing rate limits.
Custom rate limiting approaches can be useful for optimizing the rate limiting strategy for a specific API use case.

For my latest project Pub Index API I am making use of an API gateway for rate-limiting.

More: RESTful API Design Cheatsheet

Top comments (3)

Flimtix • Apr 16 '23

Super interesting and helpful post! 👍
Could you make a continuation where you show the pros and cons?

Colin McDermott • Apr 22 '23

Honestly I think the best advice is simply: use an API gateway and configure it as needed. All the stuff about leaky buckets is good theory but really you are probably just going to set a limit eg x requests per y period.

vineetjadav • Mar 22 '24 • Edited

Thanks for extraordinary insights on API, for more information do check out Cloud Computing And DevOps Courses.