Darian Vance

Posted on Jan 28 • Originally published at wp.me

Solved: Tracking API Rate Limits: Middleware Solution for Express.js/Python

#devops #programming #tutorial #cloud

🚀 Executive Summary

TL;DR: Uncontrolled third-party API usage often results in 429 Too Many Requests errors and service disruptions. This article presents a proactive, centralized rate limit tracking solution using Redis, implemented as middleware in Express.js and a decorator in Python FastAPI, to monitor API usage and gracefully pause before hitting limits.

🎯 Key Takeaways

Proactive API rate limit tracking using Redis prevents 429 errors by checking available quota before making external calls, rather than reacting to failures.
The solution leverages Redis as a high-performance, centralized store for shared rate limit status, crucial for maintaining consistency across distributed application instances.
Rate limit updates are dynamically managed by extracting x-ratelimit-remaining and x-ratelimit-reset headers from external API responses, setting the remaining count in Redis with an appropriate Time To Live (TTL).

Tracking API Rate Limits: Middleware Solution for Express.js/Python

Introduction

In the world of interconnected services, APIs are the glue that holds our applications together. However, every seasoned engineer has faced the dreaded 429 Too Many Requests error. Relying on third-party APIs without respecting their rate limits is a recipe for disaster, leading to service disruptions, cascading failures, and even temporary account suspension.

The common approach is reactive: you make a call, get a 429 error, and then back off. This is inefficient and makes your application brittle. A far more robust solution is to be proactive. Instead of waiting to be told you’ve hit a limit, what if your application could keep track of its own usage and gracefully pause before ever receiving an error?

In this tutorial, we will build exactly that: a proactive, centralized rate limit tracking solution using Redis. We’ll implement this pattern as a middleware in an Express.js application and as a decorator in a Python FastAPI application. This approach is perfect for distributed systems where multiple instances of your service share the same API quota.

Prerequisites

Before we begin, ensure you have the following tools and knowledge:

Node.js (v16+) and npm or yarn installed.
Python (v3.8+) and pip installed.
Docker and Docker Compose for running a local Redis instance.
A basic understanding of REST APIs and HTTP headers.
Familiarity with either Express.js or a Python web framework like FastAPI or Flask.
Access to a command-line terminal.

Step-by-Step Guide

Step 1: Setting Up the Shared State with Redis

Our tracking mechanism needs a central, fast, and reliable place to store the current rate limit status. Redis is the perfect tool for this job due to its high-performance, in-memory nature and atomic operations. We’ll use Docker Compose to spin up a Redis container quickly.

Create a file named docker-compose.yml in your project directory:

# docker-compose.yml
version: '3.8'
services:
  redis:
    image: "redis:alpine"
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data

volumes:
  redis_data:

This configuration defines a service named redis that uses the official lightweight Alpine image. It maps the container’s port 6379 to your local machine’s port 6379 and creates a Docker volume to persist data.

Now, start the container in detached mode from your terminal:

# Bash Command
docker-compose up -d

You can verify that Redis is running by connecting to it with the Redis CLI (if installed locally) or through another Docker command:

# Bash Command
docker exec -it [CONTAINER_ID] redis-cli ping
# Expected output: PONG

With our shared state manager running, we’re ready to build the application logic.

Step 2: Implementing the Rate Limit Tracker in Express.js

For our Node.js example, we’ll create an Express middleware. This middleware will intercept requests that need to call an external API. It will first check Redis for the available quota before proceeding. After the external call, it will read the rate limit headers from the response and update Redis accordingly.

First, initialize a Node.js project and install the necessary dependencies:

# Bash Commands
npm init -y
npm install express redis axios

Now, create a file named rateLimitTracker.js for our middleware logic:

// rateLimitTracker.js
const { createClient } = require('redis');
const axios = require('axios');

const redisClient = createClient();
redisClient.on('error', (err) => console.log('Redis Client Error', err));

// A unique key for tracking a specific API endpoint.
const GITHUB_API_LIMIT_KEY = 'rate-limit:github-api';

const rateLimitTracker = async (req, res, next) => {
  if (!redisClient.isOpen) {
    await redisClient.connect();
  }

  try {
    const remaining = await redisClient.get(GITHUB_API_LIMIT_KEY);

    // If the key exists and the count is 0, we've hit the limit.
    if (remaining !== null && Number(remaining) <= 0) {
      return res.status(429).json({ 
        message: 'GitHub API rate limit exceeded. Please try again later.' 
      });
    }

    // Attach a function to the response object to update the limit after the request.
    res.updateRateLimit = async (apiResponse) => {
      const newRemaining = apiResponse.headers['x-ratelimit-remaining'];
      const resetTimestamp = apiResponse.headers['x-ratelimit-reset'];

      if (newRemaining !== undefined && resetTimestamp !== undefined) {
        const ttl = Number(resetTimestamp) - Math.floor(Date.now() / 1000);
        // Set the new value with an expiry time (Time To Live).
        await redisClient.set(GITHUB_API_LIMIT_KEY, newRemaining, { EX: ttl > 0 ? ttl : 1 });
        console.log(`Updated Redis: Remaining calls = ${newRemaining}, Resets in ${ttl}s`);
      }
    };

    next();
  } catch (error) {
    console.error('Error in rate limit middleware:', error);
    // Fail open: if Redis fails, let the request proceed.
    next();
  }
};

module.exports = rateLimitTracker;

The key logic here is the res.updateRateLimit function. We attach it to the response object so our main route handler can call it after the external API call succeeds, passing in the response from that call.

Next, let’s integrate this into an Express server. Create a file named server.js:

// server.js
const express = require('express');
const axios = require('axios');
const rateLimitTracker = require('./rateLimitTracker');

const app = express();
const PORT = 3000;

// Apply the middleware to the route that calls the external API.
app.get('/github/user/:username', rateLimitTracker, async (req, res) => {
  try {
    const { username } = req.params;
    const githubResponse = await axios.get(`https://api.github.com/users/${username}`);

    // Call the function attached by our middleware to update the limit.
    if (res.updateRateLimit) {
      res.updateRateLimit(githubResponse);
    }

    res.json(githubResponse.data);
  } catch (error) {
    // Also update the limit on failure, as a failed request might still count against the quota.
    if (error.response && res.updateRateLimit) {
      res.updateRateLimit(error.response);
    }
    const status = error.response ? error.response.status : 500;
    const message = error.response ? error.response.data : 'Internal Server Error';
    res.status(status).json({ message });
  }
});

app.listen(PORT, () => {
  console.log(`Express server running on http://localhost:${PORT}`);
});

When you call /github/user/someuser, our middleware checks Redis. If the quota is available, it proceeds. The route handler calls the GitHub API and then uses res.updateRateLimit to save the latest rate limit information back to Redis.

Step 3: Implementing the Rate Limit Tracker in Python (FastAPI)

The principle in Python is identical. We’ll use a dependency injection approach with FastAPI, which provides a clean and reusable way to encapsulate our logic.

First, set up a virtual environment and install the dependencies:

# Bash Commands
python3 -m venv venv
source venv/bin/activate
pip install fastapi "uvicorn[standard]" redis httpx

Now, create a file named main.py:

# main.py
import redis.asyncio as redis
import httpx
import time
from fastapi import FastAPI, Depends, HTTPException, Response

# --- Configuration & Clients ---
app = FastAPI()
redis_client = redis.from_url("redis://localhost:6379", decode_responses=True)
GITHUB_API_LIMIT_KEY = "rate-limit:github-api"

# --- Rate Limiter Logic as a Dependency ---
async def github_rate_limiter(response: Response):
    """
    A FastAPI dependency that checks and updates API rate limits.
    """
    remaining_str = await redis_client.get(GITHUB_API_LIMIT_KEY)

    if remaining_str is not None and int(remaining_str) <= 0:
        raise HTTPException(
            status_code=429,
            detail="GitHub API rate limit exceeded. Please try again later."
        )

    # Yield control to the route handler. Code after this runs after the response is sent.
    yield

    # This part runs after the request has been processed. We use the response object
    # that FastAPI gives us access to.
    api_headers = response.context.get("api_headers") if hasattr(response, "context") else None
    if api_headers:
        new_remaining = api_headers.get("x-ratelimit-remaining")
        reset_timestamp = api_headers.get("x-ratelimit-reset")

        if new_remaining is not None and reset_timestamp is not None:
            ttl = int(reset_timestamp) - int(time.time())
            await redis_client.set(
                GITHUB_API_LIMIT_KEY, 
                new_remaining, 
                ex=ttl if ttl > 0 else 1
            )
            print(f"Updated Redis: Remaining calls = {new_remaining}, Resets in {ttl}s")

# --- API Route ---
@app.get("/github/user/{username}", dependencies=[Depends(github_rate_limiter)])
async def get_github_user(username: str, response: Response):
    async with httpx.AsyncClient() as client:
        try:
            github_res = await client.get(f"https://api.github.com/users/{username}")
            github_res.raise_for_status() # Raise an exception for 4xx/5xx responses

            # Store headers in the response context to be read by the dependency later.
            response.context = {"api_headers": github_res.headers}
            return github_res.json()

        except httpx.HTTPStatusError as e:
            # Also update the rate limit on error.
            response.context = {"api_headers": e.response.headers}
            raise HTTPException(
                status_code=e.response.status_code, 
                detail=e.response.json()
            )

In this FastAPI example, github_rate_limiter is a “dependency” function. FastAPI executes the code before the yield statement before running the route handler. After the route handler completes, the code after yield is executed. This is a powerful pattern for setup and teardown logic. We pass headers from the route handler back to the dependency using a shared response.context object.

Run the application with Uvicorn:

# Bash Command
uvicorn main:app --reload

Common Pitfalls

1. Race Conditions in a Distributed Environment

Imagine two application instances running in parallel. Both read the same value from Redis (e.g., “1 remaining call”). Both proceed to make an API call, using up two requests when they thought only one was available. While our header-based approach mitigates this by always trusting the server’s response, simpler “decrement-on-call” logic is vulnerable.

Solution: For simple decrementing counters, use Redis’s atomic operations like DECR or DECRBY. These commands guarantee that the decrement operation is performed as a single, uninterruptible step, preventing race conditions. For our header-based solution, the main risk is clock skew. Always add a small buffer (e.g., 5-10 seconds) to the TTL you calculate from the reset timestamp to account for network latency and minor clock differences.

2. Forgetting to Handle the “Limit Reached” Case Gracefully

Our code correctly returns a 429 error when the limit is reached, but what should the client do next? Simply failing the user’s request might not be the best user experience.

Solution: Implement a more sophisticated strategy. Instead of immediately failing, you could push the request into a queue (like RabbitMQ or a Redis list) with a delayed execution time. A separate worker process can then retry these jobs after the rate limit has reset. This turns a hard failure into a temporary delay, creating a much more resilient system.

Conclusion

By proactively tracking API rate limits, you transform your application from a reactive victim of API constraints to a resilient and well-behaved citizen of the microservices ecosystem. We’ve demonstrated how to build a robust, centralized tracking system using Redis for both Express.js and Python FastAPI applications. This pattern not only prevents 429 errors but also improves reliability and provides a foundation for more advanced features like request queuing and dynamic backoff strategies.

This foundational concept can be extended to track multiple API endpoints, handle different rate limit policies, and integrate into your observability stack for better monitoring. Start implementing this in your services today to build more stable and predictable applications.