DEV Community

Cover image for Building an AI Chat Rate Limiter with Node.js, Express, and Vercel AI SDK
Fahim Ahammed Firoz
Fahim Ahammed Firoz

Posted on

Building an AI Chat Rate Limiter with Node.js, Express, and Vercel AI SDK

When building AI chat applications, one of the main challenges is controlling costs. Each AI request has a cost, so you cannot allow unlimited usage. A rate limiter helps solve this by restricting how many requests each user can make in a fixed time period.

In this article, we will look at how to build a chatbot backend using Node.js, Express, and the Vercel AI SDK. Our system uses a fixed window rate limiting algorithm to manage usage for different user types: Guest, Free, and Premium.


Why Rate Limiting?

Rate limiting ensures:

  • Fair usage for all users
  • Protection from abuse
  • Cost control for AI services
  • Predictable system performance

For example, if a Guest makes 1,000 requests per hour, your costs can skyrocket. With a limiter, you decide how many requests are allowed.


User Types and Limits

User Type Limit (per hour) Notes
Guest 3 No login required
Free 10 Logged-in users with free plan
Premium 50 Logged-in users with premium plan

The system identifies:

  • Guests by IP address
  • Logged-in users by their user.id (from JWT token)

How the Rate Limiter Works

  1. A request is sent to /api/chat.
  2. If there is a JWT token, it is verified. If missing, the user is a Guest.
  3. The system decides the limit based on user type:

    • Guest = 3
    • Free = 10
    • Premium = 50
  4. The request counter is checked for the current 1-hour window.

  5. If the user has requests left, the system forwards the query to the AI SDK.

  6. If the limit is exceeded, the server returns a 429 error with a clear message.


Fixed Window Algorithm

We use a Fixed Window Algorithm:

  • Keep a count of requests per user (or IP).
  • If the request is within the same window (1 hour), increase the count.
  • If the window has expired, reset the count and start a new window.
  • If the count is above the limit, reject the request.

Example in memory store:

{
  "user123": {
    "count": 7,
    "windowStart": 1695206400000
  }
}
Enter fullscreen mode Exit fullscreen mode

This means user123 has made 7 requests since the last window started.


Flow Diagram

The process can be explained with this flow:

SD

  • Check if the request has a JWT token.
  • If valid, assign user type (Free or Premium). Otherwise, assign Guest.
  • Apply the rate limit based on user type.
  • If the user is new, start tracking requests.
  • If the time window has expired, reset the counter.
  • If the count is within the limit, allow the request. Otherwise, block it.

Conclusion

By combining Express, JWT authentication, and the Vercel AI SDK, we built a chatbot backend with different rate limits for Guests, Free users, and Premium users.

This ensures cost control, fair usage, and a better experience for all users.

Source code: GitHub - chatbot-throttle

⭐ If you find this project useful, don’t forget to star the repository on GitHub!

Top comments (0)