Building an AI Chat Rate Limiter with Node.js, Express, and Vercel AI SDK

#ratelimit #fixedwindow #aisdk #aichatratelimiter

When building AI chat applications, one of the main challenges is controlling costs. Each AI request has a cost, so you cannot allow unlimited usage. A rate limiter helps solve this by restricting how many requests each user can make in a fixed time period.

In this article, we will look at how to build a chatbot backend using Node.js, Express, and the Vercel AI SDK. Our system uses a fixed window rate limiting algorithm to manage usage for different user types: Guest, Free, and Premium.

Why Rate Limiting?

Rate limiting ensures:

Fair usage for all users
Protection from abuse
Cost control for AI services
Predictable system performance

For example, if a Guest makes 1,000 requests per hour, your costs can skyrocket. With a limiter, you decide how many requests are allowed.

User Types and Limits

User Type	Limit (per hour)	Notes
Guest	3	No login required
Free	10	Logged-in users with free plan
Premium	50	Logged-in users with premium plan

The system identifies:

Guests by IP address
Logged-in users by their user.id (from JWT token)

How the Rate Limiter Works

A request is sent to /api/chat.
If there is a JWT token, it is verified. If missing, the user is a Guest.
The system decides the limit based on user type:
- Guest = 3
- Free = 10
- Premium = 50
The request counter is checked for the current 1-hour window.
If the user has requests left, the system forwards the query to the AI SDK.
If the limit is exceeded, the server returns a 429 error with a clear message.

Fixed Window Algorithm

We use a Fixed Window Algorithm:

Keep a count of requests per user (or IP).
If the request is within the same window (1 hour), increase the count.
If the window has expired, reset the count and start a new window.
If the count is above the limit, reject the request.

Example in memory store:

{
  "user123": {
    "count": 7,
    "windowStart": 1695206400000
  }
}

This means user123 has made 7 requests since the last window started.

Flow Diagram

The process can be explained with this flow:

Check if the request has a JWT token.
If valid, assign user type (Free or Premium). Otherwise, assign Guest.
Apply the rate limit based on user type.
If the user is new, start tracking requests.
If the time window has expired, reset the counter.
If the count is within the limit, allow the request. Otherwise, block it.

Conclusion

By combining Express, JWT authentication, and the Vercel AI SDK, we built a chatbot backend with different rate limits for Guests, Free users, and Premium users.

This ensures cost control, fair usage, and a better experience for all users.

Source code: GitHub - chatbot-throttle

⭐ If you find this project useful, don’t forget to star the repository on GitHub!