When building AI chat applications, one of the main challenges is controlling costs. Each AI request has a cost, so you cannot allow unlimited usage. A rate limiter helps solve this by restricting how many requests each user can make in a fixed time period.
In this article, we will look at how to build a chatbot backend using Node.js, Express, and the Vercel AI SDK. Our system uses a fixed window rate limiting algorithm to manage usage for different user types: Guest, Free, and Premium.
Why Rate Limiting?
Rate limiting ensures:
- Fair usage for all users
- Protection from abuse
- Cost control for AI services
- Predictable system performance
For example, if a Guest makes 1,000 requests per hour, your costs can skyrocket. With a limiter, you decide how many requests are allowed.
User Types and Limits
User Type | Limit (per hour) | Notes |
---|---|---|
Guest | 3 | No login required |
Free | 10 | Logged-in users with free plan |
Premium | 50 | Logged-in users with premium plan |
The system identifies:
- Guests by IP address
-
Logged-in users by their
user.id
(from JWT token)
How the Rate Limiter Works
- A request is sent to
/api/chat
. - If there is a JWT token, it is verified. If missing, the user is a Guest.
-
The system decides the limit based on user type:
- Guest = 3
- Free = 10
- Premium = 50
The request counter is checked for the current 1-hour window.
If the user has requests left, the system forwards the query to the AI SDK.
If the limit is exceeded, the server returns a 429 error with a clear message.
Fixed Window Algorithm
We use a Fixed Window Algorithm:
- Keep a count of requests per user (or IP).
- If the request is within the same window (1 hour), increase the count.
- If the window has expired, reset the count and start a new window.
- If the count is above the limit, reject the request.
Example in memory store:
{
"user123": {
"count": 7,
"windowStart": 1695206400000
}
}
This means user123 has made 7 requests since the last window started.
Flow Diagram
The process can be explained with this flow:
- Check if the request has a JWT token.
- If valid, assign user type (Free or Premium). Otherwise, assign Guest.
- Apply the rate limit based on user type.
- If the user is new, start tracking requests.
- If the time window has expired, reset the counter.
- If the count is within the limit, allow the request. Otherwise, block it.
Conclusion
By combining Express, JWT authentication, and the Vercel AI SDK, we built a chatbot backend with different rate limits for Guests, Free users, and Premium users.
This ensures cost control, fair usage, and a better experience for all users.
Source code: GitHub - chatbot-throttle
⭐ If you find this project useful, don’t forget to star the repository on GitHub!
Top comments (0)