Building a Production-Ready Rate Limiter with Redis in NestJS (Part 2: Workspace-Aware Business Logic)
In Part 1, we built the core Redis rate limiter with atomic Lua scripts. Now we'll add the intelligence that makes it production-ready for a multi-tenant SaaS application.
What we're building:
- Workspace-isolated rate limiting (Slack-style)
- Plan-based limits (Free, Pro, Enterprise)
- Security-focused limits for authentication
- Standard HTTP rate limit headers
- Smart identifier generation
This is the business logic layer that sits on top of our Redis implementation. Let's dive in.
The Challenge: Multi-Tenant Rate Limiting
In a SaaS application, you can't treat all requests equally. Consider these scenarios:
- Workspace A on a Free plan shouldn't be able to exhaust the rate limit for Workspace B on an Enterprise plan
- A user in Workspace A has different limits than the same user in Workspace B
- Authentication endpoints need stricter limits regardless of subscription plan
- Different HTTP methods need different limits (writes are more expensive than reads)
Traditional rate limiters use simple identifiers like IP address or user ID. We need something smarter.
Architecture Overview
Our RateLimitHandler service orchestrates the rate limiting strategy:
import { Injectable, HttpStatus, Logger } from '@nestjs/common';
import type { Request, Response } from 'express';
import { RateLimitConfig } from '../interfaces/security.interface';
import { RedisRateLimiter } from './redis-rate-limiter.service';
interface RequestWithWorkspace extends Request {
workspace?: {
id: string;
slug: string;
plan: string;
};
workspaceId?: string;
user?: {
id: string;
_id?: string;
};
}
@Injectable()
export class RateLimitHandler {
private readonly logger = new Logger(RateLimitHandler.name);
constructor(private readonly redisRateLimiter: RedisRateLimiter) {}
}
The RequestWithWorkspace interface extends Express's Request to include workspace context, which your authentication middleware should populate.
Smart Identifier Generation
The identifier is crucial - it determines the scope of rate limiting. Here's how we generate workspace-aware identifiers:
private getRateLimitIdentifier(req: RequestWithWorkspace): string {
const ip = this.getClientIP(req);
const userId = req.user?.id || req.user?._id || 'anonymous';
const endpoint = req.route?.path || req.path;
// Global routes (no workspace context needed)
if (this.isGlobalRoute(req.path)) {
return `global:${ip}:${userId}:${endpoint}`;
}
// Workspace routes (include workspace ID)
const workspaceId = req.workspaceId || req.workspace?.id || 'unknown';
return `workspace:${workspaceId}:${ip}:${userId}:${endpoint}`;
}
private isGlobalRoute(path: string): boolean {
const globalRoutes = [
'/auth/login',
'/auth/register',
'/auth/forgot-password',
'/auth/reset-password',
'/health',
'/metrics',
];
return globalRoutes.some((route) => path.startsWith(route));
}
private getClientIP(req: Request): string {
const xfwd = (req.headers['x-forwarded-for'] as string | undefined)
?.split(',')[0]
?.trim();
const xreal = (req.headers['x-real-ip'] as string | undefined)?.trim();
const conn = (req as any).connection?.remoteAddress as string | undefined;
const sock = (req.socket as any)?.remoteAddress as string | undefined;
return xfwd || xreal || conn || sock || '127.0.0.1';
}
Why this structure matters:
Global routes - Authentication and health checks are identified by
global:prefix. They're not workspace-specific, so a login attempt from Workspace A doesn't affect Workspace B.-
Workspace routes - Include the workspace ID. This means:
- Each workspace has independent rate limits
- User Alice in Workspace A has separate limits from Alice in Workspace B
- This prevents "noisy neighbor" problems in multi-tenant systems
-
IP + User + Endpoint - The combination prevents:
- Single IP exhausting limits for all users (important in corporate networks)
- Single user exhausting limits across multiple IPs
- Different endpoints interfering with each other
Real-world example:
# Same user, different workspaces = different limits
workspace:ws_123:192.168.1.1:user_abc:/api/projects
workspace:ws_456:192.168.1.1:user_abc:/api/projects
# Same workspace, different endpoints = different limits
workspace:ws_123:192.168.1.1:user_abc:/api/projects
workspace:ws_123:192.168.1.1:user_abc:/api/tasks
# Global routes = isolated from workspaces
global:192.168.1.1:anonymous:/auth/login
Context-Aware Rate Limit Configuration
Different routes need different strategies. Authentication endpoints need strict limits for security, while workspace operations vary by subscription plan:
private getRateLimitConfig(req: RequestWithWorkspace): RateLimitConfig {
const path = req.path;
const method = req.method;
const plan = req.workspace?.plan || 'free';
// Authentication routes - STRICT (global, no plan variation)
if (path.includes('/auth/login')) {
return {
windowMs: 15 * 60 * 1000, // 15 minutes
maxRequests: 5, // Only 5 attempts
blockDurationMs: 60 * 60 * 1000, // 1 hour block
};
}
if (path.includes('/auth/register')) {
return {
windowMs: 60 * 60 * 1000, // 1 hour
maxRequests: 3, // Only 3 registrations
blockDurationMs: 24 * 60 * 60 * 1000, // 24 hour block
};
}
if (path.includes('/auth/')) {
return {
windowMs: 5 * 60 * 1000, // 5 minutes
maxRequests: 10,
blockDurationMs: 30 * 60 * 1000, // 30 minutes block
};
}
// Workspace-specific routes - PLAN-BASED
return this.getWorkspaceRateLimitByPlan(plan, method);
}
Security-first design decisions:
Login limits are aggressive - 5 attempts in 15 minutes protects against brute force attacks. After 5 failures, the attacker is blocked for an hour.
Registration is even stricter - 3 registrations per hour prevents automated account creation. 24-hour block discourages abuse.
Authentication limits are global - They don't vary by subscription plan. Security is not a premium feature.
Progressive blocking - The
blockDurationMstemporarily bans abusive users, giving your system time to recover.
Plan-Based Rate Limits
Here's where subscription tiers translate into actual technical limits:
private getWorkspaceRateLimitByPlan(
plan: string,
method: string,
): RateLimitConfig {
const planLimits = {
free: {
POST: { windowMs: 60 * 1000, maxRequests: 20 },
PUT: { windowMs: 60 * 1000, maxRequests: 20 },
DELETE: { windowMs: 60 * 1000, maxRequests: 10 },
PATCH: { windowMs: 60 * 1000, maxRequests: 20 },
GET: { windowMs: 60 * 1000, maxRequests: 100 },
},
pro: {
POST: { windowMs: 60 * 1000, maxRequests: 100 },
PUT: { windowMs: 60 * 1000, maxRequests: 100 },
DELETE: { windowMs: 60 * 1000, maxRequests: 50 },
PATCH: { windowMs: 60 * 1000, maxRequests: 100 },
GET: { windowMs: 60 * 1000, maxRequests: 500 },
},
enterprise: {
POST: { windowMs: 60 * 1000, maxRequests: 1000 },
PUT: { windowMs: 60 * 1000, maxRequests: 1000 },
DELETE: { windowMs: 60 * 1000, maxRequests: 500 },
PATCH: { windowMs: 60 * 1000, maxRequests: 1000 },
GET: { windowMs: 60 * 1000, maxRequests: 5000 },
},
};
const limits = planLimits[plan] || planLimits.free;
const methodLimit = limits[method] || limits.GET;
return {
...methodLimit,
blockDurationMs: 5 * 60 * 1000, // 5 minutes block for all workspace operations
};
}
Design rationale:
Write operations cost more - POST/PUT/PATCH operations are more expensive (database writes, validation, business logic). They get stricter limits.
DELETE is most restricted - Data deletion is sensitive and often irreversible. Even Enterprise gets fewer DELETE operations than other writes.
GET requests are abundant - Read operations are cheaper and more common. Free tier gets 100/min, Enterprise gets 5000/min.
Clear value ladder - Free → Pro = 5x increase, Pro → Enterprise = 10x increase. This creates a tangible reason to upgrade.
Consistent windows - All plans use 60-second windows for predictability. Users can easily reason about their limits.
Real-world comparison:
Free tier: 20 writes/min = 1,200/hour = ~29K/day
Pro tier: 100 writes/min = 6,000/hour = ~144K/day
Enterprise: 1,000 writes/min = 60,000/hour = ~1.4M/day
The Main Rate Limit Check
Now let's implement the method that ties it all together:
async checkRateLimit(
req: RequestWithWorkspace,
res: Response,
): Promise<boolean> {
const identifier = this.getRateLimitIdentifier(req);
const config = this.getRateLimitConfig(req);
this.logger.debug(`Rate limit check - Identifier: ${identifier}`);
const result = await this.redisRateLimiter.checkRateLimit(
identifier,
config,
1, // increment by 1
);
// Set standard rate limit headers
res.setHeader('X-RateLimit-Limit', config.maxRequests.toString());
res.setHeader(
'X-RateLimit-Remaining',
Math.max(0, result.remaining).toString(),
);
res.setHeader('X-RateLimit-Reset', result.resetTime.toISOString());
if (result.isBlocked) {
if (result.retryAfter) {
res.setHeader('Retry-After', result.retryAfter.toString());
}
this.logger.warn(`Rate limit exceeded for identifier: ${identifier}`);
res.status(HttpStatus.TOO_MANY_REQUESTS).json({
success: false,
message: 'Too many requests. Please try again later.',
retryAfter: result.retryAfter,
timestamp: new Date().toISOString(),
});
return false;
}
return true;
}
What's happening here:
- Generate context-aware identifier - Uses workspace, user, IP, and endpoint
- Get appropriate config - Based on route type and subscription plan
- Check with Redis - Our atomic Lua script handles the logic
- Set response headers - Standard rate limit headers (more on this below)
- Return structured error - If blocked, send 429 with retry information
- Return boolean - Calling code can easily check if request should proceed
Standard Rate Limit Headers
Your API clients need to know their rate limit status. These headers follow RFC 6585 and common conventions:
// Always present
X-RateLimit-Limit: 100 // Maximum requests allowed in window
X-RateLimit-Remaining: 73 // Requests remaining before limit
X-RateLimit-Reset: 2026-01-07T15:30:00.000Z // When counter resets
// Present when blocked (429 response)
Retry-After: 300 // Seconds until unblocked
These headers let client applications:
- Display accurate "X requests remaining" to users
- Implement intelligent retry logic with exponential backoff
- Show countdown timers until rate limit reset
- Warn users before hitting limits
Client implementation example:
// Client-side handling
const response = await fetch('/api/projects', {
method: 'POST',
body: JSON.stringify(project),
});
if (response.status === 429) {
const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
console.log(`Rate limited. Retry in ${retryAfter} seconds`);
// Show user-friendly message
toast.error(`Too many requests. Please wait ${retryAfter} seconds.`);
// Automatically retry after delay
setTimeout(() => retryRequest(), retryAfter * 1000);
} else {
const remaining = response.headers.get('X-RateLimit-Remaining');
if (parseInt(remaining) < 10) {
toast.warning('Approaching rate limit');
}
}
Integration with NestJS Guards
To use this in your NestJS application, create a guard:
import { Injectable, CanActivate, ExecutionContext } from '@nestjs/common';
import { Reflector } from '@nestjs/core';
import { RateLimitHandler } from './rate-limit-handler.service';
import { SKIP_RATE_LIMIT_KEY } from './decorators/skip-rate-limit.decorator';
@Injectable()
export class RateLimitGuard implements CanActivate {
constructor(
private readonly rateLimitHandler: RateLimitHandler,
private readonly reflector: Reflector,
) {}
async canActivate(context: ExecutionContext): Promise<boolean> {
// Check if rate limiting should be skipped
const skipRateLimit = this.reflector.getAllAndOverride<boolean>(
SKIP_RATE_LIMIT_KEY,
[context.getHandler(), context.getClass()],
);
if (skipRateLimit) {
return true;
}
const request = context.switchToHttp().getRequest();
const response = context.switchToHttp().getResponse();
return await this.rateLimitHandler.checkRateLimit(request, response);
}
}
Custom Decorator for Skipping Rate Limits
For endpoints that shouldn't be rate limited (like health checks or internal admin routes), create a custom decorator:
// decorators/skip-rate-limit.decorator.ts
import { SetMetadata } from '@nestjs/common';
export const SKIP_RATE_LIMIT_KEY = 'skipRateLimit';
export const SkipRateLimit = () => SetMetadata(SKIP_RATE_LIMIT_KEY, true);
Then update your guard to respect this decorator (the guard code shown above already includes this):
Usage:
@Controller('health')
export class HealthController {
@Get()
@SkipRateLimit() // This endpoint won't be rate limited
check() {
return { status: 'ok' };
}
}
Apply globally:
// main.ts
import { RateLimitGuard } from './security/guards/rate-limit.guard';
async function bootstrap() {
const app = await NestFactory.create(AppModule);
// Apply rate limiting to all routes
app.useGlobalGuards(app.get(RateLimitGuard));
await app.listen(3000);
}
Or selectively:
// controller.ts
@Controller('projects')
@UseGuards(RateLimitGuard) // Apply to entire controller
export class ProjectsController {
@Post()
create(@Body() dto: CreateProjectDto) {
// This endpoint is rate limited
}
@Get()
@SkipRateLimit() // Custom decorator to skip rate limiting
findAll() {
// This endpoint is NOT rate limited
}
}
Real-World Production Scenarios
Let's walk through some practical examples:
Scenario 1: Normal User Activity
// User in Free tier workspace making requests
Request 1: POST /api/projects
→ Identifier: workspace:ws_free:192.168.1.1:user_123:/api/projects
→ Limit: 20/min
→ Result: ✅ Allowed (1/20 used)
→ Headers: X-RateLimit-Remaining: 19
Request 2: POST /api/projects (10 seconds later)
→ Same identifier
→ Result: ✅ Allowed (2/20 used)
→ Headers: X-RateLimit-Remaining: 18
Scenario 2: Hitting Rate Limit
// User makes 21st POST request in same minute
Request 21: POST /api/projects
→ Limit: 20/min
→ Result: ❌ Blocked
→ Status: 429 Too Many Requests
→ Headers:
X-RateLimit-Limit: 20
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 2026-01-07T15:31:00.000Z
Retry-After: 37
→ Response: {
success: false,
message: "Too many requests. Please try again later.",
retryAfter: 37
}
Scenario 3: Workspace Isolation
// Same user, different workspaces
Workspace A (Free): POST /api/projects
→ Identifier: workspace:ws_free:192.168.1.1:user_123:/api/projects
→ Result: ❌ Blocked (hit 20/min limit)
Workspace B (Enterprise): POST /api/projects
→ Identifier: workspace:ws_ent:192.168.1.1:user_123:/api/projects
→ Result: ✅ Allowed (1/1000 used)
→ Different workspace = independent rate limit!
Scenario 4: Brute Force Protection
// Attacker trying to brute force login
Attempt 1-5: POST /auth/login (wrong password)
→ Identifier: global:192.168.1.1:anonymous:/auth/login
→ Result: ✅ Allowed (but login fails)
Attempt 6: POST /auth/login
→ Result: ❌ Blocked for 1 hour
→ Status: 429
→ Headers: Retry-After: 3600
// Even if they switch IPs, user-based blocking kicks in
// (if you track user ID in login attempts)
Module Setup
Wire everything together in your NestJS module:
// security.module.ts
import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { RedisRateLimiter } from './services/redis-rate-limiter.service';
import { RateLimitHandler } from './services/rate-limit-handler.service';
import { RateLimitGuard } from './guards/rate-limit.guard';
@Module({
imports: [ConfigModule],
providers: [
RedisRateLimiter,
RateLimitHandler,
RateLimitGuard,
],
exports: [
RedisRateLimiter,
RateLimitHandler,
RateLimitGuard,
],
})
export class SecurityModule {}
Monitoring and Alerting
In production, you need visibility into rate limiting behavior:
// Add a monitoring endpoint (admin only)
@Controller('admin/rate-limits')
export class RateLimitMonitoringController {
constructor(private readonly redisRateLimiter: RedisRateLimiter) {}
@Get('top-consumers')
async getTopConsumers() {
return await this.redisRateLimiter.getTopConsumers(50);
}
@Get('health')
async getHealth() {
return await this.redisRateLimiter.healthCheck();
}
@Get('metrics')
async getMetrics() {
return await this.redisRateLimiter.getMetrics();
}
@Post('reset/:identifier')
async resetLimit(@Param('identifier') identifier: string) {
await this.redisRateLimiter.resetRateLimit(identifier);
return { success: true, message: 'Rate limit reset' };
}
}
Set up alerts:
// Use your monitoring service (DataDog, New Relic, etc.)
this.logger.warn(`Rate limit exceeded for ${identifier}`, {
workspace: req.workspace?.id,
user: req.user?.id,
endpoint: req.path,
plan: req.workspace?.plan,
});
// Alert if too many 429 responses
if (result.isBlocked) {
this.metrics.increment('rate_limit.blocked', {
plan: req.workspace?.plan,
endpoint: req.path,
});
}
Advanced Patterns
Per-Endpoint Overrides
// Some endpoints need custom limits
if (path.includes('/api/exports')) {
return {
windowMs: 60 * 60 * 1000, // 1 hour
maxRequests: 5, // Only 5 exports/hour
blockDurationMs: 0, // Don't block, just limit
};
}
User-Specific Overrides
// VIP users get higher limits
const userTier = await this.getUserTier(req.user?.id);
if (userTier === 'vip') {
config.maxRequests *= 2; // Double their limit
}
Burst Handling (Future Enhancement)
The current implementation uses fixed windows. For burst handling, you'd need to implement a token bucket algorithm:
// Future enhancement: Token bucket for burst handling
// This would require extending RateLimitConfig:
interface TokenBucketConfig extends RateLimitConfig {
burstLimit: number; // Maximum burst capacity
refillRate: number; // Tokens per second
}
// Example usage (not implemented in current version):
return {
windowMs: 60 * 1000,
maxRequests: 100, // Average rate
burstLimit: 150, // Allow short bursts up to 150
refillRate: 100 / 60, // Refill at average rate
};
Note: This requires modifying the Lua script to implement token bucket logic. The current fixed-window implementation is simpler and sufficient for most use cases.
Testing Strategies
Unit Tests
describe('RateLimitHandler', () => {
it('should use workspace-scoped identifiers', () => {
const req = {
workspace: { id: 'ws_123', plan: 'pro' },
user: { id: 'user_456' },
path: '/api/projects',
} as any;
const identifier = handler['getRateLimitIdentifier'](req);
expect(identifier).toContain('workspace:ws_123');
expect(identifier).toContain('user_456');
});
it('should apply stricter limits to free plans', () => {
const freeConfig = handler['getWorkspaceRateLimitByPlan']('free', 'POST');
const proConfig = handler['getWorkspaceRateLimitByPlan']('pro', 'POST');
expect(freeConfig.maxRequests).toBe(20);
expect(proConfig.maxRequests).toBe(100);
});
});
Integration Tests
describe('Rate Limiting E2E', () => {
it('should enforce plan-based limits', async () => {
// Make 21 requests (Free tier limit is 20)
const requests = Array(21).fill(null).map(() =>
request(app.getHttpServer())
.post('/api/projects')
.set('Authorization', `Bearer ${freeUserToken}`)
.send({ name: 'Test Project' })
);
const responses = await Promise.all(requests);
const blocked = responses.filter(r => r.status === 429);
expect(blocked.length).toBeGreaterThan(0);
});
});
Performance Considerations
Based on production usage:
Latency Impact:
- Rate limit check adds ~3-5ms per request (Redis latency)
- Lua script execution: <1ms
- Total overhead: ~5-7ms per request
Redis Memory:
- ~200 bytes per active rate limit key
- 100,000 active users = ~20MB
- With TTL cleanup: memory stays stable
Throughput:
- Single Redis instance: ~10,000 checks/second
- Redis Cluster: 50,000+ checks/second
- Bottleneck is usually network, not Redis
Common Pitfalls to Avoid
- Don't use client IP alone - NAT and proxies mean multiple users share IPs
- Don't forget X-Forwarded-For - You'll rate limit your load balancer instead of users
- Don't block global routes by workspace - Authentication should be globally scoped
- Don't use the same limits for all HTTP methods - Writes cost more than reads
- Don't fail closed - If Redis is down, allow requests (fail open)
- Don't forget about workspace isolation - Multi-tenancy is critical in SaaS
Key Takeaways
Building production-grade rate limiting for multi-tenant SaaS taught me:
- Context is everything - Workspace-aware identifiers prevent noisy neighbor problems
- Plan-based limits are a feature - Rate limits differentiate your pricing tiers
- Security doesn't tier - Authentication limits should be strict for everyone
- Headers matter - Standard rate limit headers enable smart client behavior
- Fail open - Availability is more important than perfect rate limiting
- Monitor everything - You need visibility into who's hitting limits and why
This implementation has been running in production for months, handling millions of requests across thousands of workspaces. It's proven reliable, performant, and maintainable.
What's Next?
Potential enhancements:
- Sliding window algorithm for more precise limiting
- Token bucket algorithm for burst handling
- Distributed rate limiting across regions
- Machine learning for anomaly detection
- Dynamic limits based on system load
Have questions or improvements? I'd love to hear how you've implemented rate limiting in your SaaS applications!
Series recap:
- Part 1: Core Redis implementation with Lua scripts
- Part 2 (this article): Workspace-aware business logic
GitHub: Complete implementation with tests available in my repositories.
Tags: #nestjs #redis #ratelimiting #typescript #saas #multitenant #backend #nodejs
Top comments (0)