DEV Community

Cover image for Building a Production-Ready Rate Limiter with Redis in NestJS (Part 2: Multi-Tenant & Plan-Based Logic)
Ajao Yussuf
Ajao Yussuf

Posted on

Building a Production-Ready Rate Limiter with Redis in NestJS (Part 2: Multi-Tenant & Plan-Based Logic)

Building a Production-Ready Rate Limiter with Redis in NestJS (Part 2: Workspace-Aware Business Logic)

In Part 1, we built the core Redis rate limiter with atomic Lua scripts. Now we'll add the intelligence that makes it production-ready for a multi-tenant SaaS application.

What we're building:

  • Workspace-isolated rate limiting (Slack-style)
  • Plan-based limits (Free, Pro, Enterprise)
  • Security-focused limits for authentication
  • Standard HTTP rate limit headers
  • Smart identifier generation

This is the business logic layer that sits on top of our Redis implementation. Let's dive in.

The Challenge: Multi-Tenant Rate Limiting

In a SaaS application, you can't treat all requests equally. Consider these scenarios:

  1. Workspace A on a Free plan shouldn't be able to exhaust the rate limit for Workspace B on an Enterprise plan
  2. A user in Workspace A has different limits than the same user in Workspace B
  3. Authentication endpoints need stricter limits regardless of subscription plan
  4. Different HTTP methods need different limits (writes are more expensive than reads)

Traditional rate limiters use simple identifiers like IP address or user ID. We need something smarter.

Architecture Overview

Our RateLimitHandler service orchestrates the rate limiting strategy:

import { Injectable, HttpStatus, Logger } from '@nestjs/common';
import type { Request, Response } from 'express';
import { RateLimitConfig } from '../interfaces/security.interface';
import { RedisRateLimiter } from './redis-rate-limiter.service';

interface RequestWithWorkspace extends Request {
  workspace?: {
    id: string;
    slug: string;
    plan: string;
  };
  workspaceId?: string;
  user?: {
    id: string;
    _id?: string;
  };
}

@Injectable()
export class RateLimitHandler {
  private readonly logger = new Logger(RateLimitHandler.name);

  constructor(private readonly redisRateLimiter: RedisRateLimiter) {}
}
Enter fullscreen mode Exit fullscreen mode

The RequestWithWorkspace interface extends Express's Request to include workspace context, which your authentication middleware should populate.

Smart Identifier Generation

The identifier is crucial - it determines the scope of rate limiting. Here's how we generate workspace-aware identifiers:

private getRateLimitIdentifier(req: RequestWithWorkspace): string {
  const ip = this.getClientIP(req);
  const userId = req.user?.id || req.user?._id || 'anonymous';
  const endpoint = req.route?.path || req.path;

  // Global routes (no workspace context needed)
  if (this.isGlobalRoute(req.path)) {
    return `global:${ip}:${userId}:${endpoint}`;
  }

  // Workspace routes (include workspace ID)
  const workspaceId = req.workspaceId || req.workspace?.id || 'unknown';
  return `workspace:${workspaceId}:${ip}:${userId}:${endpoint}`;
}

private isGlobalRoute(path: string): boolean {
  const globalRoutes = [
    '/auth/login',
    '/auth/register',
    '/auth/forgot-password',
    '/auth/reset-password',
    '/health',
    '/metrics',
  ];

  return globalRoutes.some((route) => path.startsWith(route));
}

private getClientIP(req: Request): string {
  const xfwd = (req.headers['x-forwarded-for'] as string | undefined)
    ?.split(',')[0]
    ?.trim();
  const xreal = (req.headers['x-real-ip'] as string | undefined)?.trim();
  const conn = (req as any).connection?.remoteAddress as string | undefined;
  const sock = (req.socket as any)?.remoteAddress as string | undefined;
  return xfwd || xreal || conn || sock || '127.0.0.1';
}
Enter fullscreen mode Exit fullscreen mode

Why this structure matters:

  1. Global routes - Authentication and health checks are identified by global: prefix. They're not workspace-specific, so a login attempt from Workspace A doesn't affect Workspace B.

  2. Workspace routes - Include the workspace ID. This means:

    • Each workspace has independent rate limits
    • User Alice in Workspace A has separate limits from Alice in Workspace B
    • This prevents "noisy neighbor" problems in multi-tenant systems
  3. IP + User + Endpoint - The combination prevents:

    • Single IP exhausting limits for all users (important in corporate networks)
    • Single user exhausting limits across multiple IPs
    • Different endpoints interfering with each other

Real-world example:

# Same user, different workspaces = different limits
workspace:ws_123:192.168.1.1:user_abc:/api/projects
workspace:ws_456:192.168.1.1:user_abc:/api/projects

# Same workspace, different endpoints = different limits  
workspace:ws_123:192.168.1.1:user_abc:/api/projects
workspace:ws_123:192.168.1.1:user_abc:/api/tasks

# Global routes = isolated from workspaces
global:192.168.1.1:anonymous:/auth/login
Enter fullscreen mode Exit fullscreen mode

Context-Aware Rate Limit Configuration

Different routes need different strategies. Authentication endpoints need strict limits for security, while workspace operations vary by subscription plan:

private getRateLimitConfig(req: RequestWithWorkspace): RateLimitConfig {
  const path = req.path;
  const method = req.method;
  const plan = req.workspace?.plan || 'free';

  // Authentication routes - STRICT (global, no plan variation)
  if (path.includes('/auth/login')) {
    return {
      windowMs: 15 * 60 * 1000,        // 15 minutes
      maxRequests: 5,                   // Only 5 attempts
      blockDurationMs: 60 * 60 * 1000,  // 1 hour block
    };
  }

  if (path.includes('/auth/register')) {
    return {
      windowMs: 60 * 60 * 1000,         // 1 hour
      maxRequests: 3,                    // Only 3 registrations
      blockDurationMs: 24 * 60 * 60 * 1000, // 24 hour block
    };
  }

  if (path.includes('/auth/')) {
    return {
      windowMs: 5 * 60 * 1000,          // 5 minutes
      maxRequests: 10,
      blockDurationMs: 30 * 60 * 1000,  // 30 minutes block
    };
  }

  // Workspace-specific routes - PLAN-BASED
  return this.getWorkspaceRateLimitByPlan(plan, method);
}
Enter fullscreen mode Exit fullscreen mode

Security-first design decisions:

  1. Login limits are aggressive - 5 attempts in 15 minutes protects against brute force attacks. After 5 failures, the attacker is blocked for an hour.

  2. Registration is even stricter - 3 registrations per hour prevents automated account creation. 24-hour block discourages abuse.

  3. Authentication limits are global - They don't vary by subscription plan. Security is not a premium feature.

  4. Progressive blocking - The blockDurationMs temporarily bans abusive users, giving your system time to recover.

Plan-Based Rate Limits

Here's where subscription tiers translate into actual technical limits:

private getWorkspaceRateLimitByPlan(
  plan: string,
  method: string,
): RateLimitConfig {
  const planLimits = {
    free: {
      POST: { windowMs: 60 * 1000, maxRequests: 20 },
      PUT: { windowMs: 60 * 1000, maxRequests: 20 },
      DELETE: { windowMs: 60 * 1000, maxRequests: 10 },
      PATCH: { windowMs: 60 * 1000, maxRequests: 20 },
      GET: { windowMs: 60 * 1000, maxRequests: 100 },
    },
    pro: {
      POST: { windowMs: 60 * 1000, maxRequests: 100 },
      PUT: { windowMs: 60 * 1000, maxRequests: 100 },
      DELETE: { windowMs: 60 * 1000, maxRequests: 50 },
      PATCH: { windowMs: 60 * 1000, maxRequests: 100 },
      GET: { windowMs: 60 * 1000, maxRequests: 500 },
    },
    enterprise: {
      POST: { windowMs: 60 * 1000, maxRequests: 1000 },
      PUT: { windowMs: 60 * 1000, maxRequests: 1000 },
      DELETE: { windowMs: 60 * 1000, maxRequests: 500 },
      PATCH: { windowMs: 60 * 1000, maxRequests: 1000 },
      GET: { windowMs: 60 * 1000, maxRequests: 5000 },
    },
  };

  const limits = planLimits[plan] || planLimits.free;
  const methodLimit = limits[method] || limits.GET;

  return {
    ...methodLimit,
    blockDurationMs: 5 * 60 * 1000, // 5 minutes block for all workspace operations
  };
}
Enter fullscreen mode Exit fullscreen mode

Design rationale:

  1. Write operations cost more - POST/PUT/PATCH operations are more expensive (database writes, validation, business logic). They get stricter limits.

  2. DELETE is most restricted - Data deletion is sensitive and often irreversible. Even Enterprise gets fewer DELETE operations than other writes.

  3. GET requests are abundant - Read operations are cheaper and more common. Free tier gets 100/min, Enterprise gets 5000/min.

  4. Clear value ladder - Free → Pro = 5x increase, Pro → Enterprise = 10x increase. This creates a tangible reason to upgrade.

  5. Consistent windows - All plans use 60-second windows for predictability. Users can easily reason about their limits.

Real-world comparison:

Free tier:    20 writes/min  = 1,200/hour   = ~29K/day
Pro tier:     100 writes/min = 6,000/hour   = ~144K/day  
Enterprise:   1,000 writes/min = 60,000/hour = ~1.4M/day
Enter fullscreen mode Exit fullscreen mode

The Main Rate Limit Check

Now let's implement the method that ties it all together:

async checkRateLimit(
  req: RequestWithWorkspace,
  res: Response,
): Promise<boolean> {
  const identifier = this.getRateLimitIdentifier(req);
  const config = this.getRateLimitConfig(req);

  this.logger.debug(`Rate limit check - Identifier: ${identifier}`);

  const result = await this.redisRateLimiter.checkRateLimit(
    identifier,
    config,
    1, // increment by 1
  );

  // Set standard rate limit headers
  res.setHeader('X-RateLimit-Limit', config.maxRequests.toString());
  res.setHeader(
    'X-RateLimit-Remaining',
    Math.max(0, result.remaining).toString(),
  );
  res.setHeader('X-RateLimit-Reset', result.resetTime.toISOString());

  if (result.isBlocked) {
    if (result.retryAfter) {
      res.setHeader('Retry-After', result.retryAfter.toString());
    }

    this.logger.warn(`Rate limit exceeded for identifier: ${identifier}`);

    res.status(HttpStatus.TOO_MANY_REQUESTS).json({
      success: false,
      message: 'Too many requests. Please try again later.',
      retryAfter: result.retryAfter,
      timestamp: new Date().toISOString(),
    });
    return false;
  }

  return true;
}
Enter fullscreen mode Exit fullscreen mode

What's happening here:

  1. Generate context-aware identifier - Uses workspace, user, IP, and endpoint
  2. Get appropriate config - Based on route type and subscription plan
  3. Check with Redis - Our atomic Lua script handles the logic
  4. Set response headers - Standard rate limit headers (more on this below)
  5. Return structured error - If blocked, send 429 with retry information
  6. Return boolean - Calling code can easily check if request should proceed

Standard Rate Limit Headers

Your API clients need to know their rate limit status. These headers follow RFC 6585 and common conventions:

// Always present
X-RateLimit-Limit: 100           // Maximum requests allowed in window
X-RateLimit-Remaining: 73        // Requests remaining before limit
X-RateLimit-Reset: 2026-01-07T15:30:00.000Z  // When counter resets

// Present when blocked (429 response)
Retry-After: 300                 // Seconds until unblocked
Enter fullscreen mode Exit fullscreen mode

These headers let client applications:

  • Display accurate "X requests remaining" to users
  • Implement intelligent retry logic with exponential backoff
  • Show countdown timers until rate limit reset
  • Warn users before hitting limits

Client implementation example:

// Client-side handling
const response = await fetch('/api/projects', {
  method: 'POST',
  body: JSON.stringify(project),
});

if (response.status === 429) {
  const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
  console.log(`Rate limited. Retry in ${retryAfter} seconds`);

  // Show user-friendly message
  toast.error(`Too many requests. Please wait ${retryAfter} seconds.`);

  // Automatically retry after delay
  setTimeout(() => retryRequest(), retryAfter * 1000);
} else {
  const remaining = response.headers.get('X-RateLimit-Remaining');
  if (parseInt(remaining) < 10) {
    toast.warning('Approaching rate limit');
  }
}
Enter fullscreen mode Exit fullscreen mode

Integration with NestJS Guards

To use this in your NestJS application, create a guard:

import { Injectable, CanActivate, ExecutionContext } from '@nestjs/common';
import { Reflector } from '@nestjs/core';
import { RateLimitHandler } from './rate-limit-handler.service';
import { SKIP_RATE_LIMIT_KEY } from './decorators/skip-rate-limit.decorator';

@Injectable()
export class RateLimitGuard implements CanActivate {
  constructor(
    private readonly rateLimitHandler: RateLimitHandler,
    private readonly reflector: Reflector,
  ) {}

  async canActivate(context: ExecutionContext): Promise<boolean> {
    // Check if rate limiting should be skipped
    const skipRateLimit = this.reflector.getAllAndOverride<boolean>(
      SKIP_RATE_LIMIT_KEY,
      [context.getHandler(), context.getClass()],
    );

    if (skipRateLimit) {
      return true;
    }

    const request = context.switchToHttp().getRequest();
    const response = context.switchToHttp().getResponse();

    return await this.rateLimitHandler.checkRateLimit(request, response);
  }
}
Enter fullscreen mode Exit fullscreen mode

Custom Decorator for Skipping Rate Limits

For endpoints that shouldn't be rate limited (like health checks or internal admin routes), create a custom decorator:

// decorators/skip-rate-limit.decorator.ts
import { SetMetadata } from '@nestjs/common';

export const SKIP_RATE_LIMIT_KEY = 'skipRateLimit';
export const SkipRateLimit = () => SetMetadata(SKIP_RATE_LIMIT_KEY, true);
Enter fullscreen mode Exit fullscreen mode

Then update your guard to respect this decorator (the guard code shown above already includes this):

Usage:

@Controller('health')
export class HealthController {
  @Get()
  @SkipRateLimit()  // This endpoint won't be rate limited
  check() {
    return { status: 'ok' };
  }
}
Enter fullscreen mode Exit fullscreen mode

Apply globally:

// main.ts
import { RateLimitGuard } from './security/guards/rate-limit.guard';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  // Apply rate limiting to all routes
  app.useGlobalGuards(app.get(RateLimitGuard));

  await app.listen(3000);
}
Enter fullscreen mode Exit fullscreen mode

Or selectively:

// controller.ts
@Controller('projects')
@UseGuards(RateLimitGuard)  // Apply to entire controller
export class ProjectsController {
  @Post()
  create(@Body() dto: CreateProjectDto) {
    // This endpoint is rate limited
  }

  @Get()
  @SkipRateLimit()  // Custom decorator to skip rate limiting
  findAll() {
    // This endpoint is NOT rate limited
  }
}
Enter fullscreen mode Exit fullscreen mode

Real-World Production Scenarios

Let's walk through some practical examples:

Scenario 1: Normal User Activity

// User in Free tier workspace making requests
Request 1: POST /api/projects
 Identifier: workspace:ws_free:192.168.1.1:user_123:/api/projects
 Limit: 20/min
 Result:  Allowed (1/20 used)
 Headers: X-RateLimit-Remaining: 19

Request 2: POST /api/projects (10 seconds later)
 Same identifier
 Result:  Allowed (2/20 used)
 Headers: X-RateLimit-Remaining: 18
Enter fullscreen mode Exit fullscreen mode

Scenario 2: Hitting Rate Limit

// User makes 21st POST request in same minute
Request 21: POST /api/projects
 Limit: 20/min
 Result:  Blocked
 Status: 429 Too Many Requests
 Headers:
    X-RateLimit-Limit: 20
    X-RateLimit-Remaining: 0
    X-RateLimit-Reset: 2026-01-07T15:31:00.000Z
    Retry-After: 37
 Response: {
    success: false,
    message: "Too many requests. Please try again later.",
    retryAfter: 37
  }
Enter fullscreen mode Exit fullscreen mode

Scenario 3: Workspace Isolation

// Same user, different workspaces
Workspace A (Free): POST /api/projects
 Identifier: workspace:ws_free:192.168.1.1:user_123:/api/projects
 Result:  Blocked (hit 20/min limit)

Workspace B (Enterprise): POST /api/projects
 Identifier: workspace:ws_ent:192.168.1.1:user_123:/api/projects
 Result:  Allowed (1/1000 used)
 Different workspace = independent rate limit!
Enter fullscreen mode Exit fullscreen mode

Scenario 4: Brute Force Protection

// Attacker trying to brute force login
Attempt 1-5: POST /auth/login (wrong password)
 Identifier: global:192.168.1.1:anonymous:/auth/login
 Result:  Allowed (but login fails)

Attempt 6: POST /auth/login
 Result:  Blocked for 1 hour
 Status: 429
 Headers: Retry-After: 3600

// Even if they switch IPs, user-based blocking kicks in
// (if you track user ID in login attempts)
Enter fullscreen mode Exit fullscreen mode

Module Setup

Wire everything together in your NestJS module:

// security.module.ts
import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { RedisRateLimiter } from './services/redis-rate-limiter.service';
import { RateLimitHandler } from './services/rate-limit-handler.service';
import { RateLimitGuard } from './guards/rate-limit.guard';

@Module({
  imports: [ConfigModule],
  providers: [
    RedisRateLimiter,
    RateLimitHandler,
    RateLimitGuard,
  ],
  exports: [
    RedisRateLimiter,
    RateLimitHandler,
    RateLimitGuard,
  ],
})
export class SecurityModule {}
Enter fullscreen mode Exit fullscreen mode

Monitoring and Alerting

In production, you need visibility into rate limiting behavior:

// Add a monitoring endpoint (admin only)
@Controller('admin/rate-limits')
export class RateLimitMonitoringController {
  constructor(private readonly redisRateLimiter: RedisRateLimiter) {}

  @Get('top-consumers')
  async getTopConsumers() {
    return await this.redisRateLimiter.getTopConsumers(50);
  }

  @Get('health')
  async getHealth() {
    return await this.redisRateLimiter.healthCheck();
  }

  @Get('metrics')
  async getMetrics() {
    return await this.redisRateLimiter.getMetrics();
  }

  @Post('reset/:identifier')
  async resetLimit(@Param('identifier') identifier: string) {
    await this.redisRateLimiter.resetRateLimit(identifier);
    return { success: true, message: 'Rate limit reset' };
  }
}
Enter fullscreen mode Exit fullscreen mode

Set up alerts:

// Use your monitoring service (DataDog, New Relic, etc.)
this.logger.warn(`Rate limit exceeded for ${identifier}`, {
  workspace: req.workspace?.id,
  user: req.user?.id,
  endpoint: req.path,
  plan: req.workspace?.plan,
});

// Alert if too many 429 responses
if (result.isBlocked) {
  this.metrics.increment('rate_limit.blocked', {
    plan: req.workspace?.plan,
    endpoint: req.path,
  });
}
Enter fullscreen mode Exit fullscreen mode

Advanced Patterns

Per-Endpoint Overrides

// Some endpoints need custom limits
if (path.includes('/api/exports')) {
  return {
    windowMs: 60 * 60 * 1000,  // 1 hour
    maxRequests: 5,             // Only 5 exports/hour
    blockDurationMs: 0,         // Don't block, just limit
  };
}
Enter fullscreen mode Exit fullscreen mode

User-Specific Overrides

// VIP users get higher limits
const userTier = await this.getUserTier(req.user?.id);
if (userTier === 'vip') {
  config.maxRequests *= 2;  // Double their limit
}
Enter fullscreen mode Exit fullscreen mode

Burst Handling (Future Enhancement)

The current implementation uses fixed windows. For burst handling, you'd need to implement a token bucket algorithm:

// Future enhancement: Token bucket for burst handling
// This would require extending RateLimitConfig:
interface TokenBucketConfig extends RateLimitConfig {
  burstLimit: number;  // Maximum burst capacity
  refillRate: number;  // Tokens per second
}

// Example usage (not implemented in current version):
return {
  windowMs: 60 * 1000,
  maxRequests: 100,      // Average rate
  burstLimit: 150,       // Allow short bursts up to 150
  refillRate: 100 / 60,  // Refill at average rate
};
Enter fullscreen mode Exit fullscreen mode

Note: This requires modifying the Lua script to implement token bucket logic. The current fixed-window implementation is simpler and sufficient for most use cases.

Testing Strategies

Unit Tests

describe('RateLimitHandler', () => {
  it('should use workspace-scoped identifiers', () => {
    const req = {
      workspace: { id: 'ws_123', plan: 'pro' },
      user: { id: 'user_456' },
      path: '/api/projects',
    } as any;

    const identifier = handler['getRateLimitIdentifier'](req);
    expect(identifier).toContain('workspace:ws_123');
    expect(identifier).toContain('user_456');
  });

  it('should apply stricter limits to free plans', () => {
    const freeConfig = handler['getWorkspaceRateLimitByPlan']('free', 'POST');
    const proConfig = handler['getWorkspaceRateLimitByPlan']('pro', 'POST');

    expect(freeConfig.maxRequests).toBe(20);
    expect(proConfig.maxRequests).toBe(100);
  });
});
Enter fullscreen mode Exit fullscreen mode

Integration Tests

describe('Rate Limiting E2E', () => {
  it('should enforce plan-based limits', async () => {
    // Make 21 requests (Free tier limit is 20)
    const requests = Array(21).fill(null).map(() =>
      request(app.getHttpServer())
        .post('/api/projects')
        .set('Authorization', `Bearer ${freeUserToken}`)
        .send({ name: 'Test Project' })
    );

    const responses = await Promise.all(requests);

    const blocked = responses.filter(r => r.status === 429);
    expect(blocked.length).toBeGreaterThan(0);
  });
});
Enter fullscreen mode Exit fullscreen mode

Performance Considerations

Based on production usage:

Latency Impact:

  • Rate limit check adds ~3-5ms per request (Redis latency)
  • Lua script execution: <1ms
  • Total overhead: ~5-7ms per request

Redis Memory:

  • ~200 bytes per active rate limit key
  • 100,000 active users = ~20MB
  • With TTL cleanup: memory stays stable

Throughput:

  • Single Redis instance: ~10,000 checks/second
  • Redis Cluster: 50,000+ checks/second
  • Bottleneck is usually network, not Redis

Common Pitfalls to Avoid

  1. Don't use client IP alone - NAT and proxies mean multiple users share IPs
  2. Don't forget X-Forwarded-For - You'll rate limit your load balancer instead of users
  3. Don't block global routes by workspace - Authentication should be globally scoped
  4. Don't use the same limits for all HTTP methods - Writes cost more than reads
  5. Don't fail closed - If Redis is down, allow requests (fail open)
  6. Don't forget about workspace isolation - Multi-tenancy is critical in SaaS

Key Takeaways

Building production-grade rate limiting for multi-tenant SaaS taught me:

  1. Context is everything - Workspace-aware identifiers prevent noisy neighbor problems
  2. Plan-based limits are a feature - Rate limits differentiate your pricing tiers
  3. Security doesn't tier - Authentication limits should be strict for everyone
  4. Headers matter - Standard rate limit headers enable smart client behavior
  5. Fail open - Availability is more important than perfect rate limiting
  6. Monitor everything - You need visibility into who's hitting limits and why

This implementation has been running in production for months, handling millions of requests across thousands of workspaces. It's proven reliable, performant, and maintainable.

What's Next?

Potential enhancements:

  • Sliding window algorithm for more precise limiting
  • Token bucket algorithm for burst handling
  • Distributed rate limiting across regions
  • Machine learning for anomaly detection
  • Dynamic limits based on system load

Have questions or improvements? I'd love to hear how you've implemented rate limiting in your SaaS applications!

Series recap:

  • Part 1: Core Redis implementation with Lua scripts
  • Part 2 (this article): Workspace-aware business logic

GitHub: Complete implementation with tests available in my repositories.

Tags: #nestjs #redis #ratelimiting #typescript #saas #multitenant #backend #nodejs

Top comments (0)