Ajao Yussuf

Posted on Jan 7

Building a Production-Ready Rate Limiter with Redis in NestJS (Part 2: Multi-Tenant & Plan-Based Logic)

#nestjs #redis #backend #typescript

Building a Production-Ready Rate Limiter with Redis in NestJS (Part 2: Workspace-Aware Business Logic)

In Part 1, we built the core Redis rate limiter with atomic Lua scripts. Now we'll add the intelligence that makes it production-ready for a multi-tenant SaaS application.

What we're building:

Workspace-isolated rate limiting (Slack-style)
Plan-based limits (Free, Pro, Enterprise)
Security-focused limits for authentication
Standard HTTP rate limit headers
Smart identifier generation

This is the business logic layer that sits on top of our Redis implementation. Let's dive in.

The Challenge: Multi-Tenant Rate Limiting

In a SaaS application, you can't treat all requests equally. Consider these scenarios:

Workspace A on a Free plan shouldn't be able to exhaust the rate limit for Workspace B on an Enterprise plan
A user in Workspace A has different limits than the same user in Workspace B
Authentication endpoints need stricter limits regardless of subscription plan
Different HTTP methods need different limits (writes are more expensive than reads)

Traditional rate limiters use simple identifiers like IP address or user ID. We need something smarter.

Architecture Overview

Our RateLimitHandler service orchestrates the rate limiting strategy:

import { Injectable, HttpStatus, Logger } from '@nestjs/common';
import type { Request, Response } from 'express';
import { RateLimitConfig } from '../interfaces/security.interface';
import { RedisRateLimiter } from './redis-rate-limiter.service';

interface RequestWithWorkspace extends Request {
  workspace?: {
    id: string;
    slug: string;
    plan: string;
  };
  workspaceId?: string;
  user?: {
    id: string;
    _id?: string;
  };
}

@Injectable()
export class RateLimitHandler {
  private readonly logger = new Logger(RateLimitHandler.name);

  constructor(private readonly redisRateLimiter: RedisRateLimiter) {}
}

The RequestWithWorkspace interface extends Express's Request to include workspace context, which your authentication middleware should populate.

Smart Identifier Generation

The identifier is crucial - it determines the scope of rate limiting. Here's how we generate workspace-aware identifiers:

private getRateLimitIdentifier(req: RequestWithWorkspace): string {
  const ip = this.getClientIP(req);
  const userId = req.user?.id || req.user?._id || 'anonymous';
  const endpoint = req.route?.path || req.path;

  // Global routes (no workspace context needed)
  if (this.isGlobalRoute(req.path)) {
    return `global:${ip}:${userId}:${endpoint}`;
  }

  // Workspace routes (include workspace ID)
  const workspaceId = req.workspaceId || req.workspace?.id || 'unknown';
  return `workspace:${workspaceId}:${ip}:${userId}:${endpoint}`;
}

private isGlobalRoute(path: string): boolean {
  const globalRoutes = [
    '/auth/login',
    '/auth/register',
    '/auth/forgot-password',
    '/auth/reset-password',
    '/health',
    '/metrics',
  ];

  return globalRoutes.some((route) => path.startsWith(route));
}

private getClientIP(req: Request): string {
  const xfwd = (req.headers['x-forwarded-for'] as string | undefined)
    ?.split(',')[0]
    ?.trim();
  const xreal = (req.headers['x-real-ip'] as string | undefined)?.trim();
  const conn = (req as any).connection?.remoteAddress as string | undefined;
  const sock = (req.socket as any)?.remoteAddress as string | undefined;
  return xfwd || xreal || conn || sock || '127.0.0.1';
}

Why this structure matters:

Global routes - Authentication and health checks are identified by global: prefix. They're not workspace-specific, so a login attempt from Workspace A doesn't affect Workspace B.
Workspace routes - Include the workspace ID. This means:
- Each workspace has independent rate limits
- User Alice in Workspace A has separate limits from Alice in Workspace B
- This prevents "noisy neighbor" problems in multi-tenant systems
IP + User + Endpoint - The combination prevents:
- Single IP exhausting limits for all users (important in corporate networks)
- Single user exhausting limits across multiple IPs
- Different endpoints interfering with each other

Real-world example:

# Same user, different workspaces = different limits
workspace:ws_123:192.168.1.1:user_abc:/api/projects
workspace:ws_456:192.168.1.1:user_abc:/api/projects

# Same workspace, different endpoints = different limits  
workspace:ws_123:192.168.1.1:user_abc:/api/projects
workspace:ws_123:192.168.1.1:user_abc:/api/tasks

# Global routes = isolated from workspaces
global:192.168.1.1:anonymous:/auth/login

Context-Aware Rate Limit Configuration

Different routes need different strategies. Authentication endpoints need strict limits for security, while workspace operations vary by subscription plan:

private getRateLimitConfig(req: RequestWithWorkspace): RateLimitConfig {
  const path = req.path;
  const method = req.method;
  const plan = req.workspace?.plan || 'free';

  // Authentication routes - STRICT (global, no plan variation)
  if (path.includes('/auth/login')) {
    return {
      windowMs: 15 * 60 * 1000,        // 15 minutes
      maxRequests: 5,                   // Only 5 attempts
      blockDurationMs: 60 * 60 * 1000,  // 1 hour block
    };
  }

  if (path.includes('/auth/register')) {
    return {
      windowMs: 60 * 60 * 1000,         // 1 hour
      maxRequests: 3,                    // Only 3 registrations
      blockDurationMs: 24 * 60 * 60 * 1000, // 24 hour block
    };
  }

  if (path.includes('/auth/')) {
    return {
      windowMs: 5 * 60 * 1000,          // 5 minutes
      maxRequests: 10,
      blockDurationMs: 30 * 60 * 1000,  // 30 minutes block
    };
  }

  // Workspace-specific routes - PLAN-BASED
  return this.getWorkspaceRateLimitByPlan(plan, method);
}

Security-first design decisions:

Login limits are aggressive - 5 attempts in 15 minutes protects against brute force attacks. After 5 failures, the attacker is blocked for an hour.
Registration is even stricter - 3 registrations per hour prevents automated account creation. 24-hour block discourages abuse.
Authentication limits are global - They don't vary by subscription plan. Security is not a premium feature.
Progressive blocking - The blockDurationMs temporarily bans abusive users, giving your system time to recover.

Plan-Based Rate Limits

Here's where subscription tiers translate into actual technical limits:

private getWorkspaceRateLimitByPlan(
  plan: string,
  method: string,
): RateLimitConfig {
  const planLimits = {
    free: {
      POST: { windowMs: 60 * 1000, maxRequests: 20 },
      PUT: { windowMs: 60 * 1000, maxRequests: 20 },
      DELETE: { windowMs: 60 * 1000, maxRequests: 10 },
      PATCH: { windowMs: 60 * 1000, maxRequests: 20 },
      GET: { windowMs: 60 * 1000, maxRequests: 100 },
    },
    pro: {
      POST: { windowMs: 60 * 1000, maxRequests: 100 },
      PUT: { windowMs: 60 * 1000, maxRequests: 100 },
      DELETE: { windowMs: 60 * 1000, maxRequests: 50 },
      PATCH: { windowMs: 60 * 1000, maxRequests: 100 },
      GET: { windowMs: 60 * 1000, maxRequests: 500 },
    },
    enterprise: {
      POST: { windowMs: 60 * 1000, maxRequests: 1000 },
      PUT: { windowMs: 60 * 1000, maxRequests: 1000 },
      DELETE: { windowMs: 60 * 1000, maxRequests: 500 },
      PATCH: { windowMs: 60 * 1000, maxRequests: 1000 },
      GET: { windowMs: 60 * 1000, maxRequests: 5000 },
    },
  };

  const limits = planLimits[plan] || planLimits.free;
  const methodLimit = limits[method] || limits.GET;

  return {
    ...methodLimit,
    blockDurationMs: 5 * 60 * 1000, // 5 minutes block for all workspace operations
  };
}

Design rationale:

Write operations cost more - POST/PUT/PATCH operations are more expensive (database writes, validation, business logic). They get stricter limits.
DELETE is most restricted - Data deletion is sensitive and often irreversible. Even Enterprise gets fewer DELETE operations than other writes.
GET requests are abundant - Read operations are cheaper and more common. Free tier gets 100/min, Enterprise gets 5000/min.
Clear value ladder - Free → Pro = 5x increase, Pro → Enterprise = 10x increase. This creates a tangible reason to upgrade.
Consistent windows - All plans use 60-second windows for predictability. Users can easily reason about their limits.

Real-world comparison:

Free tier:    20 writes/min  = 1,200/hour   = ~29K/day
Pro tier:     100 writes/min = 6,000/hour   = ~144K/day  
Enterprise:   1,000 writes/min = 60,000/hour = ~1.4M/day

The Main Rate Limit Check

Now let's implement the method that ties it all together:

async checkRateLimit(
  req: RequestWithWorkspace,
  res: Response,
): Promise<boolean> {
  const identifier = this.getRateLimitIdentifier(req);
  const config = this.getRateLimitConfig(req);

  this.logger.debug(`Rate limit check - Identifier: ${identifier}`);

  const result = await this.redisRateLimiter.checkRateLimit(
    identifier,
    config,
    1, // increment by 1
  );

  // Set standard rate limit headers
  res.setHeader('X-RateLimit-Limit', config.maxRequests.toString());
  res.setHeader(
    'X-RateLimit-Remaining',
    Math.max(0, result.remaining).toString(),
  );
  res.setHeader('X-RateLimit-Reset', result.resetTime.toISOString());

  if (result.isBlocked) {
    if (result.retryAfter) {
      res.setHeader('Retry-After', result.retryAfter.toString());
    }

    this.logger.warn(`Rate limit exceeded for identifier: ${identifier}`);

    res.status(HttpStatus.TOO_MANY_REQUESTS).json({
      success: false,
      message: 'Too many requests. Please try again later.',
      retryAfter: result.retryAfter,
      timestamp: new Date().toISOString(),
    });
    return false;
  }

  return true;
}

What's happening here:

Generate context-aware identifier - Uses workspace, user, IP, and endpoint
Get appropriate config - Based on route type and subscription plan
Check with Redis - Our atomic Lua script handles the logic
Set response headers - Standard rate limit headers (more on this below)
Return structured error - If blocked, send 429 with retry information
Return boolean - Calling code can easily check if request should proceed

Standard Rate Limit Headers

Your API clients need to know their rate limit status. These headers follow RFC 6585 and common conventions:

// Always present
X-RateLimit-Limit: 100           // Maximum requests allowed in window
X-RateLimit-Remaining: 73        // Requests remaining before limit
X-RateLimit-Reset: 2026-01-07T15:30:00.000Z  // When counter resets

// Present when blocked (429 response)
Retry-After: 300                 // Seconds until unblocked

These headers let client applications:

Display accurate "X requests remaining" to users
Implement intelligent retry logic with exponential backoff
Show countdown timers until rate limit reset
Warn users before hitting limits

Client implementation example:

// Client-side handling
const response = await fetch('/api/projects', {
  method: 'POST',
  body: JSON.stringify(project),
});

if (response.status === 429) {
  const retryAfter = parseInt(response.headers.get('Retry-After') || '60');
  console.log(`Rate limited. Retry in ${retryAfter} seconds`);

  // Show user-friendly message
  toast.error(`Too many requests. Please wait ${retryAfter} seconds.`);

  // Automatically retry after delay
  setTimeout(() => retryRequest(), retryAfter * 1000);
} else {
  const remaining = response.headers.get('X-RateLimit-Remaining');
  if (parseInt(remaining) < 10) {
    toast.warning('Approaching rate limit');
  }
}

Integration with NestJS Guards

To use this in your NestJS application, create a guard:

import { Injectable, CanActivate, ExecutionContext } from '@nestjs/common';
import { Reflector } from '@nestjs/core';
import { RateLimitHandler } from './rate-limit-handler.service';
import { SKIP_RATE_LIMIT_KEY } from './decorators/skip-rate-limit.decorator';

@Injectable()
export class RateLimitGuard implements CanActivate {
  constructor(
    private readonly rateLimitHandler: RateLimitHandler,
    private readonly reflector: Reflector,
  ) {}

  async canActivate(context: ExecutionContext): Promise<boolean> {
    // Check if rate limiting should be skipped
    const skipRateLimit = this.reflector.getAllAndOverride<boolean>(
      SKIP_RATE_LIMIT_KEY,
      [context.getHandler(), context.getClass()],
    );

    if (skipRateLimit) {
      return true;
    }

    const request = context.switchToHttp().getRequest();
    const response = context.switchToHttp().getResponse();

    return await this.rateLimitHandler.checkRateLimit(request, response);
  }
}

Custom Decorator for Skipping Rate Limits

For endpoints that shouldn't be rate limited (like health checks or internal admin routes), create a custom decorator:

// decorators/skip-rate-limit.decorator.ts
import { SetMetadata } from '@nestjs/common';

export const SKIP_RATE_LIMIT_KEY = 'skipRateLimit';
export const SkipRateLimit = () => SetMetadata(SKIP_RATE_LIMIT_KEY, true);

Then update your guard to respect this decorator (the guard code shown above already includes this):

Usage:

@Controller('health')
export class HealthController {
  @Get()
  @SkipRateLimit()  // This endpoint won't be rate limited
  check() {
    return { status: 'ok' };
  }
}

Apply globally:

// main.ts
import { RateLimitGuard } from './security/guards/rate-limit.guard';

async function bootstrap() {
  const app = await NestFactory.create(AppModule);

  // Apply rate limiting to all routes
  app.useGlobalGuards(app.get(RateLimitGuard));

  await app.listen(3000);
}

Or selectively:

// controller.ts
@Controller('projects')
@UseGuards(RateLimitGuard)  // Apply to entire controller
export class ProjectsController {
  @Post()
  create(@Body() dto: CreateProjectDto) {
    // This endpoint is rate limited
  }

  @Get()
  @SkipRateLimit()  // Custom decorator to skip rate limiting
  findAll() {
    // This endpoint is NOT rate limited
  }
}

Real-World Production Scenarios

Let's walk through some practical examples:

Scenario 1: Normal User Activity

// User in Free tier workspace making requests
Request 1: POST /api/projects
→ Identifier: workspace:ws_free:192.168.1.1:user_123:/api/projects
→ Limit: 20/min
→ Result: ✅ Allowed (1/20 used)
→ Headers: X-RateLimit-Remaining: 19

Request 2: POST /api/projects (10 seconds later)
→ Same identifier
→ Result: ✅ Allowed (2/20 used)
→ Headers: X-RateLimit-Remaining: 18

Scenario 2: Hitting Rate Limit

// User makes 21st POST request in same minute
Request 21: POST /api/projects
→ Limit: 20/min
→ Result: ❌ Blocked
→ Status: 429 Too Many Requests
→ Headers:
    X-RateLimit-Limit: 20
    X-RateLimit-Remaining: 0
    X-RateLimit-Reset: 2026-01-07T15:31:00.000Z
    Retry-After: 37
→ Response: {
    success: false,
    message: "Too many requests. Please try again later.",
    retryAfter: 37
  }

Scenario 3: Workspace Isolation

// Same user, different workspaces
Workspace A (Free): POST /api/projects
→ Identifier: workspace:ws_free:192.168.1.1:user_123:/api/projects
→ Result: ❌ Blocked (hit 20/min limit)

Workspace B (Enterprise): POST /api/projects
→ Identifier: workspace:ws_ent:192.168.1.1:user_123:/api/projects
→ Result: ✅ Allowed (1/1000 used)
→ Different workspace = independent rate limit!

Scenario 4: Brute Force Protection

// Attacker trying to brute force login
Attempt 1-5: POST /auth/login (wrong password)
→ Identifier: global:192.168.1.1:anonymous:/auth/login
→ Result: ✅ Allowed (but login fails)

Attempt 6: POST /auth/login
→ Result: ❌ Blocked for 1 hour
→ Status: 429
→ Headers: Retry-After: 3600

// Even if they switch IPs, user-based blocking kicks in
// (if you track user ID in login attempts)

Module Setup

Wire everything together in your NestJS module:

// security.module.ts
import { Module } from '@nestjs/common';
import { ConfigModule } from '@nestjs/config';
import { RedisRateLimiter } from './services/redis-rate-limiter.service';
import { RateLimitHandler } from './services/rate-limit-handler.service';
import { RateLimitGuard } from './guards/rate-limit.guard';

@Module({
  imports: [ConfigModule],
  providers: [
    RedisRateLimiter,
    RateLimitHandler,
    RateLimitGuard,
  ],
  exports: [
    RedisRateLimiter,
    RateLimitHandler,
    RateLimitGuard,
  ],
})
export class SecurityModule {}

Monitoring and Alerting

In production, you need visibility into rate limiting behavior:

// Add a monitoring endpoint (admin only)
@Controller('admin/rate-limits')
export class RateLimitMonitoringController {
  constructor(private readonly redisRateLimiter: RedisRateLimiter) {}

  @Get('top-consumers')
  async getTopConsumers() {
    return await this.redisRateLimiter.getTopConsumers(50);
  }

  @Get('health')
  async getHealth() {
    return await this.redisRateLimiter.healthCheck();
  }

  @Get('metrics')
  async getMetrics() {
    return await this.redisRateLimiter.getMetrics();
  }

  @Post('reset/:identifier')
  async resetLimit(@Param('identifier') identifier: string) {
    await this.redisRateLimiter.resetRateLimit(identifier);
    return { success: true, message: 'Rate limit reset' };
  }
}

Set up alerts:

// Use your monitoring service (DataDog, New Relic, etc.)
this.logger.warn(`Rate limit exceeded for ${identifier}`, {
  workspace: req.workspace?.id,
  user: req.user?.id,
  endpoint: req.path,
  plan: req.workspace?.plan,
});

// Alert if too many 429 responses
if (result.isBlocked) {
  this.metrics.increment('rate_limit.blocked', {
    plan: req.workspace?.plan,
    endpoint: req.path,
  });
}

Advanced Patterns

Per-Endpoint Overrides

// Some endpoints need custom limits
if (path.includes('/api/exports')) {
  return {
    windowMs: 60 * 60 * 1000,  // 1 hour
    maxRequests: 5,             // Only 5 exports/hour
    blockDurationMs: 0,         // Don't block, just limit
  };
}

User-Specific Overrides

// VIP users get higher limits
const userTier = await this.getUserTier(req.user?.id);
if (userTier === 'vip') {
  config.maxRequests *= 2;  // Double their limit
}

Burst Handling (Future Enhancement)

The current implementation uses fixed windows. For burst handling, you'd need to implement a token bucket algorithm:

// Future enhancement: Token bucket for burst handling
// This would require extending RateLimitConfig:
interface TokenBucketConfig extends RateLimitConfig {
  burstLimit: number;  // Maximum burst capacity
  refillRate: number;  // Tokens per second
}

// Example usage (not implemented in current version):
return {
  windowMs: 60 * 1000,
  maxRequests: 100,      // Average rate
  burstLimit: 150,       // Allow short bursts up to 150
  refillRate: 100 / 60,  // Refill at average rate
};

Note: This requires modifying the Lua script to implement token bucket logic. The current fixed-window implementation is simpler and sufficient for most use cases.

Testing Strategies

Unit Tests

describe('RateLimitHandler', () => {
  it('should use workspace-scoped identifiers', () => {
    const req = {
      workspace: { id: 'ws_123', plan: 'pro' },
      user: { id: 'user_456' },
      path: '/api/projects',
    } as any;

    const identifier = handler['getRateLimitIdentifier'](req);
    expect(identifier).toContain('workspace:ws_123');
    expect(identifier).toContain('user_456');
  });

  it('should apply stricter limits to free plans', () => {
    const freeConfig = handler['getWorkspaceRateLimitByPlan']('free', 'POST');
    const proConfig = handler['getWorkspaceRateLimitByPlan']('pro', 'POST');

    expect(freeConfig.maxRequests).toBe(20);
    expect(proConfig.maxRequests).toBe(100);
  });
});

Integration Tests

describe('Rate Limiting E2E', () => {
  it('should enforce plan-based limits', async () => {
    // Make 21 requests (Free tier limit is 20)
    const requests = Array(21).fill(null).map(() =>
      request(app.getHttpServer())
        .post('/api/projects')
        .set('Authorization', `Bearer ${freeUserToken}`)
        .send({ name: 'Test Project' })
    );

    const responses = await Promise.all(requests);

    const blocked = responses.filter(r => r.status === 429);
    expect(blocked.length).toBeGreaterThan(0);
  });
});

Performance Considerations

Based on production usage:

Latency Impact:

Rate limit check adds ~3-5ms per request (Redis latency)
Lua script execution: <1ms
Total overhead: ~5-7ms per request

Redis Memory:

~200 bytes per active rate limit key
100,000 active users = ~20MB
With TTL cleanup: memory stays stable

Throughput:

Single Redis instance: ~10,000 checks/second
Redis Cluster: 50,000+ checks/second
Bottleneck is usually network, not Redis

Common Pitfalls to Avoid

Don't use client IP alone - NAT and proxies mean multiple users share IPs
Don't forget X-Forwarded-For - You'll rate limit your load balancer instead of users
Don't block global routes by workspace - Authentication should be globally scoped
Don't use the same limits for all HTTP methods - Writes cost more than reads
Don't fail closed - If Redis is down, allow requests (fail open)
Don't forget about workspace isolation - Multi-tenancy is critical in SaaS

Key Takeaways

Building production-grade rate limiting for multi-tenant SaaS taught me:

Context is everything - Workspace-aware identifiers prevent noisy neighbor problems
Plan-based limits are a feature - Rate limits differentiate your pricing tiers
Security doesn't tier - Authentication limits should be strict for everyone
Headers matter - Standard rate limit headers enable smart client behavior
Fail open - Availability is more important than perfect rate limiting
Monitor everything - You need visibility into who's hitting limits and why

This implementation has been running in production for months, handling millions of requests across thousands of workspaces. It's proven reliable, performant, and maintainable.

What's Next?

Potential enhancements:

Sliding window algorithm for more precise limiting
Token bucket algorithm for burst handling
Distributed rate limiting across regions
Machine learning for anomaly detection
Dynamic limits based on system load

Have questions or improvements? I'd love to hear how you've implemented rate limiting in your SaaS applications!

Series recap:

Part 1: Core Redis implementation with Lua scripts
Part 2 (this article): Workspace-aware business logic

GitHub: Complete implementation with tests available in my repositories.

Tags: #nestjs #redis #ratelimiting #typescript #saas #multitenant #backend #nodejs

DEV Community