Juan Torchia

Posted on Jun 5 • Originally published at juanchi.dev

Rate limiting in web apps: what to protect before picking a library

#english #typescript #nextjs #railway

Rate limiting in web apps: what to protect before picking a library

I made the mistake of adding rate limiting like it was a convenience dependency — npm install, copy middleware from a tutorial, paste the magic number of 100 requests per minute, and get back to the sprint. I did it because "security" was on the backlog and I wanted to tick the box. The result was predictable: the middleware existed, but it wasn't protecting anything in particular. And the first time I actually looked at the logs with fresh eyes, I realized I had no idea what would have happened if someone had abused the login endpoint.

I'm telling you this because that exact pattern is what I keep seeing recycled in Next.js tutorials: install a library, wrap it as global middleware, call it "security." That's not security. That's security vibes.

My take is concrete: rate limiting isn't a dependency; it's an abuse policy. And a policy without a definition is a rule without a subject.

What rate limiting in Next.js actually is — and what it isn't

Rate limiting is a mechanism to reject or delay requests that exceed a defined threshold within a time window. That's all it is, technically. The real value isn't in the threshold — it's in the decision that led to that threshold.

OWASP, in its Authentication Cheat Sheet, recommends specific defensive controls around authentication: temporary account lockout after failed attempts, progressive throttling, and uniform responses to avoid leaking whether a user exists. What OWASP does not say is "install library X and set 100 req/min on all routes." That's an interpretation, not a prescription.

The difference matters: the OWASP guide gives you the what to protect (authentication, password recovery, endpoints that expose account state). The concrete implementation depends on your stack, your traffic, and the cost you're willing to absorb when you block a legitimate user.

In Next.js App Router, the natural interception point is Middleware (middleware.ts), which runs at the edge before the request reaches the route — I covered that in detail in this post on authorization patterns in Next.js 16 Middleware. But the execution layer doesn't replace the policy; it just enforces it.

The classic mistake: copying middleware before defining the asset

The typical tutorial starts like this:

// middleware.ts — example of what NOT to do first
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";
import { Ratelimit } from "@upstash/ratelimit";
import { Redis } from "@upstash/redis";

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(100, "1 m"), // why 100? why 1 minute?
});

export async function middleware(request: NextRequest) {
  const ip = request.ip ?? "127.0.0.1";
  const { success } = await ratelimit.limit(ip);
  if (!success) return new NextResponse("Too Many Requests", { status: 429 });
  return NextResponse.next();
}

export const config = {
  matcher: ["/((?!_next|favicon.ico).*)"], // applies to EVERYTHING — really?
};

The code works. The problem is the chain of unanswered questions:

Why 100 requests per minute? Based on what measured legitimate behavior?
Does applying it to every route even make sense? A static image and a login endpoint don't share the same abuse profile.
What happens to a legitimate user behind a corporate proxy or a university network where dozens of people share the same IP?
Is there a log of the 429 that lets you tell the difference between a real attack and a false positive?

None of those questions get answered by the library. You answer them before you touch the code.

The decision matrix: four questions before writing a single line

Before choosing an algorithm, a library, or a threshold, these four questions determine whether the rate limiting you're about to implement actually makes sense:

1. What asset are you protecting?

Not "the app." Something concrete:

Asset	Why limiting it matters
Login endpoint	Credential stuffing, brute force
Password recovery endpoint	Account enumeration, email spam
Form submission API	Spam, notification flooding
Expensive search endpoint	Compute resource abuse
Static routes / assets	Probably not — leave it to the CDN

OWASP explicitly calls out the first two. The rest are product decisions that you have to make.

2. What abuse are you expecting?

The abuse you want to limit determines the right algorithm:

High-velocity credential stuffing: sliding window with a low per-IP threshold on the auth endpoint.
Slow, distributed scraping: IP alone isn't enough — you need fingerprinting or session tokens.
Automated form spam: CAPTCHA first, rate limiting as a second layer.
Legitimate traffic spikes (launch day, going viral): aggressive rate limiting can hurt you here — consider queueing or backpressure first.

If you don't know what abuse you're expecting, whatever threshold you set is arbitrary. And an arbitrary threshold is just as likely to annoy real users as it is to stop someone with actual malicious intent.

3. What does a false positive cost you?

This is the cost tutorials always skip. A 429 hitting a legitimate user has real consequences that depend on context:

In a SaaS with paying customers: lost trust, churn, a support ticket.
In a public app with anonymous users: frustration, abandonment.
In an internal API: it can silently break a critical flow.

The cost of a false positive defines how tight you can make the threshold. If the cost is high, you need a more permissive threshold and better abuse signals (user-agent, behavioral patterns, tokens) instead of just IP.

4. How do you observe that it's working?

If you don't have an answer to this, what you implemented is decorative security. A rate limiter without observability won't tell you whether it's blocking real abuse or legitimate users, whether the threshold is too aggressive or too permissive, or whether someone is already bypassing the control with a different technique.

The useful minimum is logging every 429 with:

Timestamp
IP (or whatever identifier you use as the key)
Affected route
Authentication context if available (authenticated user vs. anonymous)

// middleware.ts — version with minimal observability
import { NextResponse } from "next/server";
import type { NextRequest } from "next/server";

// Reproducible example using in-memory storage (dev only or edge with local state)
// In real production you need Redis or another distributed store
const attempts = new Map<string, { count: number; reset: number }>();

const LIMIT = 10; // only for the login endpoint
const WINDOW_MS = 60_000; // 1 minute

export async function middleware(request: NextRequest) {
  // Only applying to the login route — defined asset, not global
  if (!request.nextUrl.pathname.startsWith("/api/auth/login")) {
    return NextResponse.next();
  }

  const ip = request.headers.get("x-forwarded-for")?.split(",")[0] ?? "unknown";
  const now = Date.now();
  const record = attempts.get(ip);

  if (!record || now > record.reset) {
    attempts.set(ip, { count: 1, reset: now + WINDOW_MS });
    return NextResponse.next();
  }

  record.count++;

  if (record.count > LIMIT) {
    // Observable log: in production this goes to your logging system
    console.warn(JSON.stringify({
      event: "rate_limit_exceeded",
      ip,
      route: request.nextUrl.pathname,
      attempts: record.count,
      timestamp: new Date().toISOString(),
    }));

    return new NextResponse(
      JSON.stringify({ error: "Too many attempts. Wait a moment." }),
      {
        status: 429,
        headers: {
          "Content-Type": "application/json",
          "Retry-After": "60",
        },
      }
    );
  }

  return NextResponse.next();
}

export const config = {
  // Only intercept the route we defined as the asset
  matcher: ["/api/auth/login"],
};

⚠️ This example uses an in-memory Map, which doesn't persist across edge runtime instances and doesn't survive a redeploy. For production on Railway or any other distributed environment, you need an external store like Redis (Upstash, Redis Cloud). This example is a decision pattern, not a production recipe.

Where people get it wrong: three patterns that look right but aren't

Pattern 1: Global rate limiting as a substitute for endpoint security

A middleware that caps all routes at 100 req/min doesn't protect the login endpoint any better than no rate limiting at all — if the threshold is above the volume of a typical brute-force attack. The attacker just respects the limit and keeps going. What actually helps is a low, specific threshold on the right asset — closer to what OWASP recommends for authentication.

Pattern 2: IP as the only key dimension

A user behind CGNAT (IPv4 shared across thousands of people) has the same IP as everyone else on that network. In corporate or university contexts, rate limiting them all together can block dozens of legitimate people because of one person's behavior. If the asset you're protecting is primarily accessed by authenticated users, the key should be the user or session identifier, not the IP.

Pattern 3: Forgetting the Retry-After header

RFC 6585 defines that a 429 response should include a Retry-After header indicating how long the client should wait. Without that header, automated clients — SDKs, mobile apps, integrations — will retry immediately and make the problem you were trying to limit worse. Small detail, concrete consequences.

Limits of this guide: what you can't conclude without your own data

There are things I'm not going to claim here because they depend on context I don't have:

What threshold to use: there's no universally correct number. The 10 req/min in the login example is illustrative. The real number comes from measured legitimate behavior in production — or a conservative initial decision with room to adjust.
Whether Upstash, Redis Cloud, or something else is better: that depends on latency from where you run the edge, cost per operation, and the operational complexity you're willing to maintain.
Whether rate limiting solves slow, distributed scraping: probably not, at least not alone. That scenario needs other signals. Claiming otherwise without data would be selling an incomplete solution to someone with a real problem.

Before assuming your rate limiting is working, you need real logs. Without them, the control exists on paper but not in practice.

FAQ: rate limiting in Next.js web applications

Where do I implement rate limiting in Next.js App Router — in Middleware or in the Route Handler?

Depends on the asset. If you want to act before the request reaches the route logic (and before compute costs kick in), Middleware is the right place. If you need full authentication context or business logic to decide the limit, the Route Handler has more information. In practice, both layers can coexist: Middleware for coarse IP-level limits, Route Handler for fine-grained per-authenticated-user limits.

Can I use an in-memory Map as the store for the rate limiter?

Only in development or as a pattern demonstration. An in-memory Map isn't shared across instances (Next.js can have multiple workers) and resets on every redeploy. For a distributed environment like Railway or Vercel, you need an external store — Redis is the most common and well-documented option.

Does rate limiting replace CAPTCHA on the login form?

No. They're different controls. CAPTCHA aims to distinguish humans from bots in real time. Rate limiting aims to cap the volume of attempts regardless of whether they're human or bot. OWASP suggests both as complementary layers, not as alternatives.

What happens if a legitimate user hits the limit by accident?

They should get a 429 with a clear Retry-After and an understandable message. If that happens frequently, the threshold is miscalibrated for legitimate traffic — that's a signal to revisit the number, not to remove the control.

Is rate limiting enough to protect a public API from mass scraping?

Probably not, if the scraping is distributed (many different IPs, each with low volume). Per-IP rate limiting only works well against high-frequency concentrated sources. Distributed scraping needs other signals: user-agent fingerprinting, behavioral pattern analysis, or mandatory token-based authentication.

Should I apply rate limiting to static asset routes?

Generally no — that's work for a CDN or the infrastructure layer. Applying rate limiting to /favicon.ico or images from Next.js Middleware adds latency with no real defensive benefit. If asset traffic is the problem, the right control is at the network layer, not the application.

The library doesn't decide the policy. You do.

There's a reason misconfigured rate limiting is worse than none: it gives you the feeling that the asset is protected when it isn't, or it protects something that doesn't matter while leaving the thing that does matter wide open.

My position: before installing any library, answer the four questions in the matrix. If you can't answer "what asset am I protecting" and "what abuse am I expecting" with something more specific than "the app" and "bad people," you're not ready to implement the policy. You're ready to copy a tutorial.

The concrete next step: look at your authentication routes first. They're the ones OWASP flags with the most evidence of needing controls. Define a conservative threshold, add minimal observability (log those 429s), and check those logs in the first 48 hours. That's where you'll get real data to calibrate against.

If you want to understand how Next.js Middleware executes these controls and what happens with race conditions at scale, the post on authorization patterns in Next.js 16 Middleware is the logical next step. And if backend endpoint security is part of the picture, it's worth crossing with what I covered on what to expose and what to hide in Spring Boot Actuator.

Primary source:

OWASP Authentication Cheat Sheet — defensive controls around authentication and abuse: https://cheatsheetseries.owasp.org/cheatsheets/Authentication_Cheat_Sheet.html

This article was originally published on juanchi.dev

DEV Community

Rate limiting in web apps: what to protect before picking a library

Rate limiting in web apps: what to protect before picking a library

What rate limiting in Next.js actually is — and what it isn't

The classic mistake: copying middleware before defining the asset

The decision matrix: four questions before writing a single line

1. What asset are you protecting?

2. What abuse are you expecting?

3. What does a false positive cost you?

4. How do you observe that it's working?

Where people get it wrong: three patterns that look right but aren't

Limits of this guide: what you can't conclude without your own data

FAQ: rate limiting in Next.js web applications

The library doesn't decide the policy. You do.

Top comments (0)