10,000 Logins in 1 Second
If your login endpoint is unprotected, an attacker can try thousands of password combinations per second using a simple automated script. This is known as a Brute Force or Credential Stuffing attack.
Even if the attacker doesn't successfully guess a password, the sheer volume of requests can trigger a Denial of Service (DoS). Your database will struggle to hash thousands of passwords simultaneously, causing your entire application to crawl to a halt for legitimate users.
The Solution: Rate Limiting
Rate limiting is the "Quality of Service" (QoS) layer for your API. It acts as a governor, ensuring that no single user or IP address can monopolize your system's resources.
Beyond security, it is essential for Cost Control (preventing expensive AWS Lambda or third-party API bills) and Stability.
The Algorithms: How to Count
Choosing the right algorithm depends on how "strict" you need to be.
1. Fixed Window Counter
The simplest approach. You allow X requests per window of time (e.g., 60 seconds).
- The Flaw: If the limit is 100 requests/min, a user could send 100 requests at 10:59:59 and another 100 at 11:00:01. This "burst" effectively doubles your allowed limit at the window boundary.
2. Sliding Window Log / Counter
This is more precise. It tracks the timestamp of each request or uses a weighted average of the previous window. It smooths out the "boundary" issue of the Fixed Window, ensuring that the limit is strictly enforced across any moving time frame.
3. Token Bucket
Imagine a bucket that fills with "tokens" at a steady rate. Every request "spends" one token.
- The Benefit: It allows for Bursts. If a user hasn't made a request in a while, their bucket fills up, allowing them to send 10 requests quickly, but then they are throttled to the steady "fill rate." This is the industry standard for most public APIs (like Stripe or GitHub).
Where to Implement? (The Architecture Choice)
1. The Perimeter (API Gateway / Nginx)
The most efficient place to rate limit is at the edge. By using Nginx or a Cloud Gateway (AWS API Gateway, Cloudflare), you reject malicious traffic before it ever touches your expensive application servers or database.
2. The Application Layer (Middleware)
For fine-grained control, use middleware like express-rate-limit. This is useful when the limit depends on application logic—for example, giving "Premium" users a higher limit than "Free" users.
3. The Distributed Layer (Redis)
In modern system design, you likely have multiple instances of your server running behind a load balancer. If you use local memory to track requests, a user could bypass the limit by hitting different server instances.
- The Fix: Use Redis as a centralized, lightning-fast counter that all server instances check.
Why this is "System" Design:
1. Graceful Degradation
When a user hits a limit, don't just "crash." Return a 429 Too Many Requests status code. Include a Retry-After header so the client's code knows exactly how many seconds to wait before trying again.
2. Strategic Throttling
Not all endpoints are created equal. You should have:
-
Strict Limits on
/login,/register, and/forgot-password. - Moderate Limits on search and data-heavy GET requests.
- Loose Limits on static assets or public heartbeats.
Pro-Tip: The "Tarpit" Strategy
For high-security areas like login pages, don't just block the user. Use a "Tarpit" (or Delay) approach. For every failed attempt, increase the time the server takes to respond (Exponential Backoff). This makes brute-forcing mathematically impossible by turning a 1-second script into a 10-year project for the attacker.
Takeaway: Rate limiting isn't just about stopping bad actors; it’s about ensuring your system remains available and predictable for everyone else.
Top comments (0)