Mel Wang

Posted on Feb 11

Rate Limiting Explained Simply - Part 1

#systemdesign

It’s a normal day. Your API is running smoothly. Then suddenly…

CPU spikes
Database connections max out
Latency explodes
Users start seeing errors

What happened?

Maybe a bot started hammering your endpoint.
Maybe a client had a retry bug.
Maybe you just went viral.

Whatever the reason, your system was not prepared for the flood. What you needed was a way to control how fast requests are allowed in. That’s where rate limiting comes in.

What Is Rate Limiting?

A rate limiter controls how many requests are allowed within a specific time window. It protects your system from overload and ensures fair usage.

Think of it like:

A toll booth controlling traffic on a highway
A dam regulating water flow
A bouncer letting people into a club at a controlled pace

It’s not about blocking users. It’s about keeping your system alive.

Why Do We Need Rate Limiting?

Rate limiting isn’t just a “nice-to-have.” It solves very real problems.

1. Protect Infrastructure

Without limits, a sudden spike can:

Exhaust database connections
Overwhelm CPU
Trigger cascading failures
Rate limiting acts as a safety valve.

2. Prevent Abuse

Attackers and bots don’t politely slow down.
Common abuse scenarios:
- Brute-force login attempts
- Credential stuffing
- Scraping
- DDoS-style request floods
- Rate limiting is often your first line of defense.

3. Ensure Fair Usage

If one client sends 10,000 requests per second and another sends 5. Should the first client consume everything? Probably not. Rate limiting ensures that no single user can monopolize your resources.

4. Control Costs

Many systems rely on:

Paid third-party APIs
Database operations
Cloud compute
Uncontrolled traffic = higher bills

Rate limiting protects your wallet too.

So How Do I Add It Into My App?

At this point you might be thinking:

“This all makes sense. But how do I actually add rate limiting to my system?”

There are three common approaches. Which one you choose depends on how your app is structured.

1. Add It Inside Your Application

The simplest way is to implement it directly in your backend code.

You keep a counter per user (or IP), track how many requests they’ve made, and reset it every time window.

This works great if:

You have a single server
You’re building something small

But the moment you scale to multiple instances behind a load balancer, each instance keeps its own counter. Your “100 requests per second” limit can quietly turn into 500. That’s usually not what you intended.

2. Let Your API Gateway Handle It

In many production systems, rate limiting lives at the gateway level. Tools like NGINX, API Gateway, or Cloudflare can enforce limits before traffic even reaches your application.

This has a big advantage: You don’t touch your code.

You configure rules like:

100 requests per minute per IP
1000 requests per minute per API key And the gateway enforces them consistently.

For many teams, this is the cleanest solution.

3. Use a Shared Store (Like Redis)

Once you’re running multiple instances, you need shared state. That’s where something like Redis comes in.

Instead of each server counting independently, they all increment the same counter stored in Redis. Because Redis operations can be atomic, you avoid race conditions and keep your limits accurate. This is the typical solution in distributed systems.

Stay tuned for Part 2!

DEV Community