Cathy Lai

Posted on Oct 21

Understanding Rate Limiting — Keeping APIs Fair, Fast, and Friendly

#security #architecture #api #performance

When we build APIs or services that accept frequent requests — like a website analytics tracker, login endpoint, or public data API — we eventually face one of the oldest scaling problems on the web: what if someone calls it too often?

That’s where rate limiting comes in.

🚦 What Is Rate Limiting?

Rate limiting is the practice of controlling how many times a user (or client) can perform a certain action within a defined time window.

It protects your system from:

Accidental overloads (like retry storms)
Misbehaving scripts or bots
Denial-of-service attempts
Excessive costs (for APIs that bill per request)

In short: you decide what “too much” means, and then enforce it gently but firmly.

🎛️ Core Idea

Every rate-limiting strategy is built around three questions:

Who are we limiting?

(A user account, API key, IP address, browser session, etc.)
What are we limiting?

(Requests per second, login attempts, uploads, messages, etc.)
When does the counter reset?

(Per second, minute, hour, or day — depending on use case.)

For example:

“Each visitor can send at most 1 request every 15 seconds to the /track endpoint.”

🧱 Common Implementation Patterns

Let’s look at some ways developers enforce these limits, from simplest to most scalable.

1. Application-Layer Throttling

This is logic inside your code (e.g., PHP, Node, Python) that checks when the user last made a request and skips or delays duplicates within a short window.

It’s simple and works well for smaller apps, but it’s stored in memory — so it resets when the server restarts.

Good for: lightweight trackers, small internal APIs.

Limitation: doesn’t share state across multiple servers.

2. Database-Enforced Limits

You can use your database as the truth source — only allowing one row per user per time window (like 15 seconds).

This turns your DB into both the log and the rate limiter.

It’s reliable, consistent, and doesn’t require external tools — but slightly slower under heavy traffic.

Good for: apps already writing to SQL tables (e.g., analytics trackers).

Limitation: higher write volume = more DB load.

3. In-Memory Cache Limits

Technologies like Redis, Memcached, or APCu in PHP let you store short-lived counters in memory.

When a user sends a request, the system increments a counter for that key (like user:123) and expires it after a few seconds or minutes.

This approach is blazing fast and easy to scale horizontally.

Good for: production APIs or login endpoints.

Limitation: needs an external cache server (not just plain PHP).

4. Token Bucket and Leaky Bucket Algorithms

These are classic algorithms used by big-scale APIs (Twitter, Stripe, Cloudflare).

They model each user as having a “bucket” that fills or empties at a steady rate:

The token bucket allows bursts (you can send several requests quickly until the bucket empties).
The leaky bucket ensures a constant steady rate of processing.

These algorithms are used in distributed rate limiters, where multiple servers coordinate via Redis or another shared store.

Good for: large distributed systems.

Limitation: more math and coordination required.

5. Reverse Proxy / Gateway Limits

Many production systems don’t even handle rate limiting in app code — they offload it to API gateways or web servers.

Examples:

Nginx: limit_req_zone and limit_req
Cloudflare: configurable per endpoint
Kong / Traefik / Envoy: API gateway plugins

These tools can drop or delay excessive requests before they even reach your app, saving resources.

Good for: high-traffic, multi-service environments.

Limitation: less flexible logic (can’t easily tie to custom business rules).

🧠 Designing a Fair Limit

When setting limits, think about the experience you want to protect:

Analytics tracker? → 1 hit per visitor per 10–30 seconds.
Login endpoint? → 5 attempts per minute per IP or user.
Public API? → 100 requests per minute per API key.
File uploads? → 10 per hour per account.

A good limit protects the system without frustrating legitimate users.

💬 What Happens When a Limit Is Reached?

There are a few graceful options:

Ignore the request quietly (used by analytics trackers)
Respond with HTTP 429 (“Too Many Requests”) and tell the client when to retry
Queue or delay the request (used in message-based systems)

The key is consistency: your clients should know what to expect.

🔁 Rate Limiting vs. Idempotency

Rate limiting controls how often something happens.

Idempotency ensures that retries don’t multiply the effect.

They work beautifully together:

Rate limiting protects performance.
Idempotency protects data consistency.

For example, a visitor tracker might accept one unique event ID per hit — even if the browser retries three times. That’s idempotency in action.

🧩 The Big Picture

Rate limiting isn’t about punishment; it’s about fairness and resilience.

When you apply it thoughtfully, it prevents abuse, saves bandwidth, and keeps every user’s experience smooth.

The best limiters are invisible — they quietly maintain balance behind the scenes.

🧭 Summary Table

Approach	Scope	Strengths	Best For
App-layer logic	Local	Easy, no extra tools	Small apps, quick fixes
Database uniqueness	Persistent	Reliable, simple	Trackers, logs
In-memory cache	Fast	Scalable, low-latency	APIs, auth systems
Token/Leaky bucket	Algorithmic	Smooth fairness	High-traffic APIs
Proxy/gateway	External	Centralized, low-overhead	Distributed microservices

🪶 Closing Thought

The more your application grows, the more requests it will receive.

Rate limiting is how you stay generous without being exploited.

It’s a quiet act of discipline that makes systems — and developers — more resilient.

Written by Cathy Lai — software developer and educator exploring backend architecture, microservices, and AI-assisted app building.

DEV Community