Soma

Posted on Oct 12

How to Design a Rate Limiter on a System Design Interview?

#programming #systemdesign #coding #softwaredevelopment

Disclosure: This post includes affiliate links; I may receive compensation if you purchase products or services from the different links provided in this article.

credit --- ByteByteGo

Hello friends, System design interviews often test your ability to solve problems that balance performance, scalability, and correctness. One of the most common questions I've encountered is:

"How would you design a Rate Limiter?"

I've been asked this exact question multiple times, and each time the interviewer wanted to see how I approached it systematically.

The rate limiter is not just an academic problem; it's at the heart of many real systems. APIs, login attempts, payment systems, and messaging platforms all use rate limiting to prevent abuse, control costs, and ensure fairness among users.

In the past, I have shared common questions like how to design WhatsApp or YouTube, as well as some concept-based questions like the difference between API Gateway vs Load Balancer and Horizontal vs Vertical Scaling, Forward proxy vs reverse proxy.

In this article, I'll walk you through the problem, the key requirements, different design approaches, and show you code examples (including the simple timestamp array method I used in interviews).

What is a Rate Limiter?

A Rate Limiter is a system component that restricts the number of actions a user (or client) can perform in a given timeframe.

Examples:

API Gateway: Only allow 100 requests per user per minute.
Login System: Allow only 5 failed attempts in 10 minutes.
Messaging App: Prevent users from sending more than 20 messages per second.

If users exceed these limits, the system should block their requests (often returning an HTTP status code 429 Too Many Requests).

Here is a nice diagram from ByteByteGo which shows Rate Limiter in action:

Key Requirements in Interviews

When designing a rate limiter, interviewers usually want to see if you can handle:

Correctness --- Ensuring requests beyond the limit are rejected.
Efficiency --- Handling millions of requests per second with low latency.
Scalability --- Working in a distributed system across multiple servers.
Fairness --- Avoiding loopholes where burst traffic is allowed.
Configurability --- Easy to change limits per user, per API, etc.

You can also ask questions to clarify any other requirements the Interview will have, like sometimes they ask you to put a limit on a particular URL and on a particular HTTP method.

Top 4 Rate Limiting Algorithms for Interview

Many different algorithms exist for rate limiting, each with trade-offs. Here are the most popular rate-limiting algorithms, which are also asked on technical interviews:

Fixed Window Counter

Divide time into fixed windows (e.g., every minute). Count requests.
Simple but can allow bursts at window boundaries.

Sliding Window Log

Store timestamps of requests in a log (array/queue). Remove old timestamps.
More accurate but requires memory proportional to the request volume.

Sliding Window Counter

Uses counters for current and previous windows, weighted by time.
Memory efficient, smoother than a fixed window.

Token Bucket / Leaky Bucket

Tokens are added at a fixed rate, and requests consume tokens.
Smooths traffic and is widely used in production systems.

How to design a Rate Limiter on Coding Interviews?

As a Java developer, it's important to not just explain the algorithm but also write clean, interview-ready Java code. In this article, I'll explain the approaches and show you Java implementations for two popular solutions:

Sliding Window Log (array of timestamps) --- the one I personally used in interviews.
Token Bucket --- the production-grade solution widely used in APIs.

1. Sliding Window Log in Java

This method maintains a queue of timestamps for each request. Before processing a new request:

Remove timestamps older than the configured time window.
If the queue size is below the limit, allow the request and insert the new timestamp.
Otherwise, reject it.

Here is how it works:

Now, let's see the implementation in Java code:

import java.util.*;\
public class RateLimiter {\
    private final int maxRequests;\
    private final long windowSizeInMillis;\
    private final Deque<Long> requestTimestamps;\
    public RateLimiter(int maxRequests, int windowSizeInSeconds) {\
        this.maxRequests = maxRequests;\
        this.windowSizeInMillis = windowSizeInSeconds * 1000L;\
        this.requestTimestamps = new ArrayDeque<>();\
    }\
    public synchronized boolean allowRequest() {\
        long now = System.currentTimeMillis();\
        // Remove old timestamps\
        while (!requestTimestamps.isEmpty() &&\
               requestTimestamps.peekFirst() <= now - windowSizeInMillis) {\
            requestTimestamps.pollFirst();\
        }\
        if (requestTimestamps.size() < maxRequests) {\
            requestTimestamps.addLast(now);\
            return true;\
        } else {\
            return false;\
        }\
    }\
    // Demo\
    public static void main(String[] args) throws InterruptedException {\
        RateLimiter limiter = new RateLimiter(5, 10); // 5 requests per 10 seconds\
        for (int i = 1; i <= 7; i++) {\
            if (limiter.allowRequest()) {\
                System.out.println("Request " + i + ": Allowed");\
            } else {\
                System.out.println("Request " + i + ": Blocked");\
            }\
            Thread.sleep(1000);\
        }\
    }\
}

Sample Output
Request 1: Allowed
Request 2: Allowed
Request 3: Allowed
Request 4: Allowed
Request 5: Allowed
Request 6: Blocked
Request 7: Blocked

This solution is perfect for interviews because it's simple, intuitive, and demonstrates your understanding of sliding windows.

2. Token Bucket in Java

The Token Bucket algorithm is widely used in production (e.g., API gateways, microservices).

Tokens are added at a fixed rate.
Each request consumes one token.
If no tokens are available, the request is rejected.

Here is how Tocken Bucket Algorithms work:

Now, let's see the Java code:

public class TokenBucket {\
    private final int capacity;\
    private final double refillRate; // tokens per second\
    private double tokens;\
    private long lastRefillTimestamp;

public TokenBucket(int capacity, double refillRate) {\
        this.capacity = capacity;\
        this.refillRate = refillRate;\
        this.tokens = capacity;\
        this.lastRefillTimestamp = System.nanoTime();\
    }\
    public synchronized boolean allowRequest() {\
        long now = System.nanoTime();\
        double tokensToAdd = ((now - lastRefillTimestamp) / 1e9) * refillRate;\
        tokens = Math.min(capacity, tokens + tokensToAdd);\
        lastRefillTimestamp = now;\
        if (tokens >= 1) {\
            tokens -= 1;\
            return true;\
        } else {\
            return false;\
        }\
    }\
    // Demo\
    public static void main(String[] args) throws InterruptedException {\
        TokenBucket bucket = new TokenBucket(10, 5); // 5 tokens/sec, burst up to 10\
        for (int i = 1; i <= 20; i++) {\
            if (bucket.allowRequest()) {\
                System.out.println("Request " + i + ": Allowed");\
            } else {\
                System.out.println("Request " + i + ": Blocked");\
            }\
            Thread.sleep(200);\
        }\
    }\
}

This implementation is thread-safe and performs well under concurrent loads.

Interview Strategy (for Java Developers)

When asked, "How would you design a rate limiter?" in a Java system design interview:

Start with Fixed Window Counter (simple but has edge cases).
Move to Sliding Window Log (use Deque<Long> in Java).
Mention Token Bucket (useful in production systems).
For distributed systems, bring up Redis-based counters or API Gateway features (e.g., Nginx, Envoy).

This shows both breadth (knowledge of algorithms) and depth (working Java code).

System Design Interview Resources

In order to do well on any interview, resources are very important. Before any System Design and Coding interview, I used to read the following resources

ByteByteGo: click here

I have personally bought their System Design books to speed up my preparation, and joined ByteByteGo for comprehensive preparation.

They are now also giving a 50% discount on their lifetime plan, which is what I have, and I highly recommend that to anyone preparing for the System Design interview.

Join ByteByteGo now for a 50% Discount: click here

Codemia.io : Click here

This is another great platform to practice System design problems for interviews. It has more than 120+ System design problems, many of which are free, and also a proper structure to solve them.

They also have a great platform, editorial solution, and tools to help you practice system design questions online, and the best thing is that they are also offering a 60% discount on their lifetime plan.

I usually combine ByteByteGo (theory), Codemia (practice), and Exponent (mock interview) for a complete prep

Here is the link to get discount --- Join Codemia for 60% Discount

Exponent: Click here
A specialized site for interview prep, especially for FAANG companies like Amazon and Google. They also have a great system design course and many other materials and mock interviews that can help you crack FAANG interviews.

They are also offering a 70% discount now on their annual plan, which makes it a great time to join them.

Here is the link to get discount --- Join Exponent for70% OFF

Conclusion

Rate limiting is one of those interview questions that tests both your algorithm knowledge and system design intuition.

If you just need something clean in an interview, go with the Sliding Window Log approach (with a Deque<Long> in Java).
If you want to demonstrate production-grade knowledge, mention and explain the Token Bucket algorithm.

That way, you cover both the practical coding side and the system design side in one answer.

DEV Community