DEV Community

Anh Trần Tuấn
Anh Trần Tuấn

Posted on • Originally published at tuanh.net on

Reasons the Thundering Herd Problem Happens and How to Fix It

1. What is the Thundering Herd Problem?

The Thundering Herd Problem is a situation where multiple processes or threads wake up simultaneously to handle an event or resource but compete for the same resource, causing system overload. Imagine a herd of animals running towards a small gate at once; each tries to go through, but their simultaneous arrival causes a bottleneck. This analogy applies to systems when many processes are "sleeping" or "waiting" for a resource like a lock, database connection, or network socket. When that resource becomes available, they all rush in simultaneously, overwhelming the system.

Image

1.1 Causes of the Thundering Herd Problem

The root cause of the Thundering Herd Problem is poor synchronization between threads or processes when accessing a shared resource. Some common scenarios where this issue arises are:

  • Lock contention : When multiple threads are waiting for a lock, and they all wake up when the lock is released.
  • Network timeouts : Many processes waiting for the same external network resource (e.g., a database server) may retry simultaneously when the resource becomes available.
  • Cache invalidation : In distributed systems, multiple services may request the same data simultaneously after a cache entry expires, leading to a surge in requests to the backend service.

1.2 Real-World Example: Connection Pooling

Image

To illustrate this issue, consider a database connection pool in a web application. Suppose multiple web requests are waiting for a connection, but the pool is full. When one connection becomes available, many requests wake up simultaneously to grab the connection. This can overload the pool and degrade performance. Here’s an example in Java:

public class ConnectionPoolExample {
    private final int MAX_CONNECTIONS = 10;
    private final Semaphore pool = new Semaphore(MAX_CONNECTIONS);

    public Connection getConnection() throws InterruptedException {
        pool.acquire(); // Threads wait here until a connection is available
        try {
            // Simulate DB connection
            return new Connection();
        } finally {
            pool.release(); // Return the connection to the pool
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

When multiple threads attempt to acquire a connection, the Thundering Herd Problem can occur if they all wake up when a connection is released.

2. Techniques to Prevent the Thundering Herd Problem

Several techniques can be used to avoid or mitigate the Thundering Herd Problem. These methods focus on improving synchronization and distributing the load more evenly.

2.1 Exponential Backoff

Image

Exponential backoff is a technique often used in retry mechanisms for network calls. Instead of all requests retrying simultaneously after a failure or timeout, the wait time between retries increases exponentially with each attempt. This ensures that not all requests flood the server at once. Here’s how you might implement it:

public class ExponentialBackoff {
    public void retryWithBackoff() throws InterruptedException {
        int retries = 0;
        int maxRetries = 5;
        while (retries < maxRetries) {
            try {
                // Simulate network request
                makeNetworkRequest();
                return;
            } catch (IOException e) {
                retries++;
                long backoffTime = (long) Math.pow(2, retries) * 1000;
                System.out.println("Retrying in " + backoffTime + " ms");
                Thread.sleep(backoffTime);
            }
        }
        throw new RuntimeException("Failed after retries");
    }
}
Enter fullscreen mode Exit fullscreen mode

In this example, the wait time doubles after each failed attempt, reducing the likelihood that all processes will retry at the same moment.

2.2 Token Bucket Rate Limiting

Image

Rate limiting is another effective strategy to control how frequently a resource is accessed. The token bucket algorithm is a common rate-limiting mechanism where processes must acquire tokens before proceeding. The tokens are added to the bucket at a fixed rate, ensuring that only a limited number of requests can proceed at any given time.

public class TokenBucketRateLimiter {
    private final int capacity;
    private int tokens;
    private final long refillInterval; // Time in ms
    private long lastRefillTime;

    public TokenBucketRateLimiter(int capacity, long refillInterval) {
        this.capacity = capacity;
        this.refillInterval = refillInterval;
        this.tokens = capacity;
        this.lastRefillTime = System.currentTimeMillis();
    }

    public synchronized boolean tryAcquire() {
        refill();
        if (tokens > 0) {
            tokens--;
            return true;
        }
        return false;
    }

    private void refill() {
        long now = System.currentTimeMillis();
        long elapsed = now - lastRefillTime;
        if (elapsed >= refillInterval) {
            tokens = Math.min(tokens + (int) (elapsed / refillInterval), capacity);
            lastRefillTime = now;
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

This method ensures that only a fixed number of processes can proceed at once, preventing an overwhelming number of simultaneous accesses to a resource.

2.3 Bulkhead

To protect our system from the Thundering Herd Problem, you can apply a Bulkhead fault tolerance pattern. I have an article on the topic right here, feel free to check it out!

3. Practical Demo: Comparing Techniques

To demonstrate the impact of these solutions, we set up two environments: one with no mitigation and one using exponential backoff. We ran multiple concurrent processes trying to access the same database. Here are the results:

3.1 Without Exponential Backoff

When 50 threads tried to access the database at the same time without exponential backoff, the connection pool was overwhelmed. Most threads failed or timed out, and the server’s response time increased dramatically.

3.2 With Exponential Backoff

Using exponential backoff, the load on the database was distributed over time. Fewer threads timed out, and the server maintained stable performance even under high load.

4. Conclusion

The Thundering Herd Problem can cripple your system if not handled properly. By implementing strategies like exponential backoff, token bucket rate limiting, and other load-distribution techniques, you can ensure that your system remains stable and performant. Whether you're managing connections in a pool or handling a flood of network requests, these methods provide a robust way to prevent bottlenecks.

If you have any questions or need further clarification on handling the Thundering Herd Problem, feel free to leave a comment below!

Read posts more at : Reasons the Thundering Herd Problem Happens and How to Fix It

AWS GenAI LIVE image

How is generative AI increasing efficiency?

Join AWS GenAI LIVE! to find out how gen AI is reshaping productivity, streamlining processes, and driving innovation.

Learn more

Top comments (0)

AWS Q Developer image

Your AI Code Assistant

Generate and update README files, create data-flow diagrams, and keep your project fully documented. Built to handle large projects, Amazon Q Developer works alongside you from idea to production code.

Get started free in your IDE

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay