DEV Community

Cover image for Our Spring Boot API Froze Under Load — Here's Exactly How We Fixed It
Gauri Katara
Gauri Katara

Posted on • Originally published at gaurikatara.hashnode.dev

Our Spring Boot API Froze Under Load — Here's Exactly How We Fixed It

This article was originally published on
My Hashnode blog [https://gaurikatara.hashnode.dev]

The API didn't throw an error. It just... stopped responding or frozen.

No exceptions in the logs. No 500 errors. No obvious reason. Just requests piling up, response times climbing from milliseconds to 30 seconds, and then — timeouts. It was strange. Traffic had picked up slightly — nothing unusual for that time of day — and suddenly our backend service was barely responding. Users were seeing blank screens. This is the story from one of my typical workdays about how we diagnosed and fixed issue in our Spring Boot microservice — and what I learned from it that I hadn't found clearly explained anywhere online at single point.

I began investigating all the potential causes:

  • Network issues

  • Examine database metrics

  • Monitor CPU and memory usage

  • Inspect thread dumps

  • Review load balancer logs

What I discovered was surprising—it was related to the threads.

The issue stemmed from thread pool exhaustion in production.

WHAT IS THREAD POOL EXHAUSTION — and why is it so sneaky?

Spring Boot uses an embedded Tomcat server by default. Tomcat handles incoming HTTP requests using a fixed pool of threads. By default, this pool has a maximum of 200 threads. Here is how it works normally: a request comes in, Tomcat assigns it a thread, the thread does its work, returns a response, and the thread goes back to the pool. Fast, clean, repeatable.

Now here is the problem: what if a thread gets stuck waiting?

  • Maybe it's calling an external API that is slow.

  • Maybe it's waiting on a database query that is taking too long.

  • Maybe it's doing something blocking that it shouldn't be.

If enough threads get stuck waiting at the same time, the pool runs out. New requests come in but there are no threads available to handle them. So they wait in a queue. The queue fills up. New requests start getting rejected or timing out. Your API is not down. It is not throwing errors. It is just frozen. And the worst part — from the outside it looks exactly like a network issue or a deployment problem. Most developers waste hours looking in the wrong place.

Thread pool exhaustion does not always look like an error. It often looks like extreme slowness or total unresponsiveness — with perfectly clean logs.

HOW WE DIAGNOSED IT

Step 1 — Check the thread pool metrics

The first thing I did after ruling out obvious causes — no deployment had happened, no database was down — was check our application metrics on AWS CloudWatch. We had Spring Boot Actuator enabled, which exposes metrics including thread pool stats. I looked at the active thread count and it was sitting at exactly 200 — the Tomcat default maximum. That was the first real clue. If your team has Actuator set up, you can check this yourself at: /actuator/metrics/tomcat.threads.busy If that number is at or near your maximum thread count while the API is slow, thread pool exhaustion is almost certainly your problem.

Step 2 — Take a thread dump

To confirm, I took a thread dump while the service was under load. A thread dump shows you exactly what every thread in your JVM is doing at that moment. You can trigger one using:

Using jstack (get the PID first with: ps aux | grep java)

jstack > thread_dump.txt

Or via Spring Boot Actuator if enabled

Postman -- http://localhost:8080/actuator/threaddump

When I opened the thread dump, I saw the same pattern repeating across most of the 200 threads. They were all stuck in a WAITING or TIMED_WAITING state, blocked on an HTTP call to an external third-party service we were calling to enrich our response data. That external service had started responding slowly — averaging 25 seconds per call instead of the usual 200ms. Our code was calling it synchronously, blocking the thread the entire time. With enough concurrent requests, every thread in the pool was stuck waiting for that external service. New requests had nowhere to go.

THE FIX

Step 3 — The fix had two parts

We fixed it with two changes. The first was immediate — add a timeout to the external HTTP call so threads don't wait forever. The second was the proper fix — make the call non-blocking.

Part 1 — Add connection and read timeouts immediately

This was a quick fix we could deploy right away. We configured our RestTemplate with explicit timeouts:

@Configuration public class RestTemplateConfig {

@Bean
public RestTemplate restTemplate() {
    HttpComponentsClientHttpRequestFactory factory =
        new HttpComponentsClientHttpRequestFactory();

    // Don't wait more than 3 seconds to connect
    factory.setConnectTimeout(3000);

    // Don't wait more than 5 seconds for a response
    factory.setReadTimeout(5000);

    return new RestTemplate(factory);
}
}
Enter fullscreen mode Exit fullscreen mode

Always set timeouts on any external HTTP call. No timeout means a thread can be stuck waiting forever. This is one of the most common mistakes in Java backend services.

Part 2 — Increase thread pool size as a short-term buffer

While we worked on the proper async solution, we also increased the Tomcat thread pool size to give us more breathing room under load. This goes in your application.properties:

Maximum number of threads to handle requests
server.tomcat.threads.max=400

Minimum threads always kept alive
server.tomcat.threads.min-spare=20

Max requests that can wait when all threads are busy
server.tomcat.accept-count=100 Increasing thread count is not a real fix — it just delays the problem. If your threads are blocking, adding more threads means more threads will block. Always fix the root cause.

Part 3 — The proper fix: make the external call async

The real solution was to stop blocking a thread while waiting for the external service. We used Spring's @async with a dedicated thread pool for external calls, isolating them from the main request-handling threads:

@Configuration 
@EnableAsync 
public class AsyncConfig {

@Bean(name = "externalCallExecutor")
public Executor externalCallExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(10);
    executor.setMaxPoolSize(50);
    executor.setQueueCapacity(100);
    executor.setThreadNamePrefix("external-call-");
    executor.initialize();
    return executor;
}
}
Enter fullscreen mode Exit fullscreen mode

Now the external call runs on its own isolated thread pool. Even if that external service slows down completely, our main Tomcat threads keep handling requests normally. The two concerns are fully separated.

What happened after the fix

After deploying the timeout fix first, the immediate crisis resolved. Threads were no longer getting stuck indefinitely — after 5 seconds they would time out, return an error or fallback response, and free up for the next request. After the async refactor, the problem disappeared entirely. Active thread count during peak traffic dropped from 200 (maxed out) to consistently under 40. Response times went back to normal. The external service could be slow or even temporarily down, and our API kept working.

What I learned from this

  • Always set timeouts on external calls. Every HTTP call your service makes to another service or API must have a connect timeout and a read timeout. No exceptions. This is the single most common cause of thread pool exhaustion.

  • Thread pool exhaustion looks like slowness, not errors. If your API is suddenly unresponsive with no exceptions in the logs, check your active thread count before you do anything else.

  • Increase thread pool size only as a temporary measure. It buys you time but does not fix the root cause. The real fix is to stop blocking threads in the first place.

  • Isolate slow dependencies. Any external call that could be slow or unreliable should run on its own thread pool, completely separate from your main request-handling threads. This is called bulkhead pattern — and it is one of the most important resilience patterns in microservices.

  • Actuator is your best friend in production. If you are running Spring Boot and you do not have Actuator enabled with metrics exposed, you are flying blind. Enable it. It will save you hours the next time something goes wrong. --- I hope this walkthrough helped.

Thread pool exhaustion may seem complex but is easily fixable once you know the signs. If you've faced this issue or are dealing with it now, share your approach in the comments. If you found this useful, follow me for insights on Java backend engineering, Spring Boot, Kafka, AWS, and real-world production challenges

Top comments (0)