While working on a client’s Python-based FastAPI application, everything looked stable at first. The APIs were responding, health checks were passing, and there were no visible errors.
But once the application went into real usage, a serious issue appeared.
Whenever one dependent API (internal or external) became slow or stopped responding, the entire application stopped responding. Even unrelated endpoints were stuck.
Initial Observations
At first, it looked like:
A network problem
An external API issue
Or something wrong with FastAPI itself
But logs didn’t show crashes or exceptions. The application was running — just not responding.
Root Cause
After analyzing the setup, the issue turned out to be simple but critical:
👉 The FastAPI application was running with only ONE worker
This meant:
A single process handled all incoming requests
If one request got blocked while waiting for another API
That worker stayed busy
All other requests were forced to wait
So one slow API call could make the entire service appear down.
Why This Happens
FastAPI applications are usually run using:
uvicorn
Or gunicorn with Uvicorn workers
When started with default settings, the server runs with a single worker process. This is fine for development, but risky for production workloads that depend on other services.
What Was Changed
To improve reliability, the application was configured to run with multiple workers.
For example:
uvicorn main:app --workers 2
With multiple workers:
Each worker runs as a separate process
One blocked request doesn’t block the entire application
Other workers can continue serving traffic
Results After Adding Workers
After increasing the worker count:
The application stopped freezing
Slow external APIs no longer affected all requests
Health checks stayed stable
Overall responsiveness improved
This small configuration change made a significant difference.
*Important Trade-Off 🚨
*
Adding workers improves concurrency, but it comes with a cost.
*Each worker:
*
Is a separate process
Loads the application into memory
Consumes RAM independently
So increasing workers can:
Double memory usage
Cause OOM issues if the instance is small
Because of this, the worker count must be chosen carefully.
*Key Takeaway
*
The issue wasn’t with FastAPI itself.
*The real lesson:
*
Production FastAPI apps should not rely on a single worker
Worker count must match workload and available memory
More workers improve resilience, but only up to a point
*Final Thoughts
*
In distributed systems, it’s common for dependent services to slow down or fail.
Running FastAPI with multiple workers helps prevent a single slow request from taking down the entire service — but it should always be balanced against memory limits.
Sometimes, stability improvements come from how the application is run, not from rewriting code.
Top comments (0)