FastAPI is fast. Clean. Productive.
For MVPs, it’s excellent.
But once traffic increases, the bottlenecks start appearing, and most of them are architectural, not framework-related.
Here are 5 real production issues we’ve seen when FastAPI services start handling real concurrency.
1. Event Loop Blocking (Async Done Wrong)
Just because your endpoint is async def doesn’t mean your system is non-blocking.
Common mistakes:
- CPU-heavy operations inside request handlers
- Sync DB calls inside async endpoints
- Large JSON serialization
- Data processing (Pandas, ML inference)
- Blocking third-party SDKs
Under light traffic → everything looks fine.
Under concurrency → latency increases across all endpoints.
Why?
Because the event loop is blocked.
What to do instead
- Offload CPU-bound work to worker processes
- Use async-native database drivers
- Push heavy processing to a task queue
- Test under realistic concurrency (Locust / k6)
Async is a tool, not magic.
2. Database Connection Pool Exhaustion
Default pool configurations are rarely production-ready.
Symptoms under load:
- Requests hang
- Timeout errors
- Increased p95 latency
- DB CPU spikes
The application appears “up” but becomes progressively slower.
Fix
- Explicitly configure pool size
- Monitor active vs idle connections
- Avoid long-running transactions
- Consider read replicas for heavy reads
Connection pools are capacity limits. Treat them like infrastructure planning, not defaults.
3. BackgroundTasks ≠ Distributed Queue
FastAPI’s BackgroundTasks works for small, quick tasks.
It does not scale well for:
- Bulk email sending
- File processing
- Report generation
- Long-running workflows
Under load, background tasks compete with incoming requests.
This reduces throughput.
Proper solution
Use a real queue:
- Celery
- RQ
- Dramatiq
- Redis / RabbitMQ backed workers
Separate request handling from asynchronous workload processing.
4. Uvicorn Defaults in Production
Many deployments run something like:
uvicorn main:app
Single worker. Default config.
Under traffic:
- CPU saturates
- Requests queue
- Latency spikes
Production approach
Use Gunicorn with Uvicorn workers:
gunicorn -k uvicorn.workers.UvicornWorker -w 4 main:app
Tune workers based on CPU cores and workload type.
Measure:
- p95 latency
- p99 latency
- Request throughput
- Worker restarts
Production tuning is not optional.
5. Memory Growth Under Concurrency
This one is subtle.
Under concurrency:
- Large response objects accumulate
- Inefficient dependency injection patterns
- In-memory caching misuse
- Objects not released quickly
Symptoms:
- Gradual memory increase
- Higher GC pressure
- Container restarts
Mitigation
- Profile memory usage
- Stream large responses
- Keep request-scoped dependencies clean
- Monitor container memory continuously
Scaling amplifies small inefficiencies.
The Core Insight
FastAPI is not the scalability layer.
It’s a framework.
Scalability comes from:
- Architecture decisions
- Load testing
- Capacity planning
- Observability
- Separation of concerns
Most “FastAPI performance issues” are system design issues.
Before You Scale a FastAPI SaaS
Validate:
- Async correctness
- DB pool configuration
- Worker strategy
- Background processing separation
- Load testing under realistic traffic
- p95 / p99 latency tracking
Production problems don’t show up in development.
They show up when marketing works.
If you're building SaaS systems with FastAPI, we documented deeper production lessons and architectural breakdowns on our engineering blog.
Curious to hear what others have seen under load, what surprised you most?
Top comments (0)