Zestminds Technologies

Posted on Mar 1 • Originally published at zestminds.com

FastAPI Under Load: 5 Production Issues Most Teams Discover Too Late

#python #fastapi #backend #architecture

FastAPI is fast. Clean. Productive.

For MVPs, it’s excellent.

But once traffic increases, the bottlenecks start appearing, and most of them are architectural, not framework-related.

Here are 5 real production issues we’ve seen when FastAPI services start handling real concurrency.

1. Event Loop Blocking (Async Done Wrong)

Just because your endpoint is async def doesn’t mean your system is non-blocking.

Common mistakes:

CPU-heavy operations inside request handlers
Sync DB calls inside async endpoints
Large JSON serialization
Data processing (Pandas, ML inference)
Blocking third-party SDKs

Under light traffic → everything looks fine.
Under concurrency → latency increases across all endpoints.

Why?

Because the event loop is blocked.

What to do instead

Offload CPU-bound work to worker processes
Use async-native database drivers
Push heavy processing to a task queue
Test under realistic concurrency (Locust / k6)

Async is a tool, not magic.

2. Database Connection Pool Exhaustion

Default pool configurations are rarely production-ready.

Symptoms under load:

Requests hang
Timeout errors
Increased p95 latency
DB CPU spikes

The application appears “up” but becomes progressively slower.

Fix

Explicitly configure pool size
Monitor active vs idle connections
Avoid long-running transactions
Consider read replicas for heavy reads

Connection pools are capacity limits. Treat them like infrastructure planning, not defaults.

3. BackgroundTasks ≠ Distributed Queue

FastAPI’s BackgroundTasks works for small, quick tasks.

It does not scale well for:

Bulk email sending
File processing
Report generation
Long-running workflows

Under load, background tasks compete with incoming requests.

This reduces throughput.

Proper solution

Use a real queue:

Celery
RQ
Dramatiq
Redis / RabbitMQ backed workers

Separate request handling from asynchronous workload processing.

4. Uvicorn Defaults in Production

Many deployments run something like:

uvicorn main:app

Single worker. Default config.

Under traffic:

CPU saturates
Requests queue
Latency spikes

Production approach

Use Gunicorn with Uvicorn workers:

gunicorn -k uvicorn.workers.UvicornWorker -w 4 main:app

Tune workers based on CPU cores and workload type.

Measure:

p95 latency
p99 latency
Request throughput
Worker restarts

Production tuning is not optional.

5. Memory Growth Under Concurrency

This one is subtle.

Under concurrency:

Large response objects accumulate
Inefficient dependency injection patterns
In-memory caching misuse
Objects not released quickly

Symptoms:

Gradual memory increase
Higher GC pressure
Container restarts

Mitigation

Profile memory usage
Stream large responses
Keep request-scoped dependencies clean
Monitor container memory continuously

Scaling amplifies small inefficiencies.

The Core Insight

FastAPI is not the scalability layer.

It’s a framework.

Scalability comes from:

Architecture decisions
Load testing
Capacity planning
Observability
Separation of concerns

Most “FastAPI performance issues” are system design issues.

Before You Scale a FastAPI SaaS

Validate:

Async correctness
DB pool configuration
Worker strategy
Background processing separation
Load testing under realistic traffic
p95 / p99 latency tracking

Production problems don’t show up in development.

They show up when marketing works.

If you're building SaaS systems with FastAPI, we documented deeper production lessons and architectural breakdowns on our engineering blog.

Curious to hear what others have seen under load, what surprised you most?

DEV Community