Understanding Django, Gunicorn, and Database Connections

#architecture #django #database #performance

Today, I was investigating an error that drew my attention to Gunicorn workers and Django database connections. I realized this is an important topic for anyone building scalable Django applications.

Gunicorn Workers and Django Connections

Let's clarify a common misconception: each Gunicorn worker is a forked process of the master. This means the Django CONN_MAX_AGE parameter applies independently within each worker process.

CONN_MAX_AGE = 0 → A new database connection is created and dropped for every request.
CONN_MAX_AGE = None → Connections are kept open indefinitely, allowing Django to reuse them across all requests handled by that worker.
CONN_MAX_AGE > 0 → Connections are reused by the worker for the specified time (in seconds) and dropped after the timeout.

Important: Workers do not share database connections. Each worker maintains its own connection(s) based on CONN_MAX_AGE.

Ideal Configuration by Traffic

Case 1: Low Traffic

CONN_MAX_AGE = 0

Behavior: Each worker creates and drops connections per request.

Advantages:

Minimal number of DB connections
Fresh connection for each request
Works well for low-traffic or development environments

Disadvantages:

Performance overhead due to frequent connection creation
Increased network traffic
Scalability issues under higher load

Case 2: Medium/High Traffic

CONN_MAX_AGE ≈ 600 seconds

Behavior: Each worker maintains its connections for 10 minutes, creating new ones only after the timeout.

Advantages:

Reduced latency
Lower database overhead
Better connection management
Increased application resilience

Disadvantages:

Slightly higher memory/resource consumption per worker
Configuration complexity
Less effective on very low-traffic sites

Case 3: Very High Traffic

CONN_MAX_AGE = None

Behavior: Connections are kept open indefinitely. Workers do not create/destroy connections per request.

Advantages:

Reduced overhead on DB server
Improved performance
Efficient resource usage

Disadvantages:

Risk of deadlocks or long-held locks if not managed properly
Scalability challenges if worker count increases significantly
Requires careful handling in ASGI or async contexts

Worker Example

Suppose you have 2 Gunicorn workers:

Case 1: Each worker can have 1 active connection per request, frequently changing based on incoming requests.
Case 2: Each worker maintains 1 persistent connection per worker, refreshed every CONN_MAX_AGE seconds.
Case 3: Each worker maintains 1 persistent connection indefinitely, minimizing connection churn.

Common Misconception: More Workers = Better Performance

It is a myth that increasing the number of Gunicorn workers automatically improves performance. In reality:

More workers share the same CPU and memory resources, so each worker is still limited by the host's capacity.
Instead of increasing worker count, it's recommended to increase the number of running containers. This reduces per-container overhead and allows horizontal scaling, which is far more effective in real-world production environments.

The Worker Formula and Its Practical Pitfall

You may have heard the formula for calculating the number of workers:

Number of workers = 2 × N + 1
where N is the number of CPU cores.

For example, if your system has 8 vCPUs, you might assume you can safely run 17 workers.
But in real life, you shouldn't.

Instead of maxing out a single large instance, it's better to use multiple smaller instances - for example, servers with 1–2 vCPUs running 3–5 workers each.
If the load grows, scale horizontally by adding more containers or instances, rather than vertically increasing CPU and RAM on one machine.

Why You Should Avoid Vertical Scaling ?

Vertical scaling (increasing CPU/memory on a single host) has several drawbacks:

It's expensive — larger instances cost disproportionately more.
Even with more CPU and memory, you can still be bottlenecked at the network layer.
Network saturation often limits how many database connections or DNS resolutions can occur simultaneously.

In my recent investigation, we were using a high number of workers with CONN_MAX_AGE = 0, and encountered DNS resolution errors for the database hostname (DB host name not resolved).
The issue wasn't with the database or Django — it was due to network-level bottlenecks caused by too many workers constantly opening and closing connections.

After reducing the number of workers, the system stabilized and handled high load efficiently, even with the same configuration aside from CONN_MAX_AGE.

Best Practices

Use connection pooling tools like PgBouncer or AWS RDS Proxy. These handle connection management externally, improving scalability and reducing the overhead on your microservices.
Fine-tune CONN_MAX_AGE based on traffic patterns and worker count.
Monitor Gunicorn worker memory and connection usage to avoid unexpected bottlenecks.