Haripriya Veluchamy

Posted on Mar 17

Why Your PostgreSQL Keeps Running Out of Connections

#postgres #devops #python #cloud

PostgreSQL connection errors are one of those things that look terrifying when they hit production.

I used to think:
"Why is the database refusing connections?"
"Did something crash?"
"Is the server overloaded?" 😅

Recently, while working on a production system, I ran into the classic TooManyConnectionsError. Not once twice. On two different services. Same database, same root cause.

That experience helped me clearly understand why this happens and how to fix it properly.

This post is me breaking that down in a simple way, based on what actually worked.

What does TooManyConnectionsError actually mean?

In simple terms, PostgreSQL has a limit on how many connections it allows at the same time.

This limit is set by max_connections and is usually:

50–100 for small/basic tiers
100–200 for general purpose tiers

When your application tries to open more connections than this limit, PostgreSQL says no. That's the error.

The important thing to understand:

The database isn't down. It just has no room for new connections.

But I'm using a connection pool... why is this still happening?

This is the part that confused me.

I was already using a connection pool. Every tutorial says "use a pool" and I did. So why was I still exhausting connections?

Here's what I found out:

The problem wasn't that I didn't have a pool. The problem was I had too many pools.

Let me explain.

How the bug actually works

Most people write their database client class something like this:

class DatabaseClient:
    def __init__(self):
        self.pool = create_pool(min=2, max=10)

Looks clean right? Every instance gets its own pool. Professional.

But here's what happens in a real application:

Health checker creates a DatabaseClient() → pool of 10
Dependency checker creates a DatabaseClient() → another pool of 10
Each worker/task creates a DatabaseClient() → another pool of 10 each

If you have 6 workers, that's already:

10 + 10 + (6 × 10) = 80 connections from ONE container

Run 2 containers? That's 160.

Your database allows 100.

💥 TooManyConnectionsError

And the worst part? Each pool individually looks reasonable. It's only when you add them all up that it explodes.

The fix is embarrassingly simple

One process, one pool. Everything borrows from it.

Instead of each class creating its own pool, create ONE pool at the module/process level and every instance uses that shared pool.

# Created once at module level
_shared_pool = None

def get_shared_pool():
    if _shared_pool is None:
        _shared_pool = create_pool(min=2, max=5)
    return _shared_pool

class DatabaseClient:
    def get_connection(self):
        return get_shared_pool().acquire()

Now it doesn't matter how many DatabaseClient() instances you create. They all share the same 5 connections.

That's it. That's the fix.

Things I learned the hard way

Here are some gotchas that bit me:

1. The close() trap

If your client class has a close() method that closes the pool, and some other code calls it mid-process congratulations, you just killed the pool for everyone.

Make close() a no-op on individual instances. Only close the shared pool when the entire process shuts down.

2. The cascade effect

When the database runs out of connections, it doesn't just fail your query. It also fails your health check. And when the health check fails, your orchestrator thinks the container is unhealthy and might restart it. Which creates new pools. Which makes things worse.

I literally got two alerts 30 seconds apart. First one: TooManyConnectionsError. Second one: dependency_check_failed. Same container, same root cause. One bug, two pages.

3. Make pool size configurable

Use an environment variable like PG_POOL_MAX_SIZE=5. When you're debugging at 2 AM, you don't want to redeploy just to change a number.

4. Do the napkin math

Before deploying, always calculate:

pool_max_size × max_replicas < max_connections - admin_headroom

Example:

Pool max: 5
Total replicas across all services: 10
Total: 50
Database max_connections: 100
Admin headroom: 20
50 < 80 ✅

If the math doesn't work, either reduce pool sizes or put a connection pooler like PgBouncer in front of the database.

Common mistakes I see (and made myself)

Creating a new pool per class instance instead of sharing one
Not accounting for multiple services hitting the same database
Forgetting that container scaling = more pools = more connections
Not closing pools gracefully on shutdown (idle connections linger)
No rollback plan when the fix itself has a bug 😅

A simple checklist before you deploy

Before pushing connection pool changes to production:

✅ Single shared pool at module/process level
✅ Pool size is configurable via env variable
✅ close() on individual instances is a no-op
✅ Shared pool closes on process shutdown
✅ Napkin math checks out
✅ Code compiles/passes syntax check (trust me on this one 😅)

Final thoughts

Connection pool exhaustion doesn't have to be scary.

If I had to summarize everything in one line:

Many pools = trouble. One shared pool = peace.

The bug is almost never that you forgot to use a pool. It's that you accidentally created too many of them.

Once you understand this, TooManyConnectionsError stops being a 1 AM panic and becomes just another thing you know how to handle.

Hope this helps someone who just got paged for the first time because their database "ran out of connections"... 🙂

DEV Community