PostgreSQL connection errors are one of those things that look terrifying when they hit production.
I used to think:
"Why is the database refusing connections?"
"Did something crash?"
"Is the server overloaded?" π
Recently, while working on a production system, I ran into the classic TooManyConnectionsError. Not once twice. On two different services. Same database, same root cause.
That experience helped me clearly understand why this happens and how to fix it properly.
This post is me breaking that down in a simple way, based on what actually worked.
What does TooManyConnectionsError actually mean?
In simple terms, PostgreSQL has a limit on how many connections it allows at the same time.
This limit is set by max_connections and is usually:
- 50β100 for small/basic tiers
- 100β200 for general purpose tiers
When your application tries to open more connections than this limit, PostgreSQL says no. That's the error.
The important thing to understand:
The database isn't down. It just has no room for new connections.
But I'm using a connection pool... why is this still happening?
This is the part that confused me.
I was already using a connection pool. Every tutorial says "use a pool" and I did. So why was I still exhausting connections?
Here's what I found out:
The problem wasn't that I didn't have a pool. The problem was I had too many pools.
Let me explain.
How the bug actually works
Most people write their database client class something like this:
class DatabaseClient:
def __init__(self):
self.pool = create_pool(min=2, max=10)
Looks clean right? Every instance gets its own pool. Professional.
But here's what happens in a real application:
- Health checker creates a
DatabaseClient()β pool of 10 - Dependency checker creates a
DatabaseClient()β another pool of 10 - Each worker/task creates a
DatabaseClient()β another pool of 10 each
If you have 6 workers, that's already:
10 + 10 + (6 Γ 10) = 80 connections from ONE container
Run 2 containers? That's 160.
Your database allows 100.
π₯ TooManyConnectionsError
And the worst part? Each pool individually looks reasonable. It's only when you add them all up that it explodes.
The fix is embarrassingly simple
One process, one pool. Everything borrows from it.
Instead of each class creating its own pool, create ONE pool at the module/process level and every instance uses that shared pool.
# Created once at module level
_shared_pool = None
def get_shared_pool():
if _shared_pool is None:
_shared_pool = create_pool(min=2, max=5)
return _shared_pool
class DatabaseClient:
def get_connection(self):
return get_shared_pool().acquire()
Now it doesn't matter how many DatabaseClient() instances you create. They all share the same 5 connections.
That's it. That's the fix.
Things I learned the hard way
Here are some gotchas that bit me:
1. The close() trap
If your client class has a close() method that closes the pool, and some other code calls it mid-process congratulations, you just killed the pool for everyone.
Make close() a no-op on individual instances. Only close the shared pool when the entire process shuts down.
2. The cascade effect
When the database runs out of connections, it doesn't just fail your query. It also fails your health check. And when the health check fails, your orchestrator thinks the container is unhealthy and might restart it. Which creates new pools. Which makes things worse.
I literally got two alerts 30 seconds apart. First one: TooManyConnectionsError. Second one: dependency_check_failed. Same container, same root cause. One bug, two pages.
3. Make pool size configurable
Use an environment variable like PG_POOL_MAX_SIZE=5. When you're debugging at 2 AM, you don't want to redeploy just to change a number.
4. Do the napkin math
Before deploying, always calculate:
pool_max_size Γ max_replicas < max_connections - admin_headroom
Example:
- Pool max: 5
- Total replicas across all services: 10
- Total: 50
- Database max_connections: 100
- Admin headroom: 20
- 50 < 80 β
If the math doesn't work, either reduce pool sizes or put a connection pooler like PgBouncer in front of the database.
Common mistakes I see (and made myself)
- Creating a new pool per class instance instead of sharing one
- Not accounting for multiple services hitting the same database
- Forgetting that container scaling = more pools = more connections
- Not closing pools gracefully on shutdown (idle connections linger)
- No rollback plan when the fix itself has a bug π
A simple checklist before you deploy
Before pushing connection pool changes to production:
- β Single shared pool at module/process level
- β Pool size is configurable via env variable
- β
close()on individual instances is a no-op - β Shared pool closes on process shutdown
- β Napkin math checks out
- β Code compiles/passes syntax check (trust me on this one π )
Final thoughts
Connection pool exhaustion doesn't have to be scary.
If I had to summarize everything in one line:
Many pools = trouble. One shared pool = peace.
The bug is almost never that you forgot to use a pool. It's that you accidentally created too many of them.
Once you understand this, TooManyConnectionsError stops being a 1 AM panic and becomes just another thing you know how to handle.
Hope this helps someone who just got paged for the first time because their database "ran out of connections"... π
Top comments (0)