DEV Community

Ugur Aslim
Ugur Aslim

Posted on • Originally published at uguraslim.com

PostgreSQL Connection Pooling in FastAPI: PgBouncer vs. SQLAlchemy QueuePool vs. Render's Native Pools

PostgreSQL Connection Pooling in FastAPI: PgBouncer vs. SQLAlchemy QueuePool vs. Render's Native Pools

Connection pooling sounds boring until your SaaS is down at 2 AM because someone deployed with pool_size=5 and 47 concurrent requests are waiting for a connection. I've been there. I've shipped CitizenApp with three different pooling strategies, and each taught me something painful.

Here's the truth: connection pooling isn't just about performance—it's about preventing silent failures. A misconfigured pool doesn't scream; it quietly drops requests, exhausts your database connection limit, or leaks connections like a rotting pipe. This post is the methodology and configurations I use now.

Why Connection Pooling Matters More Than You Think

FastAPI is async, which means one worker can juggle hundreds of concurrent tasks. But PostgreSQL connections are synchronous and expensive—opening a new connection costs 5-50ms depending on your setup. Without pooling, each database query opens and closes a connection. At scale, you're spending more time handshaking than executing SQL.

Here's the silent killer: idle connections. If you set pool_size=50 and your app idles, you're holding 50 open PostgreSQL connections, each consuming memory on the database server. On Render's shared tier, you get ~100 connections total. Waste 50 of them, and you're one code deploy away from "too many connections" errors.

I prefer SQLAlchemy QueuePool in development because I can see pool exhaustion immediately (it blocks loudly). In production, I use PgBouncer in transaction mode on managed infrastructure, or Render's native pools if available. Each has tradeoffs.

The Three Strategies: Configuration & Tradeoffs

1. SQLAlchemy QueuePool (Development + Small Production Setups)

QueuePool is what you get out of the box with SQLAlchemy. It's not magic—it's a thread-safe queue of connections.

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

DATABASE_URL = "postgresql://user:pass@localhost/citizenapp"

engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=10,  # Connections actively maintained
    max_overflow=20,  # Extra connections if pool exhausted
    pool_recycle=3600,  # Recycle connections after 1 hour
    pool_pre_ping=True,  # Verify connection before use
    echo=False,
)
Enter fullscreen mode Exit fullscreen mode

Why this configuration:

  • pool_size=10: For a small team or single-tenant dev environment, 10 is safe. Each FastAPI worker with 2-4 workers = 20-40 total connections.
  • max_overflow=20: If load spikes, allow 20 extra connections. But they'll be closed after use—this is your escape hatch, not your baseline.
  • pool_recycle=3600: PostgreSQL closes idle connections after 30 minutes by default. Recycling at 1 hour prevents stale connection errors.
  • pool_pre_ping=True: Before handing out a connection, run a SELECT 1. Costs ~1ms per query, but saves cascading failures if the connection died.

When to use this:

  • Development environments (you want loud failures).
  • Single-tenant apps on Vercel/Render with <50 concurrent users.
  • Internal tools where you control load.

When NOT to use:

  • Multi-tenant SaaS with variable load (you'll hit max_overflow and get connection timeouts).
  • Shared database infrastructure with strict connection limits.

2. PgBouncer in Transaction Mode (Multi-Tenant SaaS)

This is my go-to for CitizenApp. PgBouncer sits between FastAPI and PostgreSQL, multiplexing connections. In transaction mode, PgBouncer returns a connection to the pool after each transaction completes—not when the client disconnects.

; pgbouncer.ini
[databases]
citizenapp = host=prod-postgres.render.ondigitalocean.com port=5432 dbname=citizenapp

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3
max_db_connections = 100
max_user_connections = 100
idle_in_transaction_session_timeout = 60000  ; 60s
Enter fullscreen mode Exit fullscreen mode

FastAPI config stays simple:

from sqlalchemy import create_engine

# Connect to PgBouncer, not PostgreSQL directly
DATABASE_URL = "postgresql://user:pass@localhost:6432/citizenapp"

engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=5,  # Small—PgBouncer does the real pooling
    max_overflow=5,
    pool_pre_ping=True,
    echo=False,
)
Enter fullscreen mode Exit fullscreen mode

Why this configuration:

  • pool_mode = transaction: After each query, the connection returns to PgBouncer's pool. This lets one PostgreSQL connection serve 10+ concurrent clients.
  • max_client_conn = 1000: FastAPI can open 1000 connections to PgBouncer.
  • default_pool_size = 25: PgBouncer maintains 25 connections to PostgreSQL per database.
  • idle_in_transaction_session_timeout = 60000: If a client opens a transaction and goes silent, kill it after 60s. Prevents zombie connections.
  • pool_size=5 in FastAPI: Why 5? PgBouncer does connection pooling. FastAPI's pool is just a safety buffer for your application layer.

The gotcha with transaction mode: You cannot use PostgreSQL transactions that span multiple queries unless PgBouncer is in session mode (slower, uses more database connections). For CitizenApp, we structure queries to be transaction-less where possible, or use explicit savepoint blocks:

from sqlalchemy import text

async def create_tenant_with_settings(db, tenant_data, settings):
    async with db.begin():  # Explicit transaction
        result = await db.execute(
            insert(Tenant).values(**tenant_data).returning(Tenant.id)
        )
        tenant_id = result.scalar()

        # All queries here are in the transaction
        await db.execute(
            insert(TenantSettings).values(tenant_id=tenant_id, **settings)
        )
    # Transaction commits, connection returns to PgBouncer's pool
Enter fullscreen mode Exit fullscreen mode

When to use PgBouncer:

  • Multi-tenant SaaS with unpredictable load (like CitizenApp).
  • Shared database infrastructure with strict connection limits.
  • When you want to decouple application connection pools from database connection limits.

When NOT to use:

  • If you need long-lived database transactions (PgBouncer transaction mode breaks this).
  • Simple setups where overhead isn't justified.

3. Render's Native Connection Pooling

If you're on Render (I host CitizenApp here), Render offers managed connection pooling built into the PostgreSQL database.

# No special config needed—Render handles it
from sqlalchemy import create_engine

DATABASE_URL = os.getenv("DATABASE_URL")  # Render injects this

engine = create_engine(
    DATABASE_URL,
    poolclass=QueuePool,
    pool_size=5,
    max_overflow=5,
    pool_pre_ping=True,
)
Enter fullscreen mode Exit fullscreen mode

When you create a PostgreSQL instance on Render, you get:

  • Built-in PgBouncer in transaction mode.
  • Connection limit automatically sized based on your plan.
  • Monitoring and metrics in the Render dashboard.

The advantage: zero ops. Render manages PgBouncer for you. The disadvantage: you can't tune it per your needs, and you're trusting Render's defaults (which are usually fine).

Testing & Monitoring: Catch Pool Exhaustion Before It Happens

Here's my testing methodology:

import asyncio
import time
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession

async def load_test_pool(url: str, concurrent_queries: int = 100):
    """Simulate concurrent load and measure pool behavior."""
    engine = create_async_engine(
        url,
        pool_size=10,
        max_overflow=20,
        pool_pre_ping=True,
    )

    async def run_query(session_id: int):
        async with AsyncSession(engine) as session:
            start = time.time()
            await session.execute(text("SELECT 1"))
            await asyncio.sleep(0.01)  # Simulate work
            elapsed = time.time() - start
            return elapsed

    start = time.time()
    tasks = [run_query(i) for i in range(concurrent_queries)]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    total = time.time() - start

    errors = [r for r in results if isinstance(r, Exception)]
    latencies = [r for r in results if not isinstance(r, Exception)]

    print(f"Total time: {total:.2f}s")
    print(f"Successful queries: {len(latencies)}")
    print(f"Errors: {len(errors)}")
    print(f"P50 latency: {sorted(latencies)[len(latencies)//2]:.3f}s")
    print(f"P99 latency: {sorted(latencies)[int(len(latencies)*0.99)]:.3f}s")

    await engine.dispose()

# Run: asyncio.run(load_test_pool("postgresql://...", 100))
Enter fullscreen mode Exit fullscreen mode

Monitoring query for pool exhaustion:


sql
-- On your PostgreSQL server, run this every 30s
SELECT
    datname,
    count(*) as total_connections,
    sum(case when state = 'active' then 1 else 0 end) as active,
    sum(case when state = 'idle' then 1 else 0 end) as idle,
    sum(case when
Enter fullscreen mode Exit fullscreen mode

Top comments (0)