PostgreSQL connection pool exhaustion is one of the most insidious production failures you can face. Your app looks healthy — process is up, CPU is calm, memory is fine — but requests are silently queueing behind a full connection pool. By the time you notice latency spiking, the waiting queue has already grown to hundreds of requests and your database is under maximum load.
The fix isn't complicated. You need to see what your pool is doing before it becomes a problem. That's exactly what pg-pool-monitor was built for.
The Problem with pg Pool Observability
node-postgres (the pg package) is the default PostgreSQL driver for Node.js. It's battle-tested and does its job well. But pg.Pool gives you exactly three counters:
pool.totalCount // All connections (active + idle)
pool.idleCount // Connections sitting idle
pool.waitingCount // Requests queued waiting for a free connection
These three numbers contain everything you need to know about your pool's health — but they're just sitting there on the pool object, not being tracked, not being graphed, not triggering alerts. If waitingCount spikes at 2 AM on a Tuesday, you'll find out about it in a post-mortem.
pg-pool-monitor takes those three counters and turns them into six Prometheus metrics with a single function call.
What pg-pool-monitor Exposes
Six gauges and one counter per pool:
| Metric | Type | What it Tells You |
|---|---|---|
pg_pool_total |
Gauge | Total connections (active + idle) |
pg_pool_idle |
Gauge | Connections available right now |
pg_pool_acquired |
Gauge | Connections currently in use |
pg_pool_waiting |
Gauge | Requests queued — your primary alert metric |
pg_pool_utilization |
Gauge |
acquired / total as a 0.0–1.0 ratio |
pg_pool_scrapes_total |
Counter | Total scrapes (for scrape health monitoring) |
pg_pool_waiting is the critical one. Any sustained value above 0 means your pool is saturated. Connections are being requested faster than they're being returned. You're either under-provisioned, have a slow query holding connections, or have a connection leak.
Installation
npm install pg-pool-monitor
Zero production dependencies. pg >= 8.0.0 is a peer dependency (you already have it).
Basic Setup: Two Lines
The minimal integration is genuinely two lines beyond your existing pool setup:
const { Pool } = require('pg');
const { createMonitor } = require('pg-pool-monitor');
// Your existing pool
const pool = new Pool({ connectionString: process.env.DATABASE_URL });
// Add monitoring
const monitor = createMonitor(pool, { name: 'primary', prefix: 'myapp' });
// Expose to Prometheus
app.get('/metrics', (req, res) => {
res.set('Content-Type', 'text/plain');
res.send(monitor.getMetrics());
});
The output looks like this:
# HELP myapp_pool_total Total number of connections in the pool (active + idle)
# TYPE myapp_pool_total gauge
myapp_pool_total{pool="primary"} 10 1711584000000
# HELP myapp_pool_idle Number of idle connections waiting to be acquired
# TYPE myapp_pool_idle gauge
myapp_pool_idle{pool="primary"} 4 1711584000000
# HELP myapp_pool_acquired Number of connections currently checked out
# TYPE myapp_pool_acquired gauge
myapp_pool_acquired{pool="primary"} 6 1711584000000
# HELP myapp_pool_waiting Number of requests queued waiting for a connection
# TYPE myapp_pool_waiting gauge
myapp_pool_waiting{pool="primary"} 0 1711584000000
# HELP myapp_pool_utilization Pool utilization ratio (acquired / total)
# TYPE myapp_pool_utilization gauge
myapp_pool_utilization{pool="primary"} 0.6000 1711584000000
Plug this into a Prometheus scrape config and you're done.
Health Check Integration
In addition to the Prometheus metrics, pg-pool-monitor provides a getHealth() method that returns one of three states:
-
healthy— idle connections available, no queue -
degraded— all connections acquired but nothing queuing yet (at capacity, not spilling) -
saturated— requests are queuing — this is your alert condition
This makes it trivial to build a proper health check endpoint:
app.get('/health', (req, res) => {
const health = monitor.getHealth();
const stats = monitor.getStats();
res.status(health === 'healthy' ? 200 : 503).json({
status: health,
pool: {
total: stats.total,
idle: stats.idle,
acquired: stats.acquired,
waiting: stats.waiting,
utilization: stats.utilization,
}
});
});
When your pool is saturated, your load balancer gets a 503 and can route traffic elsewhere. This is the proper way to integrate database pool health into your readiness probe.
The getStats() Object
If you want pool state in a format that's easy to log, serialize, or pass to your own monitoring infrastructure, getStats() returns:
{
pool: 'primary',
total: 10,
idle: 4,
acquired: 6,
waiting: 0,
utilization: 0.6,
healthy: true,
highWaterMarks: {
total: 12, // Peak pool size since process start
waiting: 2, // Peak queue depth since process start
acquired: 8 // Peak acquired count since process start
},
scrapes: 47
}
The highWaterMarks are particularly valuable for right-sizing your pool. If your highWaterMarks.waiting is consistently above 0, your pool is too small for your peak load. If highWaterMarks.acquired is always well below total, you're over-provisioned.
Multi-Pool Setup with PgPoolRegistry
Many production applications run multiple pools: a primary write pool, a read replica pool, an analytics pool. PgPoolRegistry handles all of them:
const { createRegistry } = require('pg-pool-monitor');
const registry = createRegistry();
registry.register(primaryPool, { name: 'primary', prefix: 'myapp' });
registry.register(replicaPool, { name: 'replica', prefix: 'myapp' });
registry.register(analyticsPool, { name: 'analytics', prefix: 'myapp' });
// Single /metrics endpoint for all pools
app.get('/metrics', (req, res) => {
res.set('Content-Type', 'text/plain');
res.send(registry.getMetrics());
});
// Overall health — 'healthy' only if ALL pools are healthy
app.get('/health', (req, res) => {
const health = registry.getOverallHealth();
res.status(health === 'healthy' ? 200 : 503).json({
health,
pools: registry.getAllStats(),
});
});
The registry's getOverallHealth() returns saturated if any pool is saturated, degraded if any is degraded, and healthy only when all pools are clear. This is the correct behavior for a Kubernetes readiness probe — if any database pool is in trouble, stop sending traffic.
Prometheus Scrape Config
Add this to your prometheus.yml:
scrape_configs:
- job_name: 'myapp'
static_configs:
- targets: ['localhost:3000']
metrics_path: '/metrics'
scrape_interval: 15s
Grafana Dashboard Queries
With data flowing into Prometheus, these PromQL queries give you instant visibility in Grafana:
# Pool utilization over time — graph this for capacity planning
pg_pool_utilization{pool="primary"}
# Connection queue depth — alert if this is > 0 for > 30s
pg_pool_waiting{pool="primary"} > 0
# Idle connections — alert if this hits 0 (pool fully saturated)
pg_pool_idle{pool="primary"} == 0
# Acquired connections over time — see your peak usage patterns
pg_pool_acquired{pool="primary"}
# Across all pools — utilization heatmap
avg by (pool) (pg_pool_utilization)
Alert Rules That Actually Matter
# prometheus/alerts.yml
groups:
- name: postgres-pool
rules:
- alert: PgPoolSaturated
expr: pg_pool_waiting > 0
for: 30s
labels:
severity: warning
annotations:
summary: "PostgreSQL pool {{ $labels.pool }} is queueing requests"
description: "{{ $value }} requests waiting — increase pool.max or investigate slow queries"
- alert: PgPoolHighUtilization
expr: pg_pool_utilization > 0.9
for: 1m
labels:
severity: warning
annotations:
summary: "PostgreSQL pool {{ $labels.pool }} utilization above 90%"
- alert: PgPoolFullyIdle
expr: pg_pool_total > 0 and pg_pool_idle == pg_pool_total
for: 5m
labels:
severity: info
annotations:
summary: "PostgreSQL pool {{ $labels.pool }} may be over-provisioned"
The PgPoolSaturated alert is your most important one. Set it to fire after 30 seconds of any queue depth — that's long enough to filter out momentary spikes, but fast enough to catch a real problem before it cascades.
Right-Sizing Your Pool
The most common question pg-pool-monitor helps you answer: what should pool.max be?
The classic formula: max = (cores × 2) + effective_spindle_count
For a 4-core application server talking to an SSD-backed PostgreSQL: max = (4 × 2) + 1 = 9.
But formulas are a starting point. Watch your actual metrics:
- If
pg_pool_waitingever hits > 0 under normal load → increasemax - If
pg_pool_idleis consistently > 30% oftotal→ decreasemaxto free Postgres connections - If your
pg_pool_utilizationpeaks at 0.95+ during normal load → you have no headroom; increase before the next traffic spike
The .instrument() Method
For the most accurate high-water mark tracking, call .instrument() on your monitor after creation:
const monitor = createMonitor(pool, { name: 'primary' }).instrument();
This attaches acquire and connect event listeners to the pool, updating the high-water marks in real time rather than only on scrape. It adds two event listeners — negligible overhead for the accuracy gain.
Why Zero Dependencies?
I wanted pg-pool-monitor to be something you can drop into any production Node.js service without dependency risk. No prom-client required, no additional npm installs to audit. The Prometheus text exposition format is simple enough to generate with string concatenation — which is exactly what this package does.
If you're already using prom-client in your service, you can still use pg-pool-monitor and aggregate its output alongside your existing metrics on the same /metrics endpoint — they're both plain text, just concatenate the strings.
GitHub & npm
- GitHub: github.com/axiom-experiment/pg-pool-monitor
-
npm:
npm install pg-pool-monitor(publishing shortly — token renewal in progress) - 32/32 tests passing — pool total, idle, acquired, waiting, utilization, health states, multi-pool registry, label formatting, high-water marks, event instrumentation
pg-pool-monitor is part of the AXIOM production toolchain series — a collection of zero-dependency Node.js utilities built alongside the AXIOM autonomous agent experiment. If this saves you a 2 AM incident, consider starring the repo or sponsoring the project on GitHub.
Top comments (0)