Introduction
Your database is the last thing that should go down. Yet setting up PostgreSQL for high availability remains one of the most under-documented areas in the DevOps world. Most teams run a single Postgres instance until the day it dies, then scramble to set up replication while their users stare at error pages.
This guide walks through a production-ready PostgreSQL HA setup using the tools that have become the industry standard: Patroni for automated failover, pgBouncer for connection pooling, streaming replication for data redundancy, and pgBackRest for backups. We'll cover the architecture, actual configuration files, failure scenarios, and the operational runbooks you'll need.
If you're running Postgres in production and don't have automated failover, this article is for you.
Architecture Overview
A production HA Postgres setup has four layers:
┌─────────────┐
│ HAProxy │ (Virtual IP / DNS)
│ Port 5432 │
└──────┬──────┘
│
┌────────────┼────────────┐
│ │ │
┌─────┴─────┐ ┌───┴───┐ ┌─────┴─────┐
│ pgBouncer │ │pgBouncer│ │ pgBouncer │
│ Node 1 │ │ Node 2 │ │ Node 3 │
└─────┬──────┘ └───┬───┘ └─────┬──────┘
│ │ │
┌─────┴─────┐ ┌───┴───┐ ┌─────┴─────┐
│ Patroni + │ │Patroni│ │ Patroni + │
│ Postgres │ │ + PG │ │ Postgres │
│ (Primary) │ │(Replica)│ │ (Replica) │
└────────────┘ └────────┘ └────────────┘
│ │ │
┌─────────────────────────────────────┐
│ etcd Cluster (3 nodes) │
└─────────────────────────────────────┘
Patroni manages the Postgres instances and handles leader election via a distributed consensus store (etcd, ZooKeeper, or Consul). When the primary fails, Patroni automatically promotes a replica and reconfigures the others to follow the new primary.
pgBouncer sits between your application and Postgres, pooling connections to avoid the expensive fork-per-connection model. Each Postgres node runs its own pgBouncer instance.
HAProxy (or a DNS-based solution) routes traffic to the current primary. Patroni exposes a REST API that HAProxy uses for health checks.
Setting Up Patroni
Install Patroni on each Postgres node:
pip3 install patroni[etcd]
Here's a production Patroni configuration (/etc/patroni/patroni.yml):
scope: postgres-cluster
name: pg-node-1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.1.10:8008
etcd3:
hosts: 10.0.1.20:2379,10.0.1.21:2379,10.0.1.22:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576 # 1MB
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: "on"
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
archive_mode: "on"
archive_command: >
pgbackrest --stanza=main archive-push %p
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 10.0.1.0/24 md5
- host all all 10.0.1.0/24 md5
postgresql:
listen: 0.0.0.0:5432
connect_address: 10.0.1.10:5432
data_dir: /var/lib/postgresql/16/main
bin_dir: /usr/lib/postgresql/16/bin
authentication:
superuser:
username: postgres
password: "your-secure-password"
replication:
username: replicator
password: "your-replication-password"
parameters:
shared_buffers: 4GB
effective_cache_size: 12GB
work_mem: 64MB
maintenance_work_mem: 512MB
max_connections: 200
checkpoint_completion_target: 0.9
wal_buffers: 64MB
random_page_cost: 1.1 # SSD storage
Key settings to understand:
-
maximum_lag_on_failover: A replica won't be promoted if it's more than 1MB behind the primary. This prevents data loss but means failover might not happen if all replicas are lagging heavily. -
use_pg_rewind: Allows the old primary to rejoin the cluster as a replica after failover without a full base backup. This is critical for fast recovery. -
use_slots: Replication slots prevent the primary from removing WAL segments that replicas still need, avoiding replication breakage during network issues.
Start Patroni as a systemd service:
sudo systemctl enable patroni
sudo systemctl start patroni
Check cluster status:
patronictl -c /etc/patroni/patroni.yml list
Output:
+ Cluster: postgres-cluster ----+---------+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-----------+------------+---------+---------+----+-----------+
| pg-node-1 | 10.0.1.10 | Leader | running | 1 | |
| pg-node-2 | 10.0.1.11 | Replica | running | 1 | 0.0 |
| pg-node-3 | 10.0.1.12 | Replica | running | 1 | 0.0 |
+-----------+------------+---------+---------+----+-----------+
Configuring pgBouncer for Connection Pooling
Without connection pooling, each application connection forks a new Postgres backend process (~10MB RAM each). With 500 application connections, that's 5GB just for connection overhead. pgBouncer solves this by multiplexing thousands of application connections over a small pool of actual database connections.
; /etc/pgbouncer/pgbouncer.ini
[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
; Pool settings
pool_mode = transaction
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
; Connection limits
max_client_conn = 1000
max_db_connections = 100
; Timeouts
server_idle_timeout = 300
client_idle_timeout = 0
query_timeout = 60
query_wait_timeout = 30
; Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
Critical decision: pool_mode. Transaction mode is the right default for most applications - it assigns a server connection for the duration of a transaction, then returns it to the pool. Session mode holds the connection for the entire client session (defeats the purpose of pooling). Statement mode is the most aggressive but breaks multi-statement transactions and prepared statements.
Warning: Transaction pooling breaks SET commands, prepared statements that span transactions, and LISTEN/NOTIFY. If your application uses these, you'll need session pooling for those specific connections.
Backup Strategy with pgBackRest
pgBackRest is the gold standard for Postgres backups. It supports full, differential, and incremental backups with parallel compression and encryption.
; /etc/pgbackrest/pgbackrest.conf
[global]
repo1-type=s3
repo1-s3-bucket=your-pg-backups
repo1-s3-region=us-east-1
repo1-s3-endpoint=s3.amazonaws.com
repo1-path=/backups
repo1-retention-full=4
repo1-retention-diff=14
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=your-encryption-passphrase
process-max=4
compress-type=zst
compress-level=3
[main]
pg1-path=/var/lib/postgresql/16/main
pg1-port=5432
Create the stanza and run your first backup:
# Initialize the backup repository
pgbackrest --stanza=main stanza-create
# Full backup
pgbackrest --stanza=main --type=full backup
# Differential backup (only changed since last full)
pgbackrest --stanza=main --type=diff backup
# Incremental backup (only changed since last backup of any type)
pgbackrest --stanza=main --type=incr backup
Set up a cron schedule:
# Full backup every Sunday at 1 AM
0 1 * * 0 pgbackrest --stanza=main --type=full backup
# Differential backup every day at 1 AM (except Sunday)
0 1 * * 1-6 pgbackrest --stanza=main --type=diff backup
Point-in-time recovery is where pgBackRest shines. Because Patroni is already archiving WAL segments to pgBackRest, you can restore to any point in time:
pgbackrest --stanza=main --type=time \
--target="2026-04-09 14:30:00" \
--target-action=promote restore
Failover Testing and Operational Runbooks
Setting up HA means nothing if you don't test it. Here are the failure scenarios you must validate:
1. Primary node crash:
# Simulate primary failure
sudo systemctl stop patroni # on the primary node
Expected: Within 30 seconds (TTL), Patroni promotes a replica. HAProxy health checks detect the change and route traffic to the new primary. Application sees a brief connection error, retries, and reconnects.
2. Network partition:
# Isolate primary from etcd
iptables -A OUTPUT -p tcp --dport 2379 -j DROP
Expected: Primary can't reach etcd, Patroni demotes it to read-only. A replica with etcd access gets promoted. When network heals, the old primary uses pg_rewind to rejoin as a replica.
3. Switchover (planned maintenance):
# Graceful switchover to a specific replica
patronictl -c /etc/patroni/patroni.yml switchover \
--master pg-node-1 --candidate pg-node-2 --force
Expected: Zero or near-zero downtime. The primary finishes in-flight transactions, transfers leadership, and becomes a replica.
4. Full cluster restore from backup:
# Stop all Patroni instances
# On the new primary node:
pgbackrest --stanza=main --type=time \
--target="2026-04-09 10:00:00" restore
# Start Patroni - it will bootstrap from the restored data
sudo systemctl start patroni
Run these tests quarterly. Document the results. Your future self at 3 AM will thank you.
Monitoring Your HA Cluster
You need visibility into replication lag, connection pool utilization, and failover events. Key metrics to track:
-- Replication lag (run on primary)
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;
-- Connection counts
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;
-- Long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle' AND now() - pg_stat_activity.query_start > interval '30 seconds';
For Prometheus monitoring, use postgres_exporter and set up alerts:
# Prometheus alerting rules
groups:
- name: postgres
rules:
- alert: PostgresReplicationLag
expr: pg_replication_lag_seconds > 30
for: 5m
labels:
severity: warning
annotations:
summary: "Replication lag is {{ $value }}s on {{ $labels.instance }}"
- alert: PostgresConnectionPoolExhausted
expr: pgbouncer_pools_server_active / pgbouncer_pools_server_max > 0.9
for: 2m
labels:
severity: critical
annotations:
summary: "pgBouncer pool >90% utilized on {{ $labels.instance }}"
- alert: PatroniNoLeader
expr: patroni_master == 0
for: 1m
labels:
severity: critical
annotations:
summary: "No Patroni leader detected in cluster"
Need Help with Your DevOps?
Building a production-grade PostgreSQL HA cluster takes time and expertise - and maintaining it takes even more. At InstaDevOps, we design, deploy, and manage database infrastructure alongside your entire DevOps stack, starting at $2,999/month.
Book a free 15-minute consultation to discuss your database reliability needs: https://calendly.com/instadevops/15min
Top comments (0)