InstaDevOps

Posted on Apr 19 • Originally published at instadevops.com

PostgreSQL High Availability: A Practical Guide with Patroni and pgBouncer

#postgres #database #devops #highavailability

Introduction

Your database is the last thing that should go down. Yet setting up PostgreSQL for high availability remains one of the most under-documented areas in the DevOps world. Most teams run a single Postgres instance until the day it dies, then scramble to set up replication while their users stare at error pages.

This guide walks through a production-ready PostgreSQL HA setup using the tools that have become the industry standard: Patroni for automated failover, pgBouncer for connection pooling, streaming replication for data redundancy, and pgBackRest for backups. We'll cover the architecture, actual configuration files, failure scenarios, and the operational runbooks you'll need.

If you're running Postgres in production and don't have automated failover, this article is for you.

Architecture Overview

A production HA Postgres setup has four layers:

                    ┌─────────────┐
                    │   HAProxy   │  (Virtual IP / DNS)
                    │  Port 5432  │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
        ┌─────┴─────┐ ┌───┴───┐ ┌─────┴─────┐
        │ pgBouncer  │ │pgBouncer│ │ pgBouncer  │
        │  Node 1    │ │ Node 2 │ │  Node 3    │
        └─────┬──────┘ └───┬───┘ └─────┬──────┘
              │            │            │
        ┌─────┴─────┐ ┌───┴───┐ ┌─────┴─────┐
        │ Patroni +  │ │Patroni│ │ Patroni +  │
        │ Postgres   │ │  + PG  │ │ Postgres   │
        │ (Primary)  │ │(Replica)│ │ (Replica)  │
        └────────────┘ └────────┘ └────────────┘
              │            │            │
        ┌─────────────────────────────────────┐
        │           etcd Cluster (3 nodes)     │
        └─────────────────────────────────────┘

Patroni manages the Postgres instances and handles leader election via a distributed consensus store (etcd, ZooKeeper, or Consul). When the primary fails, Patroni automatically promotes a replica and reconfigures the others to follow the new primary.

pgBouncer sits between your application and Postgres, pooling connections to avoid the expensive fork-per-connection model. Each Postgres node runs its own pgBouncer instance.

HAProxy (or a DNS-based solution) routes traffic to the current primary. Patroni exposes a REST API that HAProxy uses for health checks.

Setting Up Patroni

Install Patroni on each Postgres node:

pip3 install patroni[etcd]

Here's a production Patroni configuration (/etc/patroni/patroni.yml):

scope: postgres-cluster
name: pg-node-1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.1.10:8008

etcd3:
  hosts: 10.0.1.20:2379,10.0.1.21:2379,10.0.1.22:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576  # 1MB
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: "on"
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"
        archive_mode: "on"
        archive_command: >
          pgbackrest --stanza=main archive-push %p

  initdb:
    - encoding: UTF8
    - data-checksums

  pg_hba:
    - host replication replicator 10.0.1.0/24 md5
    - host all all 10.0.1.0/24 md5

postgresql:
  listen: 0.0.0.0:5432
  connect_address: 10.0.1.10:5432
  data_dir: /var/lib/postgresql/16/main
  bin_dir: /usr/lib/postgresql/16/bin

  authentication:
    superuser:
      username: postgres
      password: "your-secure-password"
    replication:
      username: replicator
      password: "your-replication-password"

  parameters:
    shared_buffers: 4GB
    effective_cache_size: 12GB
    work_mem: 64MB
    maintenance_work_mem: 512MB
    max_connections: 200
    checkpoint_completion_target: 0.9
    wal_buffers: 64MB
    random_page_cost: 1.1  # SSD storage

Key settings to understand:

maximum_lag_on_failover: A replica won't be promoted if it's more than 1MB behind the primary. This prevents data loss but means failover might not happen if all replicas are lagging heavily.
use_pg_rewind: Allows the old primary to rejoin the cluster as a replica after failover without a full base backup. This is critical for fast recovery.
use_slots: Replication slots prevent the primary from removing WAL segments that replicas still need, avoiding replication breakage during network issues.

Start Patroni as a systemd service:

sudo systemctl enable patroni
sudo systemctl start patroni

Check cluster status:

patronictl -c /etc/patroni/patroni.yml list

Output:

+ Cluster: postgres-cluster ----+---------+---------+----+-----------+
| Member    | Host       | Role    | State   | TL | Lag in MB |
+-----------+------------+---------+---------+----+-----------+
| pg-node-1 | 10.0.1.10  | Leader  | running |  1 |           |
| pg-node-2 | 10.0.1.11  | Replica | running |  1 |       0.0 |
| pg-node-3 | 10.0.1.12  | Replica | running |  1 |       0.0 |
+-----------+------------+---------+---------+----+-----------+

Configuring pgBouncer for Connection Pooling

Without connection pooling, each application connection forks a new Postgres backend process (~10MB RAM each). With 500 application connections, that's 5GB just for connection overhead. pgBouncer solves this by multiplexing thousands of application connections over a small pool of actual database connections.

; /etc/pgbouncer/pgbouncer.ini

[databases]
myapp = host=127.0.0.1 port=5432 dbname=myapp

[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt

; Pool settings
pool_mode = transaction
default_pool_size = 25
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3

; Connection limits
max_client_conn = 1000
max_db_connections = 100

; Timeouts
server_idle_timeout = 300
client_idle_timeout = 0
query_timeout = 60
query_wait_timeout = 30

; Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60

Critical decision: pool_mode. Transaction mode is the right default for most applications - it assigns a server connection for the duration of a transaction, then returns it to the pool. Session mode holds the connection for the entire client session (defeats the purpose of pooling). Statement mode is the most aggressive but breaks multi-statement transactions and prepared statements.

Warning: Transaction pooling breaks SET commands, prepared statements that span transactions, and LISTEN/NOTIFY. If your application uses these, you'll need session pooling for those specific connections.

Backup Strategy with pgBackRest

pgBackRest is the gold standard for Postgres backups. It supports full, differential, and incremental backups with parallel compression and encryption.

; /etc/pgbackrest/pgbackrest.conf

[global]
repo1-type=s3
repo1-s3-bucket=your-pg-backups
repo1-s3-region=us-east-1
repo1-s3-endpoint=s3.amazonaws.com
repo1-path=/backups
repo1-retention-full=4
repo1-retention-diff=14
repo1-cipher-type=aes-256-cbc
repo1-cipher-pass=your-encryption-passphrase

process-max=4
compress-type=zst
compress-level=3

[main]
pg1-path=/var/lib/postgresql/16/main
pg1-port=5432

Create the stanza and run your first backup:

# Initialize the backup repository
pgbackrest --stanza=main stanza-create

# Full backup
pgbackrest --stanza=main --type=full backup

# Differential backup (only changed since last full)
pgbackrest --stanza=main --type=diff backup

# Incremental backup (only changed since last backup of any type)
pgbackrest --stanza=main --type=incr backup

Set up a cron schedule:

# Full backup every Sunday at 1 AM
0 1 * * 0 pgbackrest --stanza=main --type=full backup

# Differential backup every day at 1 AM (except Sunday)
0 1 * * 1-6 pgbackrest --stanza=main --type=diff backup

Point-in-time recovery is where pgBackRest shines. Because Patroni is already archiving WAL segments to pgBackRest, you can restore to any point in time:

pgbackrest --stanza=main --type=time \
  --target="2026-04-09 14:30:00" \
  --target-action=promote restore

Failover Testing and Operational Runbooks

Setting up HA means nothing if you don't test it. Here are the failure scenarios you must validate:

1. Primary node crash:

# Simulate primary failure
sudo systemctl stop patroni  # on the primary node

Expected: Within 30 seconds (TTL), Patroni promotes a replica. HAProxy health checks detect the change and route traffic to the new primary. Application sees a brief connection error, retries, and reconnects.

2. Network partition:

# Isolate primary from etcd
iptables -A OUTPUT -p tcp --dport 2379 -j DROP

Expected: Primary can't reach etcd, Patroni demotes it to read-only. A replica with etcd access gets promoted. When network heals, the old primary uses pg_rewind to rejoin as a replica.

3. Switchover (planned maintenance):

# Graceful switchover to a specific replica
patronictl -c /etc/patroni/patroni.yml switchover \
  --master pg-node-1 --candidate pg-node-2 --force

Expected: Zero or near-zero downtime. The primary finishes in-flight transactions, transfers leadership, and becomes a replica.

4. Full cluster restore from backup:

# Stop all Patroni instances
# On the new primary node:
pgbackrest --stanza=main --type=time \
  --target="2026-04-09 10:00:00" restore

# Start Patroni - it will bootstrap from the restored data
sudo systemctl start patroni

Run these tests quarterly. Document the results. Your future self at 3 AM will thank you.

Monitoring Your HA Cluster

You need visibility into replication lag, connection pool utilization, and failover events. Key metrics to track:

-- Replication lag (run on primary)
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn,
  pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

-- Connection counts
SELECT state, count(*) FROM pg_stat_activity GROUP BY state;

-- Long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle' AND now() - pg_stat_activity.query_start > interval '30 seconds';

For Prometheus monitoring, use postgres_exporter and set up alerts:

# Prometheus alerting rules
groups:
  - name: postgres
    rules:
      - alert: PostgresReplicationLag
        expr: pg_replication_lag_seconds > 30
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Replication lag is {{ $value }}s on {{ $labels.instance }}"

      - alert: PostgresConnectionPoolExhausted
        expr: pgbouncer_pools_server_active / pgbouncer_pools_server_max > 0.9
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "pgBouncer pool >90% utilized on {{ $labels.instance }}"

      - alert: PatroniNoLeader
        expr: patroni_master == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "No Patroni leader detected in cluster"

Need Help with Your DevOps?

Building a production-grade PostgreSQL HA cluster takes time and expertise - and maintaining it takes even more. At InstaDevOps, we design, deploy, and manage database infrastructure alongside your entire DevOps stack, starting at $2,999/month.

Book a free 15-minute consultation to discuss your database reliability needs: https://calendly.com/instadevops/15min

DEV Community