This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.
Database High Availability: Failover, Standby Types, Health Checks
Database High Availability: Failover, Standby Types, Health Checks
High availability (HA) ensures your database remains accessible despite hardware failures, network partitions, or software crashes. This article covers failover mechanisms, standby types, health monitoring, and automated recovery.
Availability Metrics
Availability is measured in "nines":
| Uptime | Downtime/Year | Classification | |--------|---------------|----------------| | 99% | 87.6 hours | Basic | | 99.9% (three nines) | 8.76 hours | Standard | | 99.99% (four nines) | 52.56 minutes | Enterprise | | 99.999% (five nines) | 5.26 minutes | Mission-critical |
Achieving higher availability requires increasingly sophisticated (and expensive) infrastructure.
Standby Types
Hot Standby
A hot standby accepts read queries while replicating from the primary. PostgreSQL's hot_standby = on enables this:
standby postgresql.conf
hot_standby = on
hot_standby_feedback = on
Failover time : Seconds (already running, just promote).
Resource usage : Full database server resources.
Use case : Production read replicas with fast failover.
Warm Standby
A warm standby is running but does not accept connections until promoted:
warm standby: accept no connections
hot_standby = off
Failover time : Tens of seconds (start postmaster + recovery).
Resource usage : Moderate (receives WAL but no query processing).
Use case : Cost-sensitive environments.
Cold Standby
A cold standby is an offline copy of the data. It must be started from scratch during failover:
Failover time : Minutes to hours (restore from backup, start database).
Resource usage : Minimal (only storage).
Use case : Disaster recovery only.
Failover Strategies
Manual Failover
The operator promotes a standby and redirects applications:
Promote standby
pg_ctl promote -D /var/lib/postgresql/data
Update application DNS or connection string
This may take minutes and requires human intervention
Pros : Simple, operator has full control. Cons : Slow (minutes), error-prone under pressure.
Automated Failover with Patroni
Patroni is the most popular HA solution for PostgreSQL. It uses distributed consensus (etcd, Consul, or Zookeeper) to elect a new leader:
patroni.yml
scope: myapp-db
namespace: /service/
name: pg-node-1
consul:
host: consul.service.consul:8500
register_service: true
postgresql:
listen: 0.0.0.0:5432
connect_address: pg-node-1:5432
data_dir: /var/lib/postgresql/data
bin_dir: /usr/lib/postgresql/16/bin
authentication:
replication:
username: replicator
password: secure_password
superuser:
username: postgres
password: secure_password
parameters:
max_connections: 200
use_pg_rewind: true
use_slots: true
watchdog:
mode: required
device: /dev/watchdog
Health Checks
Patroni performs regular health checks:
- pg_isready : Checks if PostgreSQL is accepting connections.
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Replication lag : If lag exceeds a threshold, the standby is not eligible for promotion. 3\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\. Consensus health : The node must be reachable to the DCS (etcd/Consul).
Manual health check
pg_isready -h localhost -p 5432
localhost:5432 - accepting connections
Check replication status
psql -c "SELECT * FROM pg_stat_replication;"
Failover Flow
- Patroni detects primary is down (missed health check + no DCS lock).
2\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.
Found this useful? Check out more developer guides and tool comparisons on AI Study Room.
Top comments (0)