Franck Pachot

Posted on Jan 27 • Edited on Mar 5

CloudNativePG (CNPG) - install (2.18) and first test: simulate transient failure

#postgres #kubernetes #database #cloud

I'm starting a series of blog posts to explore CloudNativePG (CNPG), a Kubernetes operator for PostgreSQL that automates high availability in containerized environments.

PostgreSQL itself supports physical streaming replication, but doesn’t provide orchestration logic — no automatic promotion, scaling, or failover. Tools like Patroni fill that gap by implementing consensus (etcd, Consul, ZooKeeper, Kubernetes, or Raft) for cluster state management. In Kubernetes, databases are often deployed with StatefulSets, which provide stable network identities and persistent storage per instance. CloudNativePG instead defines PostgreSQL‑specific custom resource definitions (CRDs), which introduce the following resources:

ImageCatalog: PostgreSQL image catalogs
Cluster: PostgreSQL cluster definition
Database: Declarative database management
Pooler: PgBouncer connection pooling
Backup: On-demand backup requests
ScheduledBackup: Automated backup scheduling
Publication Logical replication publications
Subscription Logical replication subscriptions

Install: control plane for PostgreSQL

Here I’m using CNPG 1.28, which is the first release to support (quorum-based failover). Prior versions promoted the most-recently-available standby without preventing data loss (good for disaster recovery but not strict high availability).

Install the operator’s components:

kubectl apply --server-side -f https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.28/releases/cnpg-1.28.0.yaml

The CRDs and controller deploy into the cnpg-system namespace. Check rollout status:

kubectl rollout status deployment -n cnpg-system cnpg-controller-manager

deployment "cnpg-controller-manager" successfully rolled out

This Deployment defines the CloudNativePG Controller Manager — the control plane component — which runs as a single pod and continuously reconciles PostgreSQL cluster resources with their desired state via the Kubernetes API:

kubectl get deployments -n cnpg-system -o wide

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES                                         SELECTOR
cnpg-controller-manager   1/1     1            1           11d   manager      ghcr.io/cloudnative-pg/cloudnative-pg:1.28.0   app.kubernetes.io/name=cloudnative-pg

The pod’s containers listen on ports for metrics (8080/TCP) and webhook configuration (9443/TCP), and interact with CNPG’s CRDs during the reconciliation loop:

kubectl describe deploy -n cnpg-system cnpg-controller-manager

Name:                   cnpg-controller-manager
Namespace:              cnpg-system
CreationTimestamp:      Thu, 15 Jan 2026 21:04:25 +0100
Labels:                 app.kubernetes.io/name=cloudnative-pg
Annotations:            deployment.kubernetes.io/revision: 1
Selector:               app.kubernetes.io/name=cloudnative-pg
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:           app.kubernetes.io/name=cloudnative-pg
  Service Account:  cnpg-manager
  Containers:
   manager:
    Image:           ghcr.io/cloudnative-pg/cloudnative-pg:1.28.0
    Ports:           8080/TCP (metrics), 9443/TCP (webhook-server)
    Host Ports:      0/TCP (metrics), 0/TCP (webhook-server)
    SeccompProfile:  RuntimeDefault
    Command:
      /manager
    Args:
      controller
      --leader-elect
      --max-concurrent-reconciles=10
      --config-map-name=cnpg-controller-manager-config
      --secret-name=cnpg-controller-manager-config
      --webhook-port=9443
    Limits:
      cpu:     100m
      memory:  200Mi
    Requests:
      cpu:      100m
      memory:   100Mi
    Liveness:   http-get https://:9443/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get https://:9443/readyz delay=0s timeout=1s period=10s #success=1 #failure=3
    Startup:    http-get https://:9443/readyz delay=0s timeout=1s period=5s #success=1 #failure=6
    Environment:
      OPERATOR_IMAGE_NAME:           ghcr.io/cloudnative-pg/cloudnative-pg:1.28.0
      OPERATOR_NAMESPACE:             (v1:metadata.namespace)
      MONITORING_QUERIES_CONFIGMAP:  cnpg-default-monitoring
    Mounts:
      /controller from scratch-data (rw)
      /run/secrets/cnpg.io/webhook from webhook-certificates (rw)
  Volumes:
   scratch-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
   webhook-certificates:
    Type:          Secret (a volume populated by a Secret)
    SecretName:    cnpg-webhook-cert
    Optional:      true
  Node-Selectors:  <none>
  Tolerations:     <none>
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   cnpg-controller-manager-6b9f78f594 (1/1 replicas created)
Events:          <none>

Deploy: data plane (PostgreSQL cluster)

The control plane handles orchestration logic. The actual PostgreSQL instances — the data plane — are managed via CNPG’s Cluster custom resource.

Create a dedicated namespace:

kubectl delete namespace lab
kubectl create namespace lab

namespace/lab created

Here’s a minimal high-availability cluster spec:

3 instances: 1 primary, 2 hot standby replicas
Synchronous commit to 1 replica
Quorum-based failover enabled

cat > lab-cluster-rf3.yaml <<'YAML'
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: cnpg
spec:
  instances: 3
  postgresql:
    synchronous:
      method: any
      number: 1
      failoverQuorum: true
  storage:
    size: 1Gi
YAML

kubectl -n lab apply -f lab-cluster-rf3.yaml

CNPG provisions Pods with stateful semantics, using PersistentVolumeClaims for storage:

kubectl -n lab get pvc -o wide

NAME     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE   VOLUMEMODE
cnpg-1   Bound    pvc-76754ba4-e8bd-4218-837f-36aa0010940f   1Gi        RWO            hostpath       <unset>                 42s   Filesystem
cnpg-2   Bound    pvc-3b231dcc-b973-43f8-a429-80222bd51420   1Gi        RWO            hostpath       <unset>                 26s   Filesystem
cnpg-3   Bound    pvc-b8e4c6a0-bbcb-445d-9267-ffe38a1a8685   1Gi        RWO            hostpath       <unset>                 10s   Filesystem

These PVCs bind to PersistentVolumes provided by their storage class:

kubectl -n lab get pv -o wide 

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM        STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE   VOLUMEMODE
pvc-3b231dcc-b973-43f8-a429-80222bd51420   1Gi        RWO            Delete           Bound    lab/cnpg-2   hostpath       <unset>                          53s   Filesystem
pvc-76754ba4-e8bd-4218-837f-36aa0010940f   1Gi        RWO            Delete           Bound    lab/cnpg-1   hostpath       <unset>                          69s   Filesystem
pvc-b8e4c6a0-bbcb-445d-9267-ffe38a1a8685   1Gi        RWO            Delete           Bound    lab/cnpg-3   hostpath       <unset>                          37s   Filesystem

PostgreSQL instances runs in pods:

kubectl -n lab get pod -o wide

NAME     READY   STATUS    RESTARTS   AGE     IP           NODE             NOMINATED NODE   READINESS GATES
cnpg-1   1/1     Running   0          3m46s   10.1.0.141   docker-desktop   <none>           <none>
cnpg-2   1/1     Running   0          3m29s   10.1.0.143   docker-desktop   <none>           <none>
cnpg-3   1/1     Running   0          3m13s   10.1.0.145   docker-desktop   <none>           <none>

In Kubernetes, pods are typically considered equal, but PostgreSQL uses a single primary node while the other pods serve as read replicas. CNPG identifies which pod is running the primary instance:

kubectl -n lab get cluster      

NAME   AGE   INSTANCES   READY   STATUS                     PRIMARY
cnpg    4m   3           3       Cluster in healthy state   cnpg-1

As the roles of pods can change with a switchover or failover, application access though services that expose the right instances:

kubectl -n lab get svc -o wide

NAME      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE     SELECTOR
cnpg-r    ClusterIP   10.97.182.192    <none>        5432/TCP   4m13s   cnpg.io/cluster=cnpg,cnpg.io/podRole=instance
cnpg-ro   ClusterIP   10.111.116.164   <none>        5432/TCP   4m13s   cnpg.io/cluster=cnpg,cnpg.io/instanceRole=replica
cnpg-rw   ClusterIP   10.108.19.85     <none>        5432/TCP   4m13s   cnpg.io/cluster=cnpg,cnpg.io/instanceRole=primary

Those are the endpoints used to connect to PostgreSQL:

cnpg-rw connects to the primary for consistent reads and writes
cnpg-ro connects to one standby for stale reads
cnpg-r connects the primary or standby for stale reads

The load-balancing of read workloads is round-robin, like a host list, so the same workload runs on all replicas.

Client access setup

CNPG generated credentials in a Kubernetes Secret named cnpg-app for the user app:

kubectl -n lab get secrets

NAME               TYPE                       DATA   AGE
cnpg-app           kubernetes.io/basic-auth   11     8m48s
cnpg-ca            Opaque                     2      8m48s
cnpg-replication   kubernetes.io/tls          2      8m48s
cnpg-server        kubernetes.io/tls          2      8m48s

When needed, the password can be retreived with kubectl -n lab get secret cnpg-app -o jsonpath='{.data.password}' | base64 -d).

Define a shell alias to launch a PostgreSQL client pod with these credentials:


for role in rw ro r
do
alias pg${role}='kubectl -n lab run client${role} --rm -it --restart=Never  \
 --env PGHOST="cnpg-'${role}'" \
 --env PGUSER="app" \
 --env PGPASSWORD="$(kubectl -n lab get secret cnpg-app -o jsonpath='{.data.password}' | base64 -d)" \
--image=postgres:18 --'
done

Use the alias pgrw to run a PostgreSQL client connected to the primary.

PgBench default workload

With the previous alias defined, initialize PgBench tables:


pgrw pgbench -i

dropping old tables...
creating tables...
generating data (client-side)...
vacuuming...                                                                              
creating primary keys...
done in 0.10 s (drop tables 0.02 s, create tables 0.01 s, client-side generate 0.04 s, vacuum 0.01 s, primary keys 0.01 s).

Run for 10 minutes with progress every 5 seconds:

pgrw pgbench -T 600 -P 5

progress: 5.0 s, 1541.4 tps, lat 0.648 ms stddev 0.358, 0 failed
progress: 10.0 s, 1648.6 tps, lat 0.606 ms stddev 0.154, 0 failed
progress: 15.0 s, 1432.7 tps, lat 0.698 ms stddev 0.218, 0 failed
progress: 20.0 s, 1581.3 tps, lat 0.632 ms stddev 0.169, 0 failed
progress: 25.0 s, 1448.2 tps, lat 0.690 ms stddev 0.315, 0 failed
progress: 30.0 s, 1640.6 tps, lat 0.609 ms stddev 0.155, 0 failed
progress: 35.0 s, 1609.9 tps, lat 0.621 ms stddev 0.223, 0 failed

Simulated failure

In another terminal, I checked which is the primary pod:

kubectl -n lab get cluster      

NAME   AGE   INSTANCES   READY   STATUS                     PRIMARY
cnpg   40m   3           3       Cluster in healthy state   cnpg-1

I used Kubeadm where pods run in docker containers (different from Kind that runs a docker container per worker node).

From the Docker Desktop GUI, I paused the container in the primary's pod:

PgBench queries hang as the primary where it is connected to doesn't reply:

The pod was recovered and PgBench continues without being disconnected:

Kubernetes monitors pod health with liveness/readiness probes and restarts containers when those probes fail. Here, as it detects a docker pause, it recovers it by unpausing. In this case, Kubernetes—not CNPG—restored the service.

Meanwhile, CNPG independently monitors PostgreSQL and triggered a failover before Kubernetes restarted the pod:

franck.pachot@M-C7Y646J4JP cnpg % kubectl -n lab get cluster

NAME   AGE    INSTANCES   READY   STATUS         PRIMARY
cnpg   3m6s   3           2       Failing over   cnpg-1

Kubernetes brought the service back in about 30 seconds, but CNPG had already initiated a failover. A new outage will happen.

A few minutes later, cnpg-1 restarted and PgBench exited with:

WARNING:  canceling the wait for synchronous replication and terminating connection due to administrator command
DETAIL:  The transaction has already committed locally, but might not have been replicated to the standby.
pgbench: error: client 0 aborted in command 10 (SQL) of script 0; perhaps the backend died while processing

Because cnpg-1 was still there and healthy, it is still the primary, but all connections have been terminated.

Reproduce with Kubeadm

In Docker Desktop, if you use Kubeadm, where each Kubernetes pod runs in a docker container, you can pause, like I did interactively, with:

namespace="lab"
cluster="cnpg"
docker pause $(
 docker ps --filter "name=$(
 kubectl -n "${namespace}" get cluster "${cnpg}" -o jsonpath='{.status.currentPrimary}'
 )" --format "{{.ID}}"
)

Like when I did it manually, the kubelet detects this as an unhealthy state and initiates recovery to maintain the desired pod state.

Reproduce with Kind

In Docker Desktop, if you use Kind, where each Kubernetes node runs in a docker container. You can use crictl to interact with the containers but it has no pause command. You can do the same by freezing the cgroup (inspired by freeze.sh):

namespace="lab"
cluster="cnpg"
# Get the primary's node name which is the docker container name
primary_pod=$(
 kubectl -n ${namespace} get cluster ${cluster} -o jsonpath='{.status.currentPrimary}'
)
primary_node=$(
 kubectl -n ${namespace} get pod ${primary_pod} -o jsonpath='{.spec.nodeName}'
)
# get the Cgroup2 file to freeze the cgroup
freeze_file=/sys/fs/cgroup$(
 docker exec -i "${primary_node}" bash <<SH
  cat /proc/\$(
   crictl inspect \$(
    crictl ps --output json |
     jq -r --arg POD ${primary_pod} '.containers[] |
      select(.labels["io.kubernetes.pod.name"]==\$POD and .metadata.name=="postgres") |
      .id'
    ) | jq -r '.info.pid'
   )/cgroup | cut -d: -f3
SH
)/cgroup.freeze

# Freeze the pod's container for 30 seconds
echo "Primary pod: ${primary_pod} node: ${primary_node} cgroup2 freeze file: ${freeze_file}"
docker exec -i "${primary_node}" bash -x <<SH
 cat "${freeze_file}"
 echo 1 > "${freeze_file}"
 echo 0 > "${freeze_file}"
SH

Freezing the cgroup directly doesn't update the container runtime's state, like it did with docker pause, so the kubelet doesn't detect this as an unhealthy condition even if even though processes are frozen and the database unresponsive. No automatic recovery is triggered because the kubelet doesn't see a problem. To reproduce the same behavior that has been seen above, we unfreeze the cgroup after 30 seconds.

Freezing the cgroup instead of pausing the Docker container offers a better way to simulate transient unresponsive containers—such as those caused by process hangs, resource starvation, or network partitions—because real-life failures do not leave the system in a clean state the way a Docker pause does.

Observations

This test shows how PostgreSQL and Kubernetes interact under CloudNativePG. Kubernetes pod health checks and CloudNativePG’s failover logic each run their own control loop:

Kubernetes restarts containers when liveness or readiness probes fail.
CloudNativePG (CNPG) evaluates database health using replication state, quorum, and instance manager connectivity.

Pausing the container briefly triggered CNPG’s primary isolation check. When the primary loses contact with both the Kubernetes API and other cluster members, CNPG shuts it down to prevent split-brain. Timeline:

T+0s — Primary paused. CNPG detects isolation.
T+30s — Kubernetes restarts the unhealthy container.
T+180s — CNPG triggers failover.
T+275s — Primary shutdown terminates client connections.

Because CNPG and Kubernetes act on different timelines, the original pod restarted as primary (“self-failover”) when no replica was a better promotion candidate. CNPG prioritizes data integrity over fast recovery and, without a consensus protocol like Raft, relies on:

Kubernetes API state
PostgreSQL streaming replication
Instance manager health checks

This may cause additional downtime during transient faults, but it prevents split-brain and reduces the risk of complex failovers from short failures. Here is the related discussion:
https://github.com/cloudnative-pg/cloudnative-pg/discussions/9814

We observed a restart of the primary and the application must be aware that it may leave some in-doubt commits if it restarted between the local commit and the remote acknowlegement, as explained in syncrep.c:

If a wait for synchronous replication is pending, we can neither acknowledge the commit nor raise ERROR or FATAL. So in this case we issue a WARNING (which some clients may be able to interpret).

Cloud systems can fail in many ways. In this test, I used docker pause to freeze processes and simulate a primary that stops responding to clients and health checks. This mirrors a previous test I did with Yugabyte: YugabyteDB Recovery Time Objective (RTO) with PgBench: continuous availability with max. 15s latency on infrastructure failure.

This post starts a CNPG series where I will also cover failures like network partitions and storage issues, and the connection pooler.

DEV Community