Abhishek Gupta

Posted on Mar 3

DocumentDB on Kubernetes: Resilient, Highly Available Databases with Automatic Failover

#kubernetes #postgres #python #mongodb

DocumentDB is an open-source MongoDB-compatible database built on PostgreSQL that provides a familiar interface while leveraging PostgreSQL's reliability and extensibility. The DocumentDB Kubernetes Operator brings this database to Kubernetes environments by extending the platform with custom resources. The operator manages DocumentDB clusters declaratively, handling deployment, scaling, upgrades, and high availability scenarios automatically.

The DocumentDB Kubernetes Operator provides multiple levels of high availability, each addressing a different failure domain. Local HA deploys multiple database instances within a single Kubernetes cluster with automatic failover in seconds, protecting against pod and node failures. For further resilience, you can configure availability zone spreading so that replicas land in different AZs, allowing the cluster to survive a full zone outage without manual intervention. Beyond a single cluster, the operator supports multi-region HA across Azure regions (using KubeFleet) and multi-cloud HA across providers like Azure, AWS, and GCP (using Istio). Both use physical WAL replication with manual failover via kubectl documentdb promote. These levels are composable: a production deployment can combine all of them.

This post focuses on local HA, the foundational layer, and walks through automatic failover in action.

Highly Available DocumentDB deployment on Kubernetes

In single-instance database deployments, any failure (such as a pod crash, node issue, or planned upgrade) may result in downtime. Local high availability (HA) solves this problem by deploying multiple database instances within a single Kubernetes cluster. The operator creates one primary instance that handles all client operations, along with multiple replica instances that are continuously replicated via asynchronous WAL streaming. When the primary fails, a replica is automatically promoted to become the new primary, ensuring your application experiences minimal disruption.

Local HA is ideal when you need resilience within a single region without the complexity of multi-region deployments. It's a cost-effective solution for development and staging environments where you want to validate failover behavior without cloud distribution costs, as well as for production workloads that require automatic recovery from infrastructure failures. For cross-region disaster recovery scenarios, the operator also supports multi-cluster replication features.

Architecture Overview

DocumentDB's local HA leverages CloudNativePG (CNPG) as its underlying PostgreSQL foundation. CNPG handles WAL-based streaming replication and automatic failover orchestration, while DocumentDB adds the MongoDB-compatible protocol layer on top.

Here are the key components:

CNPG Cluster: Manages PostgreSQL replication (1 primary + N replicas) with WAL-based streaming
DocumentDB Gateway: A sidecar container injected into each PostgreSQL pod that translates MongoDB wire protocol to PostgreSQL DocumentDB extension calls
Kubernetes Services: Layered service architecture for different access patterns:
- Internal PostgreSQL services (port 5432): <cluster>-rw (primary), <cluster>-ro (replicas only), <cluster>-r (all instances — primary and replicas) - used for internal operations, metrics, and backups
- External Gateway service (port 10260): Routes MongoDB client traffic to the current primary by tracking the cnpg.io/instanceRole: primary label

When the primary fails, CNPG automatically detects the failure and promotes the most advanced replica to primary, updating the pod labels. The Kubernetes service automatically follows the new primary, keeping the external IP stable – no manual DNS changes required. The operator currently supports a maximum of 3 instances (instancesPerNode: 3), providing 1 primary + 2 replicas for optimal balance between availability, performance, and operational flexibility.

Let's see how this works in practice.

Setting Up the Test Environment

You need the following installed:

minikube, kubectl, and helm
Python (for the test client application)

Start your minikube cluster:

minikube start

Also make sure to clone the GitHub repository:

git clone https://github.com/abhirockzz/documentdb-local-ha-tutorial.git
cd documentdb-local-ha-tutorial

Install DocumentDB Operator

First, install cert-manager (required dependency):

helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true

Install the DocumentDB operator:

helm repo add documentdb https://documentdb.github.io/documentdb-kubernetes-operator

helm install documentdb-operator documentdb/documentdb-operator \
  --namespace documentdb-operator \
  --create-namespace \
  --wait

Verify the operator is running:

kubectl get deployment -n documentdb-operator

To check the operator version, run kubectl get pods -n documentdb-operator -o jsonpath='{.items[*].spec.containers[*].image}' && echo

Deploy Local HA Cluster

In a separate terminal, start the minikube tunnel. This creates a network route that enables LoadBalancer services to receive external IPs in your local minikube environment. The tunnel will assign an IP address to the DocumentDB service and route traffic from your host machine to the cluster, allowing the Python test client to connect:

minikube tunnel

# output:
✅  Tunnel successfully started

📌  NOTE: Please do not close this terminal as this process must stay alive for the tunnel to be accessible ...

The local_ha.yaml file contains the complete configuration including namespace, credentials secret, and the DocumentDB resource with instancesPerNode: 3 to create a 3-instance HA cluster.

Deploy the cluster:

kubectl apply -f local_ha.yaml

Monitor the pod status. Wait for all pods to be running (this may take 1-2 minutes). In the meantime, you should see output similar to this, and eventually all 3 pods will reach Running status:

kubectl get pods -n documentdb-preview-ns -w

# output:

NAME                                 READY   STATUS            RESTARTS   AGE
documentdb-local-ha-1-initdb-ffrjf   0/1     PodInitializing   0          3s
documentdb-local-ha-1-initdb-ffrjf   1/1     Running           0          24s
documentdb-local-ha-1-initdb-ffrjf   0/1     Completed         0          25s
documentdb-local-ha-1-initdb-ffrjf   0/1     Completed         0          27s
documentdb-local-ha-1-initdb-ffrjf   0/1     Completed         0          27s
documentdb-local-ha-1                0/2     Pending           0          0s
documentdb-local-ha-1                0/2     Pending           0          0s
//....
documentdb-local-ha-3                2/2     Running           0          11s

Go to the minikube tunnel terminal, and verify the tunnel is now active for the DocumentDB service:

Starting tunnel for service documentdb-service-documentdb-local-ha.

You can verify the external IP assigned to the service (should be 127.0.0.1 in this case):

kubectl get svc -n documentdb-preview-ns

# output:
NAME                                     TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)           AGE
documentdb-local-ha-r                    ClusterIP      10.108.176.232   <none>        5432/TCP          4m47s
documentdb-local-ha-ro                   ClusterIP      10.111.154.246   <none>        5432/TCP          4m47s
documentdb-local-ha-rw                   ClusterIP      10.101.235.95    <none>        5432/TCP          4m47s
documentdb-service-documentdb-local-ha   LoadBalancer   10.97.24.62      127.0.0.1     10260:31829/TCP   4m57s

Ok now we are ready to test failover.

Test Failover

The test uses a Python client application that continuously performs write and read operations against the DocumentDB cluster. This includes retry logic with exponential backoff and tracks metrics like operation counts, failures, and downtime.

Note that the client uses retryWrites=True, which allows the MongoDB driver to automatically retry failed writes on the new primary — in very fast failovers, you may see zero reported failures as the driver absorbs the disruption transparently.

Start the client application:

pip install -r requirements.txt
python failover_test_read_write.py

Let it run for ~15 seconds to establish a baseline. You'll see continuous write and read operations succeeding.

//....
[CLIENT][10:29:28.051] ✓ Connected to DocumentDB
[CLIENT][10:29:28.138] ✓ W#1 (44ms) | R#1 (43ms) | Avg W:44ms R:43ms | Docs: 1
[CLIENT][10:29:28.688] ✓ W#2 (2ms) | R#2 (43ms) | Avg W:23ms R:43ms | Docs: 2
[CLIENT][10:29:29.234] ✓ W#3 (2ms) | R#3 (43ms) | Avg W:16ms R:43ms | Docs: 3
[CLIENT][10:29:29.783] ✓ W#4 (3ms) | R#4 (43ms) | Avg W:13ms R:43ms | Docs: 4
[CLIENT][10:29:30.327] ✓ W#5 (2ms) | R#5 (42ms) | Avg W:11ms R:43ms | Docs: 5
//.....

Trigger Failover

In a new terminal, identify the current primary (the primary/replica roles may vary in your deployment):

kubectl get pods -n documentdb-preview-ns -L cnpg.io/instanceRole

# output:
NAME                    READY   STATUS    RESTARTS   AGE   INSTANCEROLE
documentdb-local-ha-1   2/2     Running   0          14m   primary
documentdb-local-ha-2   2/2     Running   0          14m   replica
documentdb-local-ha-3   2/2     Running   0          14m   replica

Note which pod shows primary role, then delete it to simulate a failure:

kubectl delete pod <primary-pod-name> -n documentdb-preview-ns

So, in case your primary pod is documentdb-local-ha-1, you would run: kubectl delete pod documentdb-local-ha-1 -n documentdb-preview-ns

Automatic Recovery

Watch the client terminal. You should see logs similar to this:

[CLIENT][10:32:32.452] ✗ FAILOVER EVENT DETECTED
[CLIENT][10:32:32.452] ℹ   Error: the database system is shutting down, full error: {'ok': 0.0, 'code': 50463173, 'codeName': 'Error', 'errmsg': 'the database system is shutting down'}
[CLIENT][10:32:32.452] ℹ   Last successful write: #178
[CLIENT][10:32:32.452] ℹ ================================================================================
[CLIENT][10:32:32.500] ✗ ⚠ Write FAILED | ⚠ Read FAILED | Downtime: 0.0s | Failed W: 1 R: 1
[CLIENT][10:32:32.551] ↻ Attempting reconnection (backoff: 1.0s)...
[CLIENT][10:32:32.703] ✗ Connection failed: error connecting to server: Connection refused (os error 111), full error: {'ok': 0.0, 'code': 1, 'codeName': 'Internal Error', 'errmsg': 'error connecting to server: Connection refused (os error 111)'}
[CLIENT][10:32:33.705] ↻ Attempting reconnection (backoff: 2.0s)...
[CLIENT][10:32:33.947] ✓ Connected to DocumentDB
[CLIENT][10:32:33.947] ✓ Reconnection successful!
[CLIENT][10:32:33.998] ℹ ================================================================================
[CLIENT][10:32:33.998] ↻ RECOVERY COMPLETE
//.....

Then the read and write operations should resume successfully:

[CLIENT][10:32:34.044] ✓ W#179 (51ms) | R#179 (46ms) | Avg W:8ms R:39ms | Docs: 284
[CLIENT][10:32:34.593] ✓ W#180 (4ms) | R#180 (42ms) | Avg W:8ms R:39ms | Docs: 285
[CLIENT][10:32:35.143] ✓ W#181 (5ms) | R#181 (44ms) | Avg W:9ms R:39ms | Docs: 286
[CLIENT][10:32:35.693] ✓ W#182 (3ms) | R#182 (43ms) | Avg W:9ms R:39ms | Docs: 287
[CLIENT][10:32:36.238] ✓ W#183 (2ms) | R#183 (42ms) | Avg W:8ms R:39ms | Docs: 288
[CLIENT][10:32:36.787] ✓ W#184 (3ms) | R#184 (43ms) | Avg W:9ms R:39ms | Docs: 289
[CLIENT][10:32:37.333] ✓ W#185 (3ms) | R#185 (42ms) | Avg W:8ms R:39ms | Docs: 290
//....

Let's take a closer look at what happened.

Behind the scenes

These are the key events during failover:

Failure detection: Operations start failing with connection errors
FAILOVER EVENT DETECTED: The client recognizes the disruption
Reconnection attempts: Automatic retry with exponential backoff
RECOVERY COMPLETE: Service automatically resumes

Behind the scenes, CNPG automatically detects the primary pod termination, promotes a healthy replica to primary, and updates the cnpg.io/instanceRole: primary label on the new primary pod. The Kubernetes service automatically routes traffic to the new primary (the external IP remains unchanged).

The key to failover is the service's label selector mechanism. Since the service tracks pods with cnpg.io/instanceRole: primary, when CNPG updates this label during promotion, the service endpoint automatically switches to the new primary without any DNS changes or client reconfiguration.

Verify the new primary – you should see a different pod as the new primary (it may vary in your deployment). In this example, documentdb-local-ha-3 has become the new primary:

kubectl get pods -n documentdb-preview-ns -L cnpg.io/instanceRole

# output:
NAME                    READY   STATUS    RESTARTS   AGE     INSTANCEROLE
documentdb-local-ha-1   2/2     Running   0          2m17s   replica
documentdb-local-ha-2   2/2     Running   0          18m     replica
documentdb-local-ha-3   2/2     Running   0          18m     primary

Verify manually

You can also connect directly using mongosh to verify. First, get the connection string:

kubectl get documentdb documentdb-local-ha -n documentdb-preview-ns

# output:

NAME                  STATUS                     CONNECTION STRING
documentdb-local-ha   Cluster in healthy state   mongodb://$(kubectl get secret documentdb-credentials -n documentdb-preview-ns -o jsonpath='{.data.username}' | base64 -d):$(kubectl get secret documentdb-credentials -n documentdb-preview-ns -o jsonpath='{.data.password}' | base64 -d)@127.0.0.1:10260/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsAllowInvalidCertificates=true&replicaSet=rs0

Use the connection string to connect with mongosh:

mongosh "mongodb://$(kubectl get secret documentdb-credentials -n documentdb-preview-ns -o jsonpath='{.data.username}' | base64 -d):$(kubectl get secret documentdb-credentials -n documentdb-preview-ns -o jsonpath='{.data.password}' | base64 -d)@127.0.0.1:10260/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsAllowInvalidCertificates=true&replicaSet=rs0"

Once connected, you can connect to the testdb database and verify the documents:

rs0 [direct: mongos] test> use testdb
switched to db testdb
rs0 [direct: mongos] testdb> db.getCollectionNames()
[ 'failover_test' ]
rs0 [direct: mongos] testdb> db.failover_test.countDocuments()
408
rs0 [direct: mongos] testdb>

Bonus exercise

Try connecting to each cluster node directly. You can kubectl port-forward to each pod. For example, to connect to documentdb-local-ha-1 over port 27017:

kubectl port-forward -n documentdb-preview-ns documentdb-local-ha-1 27017:10260

Now you can connect with mongosh:

mongosh "mongodb://k8s_secret_user:K8sSecret100@localhost:27017/?directConnection=true&authMechanism=SCRAM-SHA-256&tls=true&tlsAllowInvalidCertificates=true"

Experiment with different commands and observe the behavior.

You can also explore advanced scenarios like multi-region or multi-cloud deployments

Cleanup

To tear down the environment when you're done:

kubectl delete namespace documentdb-preview-ns
helm uninstall documentdb-operator -n documentdb-operator
kubectl delete namespace documentdb-operator
minikube stop

Wrapping Up

The DocumentDB Kubernetes Operator provides local high availability with automatic failover capabilities. You have seen how the operator handles primary failures with minimal manual intervention, making it easier to build resilient database deployments on Kubernetes.

This tutorial demonstrates basic failover recovery using a small dataset in a single-node cluster. In this case, the client-observed recovery time was approximately 1-3 seconds. Since CNPG uses asynchronous replication by default, note that transactions committed on the old primary but not yet replicated to standbys could be lost during an unplanned failover. Make sure to consider factors specific to your deployments for production or with larger datasets.

Go ahead, try it out in your own environment, and let us know your feedback!

Check out the documentation for the latest feature updates. If you run into issues or have questions, reach out on Discord or raise an issue on GitHub. Contributions are welcome, whether it's code, documentation improvements, or simply sharing your experience.

Happy building!

DEV Community