Ahmed Rakan

Posted on Nov 26

Building a Production-Grade MongoDB Cluster on Kubernetes: A Complete Guide to Horizontal Scalability

#database #devops

Introduction

Distributed systems expertise remains one of the most sought-after skills in software engineering. Engineers who can design, implement, and scale distributed databases command premium compensation for good reason—these systems form the backbone of modern applications serving millions of users.

In this comprehensive guide, we'll build a highly available, horizontally scalable MongoDB cluster using Kubernetes. You'll learn how to create a production-ready database infrastructure that can grow from a single node to hundreds of nodes, scaling seamlessly to meet demanding workloads.

Technology Stack

Our infrastructure leverages three powerful open-source technologies:

MongoDB: A distributed NoSQL database designed for horizontal scalability and high availability
MicroK8s: An ultra-lightweight Kubernetes distribution from Canonical (the creators of Ubuntu), optimized for both development and production environments
OpenEBS: A cloud-native distributed storage solution for Kubernetes that provides persistent volume management

This combination enables true horizontal scalability—you can expand your cluster's capacity by adding more nodes rather than being limited by vertical scaling constraints.

Prerequisites and Initial Setup

Installing MicroK8s

First, set up your MicroK8s cluster. The installation process is straightforward and well-documented:

MicroK8s Getting Started Guide

Follow the official documentation to install MicroK8s on your nodes. Once complete, verify your installation:

microk8s status --wait-ready

Enabling OpenEBS Storage

OpenEBS integration with MicroK8s is remarkably simple, requiring just two commands:

microk8s enable community
microk8s enable openebs

These commands enable the community addon repository and install OpenEBS components into your cluster.

Configuring Distributed Storage

Installing iSCSI on Every Node

OpenEBS relies on iSCSI (Internet Small Computer Systems Interface) for distributed block storage. This protocol enables nodes to access block-level storage over TCP/IP networks, which is essential for our distributed architecture.

Critical: Install iSCSI on every node in your cluster:

sudo apt update
sudo apt install open-iscsi
sudo systemctl enable open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid

Verify that the iSCSI daemon is running:

systemctl status iscsid

You should see output similar to:

● iscsid.service - iSCSI initiator daemon (iscsid)
     Loaded: loaded (/lib/systemd/system/iscsid.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2025-11-18 20:05:03 +03; 1 week 1 day ago
TriggeredBy: ● iscsid.socket
       Docs: man:iscsid(8)
   Main PID: 9887 (iscsid)
      Tasks: 2 (limit: 9298)
     Memory: 5.4M
        CPU: 9.154s
     CGroup: /system.slice/iscsid.service
             ├─9886 /sbin/iscsid
             └─9887 /sbin/iscsid

Verifying Storage Classes

Check that OpenEBS storage classes are available in your cluster:

kubectl get storageclass

Expected output includes multiple OpenEBS storage classes:

NAME                          PROVISIONER            RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
microk8s-hostpath (default)   microk8s.io/hostpath   Delete          WaitForFirstConsumer   false                  256d
openebs-device                openebs.io/local       Delete          WaitForFirstConsumer   false                  7d20h
openebs-hostpath              openebs.io/local       Delete          WaitForFirstConsumer   false                  7d20h
openebs-jiva                  jiva.csi.openebs.io    Delete          Immediate              true                   44m
openebs-jiva-csi-default      jiva.csi.openebs.io    Delete          Immediate              true                   7d20h

Configuring High Availability with Replica Count

For production deployments, configure the replication factor for your storage. This ensures data redundancy and high availability:

cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-jiva
provisioner: jiva.csi.openebs.io
parameters:
  replicaCount: "3"  # Use 3 for production, 2 minimum for HA
  policy: openebs-policy-default
allowVolumeExpansion: true
EOF

Replication guidelines:

3 replicas: Recommended for production (tolerates 1 node failure)
2 replicas: Minimum for high availability
Ensure your cluster has at least as many nodes as your replica count

Deploying MongoDB

Creating the Namespace and Service Account

First, create a dedicated namespace for your databases:

kubectl create namespace databases

Now create a service account with appropriate permissions. MongoDB's sidecar container needs to discover other pods in the replica set:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: mongo
  namespace: databases

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: read-pod-service-endpoint
rules:
- apiGroups: [""]
  resources: ["pods", "services", "endpoints"]
  verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: system:serviceaccount:databases:mongo
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: read-pod-service-endpoint
subjects:
- kind: ServiceAccount
  name: mongo
  namespace: databases

Apply the configuration:

kubectl apply -f service-account.yaml

Creating Credentials Secret

Before deploying MongoDB, create a secret for authentication:

kubectl create secret generic mongo-secret \
  --from-literal=mongo-user=admin \
  --from-literal=mongo-password='YourSecurePassword123!' \
  -n databases

Security note: In production, use a secrets management solution like HashiCorp Vault or sealed-secrets instead of plain Kubernetes secrets.

Deploying the StatefulSet

StatefulSets are designed for stateful applications like databases. Unlike Deployments, they provide:

Stable, unique network identifiers
Stable, persistent storage
Ordered, graceful deployment and scaling

Here's the complete MongoDB StatefulSet configuration:

apiVersion: v1
kind: Service
metadata:
  name: mongo
  namespace: databases
  labels:
    name: mongo
spec:
  selector:
    app: mongo
  ports:
  - protocol: TCP
    port: 27017
    targetPort: 27017
  clusterIP: None  # Headless service for StatefulSet

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongo
  namespace: databases
  labels:
    app: mongo
spec:
  serviceName: "mongo"
  replicas: 3
  selector:
    matchLabels:
      app: mongo
  template:
    metadata:
      labels:
        app: mongo
        role: mongo
        environment: production
    spec:
      serviceAccountName: mongo
      automountServiceAccountToken: true
      terminationGracePeriodSeconds: 30

      containers:
      - name: mongodb
        image: mongo:5.0
        command:
        - mongod
        - "--replSet=rs0"
        - "--bind_ip=0.0.0.0"
        ports:
        - name: mongodb
          containerPort: 27017
          protocol: TCP

        resources:
          requests:
            memory: "1Gi"
            cpu: "1"
          limits:
            memory: "2Gi"
            cpu: "2"

        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          valueFrom:
            secretKeyRef:
              name: mongo-secret
              key: mongo-user
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongo-secret
              key: mongo-password

        volumeMounts:
        - name: mongo-persistent-storage
          mountPath: /data/db

        livenessProbe:
          exec:
            command:
            - mongosh
            - --eval
            - "db.adminCommand('ping')"
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5

        readinessProbe:
          exec:
            command:
            - mongosh
            - --eval
            - "db.adminCommand('ping')"
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3

      - name: mongo-sidecar
        image: morphy/k8s-mongo-sidecar
        env:
        - name: KUBERNETES_POD_LABELS
          value: "app=mongo,role=mongo"
        - name: KUBERNETES_SERVICE_NAME
          value: "mongo"
        - name: KUBERNETES_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: MONGODB_USERNAME
          valueFrom:
            secretKeyRef:
              name: mongo-secret
              key: mongo-user
        - name: MONGODB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongo-secret
              key: mongo-password

  volumeClaimTemplates:
  - metadata:
      name: mongo-persistent-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: "openebs-jiva-csi-default"
      resources:
        requests:
          storage: 50Gi

Key configuration details:

Headless Service (clusterIP: None): Provides stable DNS entries for each pod (mongo-0, mongo-1, mongo-2)
Volume Claim Templates: Each pod gets its own 50GB persistent volume
Health Probes: Liveness and readiness probes ensure pod health
Sidecar Container: Automatically manages replica set configuration
Resource Limits: Prevents resource exhaustion and enables proper scheduling

Deploy the StatefulSet:

kubectl apply -f statefulset.yaml

Monitor the deployment:

kubectl get pods -n databases -w

Wait for all pods to reach the Running state.

Initializing the Replica Set

Once all pods are running, initialize the MongoDB replica set. Connect to the first pod:

kubectl exec -it mongo-0 -n databases -- mongosh -u admin -p

Enter your password when prompted, then initialize the replica set:

rs.initiate({
  _id: "rs0",
  version: 1,
  members: [
    { _id: 0, host: "mongo-0.mongo.databases.svc.cluster.local:27017" },
    { _id: 1, host: "mongo-1.mongo.databases.svc.cluster.local:27017" },
    { _id: 2, host: "mongo-2.mongo.databases.svc.cluster.local:27017" }
  ]
})

The DNS names follow the pattern: <pod-name>.<service-name>.<namespace>.svc.cluster.local:27017

Verifying Replica Set Status

Check the replica set configuration:

rs.status()

Look for these key indicators of a healthy cluster:

"ok": 1 at the end of the output
One PRIMARY member
Two SECONDARY members
All members showing "health": 1

Example output:

{
  set: 'rs0',
  members: [
    {
      _id: 0,
      name: 'mongo-0.mongo.databases.svc.cluster.local:27017',
      health: 1,
      state: 1,
      stateStr: 'PRIMARY',
      ...
    },
    {
      _id: 1,
      name: 'mongo-1.mongo.databases.svc.cluster.local:27017',
      health: 1,
      state: 2,
      stateStr: 'SECONDARY',
      ...
    },
    ...
  ],
  ok: 1
}

Testing and Validation

Testing Data Persistence

Let's verify that data persists correctly across the cluster. Insert a test document:

kubectl exec -it -n databases mongo-1 -- mongosh -u admin -p --eval '
db = db.getSiblingDB("testdb");
db.testcollection.insertOne({
  name: "test-record",
  timestamp: new Date(),
  message: "Testing OpenEBS Jiva storage",
  node: "mongo-1"
});
db.testcollection.find().pretty();
'

Expected output:

[
  {
    _id: ObjectId('6926f6c7d79787542f544ca7'),
    name: 'test-record',
    timestamp: ISODate('2025-11-26T12:47:03.263Z'),
    message: 'Testing OpenEBS Jiva storage',
    node: 'mongo-1'
  }
]

Now verify the data is replicated by querying a different pod:

kubectl exec -it -n databases mongo-0 -- mongosh -u admin -p --eval '
db = db.getSiblingDB("testdb");
db.testcollection.find().pretty();
'

You should see the same document, confirming replication is working.

Testing High Availability

The true test of high availability is failover. Let's simulate a node failure by deleting the primary pod.

First, identify the primary:

kubectl exec -n databases mongo-0 -- mongosh -u admin -p --eval "rs.isMaster().primary"

Output example:

mongo-1.mongo.databases.svc.cluster.local:27017

Now delete the primary pod:

kubectl delete pod mongo-1 -n databases

The replica set should automatically elect a new primary. Check the new primary:

kubectl exec -n databases mongo-0 -- mongosh -u admin -p --eval "rs.isMaster().primary"

Output:

mongo-0.mongo.databases.svc.cluster.local:27017

What just happened?

The primary pod was deleted
The remaining members detected the failure within seconds
An automatic election occurred
A new primary was elected
Kubernetes recreated the deleted pod
The recreated pod rejoined as a secondary

This demonstrates true high availability—your application experiences minimal disruption during node failures.

Testing Read Operations During Failover

For a more realistic test, run continuous read operations while deleting a pod:

# In terminal 1, start continuous reads
while true; do 
  kubectl exec -n databases mongo-0 -- mongosh -u admin -p --quiet --eval '
    db.getSiblingDB("testdb").testcollection.findOne()
  ' 2>/dev/null && echo "✓ Read successful" || echo "✗ Read failed"
  sleep 1
done

# In terminal 2, delete the primary
kubectl delete pod mongo-1 -n databases

You'll notice only a brief interruption (typically 5-10 seconds) during the election process.

Scaling Your Cluster

Horizontal Scaling

To scale your MongoDB cluster, simply increase the replica count:

kubectl scale statefulset mongo --replicas=5 -n databases

After the new pods are running, add them to the replica set:

kubectl exec -it mongo-0 -n databases -- mongosh -u admin -p

rs.add("mongo-3.mongo.databases.svc.cluster.local:27017")
rs.add("mongo-4.mongo.databases.svc.cluster.local:27017")

Vertical Scaling

To increase resources for existing pods, update the StatefulSet:

resources:
  requests:
    memory: "2Gi"
    cpu: "2"
  limits:
    memory: "4Gi"
    cpu: "4"

Apply the changes and perform a rolling update:

kubectl apply -f statefulset.yaml
kubectl rollout status statefulset/mongo -n databases

Storage Expansion

OpenEBS Jiva supports volume expansion. To increase storage for an existing pod:

kubectl patch pvc mongo-persistent-storage-mongo-0 -n databases -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'

Note: Not all storage classes support volume expansion. Verify with kubectl get storageclass and check the ALLOWVOLUMEEXPANSION column.

Monitoring and Maintenance

Essential Monitoring Metrics

Deploy monitoring for these critical metrics:

Replica Set Health

   kubectl exec -n databases mongo-0 -- mongosh -u admin -p --eval "rs.status()" | grep -A 3 "stateStr"

Pod Status

   kubectl get pods -n databases -o wide

Storage Usage

   kubectl exec -n databases mongo-0 -- df -h /data/db

Resource Consumption

   kubectl top pods -n databases

Backup Strategy

Implement regular backups using mongodump:

kubectl exec -n databases mongo-0 -- mongodump \
  --username=admin \
  --password=YourPassword \
  --authenticationDatabase=admin \
  --out=/tmp/backup-$(date +%Y%m%d)

For production environments, consider using:

Velero: Kubernetes-native backup solution
MongoDB Ops Manager: MongoDB's enterprise backup solution
Kanister: Application-level data management platform

Production Considerations

Security Hardening

Enable TLS/SSL: Encrypt data in transit

   - "--tlsMode=requireTLS"
   - "--tlsCertificateKeyFile=/etc/mongodb/certs/mongodb.pem"

Network Policies: Restrict pod-to-pod communication

   apiVersion: networking.k8s.io/v1
   kind: NetworkPolicy
   metadata:
     name: mongo-netpol
     namespace: databases
   spec:
     podSelector:
       matchLabels:
         app: mongo
     policyTypes:
     - Ingress
     ingress:
     - from:
       - namespaceSelector:
           matchLabels:
             name: application
       ports:
       - protocol: TCP
         port: 27017

Pod Security Standards: Apply baseline security policies
Secrets Management: Use external secrets management (Vault, AWS Secrets Manager)

Performance Optimization

Anti-Affinity Rules: Distribute pods across nodes

   affinity:
     podAntiAffinity:
       requiredDuringSchedulingIgnoredDuringExecution:
       - labelSelector:
           matchExpressions:
           - key: app
             operator: In
             values:
             - mongo
         topologyKey: kubernetes.io/hostname

Resource Tuning: Adjust based on workload patterns
WiredTiger Cache: Configure based on available memory
Connection Pooling: Optimize application connection settings

Disaster Recovery

Multi-Region Deployment: Deploy across availability zones
Regular Backup Testing: Verify backup integrity and restoration procedures
Runbook Documentation: Document recovery procedures
Automated Failover Testing: Regularly test failover mechanisms

Conclusion

You've successfully built a production-grade, horizontally scalable MongoDB cluster on Kubernetes. This setup provides:

High Availability: Automatic failover with minimal downtime
Horizontal Scalability: Scale from 3 to hundreds of nodes
Data Durability: Replicated storage with OpenEBS
Operational Flexibility: Kubernetes-native management

While this guide makes the deployment appear straightforward, the reality of managing distributed databases in production is complex. This complexity explains why managed database services like MongoDB Atlas command premium pricing—they handle the operational burden of:

24/7 monitoring and alerting
Automated backups and point-in-time recovery
Performance optimization and query analysis
Security patches and upgrades
Multi-region replication and disaster recovery
Expert support and SLA guarantees

When to self-host vs. use managed services:

Self-hosting makes sense when:

You have experienced DevOps and database engineers
You require specific configurations not available in managed services
Cost optimization is critical at scale (100+ nodes)
You need on-premises deployment for compliance reasons
You want complete control over your infrastructure

Managed services make sense when:

Your team lacks Kubernetes and database operations expertise
You want to focus on application development, not infrastructure
You need guaranteed uptime SLAs
You require enterprise support and consulting
Your workload doesn't justify a dedicated operations team

The skills you've developed in this guide—Kubernetes orchestration, distributed systems design, and operational excellence—are valuable regardless of your deployment choice. Understanding how these systems work at a fundamental level makes you a better engineer, whether you're managing your own infrastructure or architecting applications on managed platforms.

Remember: the goal isn't to rebuild MongoDB Atlas, but to understand the principles that make distributed databases resilient and scalable. This knowledge will serve you well in designing and operating any distributed system.

Next Steps

To further enhance your deployment:

Implement Prometheus monitoring with MongoDB exporter
Deploy Grafana dashboards for visualization
Set up automated backups with Velero or Kanister
Configure alerting with AlertManager
Implement GitOps with ArgoCD or Flux for declarative management
Explore sharding for extreme scale (10TB+ datasets)
Test disaster recovery procedures regularly

The journey from a basic deployment to a robust, production-ready system is iterative. Start with this foundation and continuously improve based on your specific requirements and operational experience.

Resources:

DEV Community