Introduction
Distributed systems expertise remains one of the most sought-after skills in software engineering. Engineers who can design, implement, and scale distributed databases command premium compensation for good reason—these systems form the backbone of modern applications serving millions of users.
In this comprehensive guide, we'll build a highly available, horizontally scalable MongoDB cluster using Kubernetes. You'll learn how to create a production-ready database infrastructure that can grow from a single node to hundreds of nodes, scaling seamlessly to meet demanding workloads.
Technology Stack
Our infrastructure leverages three powerful open-source technologies:
- MongoDB: A distributed NoSQL database designed for horizontal scalability and high availability
- MicroK8s: An ultra-lightweight Kubernetes distribution from Canonical (the creators of Ubuntu), optimized for both development and production environments
- OpenEBS: A cloud-native distributed storage solution for Kubernetes that provides persistent volume management
This combination enables true horizontal scalability—you can expand your cluster's capacity by adding more nodes rather than being limited by vertical scaling constraints.
Prerequisites and Initial Setup
Installing MicroK8s
First, set up your MicroK8s cluster. The installation process is straightforward and well-documented:
MicroK8s Getting Started Guide
Follow the official documentation to install MicroK8s on your nodes. Once complete, verify your installation:
microk8s status --wait-ready
Enabling OpenEBS Storage
OpenEBS integration with MicroK8s is remarkably simple, requiring just two commands:
microk8s enable community
microk8s enable openebs
These commands enable the community addon repository and install OpenEBS components into your cluster.
Configuring Distributed Storage
Installing iSCSI on Every Node
OpenEBS relies on iSCSI (Internet Small Computer Systems Interface) for distributed block storage. This protocol enables nodes to access block-level storage over TCP/IP networks, which is essential for our distributed architecture.
Critical: Install iSCSI on every node in your cluster:
sudo apt update
sudo apt install open-iscsi
sudo systemctl enable open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid
Verify that the iSCSI daemon is running:
systemctl status iscsid
You should see output similar to:
● iscsid.service - iSCSI initiator daemon (iscsid)
Loaded: loaded (/lib/systemd/system/iscsid.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2025-11-18 20:05:03 +03; 1 week 1 day ago
TriggeredBy: ● iscsid.socket
Docs: man:iscsid(8)
Main PID: 9887 (iscsid)
Tasks: 2 (limit: 9298)
Memory: 5.4M
CPU: 9.154s
CGroup: /system.slice/iscsid.service
├─9886 /sbin/iscsid
└─9887 /sbin/iscsid
Verifying Storage Classes
Check that OpenEBS storage classes are available in your cluster:
kubectl get storageclass
Expected output includes multiple OpenEBS storage classes:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
microk8s-hostpath (default) microk8s.io/hostpath Delete WaitForFirstConsumer false 256d
openebs-device openebs.io/local Delete WaitForFirstConsumer false 7d20h
openebs-hostpath openebs.io/local Delete WaitForFirstConsumer false 7d20h
openebs-jiva jiva.csi.openebs.io Delete Immediate true 44m
openebs-jiva-csi-default jiva.csi.openebs.io Delete Immediate true 7d20h
Configuring High Availability with Replica Count
For production deployments, configure the replication factor for your storage. This ensures data redundancy and high availability:
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: openebs-jiva
provisioner: jiva.csi.openebs.io
parameters:
replicaCount: "3" # Use 3 for production, 2 minimum for HA
policy: openebs-policy-default
allowVolumeExpansion: true
EOF
Replication guidelines:
- 3 replicas: Recommended for production (tolerates 1 node failure)
- 2 replicas: Minimum for high availability
- Ensure your cluster has at least as many nodes as your replica count
Deploying MongoDB
Creating the Namespace and Service Account
First, create a dedicated namespace for your databases:
kubectl create namespace databases
Now create a service account with appropriate permissions. MongoDB's sidecar container needs to discover other pods in the replica set:
apiVersion: v1
kind: ServiceAccount
metadata:
name: mongo
namespace: databases
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: read-pod-service-endpoint
rules:
- apiGroups: [""]
resources: ["pods", "services", "endpoints"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:serviceaccount:databases:mongo
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: read-pod-service-endpoint
subjects:
- kind: ServiceAccount
name: mongo
namespace: databases
Apply the configuration:
kubectl apply -f service-account.yaml
Creating Credentials Secret
Before deploying MongoDB, create a secret for authentication:
kubectl create secret generic mongo-secret \
--from-literal=mongo-user=admin \
--from-literal=mongo-password='YourSecurePassword123!' \
-n databases
Security note: In production, use a secrets management solution like HashiCorp Vault or sealed-secrets instead of plain Kubernetes secrets.
Deploying the StatefulSet
StatefulSets are designed for stateful applications like databases. Unlike Deployments, they provide:
- Stable, unique network identifiers
- Stable, persistent storage
- Ordered, graceful deployment and scaling
Here's the complete MongoDB StatefulSet configuration:
apiVersion: v1
kind: Service
metadata:
name: mongo
namespace: databases
labels:
name: mongo
spec:
selector:
app: mongo
ports:
- protocol: TCP
port: 27017
targetPort: 27017
clusterIP: None # Headless service for StatefulSet
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mongo
namespace: databases
labels:
app: mongo
spec:
serviceName: "mongo"
replicas: 3
selector:
matchLabels:
app: mongo
template:
metadata:
labels:
app: mongo
role: mongo
environment: production
spec:
serviceAccountName: mongo
automountServiceAccountToken: true
terminationGracePeriodSeconds: 30
containers:
- name: mongodb
image: mongo:5.0
command:
- mongod
- "--replSet=rs0"
- "--bind_ip=0.0.0.0"
ports:
- name: mongodb
containerPort: 27017
protocol: TCP
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "2Gi"
cpu: "2"
env:
- name: MONGO_INITDB_ROOT_USERNAME
valueFrom:
secretKeyRef:
name: mongo-secret
key: mongo-user
- name: MONGO_INITDB_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mongo-secret
key: mongo-password
volumeMounts:
- name: mongo-persistent-storage
mountPath: /data/db
livenessProbe:
exec:
command:
- mongosh
- --eval
- "db.adminCommand('ping')"
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
command:
- mongosh
- --eval
- "db.adminCommand('ping')"
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
- name: mongo-sidecar
image: morphy/k8s-mongo-sidecar
env:
- name: KUBERNETES_POD_LABELS
value: "app=mongo,role=mongo"
- name: KUBERNETES_SERVICE_NAME
value: "mongo"
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MONGODB_USERNAME
valueFrom:
secretKeyRef:
name: mongo-secret
key: mongo-user
- name: MONGODB_PASSWORD
valueFrom:
secretKeyRef:
name: mongo-secret
key: mongo-password
volumeClaimTemplates:
- metadata:
name: mongo-persistent-storage
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "openebs-jiva-csi-default"
resources:
requests:
storage: 50Gi
Key configuration details:
-
Headless Service (
clusterIP: None): Provides stable DNS entries for each pod (mongo-0, mongo-1, mongo-2) - Volume Claim Templates: Each pod gets its own 50GB persistent volume
- Health Probes: Liveness and readiness probes ensure pod health
- Sidecar Container: Automatically manages replica set configuration
- Resource Limits: Prevents resource exhaustion and enables proper scheduling
Deploy the StatefulSet:
kubectl apply -f statefulset.yaml
Monitor the deployment:
kubectl get pods -n databases -w
Wait for all pods to reach the Running state.
Initializing the Replica Set
Once all pods are running, initialize the MongoDB replica set. Connect to the first pod:
kubectl exec -it mongo-0 -n databases -- mongosh -u admin -p
Enter your password when prompted, then initialize the replica set:
rs.initiate({
_id: "rs0",
version: 1,
members: [
{ _id: 0, host: "mongo-0.mongo.databases.svc.cluster.local:27017" },
{ _id: 1, host: "mongo-1.mongo.databases.svc.cluster.local:27017" },
{ _id: 2, host: "mongo-2.mongo.databases.svc.cluster.local:27017" }
]
})
The DNS names follow the pattern: <pod-name>.<service-name>.<namespace>.svc.cluster.local:27017
Verifying Replica Set Status
Check the replica set configuration:
rs.status()
Look for these key indicators of a healthy cluster:
-
"ok": 1at the end of the output - One PRIMARY member
- Two SECONDARY members
- All members showing
"health": 1
Example output:
{
set: 'rs0',
members: [
{
_id: 0,
name: 'mongo-0.mongo.databases.svc.cluster.local:27017',
health: 1,
state: 1,
stateStr: 'PRIMARY',
...
},
{
_id: 1,
name: 'mongo-1.mongo.databases.svc.cluster.local:27017',
health: 1,
state: 2,
stateStr: 'SECONDARY',
...
},
...
],
ok: 1
}
Testing and Validation
Testing Data Persistence
Let's verify that data persists correctly across the cluster. Insert a test document:
kubectl exec -it -n databases mongo-1 -- mongosh -u admin -p --eval '
db = db.getSiblingDB("testdb");
db.testcollection.insertOne({
name: "test-record",
timestamp: new Date(),
message: "Testing OpenEBS Jiva storage",
node: "mongo-1"
});
db.testcollection.find().pretty();
'
Expected output:
[
{
_id: ObjectId('6926f6c7d79787542f544ca7'),
name: 'test-record',
timestamp: ISODate('2025-11-26T12:47:03.263Z'),
message: 'Testing OpenEBS Jiva storage',
node: 'mongo-1'
}
]
Now verify the data is replicated by querying a different pod:
kubectl exec -it -n databases mongo-0 -- mongosh -u admin -p --eval '
db = db.getSiblingDB("testdb");
db.testcollection.find().pretty();
'
You should see the same document, confirming replication is working.
Testing High Availability
The true test of high availability is failover. Let's simulate a node failure by deleting the primary pod.
First, identify the primary:
kubectl exec -n databases mongo-0 -- mongosh -u admin -p --eval "rs.isMaster().primary"
Output example:
mongo-1.mongo.databases.svc.cluster.local:27017
Now delete the primary pod:
kubectl delete pod mongo-1 -n databases
The replica set should automatically elect a new primary. Check the new primary:
kubectl exec -n databases mongo-0 -- mongosh -u admin -p --eval "rs.isMaster().primary"
Output:
mongo-0.mongo.databases.svc.cluster.local:27017
What just happened?
- The primary pod was deleted
- The remaining members detected the failure within seconds
- An automatic election occurred
- A new primary was elected
- Kubernetes recreated the deleted pod
- The recreated pod rejoined as a secondary
This demonstrates true high availability—your application experiences minimal disruption during node failures.
Testing Read Operations During Failover
For a more realistic test, run continuous read operations while deleting a pod:
# In terminal 1, start continuous reads
while true; do
kubectl exec -n databases mongo-0 -- mongosh -u admin -p --quiet --eval '
db.getSiblingDB("testdb").testcollection.findOne()
' 2>/dev/null && echo "✓ Read successful" || echo "✗ Read failed"
sleep 1
done
# In terminal 2, delete the primary
kubectl delete pod mongo-1 -n databases
You'll notice only a brief interruption (typically 5-10 seconds) during the election process.
Scaling Your Cluster
Horizontal Scaling
To scale your MongoDB cluster, simply increase the replica count:
kubectl scale statefulset mongo --replicas=5 -n databases
After the new pods are running, add them to the replica set:
kubectl exec -it mongo-0 -n databases -- mongosh -u admin -p
rs.add("mongo-3.mongo.databases.svc.cluster.local:27017")
rs.add("mongo-4.mongo.databases.svc.cluster.local:27017")
Vertical Scaling
To increase resources for existing pods, update the StatefulSet:
resources:
requests:
memory: "2Gi"
cpu: "2"
limits:
memory: "4Gi"
cpu: "4"
Apply the changes and perform a rolling update:
kubectl apply -f statefulset.yaml
kubectl rollout status statefulset/mongo -n databases
Storage Expansion
OpenEBS Jiva supports volume expansion. To increase storage for an existing pod:
kubectl patch pvc mongo-persistent-storage-mongo-0 -n databases -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
Note: Not all storage classes support volume expansion. Verify with kubectl get storageclass and check the ALLOWVOLUMEEXPANSION column.
Monitoring and Maintenance
Essential Monitoring Metrics
Deploy monitoring for these critical metrics:
- Replica Set Health
kubectl exec -n databases mongo-0 -- mongosh -u admin -p --eval "rs.status()" | grep -A 3 "stateStr"
- Pod Status
kubectl get pods -n databases -o wide
- Storage Usage
kubectl exec -n databases mongo-0 -- df -h /data/db
- Resource Consumption
kubectl top pods -n databases
Backup Strategy
Implement regular backups using mongodump:
kubectl exec -n databases mongo-0 -- mongodump \
--username=admin \
--password=YourPassword \
--authenticationDatabase=admin \
--out=/tmp/backup-$(date +%Y%m%d)
For production environments, consider using:
- Velero: Kubernetes-native backup solution
- MongoDB Ops Manager: MongoDB's enterprise backup solution
- Kanister: Application-level data management platform
Production Considerations
Security Hardening
- Enable TLS/SSL: Encrypt data in transit
- "--tlsMode=requireTLS"
- "--tlsCertificateKeyFile=/etc/mongodb/certs/mongodb.pem"
- Network Policies: Restrict pod-to-pod communication
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: mongo-netpol
namespace: databases
spec:
podSelector:
matchLabels:
app: mongo
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: application
ports:
- protocol: TCP
port: 27017
Pod Security Standards: Apply baseline security policies
Secrets Management: Use external secrets management (Vault, AWS Secrets Manager)
Performance Optimization
- Anti-Affinity Rules: Distribute pods across nodes
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- mongo
topologyKey: kubernetes.io/hostname
- Resource Tuning: Adjust based on workload patterns
- WiredTiger Cache: Configure based on available memory
- Connection Pooling: Optimize application connection settings
Disaster Recovery
- Multi-Region Deployment: Deploy across availability zones
- Regular Backup Testing: Verify backup integrity and restoration procedures
- Runbook Documentation: Document recovery procedures
- Automated Failover Testing: Regularly test failover mechanisms
Conclusion
You've successfully built a production-grade, horizontally scalable MongoDB cluster on Kubernetes. This setup provides:
- High Availability: Automatic failover with minimal downtime
- Horizontal Scalability: Scale from 3 to hundreds of nodes
- Data Durability: Replicated storage with OpenEBS
- Operational Flexibility: Kubernetes-native management
While this guide makes the deployment appear straightforward, the reality of managing distributed databases in production is complex. This complexity explains why managed database services like MongoDB Atlas command premium pricing—they handle the operational burden of:
- 24/7 monitoring and alerting
- Automated backups and point-in-time recovery
- Performance optimization and query analysis
- Security patches and upgrades
- Multi-region replication and disaster recovery
- Expert support and SLA guarantees
When to self-host vs. use managed services:
Self-hosting makes sense when:
- You have experienced DevOps and database engineers
- You require specific configurations not available in managed services
- Cost optimization is critical at scale (100+ nodes)
- You need on-premises deployment for compliance reasons
- You want complete control over your infrastructure
Managed services make sense when:
- Your team lacks Kubernetes and database operations expertise
- You want to focus on application development, not infrastructure
- You need guaranteed uptime SLAs
- You require enterprise support and consulting
- Your workload doesn't justify a dedicated operations team
The skills you've developed in this guide—Kubernetes orchestration, distributed systems design, and operational excellence—are valuable regardless of your deployment choice. Understanding how these systems work at a fundamental level makes you a better engineer, whether you're managing your own infrastructure or architecting applications on managed platforms.
Remember: the goal isn't to rebuild MongoDB Atlas, but to understand the principles that make distributed databases resilient and scalable. This knowledge will serve you well in designing and operating any distributed system.
Next Steps
To further enhance your deployment:
- Implement Prometheus monitoring with MongoDB exporter
- Deploy Grafana dashboards for visualization
- Set up automated backups with Velero or Kanister
- Configure alerting with AlertManager
- Implement GitOps with ArgoCD or Flux for declarative management
- Explore sharding for extreme scale (10TB+ datasets)
- Test disaster recovery procedures regularly
The journey from a basic deployment to a robust, production-ready system is iterative. Start with this foundation and continuously improve based on your specific requirements and operational experience.
Resources:
Top comments (0)