loading...

MongoDB Backups with Kubernetes Jobs

jmarhee profile image Joseph D. Marhee ・3 min read

I recently began running RocketChat on Kubernetes--a key component of this deployment was a MongoDB replica set. A best practice for running MongoDB reliably is, both, replication and regular backups, and Kubernetes provides accessible interfaces for both approaches.

In my case, I wanted regular and one-off backup capability, and the Kubernetes Jobs resource provided me a quick way to do this. I wanted my job pod to write out these dumpfiles to a persistent data store, so I first setup a PersistentVolume and accompanying Claim to that volume:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: mongo-dump-pv-volume
  labels:
    type: local
    app: mongo-dump
spec:
  storageClassName: manual
  capacity:
    storage: 50Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: "/mnt/kube-data/mongo-dumps"
    type: DirectoryOrCreate
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mongo-dump-pv-claim
  labels:
    app: postgres
spec:
  storageClassName: manual
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 50Gi

If you use a cloud provider like AWS or Azure, a PersistentVolume can provision block storage, so if your provider does make durability guarantees then your data volume is that much more persistent. The above example just uses hostPath volumes, so will persist for the lifecycle of the host that path resides on.

Your Job, itself, will have a podspec, much like other Kubernetes resources like Deployments where you make requests for resources, and it'll look like this:

apiVersion: batch/v1
kind: Job
metadata:
  name: mongodb-backup
  labels:
    app: mongo-dump
spec:
  backoffLimit: 5
  activeDeadlineSeconds: 100
  template:
    spec:
      containers:
      - name: mongodump
        image: mongo
        command: ["mongodump","--host","mongo-service:27017","--db","your_db"]
        volumeMounts:
          - mountPath: dump
            name: mongo-dumps
      volumes:
      - name: mongo-dumps
        persistentVolumeClaim:
          claimName: mongo-dump-pv-claim
      restartPolicy: OnFailure

You'll see we're attaching the volume as we might normally, and then run the mongodump command, which will write out to the mount path, dump.

If you have authentication enabled on your Mongo service, or use a SaaS like MongoDB Atlas, you can use Secrets like you might normally to pass through credentials being stored securely, not in the Job spec itself:

apiVersion: batch/v1
kind: Job
metadata:
  name: mongodb-backup
  labels:
    app: mongo-dump
spec:
  backoffLimit: 5
  activeDeadlineSeconds: 100
  template:
    spec:
      containers:
      - name: mongodump
        image: mongo
        env:
          - name: MONGO_CONN_STRING
            valueFrom:
              secretKeyRef:
                name: mongo-auth
                key: connstring
          - name: MONGO_DB
            value: "my_db"
        command: ["mongodump","--host","$MONGO_CONN_STRING","--db","$MONGO_DB"]
        volumeMounts:
          - mountPath: dump
            name: mongo-dumps
      volumes:
      - name: mongo-dumps
        persistentVolumeClaim:
          claimName: mongo-dump-pv-claim
      restartPolicy: Never

After you apply this Job spec, you can monitor your progress:

kubectl get pods -l app=mongo-dump

then monitor the logs for that pod name:

kubectl logs $POD_NAME

If you see your job failing, you can use the restartPolicy to define behavior in this area; for example, in the declaration above, it will not restart, but you can, for example, use OnFailure to attempt a retry, and use other options available to to define retires, backoff timing, etc.

CronJobs are another type of Job supported in Kubernetes presently (since 1.7), and in the link above, you'll see that the spec is similar, but includes your typical Cron syntax for defining when you'd like the job run, and related behavior.

Discussion

pic
Editor guide