Puru

Posted on Nov 15, 2020

Schedule MongoDB Backup to S3 using Kubernetes CronJob

#kubernetes #mongodb #s3 #cronjob

Introduction

Kubernetes CronJob makes it very easy to run Jobs on a time-based schedule. These automated jobs run like Cron tasks on a Linux or UNIX system.

In this post, we’ll make use of Kubernetes CronJob to schedule a recurring backup of MongoDB database and upload the backup archive to AWS S3. All the source code is available in GitHub Repository.

tuladhar / k8s-backup-mongodb

Schedules MongoDB Backup to S3 using Kubernetes CronJob.

Get Started

Let’s go ahead and first create a user in MongoDB dedicated to perform the backup with minimum privileges.

mongo admin --host <hostname> --authenticationDatabase admin -u root

Run the following command to create the backup user.

db.createUser({
 user: 'backup_user',
 pwd: 'oO9eV5cG6cF2oM1r',
 roles: [{ role: 'backup',db:'admin'}]
})

Kubernetes Namespace

Create a dedicated namespace in Kubernetes to deploy the cronjob.

kubectl apply -f https://raw.githubusercontent.com/tuladhar/k8s-backup-mongodb/main/kubernetes/namespace.yaml

The output is similar to this:

namespace/backup-mongodb created

Let’s save the namespace for all subsequent kubectl commands to run in that context.

kubectl config set-context --current --namespace=backup-mongodb

Kubernetes Secrets

Kubernetes Secrets allows us to store and manage sensitive information. Storing confidential information in a Secret is safer and more flexible than putting it verbatim in a Pod definition or in a container image.

Store MongoDB URI

export MONGODB_URI=mongodb://backup_user:oO9eV5cG6cF2oM1r@<mongodb-hostname>:27017

kubectl create secret generic mongodb-uri --from-literal=MONGODB_URI=$MONGODB_URI

Store AWS credentials and S3 bucket URI

export AWS_ACCESS_KEY_ID=***
export AWS_SECRET_ACCESS_KEY=***
export BUCKET_URI=s3://bucket-name
export AWS_DEFAULT_REGION=us-east-1

kubectl create secret generic aws --from-literal=AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID

kubectl create secret generic aws --from-literal=AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY

kubectl create secret generic aws --from-literal=BUCKET_URI=$BUCKET_URI

kubectl create secret generic aws --from-literal=AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION

Deploy CronJob

Now we can go ahead and deploy the MongoDB backup cronjob by running the following command:

kubectl apply -f https://raw.githubusercontent.com/tuladhar/k8s-backup-mongodb/main/kubernetes/cronjob.yaml

The output is similar to this:

cronjob.batch/backup-mongodb created

The default schedule is to run every hour. To adjust the schedule, run the following command and modify the schedule property:

kubectl edit cronjob backup-mongodb

After creating the cronjob, you can get its status by running the following command:

kubectl get cronjob

The output is similar to this:

NAME             SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE

backup-mongodb   0 */1 * * *   False     0        <none>

As you can see from the results of the command, the cronjob has not scheduled or run any jobs yet. You can list the jobs by running the following command:

kubectl get jobs

To view the Pod logs for a job, run the following command:

pods=**$(**kubectl get pods --selector=job-name=<job-name> --output=jsonpath={.items[*].metadata.name}**)**

kubectl logs $pods

Conclusion

Top comments (1)

Sudo Bhat • Aug 11 '21 • Edited

Hi,
Thanks for this article!
I get crashloopbackoff on the cron job pod that says:
Failed: bad option: --oplog mode only supported on full dumps
Removing oplog ENV from the .yaml works.

However, there is still an error that it cannot connect to the server. I am guessing it as a problem with MONGO_URI
I am using mongodb://backup_user:password@default/mongodb-0:27017 as URI. I tried removing default/ and also just using the service name which is mongodb. I have mongo deployed as statefulset.
Error in the pod: Failed: can't create session: could not connect to server: server selection error: server selection timeout

Do you happen to know the reason? Thanks!

UPDATE: for those who have similar problems, here is the solution
The mongodb host inside the cluster can be accessed using service-name.namespace.svc.cluster.local
For example host becomes: mongodb.default.svc.cluster.local