DEV Community

Puru
Puru

Posted on

Schedule MongoDB Backup to S3 using Kubernetes CronJob

Alt Text

Introduction

Kubernetes CronJob makes it very easy to run Jobs on a time-based schedule. These automated jobs run like Cron tasks on a Linux or UNIX system.

In this post, we’ll make use of Kubernetes CronJob to schedule a recurring backup of MongoDB database and upload the backup archive to AWS S3. All the source code is available in GitHub Repository.

GitHub logo tuladhar / k8s-backup-mongodb

Schedule MongoDB Backup to S3 using Kubernetes CronJob.


Get Started

Let’s go ahead and first create a user in MongoDB dedicated to perform the backup with minimum privileges.

Login to the MongoDB shell as a root user.

mongo admin --host <hostname> --authenticationDatabase admin -u root
Enter fullscreen mode Exit fullscreen mode

Run the following command to create the backup user.

db.createUser({
 user: 'backup_user',
 pwd: 'oO9eV5cG6cF2oM1r',
 roles: [{ role: 'backup',db:'admin'}]
})
Enter fullscreen mode Exit fullscreen mode

Kubernetes Namespace

Create a dedicated namespace in Kubernetes to deploy the cronjob.

kubectl apply -f https://raw.githubusercontent.com/tuladhar/k8s-backup-mongodb/main/kubernetes/namespace.yaml
Enter fullscreen mode Exit fullscreen mode

The output is similar to this:

namespace/backup-mongodb created
Enter fullscreen mode Exit fullscreen mode

Let’s save the namespace for all subsequent kubectl commands to run in that context.

kubectl config set-context --current --namespace=backup-mongodb
Enter fullscreen mode Exit fullscreen mode

Kubernetes Secrets

Kubernetes Secrets allows us to store and manage sensitive information. Storing confidential information in a Secret is safer and more flexible than putting it verbatim in a Pod definition or in a container image.

Store MongoDB URI

export MONGODB_URI=mongodb://backup_user:oO9eV5cG6cF2oM1r@<mongodb-hostname>:27017

kubectl create secret generic mongodb-uri --from-literal=MONGODB_URI=$MONGODB_URI
Enter fullscreen mode Exit fullscreen mode

Store AWS credentials and S3 bucket URI

export AWS_ACCESS_KEY_ID=***
export AWS_SECRET_ACCESS_KEY=***
export BUCKET_URI=s3://bucket-name
export AWS_DEFAULT_REGION=us-east-1

kubectl create secret generic aws --from-literal=AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID

kubectl create secret generic aws --from-literal=AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY

kubectl create secret generic aws --from-literal=BUCKET_URI=$BUCKET_URI

kubectl create secret generic aws --from-literal=AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION
Enter fullscreen mode Exit fullscreen mode

Deploy CronJob

Now we can go ahead and deploy the MongoDB backup cronjob by running the following command:

kubectl apply -f https://raw.githubusercontent.com/tuladhar/k8s-backup-mongodb/main/kubernetes/cronjob.yaml
Enter fullscreen mode Exit fullscreen mode

The output is similar to this:

cronjob.batch/backup-mongodb created
Enter fullscreen mode Exit fullscreen mode

The default schedule is to run every hour. To adjust the schedule, run the following command and modify the schedule property:

kubectl edit cronjob backup-mongodb
Enter fullscreen mode Exit fullscreen mode

Fig: Adjust the schedule

After creating the cronjob, you can get its status by running the following command:

kubectl get cronjob
Enter fullscreen mode Exit fullscreen mode

The output is similar to this:

NAME             SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE

backup-mongodb   0 */1 * * *   False     0        <none>
Enter fullscreen mode Exit fullscreen mode

As you can see from the results of the command, the cronjob has not scheduled or run any jobs yet. You can list the jobs by running the following command:

kubectl get jobs
Enter fullscreen mode Exit fullscreen mode

To view the Pod logs for a job, run the following command:

pods=**$(**kubectl get pods --selector=job-name=<job-name> --output=jsonpath={.items[*].metadata.name}**)**

kubectl logs $pods
Enter fullscreen mode Exit fullscreen mode

Conclusion

Alt Text

Top comments (1)

Collapse
 
sudobhat profile image
Sudo Bhat • Edited

Hi,
Thanks for this article!
I get crashloopbackoff on the cron job pod that says:
Failed: bad option: --oplog mode only supported on full dumps
Removing oplog ENV from the .yaml works.

However, there is still an error that it cannot connect to the server. I am guessing it as a problem with MONGO_URI
I am using mongodb://backup_user:password@default/mongodb-0:27017 as URI. I tried removing default/ and also just using the service name which is mongodb. I have mongo deployed as statefulset.
Error in the pod: Failed: can't create session: could not connect to server: server selection error: server selection timeout

Do you happen to know the reason? Thanks!

UPDATE: for those who have similar problems, here is the solution
The mongodb host inside the cluster can be accessed using service-name.namespace.svc.cluster.local
For example host becomes: mongodb.default.svc.cluster.local