Kubernetes Cronjob Best Practices

#kubernetes #devops

This post is to document the best practices I have realized over last few months of using kubernetes cronjob for scheduled workloads.

Like most developers out there, I started writing my first kubernetes cronjob yaml files following the official guide example that does not specify a lot of useful parameters. However, it took me quite some time and unexpected execution behavior of kubernetes cronjobs to realize the importance of consciously specifying all parameters and not rely blindly on default values.

Here are such useful parameters that I think we should always define for a kubernetes cronjob specification (the emphasis is on always defining these parameters and not necessarily on all their values I am showing below) -

concurrencyPolicy
- Set concurrencyPolicy: Forbid if you want to restrict another cron execution when a previous one is still running.
suspend
- Set suspend: true if you want to temporarily disable cronjob executions for some time to troubleshoot any issue without having to delete the cronjob itself.
successfulJobsHistoryLimit
- Set successfulJobsHistoryLimit: 2 to retain last 2 successful cronjob execution history.
failedJobsHistoryLimit
- Set failedJobsHistoryLimit: 3 to retain last 3 failed cronjob execution history. Default value is 1 which I think may not be enough to troubleshoot failures.\

Although following parameters belong to kubernetes Job which is the unit of Cronjob, but I think are equally important when configuring a cronjob -

parallelism
- Even though this defaults to 1, set it explicitly in your code to help yourself understand your code weeks/months after.
- Set parallelism: 1 to restrict only 1 job to run in parallel at any point of time.
completions
- Set completions: 1 when you want the job to be marked as successful.
- This makes more sense when, say, you have 3 pods running in parallel and want to mark the job as successful when all 3 pods successfully complete execution.
restartPolicy
- Set restartPolicy: OnFailure at the pod specification level to let kubernetes restart the pod container when your workload process exits with failure.
- Use backoffLimit parameter along with this to protect from back to back restarting behavior during consistent failures.
backoffLimit
- Jobs can fail for variety of reasons like a process failing within pod or a failure at kubernetes controller layer, that can cause automatic retries of job executions.
- Set backoffLimit: 3 to ensure that jobs are not retried unlimited times but only 3 times after which the job (i.e. cronjob execution in our article) will fail and halt.
activeDeadlineSeconds
- If there is a possibility of your workload process potentially consuming longer than expected (like when interacting with external dependent service), then you must set a time limit up to which kubernetes will allow the cron execution and kill it if exceeded.
- Set activeDeadlineSeconds: 60 (example value) to let kubernetes kill the job execution if it takes longer than 60 seconds.

I hope this helps someone and feel free to share your related experiences.

DEV Community

Kubernetes Cronjob Best Practices

Top comments (0)