This post is to document the best practices I have realized over last few months of using kubernetes cronjob for scheduled workloads.
Like most developers out there, I started writing my first kubernetes cronjob yaml files following the official guide example that does not specify a lot of useful parameters. However, it took me quite some time and unexpected execution behavior of kubernetes cronjobs to realize the importance of consciously specifying all parameters and not rely blindly on default values.
Here are such useful parameters that I think we should always define for a kubernetes cronjob specification (the emphasis is on always defining these parameters and not necessarily on all their values I am showing below) -
-
concurrencyPolicy
- Set
concurrencyPolicy: Forbid
if you want to restrict another cron execution when a previous one is still running.
- Set
-
suspend
- Set
suspend: true
if you want to temporarily disable cronjob executions for some time to troubleshoot any issue without having to delete the cronjob itself.
- Set
-
successfulJobsHistoryLimit
- Set
successfulJobsHistoryLimit: 2
to retain last 2 successful cronjob execution history.
- Set
-
failedJobsHistoryLimit
- Set
failedJobsHistoryLimit: 3
to retain last 3 failed cronjob execution history. Default value is 1 which I think may not be enough to troubleshoot failures.\
- Set
Although following parameters belong to kubernetes Job
which is the unit of Cronjob
, but I think are equally important when configuring a cronjob -
-
parallelism
- Even though this defaults to 1, set it explicitly in your code to help yourself understand your code weeks/months after.
- Set
parallelism: 1
to restrict only 1 job to run in parallel at any point of time.
-
completions
- Set
completions: 1
when you want the job to be marked as successful. - This makes more sense when, say, you have 3 pods running in parallel and want to mark the job as successful when all 3 pods successfully complete execution.
- Set
-
restartPolicy
- Set
restartPolicy: OnFailure
at the pod specification level to let kubernetes restart the pod container when your workload process exits with failure. - Use
backoffLimit
parameter along with this to protect from back to back restarting behavior during consistent failures.
- Set
-
backoffLimit
- Jobs can fail for variety of reasons like a process failing within pod or a failure at kubernetes controller layer, that can cause automatic retries of job executions.
- Set
backoffLimit: 3
to ensure that jobs are not retried unlimited times but only 3 times after which the job (i.e. cronjob execution in our article) will fail and halt.
-
activeDeadlineSeconds
- If there is a possibility of your workload process potentially consuming longer than expected (like when interacting with external dependent service), then you must set a time limit up to which kubernetes will allow the cron execution and kill it if exceeded.
- Set
activeDeadlineSeconds: 60
(example value) to let kubernetes kill the job execution if it takes longer than 60 seconds.
I hope this helps someone and feel free to share your related experiences.
Top comments (0)