quietpulse

Posted on Mar 30

How to monitor cron jobs (and stop silent failures)

#automation #devops #linux #monitoring

Cron jobs feel simple until they fail silently. This guide explains why that happens and how heartbeat monitoring helps you catch missed runs before they turn into real incidents.

The Problem

Cron jobs do not tell you when they fail.

That is the core issue.

Unlike a web server or API, there is no built-in alerting, no dashboard, and no visibility by default. If a job crashes, times out, or never runs, you often will not know unless:

You manually check logs
A user reports that something is broken
Data starts looking wrong

Real-World Example

You have a nightly backup job:

0 2 * * * /usr/local/bin/backup.sh

One day, the script starts failing because of a permission issue. Cron keeps triggering it, but nothing actually works.

Two weeks later, you need a backup.

There is not one.

Why It Happens

Cron is intentionally simple.

It just schedules commands. That is it.

It does not:

Track execution success
Retry failed jobs
Notify you on failure
Verify that the job actually completed

Even worse, failures can happen in subtle ways:

The script exits early with no useful error output
Dependencies change, such as an API, database, or file system
Network issues break external calls
Environment variables differ from your interactive shell

Cron will still run the command on schedule, but success is your responsibility.

Why It Is Dangerous

Silent failures are the worst kind of failures.

Here is what can go wrong:

1. Data Loss

Backups fail quietly. You do not notice until it is too late.

2. Broken Pipelines

ETL jobs stop syncing data, which leads to stale dashboards and bad decisions.

3. Missed Business Logic

Emails, billing tasks, or cleanup scripts stop running.

4. Harder Debugging

You do not know when things broke, only that they are broken now.

The longer a cron job runs without monitoring, the higher the risk.

How to Detect It

To monitor cron jobs effectively, you need external confirmation that they ran successfully.

This is where the idea of a heartbeat comes in.

Heartbeat Monitoring

A heartbeat is a simple signal sent by your cron job when it completes.

If the signal:

Arrives on time -> the job is healthy
Is missing or late -> something is wrong

Instead of checking logs manually, you flip the model:

Tell me when something does not happen.

This is much more reliable.

Simple Solution

The simplest way to implement heartbeat monitoring is to send an HTTP request at the end of your cron job.

Example Using `curl`

Let us say you have a monitoring endpoint:

https://example.com/heartbeat/backup-job

Modify your cron job like this:

0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://example.com/heartbeat/backup-job

What this does:

Runs your script
Sends the heartbeat only if the script succeeds because of &&
Gives your monitoring system a reliable signal that the job completed

If the script fails, the heartbeat is never sent.

Now your monitoring system can:

Expect a signal every day around 2 AM
Alert you if it does not arrive
Optionally track failures too

You can also send a failure signal:

0 2 * * * /usr/local/bin/backup.sh \
  && curl -fsS https://example.com/heartbeat/backup-job/success \
  || curl -fsS https://example.com/heartbeat/backup-job/failure

That gives you even more visibility.

Common Mistakes

Even when people try to monitor cron jobs, they often get it wrong.

1. Only Checking Logs

Logs are passive. If you are not actively looking, they do not help.

2. Not Handling Failures Explicitly

Using ; instead of && means the heartbeat fires even if the job fails.

Bad:

backup.sh ; curl ...

Good:

backup.sh && curl ...

3. Ignoring Timeouts

If your job hangs, it may never send a signal. Add timeouts where possible.

4. Monitoring the Wrong Thing

Checking whether cron started is not enough. You need to know the job completed.

5. No Alerting

Sending a heartbeat is useless if nobody gets alerted when it goes missing.

Alternative Approaches

Heartbeat monitoring is simple and effective, but it is not the only option.

1. Log Monitoring

Tools like ELK or Loki can detect errors in logs.

Pros:

Good for debugging
Works with existing systems

Cons:

Reactive, not proactive
Easy to miss issues

2. Uptime Checks

You can expose an endpoint and have a service ping it.

Pros:

Works well for APIs

Cons:

Not ideal for background jobs
Does not confirm job completion

3. Queue-Based Systems

If your jobs run through queues, such as workers, you can track success and failure there.

Pros:

More control
Built-in retries

Cons:

Overkill for simple cron jobs

4. Custom Monitoring Scripts

You can build your own system to track execution timestamps.

Pros:

Fully customizable

Cons:

Time-consuming
Reinventing the wheel

At this point, instead of building and maintaining your own heartbeat system, you can use a purpose-built tool. Tools like QuietPulse let you define expected intervals and alert you when a cron job misses its heartbeat without much setup.

FAQ

How do I know if my cron job ran successfully?

The most reliable way is to send a heartbeat after successful execution. If the signal does not arrive, assume failure and alert.

Can cron send emails on failure?

Yes, cron can send output to email via MAILTO, but:

It depends on system configuration
It is often unreliable or ignored
It does not detect silent failures

What is the best way to monitor cron jobs in production?

Heartbeat monitoring is usually the simplest and most effective approach:

Add a request at the end of the job
Track expected intervals
Alert on missing signals

How often should I expect heartbeats?

It depends on your schedule:

Hourly jobs -> expect hourly signals
Daily jobs -> expect one signal per day

Set a buffer, or grace period, to avoid false alerts.

Conclusion

Cron jobs are deceptively simple, but dangerously invisible.

If you do not actively monitor them, failures will go unnoticed until they hurt.

The easiest way to fix this:

Add a heartbeat signal to every job
Track when it should arrive
Alert when it does not

Once you start doing this, you stop guessing and start knowing.

Originally published at: https://quietpulse.xyz/blog/how-to-monitor-cron-jobs-and-stop-silent-failures

DEV Community

How to monitor cron jobs (and stop silent failures)

The Problem

Real-World Example

Why It Happens

Why It Is Dangerous

1. Data Loss

2. Broken Pipelines

3. Missed Business Logic

4. Harder Debugging

How to Detect It

Heartbeat Monitoring

Simple Solution

Example Using `curl`

Common Mistakes

1. Only Checking Logs

2. Not Handling Failures Explicitly

3. Ignoring Timeouts

4. Monitoring the Wrong Thing

5. No Alerting

Alternative Approaches

1. Log Monitoring

2. Uptime Checks

3. Queue-Based Systems

4. Custom Monitoring Scripts

FAQ

How do I know if my cron job ran successfully?

Can cron send emails on failure?

What is the best way to monitor cron jobs in production?

How often should I expect heartbeats?

Conclusion

Top comments (0)

The Problem

Real-World Example

Why It Happens

Why It Is Dangerous

1. Data Loss

2. Broken Pipelines

3. Missed Business Logic

4. Harder Debugging

How to Detect It

Heartbeat Monitoring

Simple Solution

Example Using curl

Common Mistakes

1. Only Checking Logs

2. Not Handling Failures Explicitly

3. Ignoring Timeouts

4. Monitoring the Wrong Thing

5. No Alerting

Alternative Approaches

1. Log Monitoring

2. Uptime Checks

3. Queue-Based Systems

4. Custom Monitoring Scripts

FAQ

How do I know if my cron job ran successfully?

Can cron send emails on failure?

What is the best way to monitor cron jobs in production?

How often should I expect heartbeats?

Conclusion

Example Using `curl`