DEV Community

quietpulse
quietpulse

Posted on

How to monitor cron jobs (and stop silent failures)

Cron jobs feel simple until they fail silently. This guide explains why that happens and how heartbeat monitoring helps you catch missed runs before they turn into real incidents.


The Problem

Cron jobs do not tell you when they fail.

That is the core issue.

Unlike a web server or API, there is no built-in alerting, no dashboard, and no visibility by default. If a job crashes, times out, or never runs, you often will not know unless:

  • You manually check logs
  • A user reports that something is broken
  • Data starts looking wrong

Real-World Example

You have a nightly backup job:

0 2 * * * /usr/local/bin/backup.sh
Enter fullscreen mode Exit fullscreen mode

One day, the script starts failing because of a permission issue. Cron keeps triggering it, but nothing actually works.

Two weeks later, you need a backup.

There is not one.


Why It Happens

Cron is intentionally simple.

It just schedules commands. That is it.

It does not:

  • Track execution success
  • Retry failed jobs
  • Notify you on failure
  • Verify that the job actually completed

Even worse, failures can happen in subtle ways:

  • The script exits early with no useful error output
  • Dependencies change, such as an API, database, or file system
  • Network issues break external calls
  • Environment variables differ from your interactive shell

Cron will still run the command on schedule, but success is your responsibility.


Why It Is Dangerous

Silent failures are the worst kind of failures.

Here is what can go wrong:

1. Data Loss

Backups fail quietly. You do not notice until it is too late.

2. Broken Pipelines

ETL jobs stop syncing data, which leads to stale dashboards and bad decisions.

3. Missed Business Logic

Emails, billing tasks, or cleanup scripts stop running.

4. Harder Debugging

You do not know when things broke, only that they are broken now.

The longer a cron job runs without monitoring, the higher the risk.


How to Detect It

To monitor cron jobs effectively, you need external confirmation that they ran successfully.

This is where the idea of a heartbeat comes in.

Heartbeat Monitoring

A heartbeat is a simple signal sent by your cron job when it completes.

If the signal:

  • Arrives on time -> the job is healthy
  • Is missing or late -> something is wrong

Instead of checking logs manually, you flip the model:

Tell me when something does not happen.

This is much more reliable.


Simple Solution

The simplest way to implement heartbeat monitoring is to send an HTTP request at the end of your cron job.

Example Using curl

Let us say you have a monitoring endpoint:

https://example.com/heartbeat/backup-job
Enter fullscreen mode Exit fullscreen mode

Modify your cron job like this:

0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://example.com/heartbeat/backup-job
Enter fullscreen mode Exit fullscreen mode

What this does:

  • Runs your script
  • Sends the heartbeat only if the script succeeds because of &&
  • Gives your monitoring system a reliable signal that the job completed

If the script fails, the heartbeat is never sent.

Now your monitoring system can:

  • Expect a signal every day around 2 AM
  • Alert you if it does not arrive
  • Optionally track failures too

You can also send a failure signal:

0 2 * * * /usr/local/bin/backup.sh \
  && curl -fsS https://example.com/heartbeat/backup-job/success \
  || curl -fsS https://example.com/heartbeat/backup-job/failure
Enter fullscreen mode Exit fullscreen mode

That gives you even more visibility.


Common Mistakes

Even when people try to monitor cron jobs, they often get it wrong.

1. Only Checking Logs

Logs are passive. If you are not actively looking, they do not help.

2. Not Handling Failures Explicitly

Using ; instead of && means the heartbeat fires even if the job fails.

Bad:

backup.sh ; curl ...
Enter fullscreen mode Exit fullscreen mode

Good:

backup.sh && curl ...
Enter fullscreen mode Exit fullscreen mode

3. Ignoring Timeouts

If your job hangs, it may never send a signal. Add timeouts where possible.

4. Monitoring the Wrong Thing

Checking whether cron started is not enough. You need to know the job completed.

5. No Alerting

Sending a heartbeat is useless if nobody gets alerted when it goes missing.


Alternative Approaches

Heartbeat monitoring is simple and effective, but it is not the only option.

1. Log Monitoring

Tools like ELK or Loki can detect errors in logs.

Pros:

  • Good for debugging
  • Works with existing systems

Cons:

  • Reactive, not proactive
  • Easy to miss issues

2. Uptime Checks

You can expose an endpoint and have a service ping it.

Pros:

  • Works well for APIs

Cons:

  • Not ideal for background jobs
  • Does not confirm job completion

3. Queue-Based Systems

If your jobs run through queues, such as workers, you can track success and failure there.

Pros:

  • More control
  • Built-in retries

Cons:

  • Overkill for simple cron jobs

4. Custom Monitoring Scripts

You can build your own system to track execution timestamps.

Pros:

  • Fully customizable

Cons:

  • Time-consuming
  • Reinventing the wheel

At this point, instead of building and maintaining your own heartbeat system, you can use a purpose-built tool. Tools like QuietPulse let you define expected intervals and alert you when a cron job misses its heartbeat without much setup.


FAQ

How do I know if my cron job ran successfully?

The most reliable way is to send a heartbeat after successful execution. If the signal does not arrive, assume failure and alert.

Can cron send emails on failure?

Yes, cron can send output to email via MAILTO, but:

  • It depends on system configuration
  • It is often unreliable or ignored
  • It does not detect silent failures

What is the best way to monitor cron jobs in production?

Heartbeat monitoring is usually the simplest and most effective approach:

  • Add a request at the end of the job
  • Track expected intervals
  • Alert on missing signals

How often should I expect heartbeats?

It depends on your schedule:

  • Hourly jobs -> expect hourly signals
  • Daily jobs -> expect one signal per day

Set a buffer, or grace period, to avoid false alerts.


Conclusion

Cron jobs are deceptively simple, but dangerously invisible.

If you do not actively monitor them, failures will go unnoticed until they hurt.

The easiest way to fix this:

  • Add a heartbeat signal to every job
  • Track when it should arrive
  • Alert when it does not

Once you start doing this, you stop guessing and start knowing.

Originally published at: https://quietpulse.xyz/blog/how-to-monitor-cron-jobs-and-stop-silent-failures

Top comments (0)