Cron jobs feel simple until they fail silently. This guide explains why that happens and how heartbeat monitoring helps you catch missed runs before they turn into real incidents.
The Problem
Cron jobs do not tell you when they fail.
That is the core issue.
Unlike a web server or API, there is no built-in alerting, no dashboard, and no visibility by default. If a job crashes, times out, or never runs, you often will not know unless:
- You manually check logs
- A user reports that something is broken
- Data starts looking wrong
Real-World Example
You have a nightly backup job:
0 2 * * * /usr/local/bin/backup.sh
One day, the script starts failing because of a permission issue. Cron keeps triggering it, but nothing actually works.
Two weeks later, you need a backup.
There is not one.
Why It Happens
Cron is intentionally simple.
It just schedules commands. That is it.
It does not:
- Track execution success
- Retry failed jobs
- Notify you on failure
- Verify that the job actually completed
Even worse, failures can happen in subtle ways:
- The script exits early with no useful error output
- Dependencies change, such as an API, database, or file system
- Network issues break external calls
- Environment variables differ from your interactive shell
Cron will still run the command on schedule, but success is your responsibility.
Why It Is Dangerous
Silent failures are the worst kind of failures.
Here is what can go wrong:
1. Data Loss
Backups fail quietly. You do not notice until it is too late.
2. Broken Pipelines
ETL jobs stop syncing data, which leads to stale dashboards and bad decisions.
3. Missed Business Logic
Emails, billing tasks, or cleanup scripts stop running.
4. Harder Debugging
You do not know when things broke, only that they are broken now.
The longer a cron job runs without monitoring, the higher the risk.
How to Detect It
To monitor cron jobs effectively, you need external confirmation that they ran successfully.
This is where the idea of a heartbeat comes in.
Heartbeat Monitoring
A heartbeat is a simple signal sent by your cron job when it completes.
If the signal:
- Arrives on time -> the job is healthy
- Is missing or late -> something is wrong
Instead of checking logs manually, you flip the model:
Tell me when something does not happen.
This is much more reliable.
Simple Solution
The simplest way to implement heartbeat monitoring is to send an HTTP request at the end of your cron job.
Example Using curl
Let us say you have a monitoring endpoint:
https://example.com/heartbeat/backup-job
Modify your cron job like this:
0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://example.com/heartbeat/backup-job
What this does:
- Runs your script
- Sends the heartbeat only if the script succeeds because of
&& - Gives your monitoring system a reliable signal that the job completed
If the script fails, the heartbeat is never sent.
Now your monitoring system can:
- Expect a signal every day around 2 AM
- Alert you if it does not arrive
- Optionally track failures too
You can also send a failure signal:
0 2 * * * /usr/local/bin/backup.sh \
&& curl -fsS https://example.com/heartbeat/backup-job/success \
|| curl -fsS https://example.com/heartbeat/backup-job/failure
That gives you even more visibility.
Common Mistakes
Even when people try to monitor cron jobs, they often get it wrong.
1. Only Checking Logs
Logs are passive. If you are not actively looking, they do not help.
2. Not Handling Failures Explicitly
Using ; instead of && means the heartbeat fires even if the job fails.
Bad:
backup.sh ; curl ...
Good:
backup.sh && curl ...
3. Ignoring Timeouts
If your job hangs, it may never send a signal. Add timeouts where possible.
4. Monitoring the Wrong Thing
Checking whether cron started is not enough. You need to know the job completed.
5. No Alerting
Sending a heartbeat is useless if nobody gets alerted when it goes missing.
Alternative Approaches
Heartbeat monitoring is simple and effective, but it is not the only option.
1. Log Monitoring
Tools like ELK or Loki can detect errors in logs.
Pros:
- Good for debugging
- Works with existing systems
Cons:
- Reactive, not proactive
- Easy to miss issues
2. Uptime Checks
You can expose an endpoint and have a service ping it.
Pros:
- Works well for APIs
Cons:
- Not ideal for background jobs
- Does not confirm job completion
3. Queue-Based Systems
If your jobs run through queues, such as workers, you can track success and failure there.
Pros:
- More control
- Built-in retries
Cons:
- Overkill for simple cron jobs
4. Custom Monitoring Scripts
You can build your own system to track execution timestamps.
Pros:
- Fully customizable
Cons:
- Time-consuming
- Reinventing the wheel
At this point, instead of building and maintaining your own heartbeat system, you can use a purpose-built tool. Tools like QuietPulse let you define expected intervals and alert you when a cron job misses its heartbeat without much setup.
FAQ
How do I know if my cron job ran successfully?
The most reliable way is to send a heartbeat after successful execution. If the signal does not arrive, assume failure and alert.
Can cron send emails on failure?
Yes, cron can send output to email via MAILTO, but:
- It depends on system configuration
- It is often unreliable or ignored
- It does not detect silent failures
What is the best way to monitor cron jobs in production?
Heartbeat monitoring is usually the simplest and most effective approach:
- Add a request at the end of the job
- Track expected intervals
- Alert on missing signals
How often should I expect heartbeats?
It depends on your schedule:
- Hourly jobs -> expect hourly signals
- Daily jobs -> expect one signal per day
Set a buffer, or grace period, to avoid false alerts.
Conclusion
Cron jobs are deceptively simple, but dangerously invisible.
If you do not actively monitor them, failures will go unnoticed until they hurt.
The easiest way to fix this:
- Add a heartbeat signal to every job
- Track when it should arrive
- Alert when it does not
Once you start doing this, you stop guessing and start knowing.
Originally published at: https://quietpulse.xyz/blog/how-to-monitor-cron-jobs-and-stop-silent-failures
Top comments (0)