Your database backup runs every night at 2 AM.
Your invoice generator fires every Monday.
Your cache warmer runs every 5 minutes.
They all work great. Until they don't.
And nobody notices.
The problem with cron jobs
Cron jobs fail the same way they succeed: silently. The daemon doesn't
care. There's no browser to show an error page. No user gets a 500.
The only signal that something went wrong is the absence of something
happening.
Here's a scenario that plays out somewhere every week:
A nightly pg_dump backup job runs at 02:00 UTC. One Tuesday, the Postgres
server moves to a new port after an upgrade. The cron job fails with
"connection refused" — but since nobody redirected stderr anywhere, the
error vanishes. Three weeks later, a developer drops a table by accident,
reaches for the backup, and finds the most recent one is 21 days old.
That's the cost of an unmonitored cron job.
Why traditional monitoring misses this
Most monitoring tools watch for things that ARE happening: high CPU,
slow responses, error rate spikes.
Cron job failures are passive. They're the absence of something happening.
Your APM won't alert you that a script didn't run. Your error tracker
can't capture an exception from a process that never started.
The fix: dead man's switch monitoring
The fix is simple in concept. Instead of watching for failure, you watch
for the absence of success.
Your job sends an HTTP request to a monitoring service every time it
completes successfully. The service expects that ping at the configured
interval. If it doesn't arrive, you get alerted.
This catches everything:
- Job crashes before completing
- Job never starts (crontab cleared, crond stopped)
- Job hangs and never finishes
- Disk full, preventing the job from writing its output
- Server rebooted, job never ran during boot sequence
# Before: no monitoring
0 2 * * * /usr/local/bin/backup.sh
# After: 30 seconds to add monitoring
0 2 * * * /usr/local/bin/backup.sh && \
curl -s https://api.getcronsafe.com/ping/your-monitor-slug
The && is important: the ping only fires if the job exits with code 0.
A failed job won't ping, which triggers the "missing ping" alert.
Going further: explicit failure reporting
Basic heartbeat monitoring catches "job didn't run." But what about "job
ran but reported an error"?
You can send an explicit fail ping when your script detects a problem:
#!/bin/bash
set -e
if /usr/local/bin/backup.sh; then
curl -s "https://api.getcronsafe.com/ping/backup"
else
curl -s "https://api.getcronsafe.com/ping/backup?status=fail&output=backup_failed"
fi
This fires an immediate alert without waiting for a timeout, and includes
context about what went wrong.
The same pattern in Python
import requests
def run_backup():
# your backup logic here
pass
try:
run_backup()
requests.get("https://api.getcronsafe.com/ping/nightly-backup", timeout=5)
except Exception as e:
requests.get(
f"https://api.getcronsafe.com/ping/nightly-backup",
params={"status": "fail", "output": str(e)},
timeout=5
)
What to monitor
Any scheduled process that runs unattended:
- Database backups — the most common silent failure
- Email queues — stop processing and nobody complains for days
- Data syncs — your dashboard shows stale numbers
- Certificate renewals — the cert expires and your site shows a warning
- Cleanup jobs — when they stop, other services start crashing
Tools for heartbeat monitoring
Several services do this well:
- Healthchecks.io — open source, self-hostable, free for 20 monitors
- Cronitor — mature platform, broader feature set, priced for teams
- CronSafe — what I built: hosted, 20 monitors free, unlimited at €9/month, with overlap detection and job output in alerts
All three use the same core concept. Pick the one that fits your setup.
The pattern is simple. The setup takes 30 seconds per job. The peace of
mind is worth it.
What are you currently using to monitor your cron jobs?
If you want to try heartbeat monitoring without setting anything up yourself,
CronSafe has a free plan with 20 monitors and no credit card required →
getcronsafe.com
Top comments (0)