DEV Community

Jack
Jack

Posted on

How to Monitor Your Cron Jobs in Production (So They Don't Silently Die)

How to Monitor Your Cron Jobs in Production (So They Don't Silently Die)

Every production system has cron jobs. Database backups, report generation, cache warming, email digests — the list grows with your product. But here's the thing: cron jobs fail silently.

Your backup script has been failing for 3 weeks? Nobody knows until you need to restore. Your nightly ETL hasn't run since the last deploy? You'll find out when the CEO asks why the dashboard is stale.

The Dead Man's Switch Pattern

The most reliable way to monitor cron jobs is the dead man's switch (or heartbeat) pattern:

  1. Create a monitor with an expected schedule
  2. Your cron job pings the monitor after completing successfully
  3. If the monitor doesn't receive a ping within the expected window, fire an alert

This is fundamentally different from log monitoring because it catches jobs that never start — not just jobs that start and fail.

Implementation

Here's how it works with a simple HTTP endpoint:

# Your existing cron job
0 2 * * * /usr/local/bin/backup.sh

# Add monitoring - ping after success
0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://cronping.anethoth.com/ping/YOUR_TOKEN
Enter fullscreen mode Exit fullscreen mode

The && is critical — it only pings if the backup succeeds. If the script exits non-zero, no ping is sent, and you get alerted.

Grace Periods

Not every job runs at exactly the same time. A good monitoring system lets you set a grace period — extra time before an alert fires.

For a daily backup that usually takes 10 minutes:

  • Schedule: every 1440 minutes (24 hours)
  • Grace period: 30 minutes
  • Alert fires if no ping received within 24.5 hours of the last one

Failure Modes

Failure Mode Log Monitoring Heartbeat Monitoring
Script errors out Catches Catches (no ping sent)
Script never starts Nothing to log Catches (no ping)
Server is down Can't log Catches (no ping)
Script hangs forever No error logged Catches (late ping)
Crontab deleted Nothing happens Catches (no ping)

Heartbeat monitoring catches every failure mode because it monitors for the absence of a signal rather than the presence of an error.

Setting Up CronPing

I built CronPing to make this dead-simple:

# 1. Sign up
curl -X POST https://cronping.anethoth.com/api/v1/signup \
  -H 'Content-Type: application/json' \
  -d '{ "email": "you@example.com" }'

# 2. Create a monitor
curl -X POST https://cronping.anethoth.com/api/v1/monitors \
  -H 'Authorization: Bearer ch_xxx...' \
  -H 'Content-Type: application/json' \
  -d '{ "name": "nightly-backup", "schedule_minutes": 1440, "grace_minutes": 30 }'

# 3. Add the ping to your cron job
0 2 * * * /usr/local/bin/backup.sh && curl -fsS https://cronping.anethoth.com/ping/xxx
Enter fullscreen mode Exit fullscreen mode

Free tier gives you 3 monitors — enough for most side projects.

Best Practices

  1. Always use && — only ping on success
  2. Use -fsS with curl — fail silently on network errors but show server errors
  3. Set realistic grace periods — too tight causes false alarms
  4. One monitor per job — don't reuse ping tokens

CronPing is free for up to 3 monitors. Try the cron expression helper to build your cron schedules.

Top comments (0)