DEV Community

Łukasz Maśląg
Łukasz Maśląg

Posted on

How to Monitor Cron Jobs in 2026: A Complete Guide

How to Monitor Cron Jobs in 2026: A Complete Guide

Cron jobs are the silent workhorses of modern applications. They run backups, clean up data, send emails, sync with APIs, and handle countless other critical tasks. But here's the problem: when they fail, they fail silently.

I learned this the hard way when I discovered a month's worth of database backups had been failing. The cron job was still "running" - it just wasn't doing anything useful. That's when I realized: running a cron job and successfully completing it are two very different things.

The Problem with Traditional Cron

Traditional cron has zero built-in monitoring. You can log output to a file, sure:

0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1
Enter fullscreen mode Exit fullscreen mode

But this means:

  • You have to remember to check the logs
  • Logs grow indefinitely (hello disk space issues)
  • No alerts when something breaks
  • You only notice when you need that backup

Cron's job is to run commands on schedule. It's not designed to tell you if those commands actually succeeded.

What We Need to Monitor

When monitoring cron jobs, we care about several things:

  1. Did it run at all? (The job might be disabled, the server might be down)
  2. Did it complete successfully? (Exit code 0 vs errors)
  3. Did it run on time? (Server overload, resource constraints)
  4. How long did it take? (Performance degradation over time)
  5. What was the output? (Errors, warnings, statistics)

Let's look at different approaches to solving this.

Approach 1: Email Alerts (Basic)

The simplest approach is using cron's built-in email feature:

MAILTO=admin@example.com
0 2 * * * /usr/local/bin/backup.sh
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Zero setup
  • Works out of the box

Cons:

  • Only notifies on failures (STDERR output)
  • Requires mail server configuration
  • No success confirmation
  • Email overload from multiple jobs
  • Can't track history or patterns

Verdict: Good for personal projects with 1-2 cron jobs. Not scalable.

Approach 2: Log Files + Manual Checks

Slightly better - centralized logging:

#!/bin/bash
# backup.sh

LOG_FILE="/var/log/backup/$(date +%Y-%m-%d).log"

echo "[$(date)] Starting backup..." >> $LOG_FILE

if pg_dump mydb > /backups/db-$(date +%Y%m%d).sql; then
    echo "[$(date)] Backup completed successfully" >> $LOG_FILE
    exit 0
else
    echo "[$(date)] ERROR: Backup failed" >> $LOG_FILE
    exit 1
fi
Enter fullscreen mode Exit fullscreen mode

Pros:

  • Full control over logging
  • Detailed output
  • Historical record

Cons:

  • Still requires manual checking
  • No real-time alerts
  • Log rotation complexity
  • Disk space management

Verdict: Better, but you'll still miss failures.

Approach 3: Dead Man's Switch Pattern

This is where it gets interesting. Instead of monitoring for failures, we monitor for success. If we don't hear from the job, something's wrong.

The Basic Pattern

#!/bin/bash
# backup.sh

MONITOR_URL="https://cronmonitor.app/ping/your-unique-id"

# Run your backup
if pg_dump mydb > /backups/db-$(date +%Y%m%d).sql; then
    # Signal success
    curl -fsS --retry 3 $MONITOR_URL/success
    exit 0
else
    # Signal failure
    curl -fsS --retry 3 $MONITOR_URL/fail
    exit 1
fi
Enter fullscreen mode Exit fullscreen mode

On the monitoring side, you set up an expected schedule:

  • "This job should ping me every day at 2 AM"
  • "If I don't hear from it by 2:30 AM, alert me"
  • "If it pings /fail, alert me immediately"

Pros:

  • Catches ALL failure modes (job disabled, server down, script errors)
  • Real-time alerts
  • Historical tracking
  • Works from anywhere

Cons:

  • Requires external service
  • Dependency on network connectivity
  • Potential costs (though many free tiers exist)

Verdict: Industry standard for production systems.

Approach 4: Full Monitoring Solution

For enterprise needs, combine monitoring with observability:

#!/bin/bash
# backup.sh with full monitoring

MONITOR_URL="https://cronmonitor.app/ping/your-unique-id"

# Start signal
curl -fsS --retry 3 "$MONITOR_URL/start"

# Capture start time
START_TIME=$(date +%s)

# Run backup with output capture
OUTPUT=$(pg_dump mydb > /backups/db-$(date +%Y%m%d).sql 2>&1)
EXIT_CODE=$?

# Calculate duration
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))

# Report results with context
if [ $EXIT_CODE -eq 0 ]; then
    curl -fsS --retry 3 \
        --data-urlencode "status=success" \
        --data-urlencode "duration=$DURATION" \
        --data-urlencode "output=$OUTPUT" \
        "$MONITOR_URL"
else
    curl -fsS --retry 3 \
        --data-urlencode "status=fail" \
        --data-urlencode "duration=$DURATION" \
        --data-urlencode "output=$OUTPUT" \
        "$MONITOR_URL"
fi

exit $EXIT_CODE
Enter fullscreen mode Exit fullscreen mode

This gives you:

  • Success/failure tracking
  • Execution duration
  • Output logs
  • Failure context
  • Performance trends over time

Real-World Implementation Tips

1. Handle Network Issues

Always add retries and timeouts to your monitoring pings:

curl -fsS --retry 3 --retry-delay 5 --max-time 10 $MONITOR_URL
Enter fullscreen mode Exit fullscreen mode

Use -f to fail on HTTP errors, -s for silent mode, -S to show errors.

2. Don't Let Monitoring Break Your Job

Wrap monitoring in a way that doesn't affect your main task:

# Run the actual job
/usr/local/bin/backup.sh
JOB_EXIT_CODE=$?

# Try to report status, but don't fail if monitoring is down
curl -fsS --retry 3 $MONITOR_URL || true

# Exit with the job's actual exit code
exit $JOB_EXIT_CODE
Enter fullscreen mode Exit fullscreen mode

3. Set Realistic Grace Periods

Jobs don't always run at exactly the same time:

  • Server load varies
  • Some tasks take longer with more data
  • Network latency affects things

Set grace periods accordingly:

  • Fast jobs (< 1 min): 5-10 minute grace period
  • Medium jobs (5-30 min): 15-30 minute grace period
  • Long jobs (hours): 1-2 hour grace period

4. Monitor the Monitors

What happens if your monitoring service goes down? Have a backup:

PRIMARY_MONITOR="https://cronmonitor.app/ping/abc123"
BACKUP_MONITOR="https://backup-service.com/ping/xyz789"

curl -fsS --retry 2 $PRIMARY_MONITOR || \
    curl -fsS --retry 2 $BACKUP_MONITOR
Enter fullscreen mode Exit fullscreen mode

5. Use Environment Variables

Don't hardcode monitoring URLs in scripts:

# /etc/cron.d/backups
MONITOR_URL=https://cronmonitor.app/ping/abc123
0 2 * * * user /usr/local/bin/backup.sh
Enter fullscreen mode Exit fullscreen mode
#!/bin/bash
# backup.sh

if [ -n "$MONITOR_URL" ]; then
    trap 'curl -fsS "$MONITOR_URL/fail"' ERR
    # Your job here
    curl -fsS "$MONITOR_URL/success"
fi
Enter fullscreen mode Exit fullscreen mode

Timezone Considerations

This is often overlooked but critical. Your server might be in UTC, your team in EST, and your monitoring service in another timezone.

Best Practice: Always think in UTC for cron schedules, translate to local time in your monitoring tool.

# Server in UTC, backup at 2 AM EST (7 AM UTC)
0 7 * * * /usr/local/bin/backup.sh
Enter fullscreen mode Exit fullscreen mode

Configure your monitoring with:

  • Schedule: "Daily at 7:00 UTC" (system time)
  • Display: "2:00 AM EST" (human time)

Common Pitfalls to Avoid

1. Not Monitoring Start Time

Only checking if a job completed misses jobs that hang:

# BAD: Only ping at the end
run_backup
curl $MONITOR_URL

# GOOD: Ping start and end
curl "$MONITOR_URL/start"
run_backup
curl "$MONITOR_URL/end"
Enter fullscreen mode Exit fullscreen mode

2. Ignoring Exit Codes

Your script might "finish" but with errors:

# BAD: Always reports success
backup.sh
curl $MONITOR_URL

# GOOD: Check exit code
if backup.sh; then
    curl "$MONITOR_URL/success"
else
    curl "$MONITOR_URL/fail"
fi
Enter fullscreen mode Exit fullscreen mode

3. Alert Fatigue

Don't alert on every tiny issue:

  • Use grace periods
  • Group related alerts
  • Set up on-call rotations
  • Distinguish critical vs warning

4. No Runbook

When alerts fire at 3 AM, you want answers fast:

# monitoring-config.yaml
monitors:
  - name: "Database Backup"
    schedule: "0 2 * * *"
    runbook: |
      1. Check disk space: df -h /backups
      2. Check database connectivity: psql -c "\l"
      3. Review logs: tail -n 100 /var/log/backup.log
      4. Manual backup: /usr/local/bin/backup.sh
      5. Escalate to: db-team@company.com
Enter fullscreen mode Exit fullscreen mode

Choosing a Monitoring Solution

Self-Hosted Options

Healthchecks.io (Open Source)

  • Free, self-hosted
  • Simple and reliable
  • Python/Django based
  • Good for small teams

Cronitor (Commercial, has open-source version)

  • Feature-rich
  • Beautiful UI
  • Higher cost

SaaS Options

My tool - CronMonitor

  • Dead simple setup
  • Timezone-aware
  • Generous free tier (10 monitors)
  • Built by someone who felt your pain

Others:

  • Cronitor (established, expensive)
  • Better Uptime (includes cron monitoring)
  • Dead Man's Snitch (simple, focused)

Choosing criteria:

  • Number of jobs you need to monitor
  • Budget
  • Need for self-hosting
  • Integration requirements (Slack, PagerDuty, etc.)

Complete Example: Production-Ready Script

Here's a fully instrumented backup script you can adapt:

#!/bin/bash
set -euo pipefail

# Configuration
BACKUP_DIR="/backups"
DB_NAME="production"
MONITOR_URL="${MONITOR_URL:-}"
RETENTION_DAYS=30
ALERT_EMAIL="admin@example.com"

# Setup logging
LOG_FILE="/var/log/backup/$(date +%Y-%m-%d).log"
exec 1> >(tee -a "$LOG_FILE")
exec 2>&1

log() {
    echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
}

# Signal start to monitoring
if [ -n "$MONITOR_URL" ]; then
    curl -fsS --retry 3 "$MONITOR_URL/start" || log "WARNING: Could not signal start"
fi

log "Starting backup process"

# Check prerequisites
if ! command -v pg_dump &> /dev/null; then
    log "ERROR: pg_dump not found"
    exit 1
fi

if [ ! -d "$BACKUP_DIR" ]; then
    log "ERROR: Backup directory $BACKUP_DIR does not exist"
    exit 1
fi

# Check disk space (need at least 10GB)
AVAILABLE=$(df -BG "$BACKUP_DIR" | awk 'NR==2 {print $4}' | sed 's/G//')
if [ "$AVAILABLE" -lt 10 ]; then
    log "ERROR: Insufficient disk space. Available: ${AVAILABLE}GB"
    exit 1
fi

# Run backup
BACKUP_FILE="$BACKUP_DIR/db-$(date +%Y%m%d-%H%M%S).sql.gz"
START_TIME=$(date +%s)

log "Creating backup: $BACKUP_FILE"

if pg_dump "$DB_NAME" | gzip > "$BACKUP_FILE"; then
    END_TIME=$(date +%s)
    DURATION=$((END_TIME - START_TIME))
    SIZE=$(du -h "$BACKUP_FILE" | cut -f1)

    log "Backup completed successfully in ${DURATION}s, size: $SIZE"

    # Verify backup is not empty
    if [ ! -s "$BACKUP_FILE" ]; then
        log "ERROR: Backup file is empty"
        rm "$BACKUP_FILE"
        exit 1
    fi

    # Cleanup old backups
    log "Cleaning up backups older than $RETENTION_DAYS days"
    find "$BACKUP_DIR" -name "db-*.sql.gz" -mtime +$RETENTION_DAYS -delete

    # Signal success
    if [ -n "$MONITOR_URL" ]; then
        curl -fsS --retry 3 \
            --data-urlencode "status=success" \
            --data-urlencode "duration=$DURATION" \
            --data-urlencode "size=$SIZE" \
            "$MONITOR_URL" || log "WARNING: Could not signal success"
    fi

    log "Backup process completed successfully"
    exit 0
else
    log "ERROR: Backup failed"

    # Signal failure
    if [ -n "$MONITOR_URL" ]; then
        curl -fsS --retry 3 \
            --data-urlencode "status=fail" \
            --data-urlencode "error=pg_dump failed" \
            "$MONITOR_URL" || log "WARNING: Could not signal failure"
    fi

    # Send email alert
    if command -v mail &> /dev/null; then
        echo "Backup failed. Check logs at $LOG_FILE" | \
            mail -s "ALERT: Backup Failed" "$ALERT_EMAIL"
    fi

    exit 1
fi
Enter fullscreen mode Exit fullscreen mode

Conclusion

Monitoring cron jobs isn't optional for production systems. The question isn't "should we monitor?" but "how should we monitor?"

Start simple:

  1. Add health check pings to your most critical jobs
  2. Set up alerts for failures
  3. Track patterns over time
  4. Iterate and improve

Remember:

  • Cron jobs fail silently by design
  • You need active monitoring, not passive logging
  • Set realistic grace periods
  • Monitor the monitors
  • Document your runbooks

The peace of mind from knowing your backups actually ran? Worth every minute of setup time.


What's your cron monitoring strategy? Drop a comment - I'd love to hear how others are solving this problem!

If you're looking for a simple solution to get started, I built CronMonitor specifically for this. Free tier includes 10 monitors, no credit card needed.

P.S. - If you found this helpful, follow me for more DevOps and SaaS content!


Additional Resources


Cover image: A terminal window showing cron job output with monitoring indicators

Top comments (0)