Łukasz Maśląg for CronMonitor

Posted on Dec 4, 2025 • Edited on Dec 31, 2025

How to Monitor Cron Jobs in 2026: A Complete Guide

#cron #cronjob #monitoring #webdev

How to Monitor Cron Jobs in 2026: A Complete Guide

Cron jobs are the silent workhorses of modern applications. They run backups, clean up data, send emails, sync with APIs, and handle countless other critical tasks. But here's the problem: when they fail, they fail silently.

I learned this the hard way when I discovered a month's worth of database backups had been failing. The cron job was still "running" - it just wasn't doing anything useful. That's when I realized: running a cron job and successfully completing it are two very different things.

The Problem with Traditional Cron

Traditional cron has zero built-in monitoring. You can log output to a file, sure:

0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1

But this means:

You have to remember to check the logs
Logs grow indefinitely (hello disk space issues)
No alerts when something breaks
You only notice when you need that backup

Cron's job is to run commands on schedule. It's not designed to tell you if those commands actually succeeded.

What We Need to Monitor

When monitoring cron jobs, we care about several things:

Did it run at all? (The job might be disabled, the server might be down)
Did it complete successfully? (Exit code 0 vs errors)
Did it run on time? (Server overload, resource constraints)
How long did it take? (Performance degradation over time)
What was the output? (Errors, warnings, statistics)

Let's look at different approaches to solving this.

Approach 1: Email Alerts (Basic)

The simplest approach is using cron's built-in email feature:

MAILTO=admin@example.com
0 2 * * * /usr/local/bin/backup.sh

Pros:

Zero setup
Works out of the box

Cons:

Only notifies on failures (STDERR output)
Requires mail server configuration
No success confirmation
Email overload from multiple jobs
Can't track history or patterns

Verdict: Good for personal projects with 1-2 cron jobs. Not scalable.

Approach 2: Log Files + Manual Checks

Slightly better - centralized logging:

#!/bin/bash
# backup.sh

LOG_FILE="/var/log/backup/$(date +%Y-%m-%d).log"

echo "[$(date)] Starting backup..." >> $LOG_FILE

if pg_dump mydb > /backups/db-$(date +%Y%m%d).sql; then
    echo "[$(date)] Backup completed successfully" >> $LOG_FILE
    exit 0
else
    echo "[$(date)] ERROR: Backup failed" >> $LOG_FILE
    exit 1
fi

Pros:

Full control over logging
Detailed output
Historical record

Cons:

Still requires manual checking
No real-time alerts
Log rotation complexity
Disk space management

Verdict: Better, but you'll still miss failures.

Approach 3: Dead Man's Switch Pattern

This is where it gets interesting. Instead of monitoring for failures, we monitor for success. If we don't hear from the job, something's wrong.

The Basic Pattern

#!/bin/bash
# backup.sh

MONITOR_URL="https://cronmonitor.app/ping/your-unique-id"

# Run your backup
if pg_dump mydb > /backups/db-$(date +%Y%m%d).sql; then
    # Signal success
    curl -fsS --retry 3 $MONITOR_URL/success
    exit 0
else
    # Signal failure
    curl -fsS --retry 3 $MONITOR_URL/fail
    exit 1
fi

On the monitoring side, you set up an expected schedule:

"This job should ping me every day at 2 AM"
"If I don't hear from it by 2:30 AM, alert me"
"If it pings /fail, alert me immediately"

Pros:

Catches ALL failure modes (job disabled, server down, script errors)
Real-time alerts
Historical tracking
Works from anywhere

Cons:

Requires external service
Dependency on network connectivity
Potential costs (though many free tiers exist)

Verdict: Industry standard for production systems.

Approach 4: Full Monitoring Solution

For enterprise needs, combine monitoring with observability:

#!/bin/bash
# backup.sh with full monitoring

MONITOR_URL="https://cronmonitor.app/ping/your-unique-id"

# Start signal
curl -fsS --retry 3 "$MONITOR_URL/start"

# Capture start time
START_TIME=$(date +%s)

# Run backup with output capture
OUTPUT=$(pg_dump mydb > /backups/db-$(date +%Y%m%d).sql 2>&1)
EXIT_CODE=$?

# Calculate duration
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))

# Report results with context
if [ $EXIT_CODE -eq 0 ]; then
    curl -fsS --retry 3 \
        --data-urlencode "status=success" \
        --data-urlencode "duration=$DURATION" \
        --data-urlencode "output=$OUTPUT" \
        "$MONITOR_URL"
else
    curl -fsS --retry 3 \
        --data-urlencode "status=fail" \
        --data-urlencode "duration=$DURATION" \
        --data-urlencode "output=$OUTPUT" \
        "$MONITOR_URL"
fi

exit $EXIT_CODE

This gives you:

Success/failure tracking
Execution duration
Output logs
Failure context
Performance trends over time

Real-World Implementation Tips

1. Handle Network Issues

Always add retries and timeouts to your monitoring pings:

curl -fsS --retry 3 --retry-delay 5 --max-time 10 $MONITOR_URL

Use -f to fail on HTTP errors, -s for silent mode, -S to show errors.

2. Don't Let Monitoring Break Your Job

Wrap monitoring in a way that doesn't affect your main task:

# Run the actual job
/usr/local/bin/backup.sh
JOB_EXIT_CODE=$?

# Try to report status, but don't fail if monitoring is down
curl -fsS --retry 3 $MONITOR_URL || true

# Exit with the job's actual exit code
exit $JOB_EXIT_CODE

3. Set Realistic Grace Periods

Jobs don't always run at exactly the same time:

Server load varies
Some tasks take longer with more data
Network latency affects things

Set grace periods accordingly:

Fast jobs (< 1 min): 5-10 minute grace period
Medium jobs (5-30 min): 15-30 minute grace period
Long jobs (hours): 1-2 hour grace period

4. Monitor the Monitors

What happens if your monitoring service goes down? Have a backup:

PRIMARY_MONITOR="https://cronmonitor.app/ping/abc123"
BACKUP_MONITOR="https://backup-service.com/ping/xyz789"

curl -fsS --retry 2 $PRIMARY_MONITOR || \
    curl -fsS --retry 2 $BACKUP_MONITOR

5. Use Environment Variables

Don't hardcode monitoring URLs in scripts:

# /etc/cron.d/backups
MONITOR_URL=https://cronmonitor.app/ping/abc123
0 2 * * * user /usr/local/bin/backup.sh

#!/bin/bash
# backup.sh

if [ -n "$MONITOR_URL" ]; then
    trap 'curl -fsS "$MONITOR_URL/fail"' ERR
    # Your job here
    curl -fsS "$MONITOR_URL/success"
fi

Timezone Considerations

This is often overlooked but critical. Your server might be in UTC, your team in EST, and your monitoring service in another timezone.

Best Practice: Always think in UTC for cron schedules, translate to local time in your monitoring tool.

# Server in UTC, backup at 2 AM EST (7 AM UTC)
0 7 * * * /usr/local/bin/backup.sh

Configure your monitoring with:

Schedule: "Daily at 7:00 UTC" (system time)
Display: "2:00 AM EST" (human time)

Common Pitfalls to Avoid

1. Not Monitoring Start Time

Only checking if a job completed misses jobs that hang:

# BAD: Only ping at the end
run_backup
curl $MONITOR_URL

# GOOD: Ping start and end
curl "$MONITOR_URL/start"
run_backup
curl "$MONITOR_URL/end"

2. Ignoring Exit Codes

Your script might "finish" but with errors:

# BAD: Always reports success
backup.sh
curl $MONITOR_URL

# GOOD: Check exit code
if backup.sh; then
    curl "$MONITOR_URL/success"
else
    curl "$MONITOR_URL/fail"
fi

3. Alert Fatigue

Don't alert on every tiny issue:

Use grace periods
Group related alerts
Set up on-call rotations
Distinguish critical vs warning

4. No Runbook

When alerts fire at 3 AM, you want answers fast:

# monitoring-config.yaml
monitors:
  - name: "Database Backup"
    schedule: "0 2 * * *"
    runbook: |
      1. Check disk space: df -h /backups
      2. Check database connectivity: psql -c "\l"
      3. Review logs: tail -n 100 /var/log/backup.log
      4. Manual backup: /usr/local/bin/backup.sh
      5. Escalate to: db-team@company.com

Choosing a Monitoring Solution

Self-Hosted Options

Healthchecks.io (Open Source)

Free, self-hosted
Simple and reliable
Python/Django based
Good for small teams

Cronitor (Commercial, has open-source version)

Feature-rich
Beautiful UI
Higher cost

SaaS Options

My tool - CronMonitor

Dead simple setup
Timezone-aware
Generous free tier (10 monitors)
Built by someone who felt your pain

Others:

Cronitor (established, expensive)
Better Uptime (includes cron monitoring)
Dead Man's Snitch (simple, focused)

Choosing criteria:

Number of jobs you need to monitor
Budget
Need for self-hosting
Integration requirements (Slack, PagerDuty, etc.)

Complete Example: Production-Ready Script

Here's a fully instrumented backup script you can adapt:

#!/bin/bash
set -euo pipefail

# Configuration
BACKUP_DIR="/backups"
DB_NAME="production"
MONITOR_URL="${MONITOR_URL:-}"
RETENTION_DAYS=30
ALERT_EMAIL="admin@example.com"

# Setup logging
LOG_FILE="/var/log/backup/$(date +%Y-%m-%d).log"
exec 1> >(tee -a "$LOG_FILE")
exec 2>&1

log() {
    echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
}

# Signal start to monitoring
if [ -n "$MONITOR_URL" ]; then
    curl -fsS --retry 3 "$MONITOR_URL/start" || log "WARNING: Could not signal start"
fi

log "Starting backup process"

# Check prerequisites
if ! command -v pg_dump &> /dev/null; then
    log "ERROR: pg_dump not found"
    exit 1
fi

if [ ! -d "$BACKUP_DIR" ]; then
    log "ERROR: Backup directory $BACKUP_DIR does not exist"
    exit 1
fi

# Check disk space (need at least 10GB)
AVAILABLE=$(df -BG "$BACKUP_DIR" | awk 'NR==2 {print $4}' | sed 's/G//')
if [ "$AVAILABLE" -lt 10 ]; then
    log "ERROR: Insufficient disk space. Available: ${AVAILABLE}GB"
    exit 1
fi

# Run backup
BACKUP_FILE="$BACKUP_DIR/db-$(date +%Y%m%d-%H%M%S).sql.gz"
START_TIME=$(date +%s)

log "Creating backup: $BACKUP_FILE"

if pg_dump "$DB_NAME" | gzip > "$BACKUP_FILE"; then
    END_TIME=$(date +%s)
    DURATION=$((END_TIME - START_TIME))
    SIZE=$(du -h "$BACKUP_FILE" | cut -f1)

    log "Backup completed successfully in ${DURATION}s, size: $SIZE"

    # Verify backup is not empty
    if [ ! -s "$BACKUP_FILE" ]; then
        log "ERROR: Backup file is empty"
        rm "$BACKUP_FILE"
        exit 1
    fi

    # Cleanup old backups
    log "Cleaning up backups older than $RETENTION_DAYS days"
    find "$BACKUP_DIR" -name "db-*.sql.gz" -mtime +$RETENTION_DAYS -delete

    # Signal success
    if [ -n "$MONITOR_URL" ]; then
        curl -fsS --retry 3 \
            --data-urlencode "status=success" \
            --data-urlencode "duration=$DURATION" \
            --data-urlencode "size=$SIZE" \
            "$MONITOR_URL" || log "WARNING: Could not signal success"
    fi

    log "Backup process completed successfully"
    exit 0
else
    log "ERROR: Backup failed"

    # Signal failure
    if [ -n "$MONITOR_URL" ]; then
        curl -fsS --retry 3 \
            --data-urlencode "status=fail" \
            --data-urlencode "error=pg_dump failed" \
            "$MONITOR_URL" || log "WARNING: Could not signal failure"
    fi

    # Send email alert
    if command -v mail &> /dev/null; then
        echo "Backup failed. Check logs at $LOG_FILE" | \
            mail -s "ALERT: Backup Failed" "$ALERT_EMAIL"
    fi

    exit 1
fi

Conclusion

Monitoring cron jobs isn't optional for production systems. The question isn't "should we monitor?" but "how should we monitor?"

Start simple:

Add health check pings to your most critical jobs
Set up alerts for failures
Track patterns over time
Iterate and improve

Remember:

Cron jobs fail silently by design
You need active monitoring, not passive logging
Set realistic grace periods
Monitor the monitors
Document your runbooks

The peace of mind from knowing your backups actually ran? Worth every minute of setup time.

What's your cron monitoring strategy? Drop a comment - I'd love to hear how others are solving this problem!

If you're looking for a simple solution to get started, I built CronMonitor specifically for this. Free tier includes 10 monitors, no credit card needed.

P.S. - If you found this helpful, follow me for more DevOps and SaaS content!

Additional Resources

Cover image: A terminal window showing cron job output with monitoring indicators

DEV Community

How to Monitor Cron Jobs in 2026: A Complete Guide

How to Monitor Cron Jobs in 2026: A Complete Guide

The Problem with Traditional Cron

What We Need to Monitor

Approach 1: Email Alerts (Basic)

Approach 2: Log Files + Manual Checks

Approach 3: Dead Man's Switch Pattern

The Basic Pattern

Approach 4: Full Monitoring Solution

Real-World Implementation Tips

1. Handle Network Issues

2. Don't Let Monitoring Break Your Job

3. Set Realistic Grace Periods

4. Monitor the Monitors

5. Use Environment Variables

Timezone Considerations

Common Pitfalls to Avoid

1. Not Monitoring Start Time

2. Ignoring Exit Codes

3. Alert Fatigue

4. No Runbook

Choosing a Monitoring Solution

Self-Hosted Options

SaaS Options

Complete Example: Production-Ready Script

Conclusion

Additional Resources

Top comments (0)