How to Monitor Cron Jobs in 2026: A Complete Guide
Cron jobs are the silent workhorses of modern applications. They run backups, clean up data, send emails, sync with APIs, and handle countless other critical tasks. But here's the problem: when they fail, they fail silently.
I learned this the hard way when I discovered a month's worth of database backups had been failing. The cron job was still "running" - it just wasn't doing anything useful. That's when I realized: running a cron job and successfully completing it are two very different things.
The Problem with Traditional Cron
Traditional cron has zero built-in monitoring. You can log output to a file, sure:
0 2 * * * /usr/local/bin/backup.sh >> /var/log/backup.log 2>&1
But this means:
- You have to remember to check the logs
- Logs grow indefinitely (hello disk space issues)
- No alerts when something breaks
- You only notice when you need that backup
Cron's job is to run commands on schedule. It's not designed to tell you if those commands actually succeeded.
What We Need to Monitor
When monitoring cron jobs, we care about several things:
- Did it run at all? (The job might be disabled, the server might be down)
- Did it complete successfully? (Exit code 0 vs errors)
- Did it run on time? (Server overload, resource constraints)
- How long did it take? (Performance degradation over time)
- What was the output? (Errors, warnings, statistics)
Let's look at different approaches to solving this.
Approach 1: Email Alerts (Basic)
The simplest approach is using cron's built-in email feature:
MAILTO=admin@example.com
0 2 * * * /usr/local/bin/backup.sh
Pros:
- Zero setup
- Works out of the box
Cons:
- Only notifies on failures (STDERR output)
- Requires mail server configuration
- No success confirmation
- Email overload from multiple jobs
- Can't track history or patterns
Verdict: Good for personal projects with 1-2 cron jobs. Not scalable.
Approach 2: Log Files + Manual Checks
Slightly better - centralized logging:
#!/bin/bash
# backup.sh
LOG_FILE="/var/log/backup/$(date +%Y-%m-%d).log"
echo "[$(date)] Starting backup..." >> $LOG_FILE
if pg_dump mydb > /backups/db-$(date +%Y%m%d).sql; then
echo "[$(date)] Backup completed successfully" >> $LOG_FILE
exit 0
else
echo "[$(date)] ERROR: Backup failed" >> $LOG_FILE
exit 1
fi
Pros:
- Full control over logging
- Detailed output
- Historical record
Cons:
- Still requires manual checking
- No real-time alerts
- Log rotation complexity
- Disk space management
Verdict: Better, but you'll still miss failures.
Approach 3: Dead Man's Switch Pattern
This is where it gets interesting. Instead of monitoring for failures, we monitor for success. If we don't hear from the job, something's wrong.
The Basic Pattern
#!/bin/bash
# backup.sh
MONITOR_URL="https://cronmonitor.app/ping/your-unique-id"
# Run your backup
if pg_dump mydb > /backups/db-$(date +%Y%m%d).sql; then
# Signal success
curl -fsS --retry 3 $MONITOR_URL/success
exit 0
else
# Signal failure
curl -fsS --retry 3 $MONITOR_URL/fail
exit 1
fi
On the monitoring side, you set up an expected schedule:
- "This job should ping me every day at 2 AM"
- "If I don't hear from it by 2:30 AM, alert me"
- "If it pings /fail, alert me immediately"
Pros:
- Catches ALL failure modes (job disabled, server down, script errors)
- Real-time alerts
- Historical tracking
- Works from anywhere
Cons:
- Requires external service
- Dependency on network connectivity
- Potential costs (though many free tiers exist)
Verdict: Industry standard for production systems.
Approach 4: Full Monitoring Solution
For enterprise needs, combine monitoring with observability:
#!/bin/bash
# backup.sh with full monitoring
MONITOR_URL="https://cronmonitor.app/ping/your-unique-id"
# Start signal
curl -fsS --retry 3 "$MONITOR_URL/start"
# Capture start time
START_TIME=$(date +%s)
# Run backup with output capture
OUTPUT=$(pg_dump mydb > /backups/db-$(date +%Y%m%d).sql 2>&1)
EXIT_CODE=$?
# Calculate duration
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))
# Report results with context
if [ $EXIT_CODE -eq 0 ]; then
curl -fsS --retry 3 \
--data-urlencode "status=success" \
--data-urlencode "duration=$DURATION" \
--data-urlencode "output=$OUTPUT" \
"$MONITOR_URL"
else
curl -fsS --retry 3 \
--data-urlencode "status=fail" \
--data-urlencode "duration=$DURATION" \
--data-urlencode "output=$OUTPUT" \
"$MONITOR_URL"
fi
exit $EXIT_CODE
This gives you:
- Success/failure tracking
- Execution duration
- Output logs
- Failure context
- Performance trends over time
Real-World Implementation Tips
1. Handle Network Issues
Always add retries and timeouts to your monitoring pings:
curl -fsS --retry 3 --retry-delay 5 --max-time 10 $MONITOR_URL
Use -f to fail on HTTP errors, -s for silent mode, -S to show errors.
2. Don't Let Monitoring Break Your Job
Wrap monitoring in a way that doesn't affect your main task:
# Run the actual job
/usr/local/bin/backup.sh
JOB_EXIT_CODE=$?
# Try to report status, but don't fail if monitoring is down
curl -fsS --retry 3 $MONITOR_URL || true
# Exit with the job's actual exit code
exit $JOB_EXIT_CODE
3. Set Realistic Grace Periods
Jobs don't always run at exactly the same time:
- Server load varies
- Some tasks take longer with more data
- Network latency affects things
Set grace periods accordingly:
- Fast jobs (< 1 min): 5-10 minute grace period
- Medium jobs (5-30 min): 15-30 minute grace period
- Long jobs (hours): 1-2 hour grace period
4. Monitor the Monitors
What happens if your monitoring service goes down? Have a backup:
PRIMARY_MONITOR="https://cronmonitor.app/ping/abc123"
BACKUP_MONITOR="https://backup-service.com/ping/xyz789"
curl -fsS --retry 2 $PRIMARY_MONITOR || \
curl -fsS --retry 2 $BACKUP_MONITOR
5. Use Environment Variables
Don't hardcode monitoring URLs in scripts:
# /etc/cron.d/backups
MONITOR_URL=https://cronmonitor.app/ping/abc123
0 2 * * * user /usr/local/bin/backup.sh
#!/bin/bash
# backup.sh
if [ -n "$MONITOR_URL" ]; then
trap 'curl -fsS "$MONITOR_URL/fail"' ERR
# Your job here
curl -fsS "$MONITOR_URL/success"
fi
Timezone Considerations
This is often overlooked but critical. Your server might be in UTC, your team in EST, and your monitoring service in another timezone.
Best Practice: Always think in UTC for cron schedules, translate to local time in your monitoring tool.
# Server in UTC, backup at 2 AM EST (7 AM UTC)
0 7 * * * /usr/local/bin/backup.sh
Configure your monitoring with:
- Schedule: "Daily at 7:00 UTC" (system time)
- Display: "2:00 AM EST" (human time)
Common Pitfalls to Avoid
1. Not Monitoring Start Time
Only checking if a job completed misses jobs that hang:
# BAD: Only ping at the end
run_backup
curl $MONITOR_URL
# GOOD: Ping start and end
curl "$MONITOR_URL/start"
run_backup
curl "$MONITOR_URL/end"
2. Ignoring Exit Codes
Your script might "finish" but with errors:
# BAD: Always reports success
backup.sh
curl $MONITOR_URL
# GOOD: Check exit code
if backup.sh; then
curl "$MONITOR_URL/success"
else
curl "$MONITOR_URL/fail"
fi
3. Alert Fatigue
Don't alert on every tiny issue:
- Use grace periods
- Group related alerts
- Set up on-call rotations
- Distinguish critical vs warning
4. No Runbook
When alerts fire at 3 AM, you want answers fast:
# monitoring-config.yaml
monitors:
- name: "Database Backup"
schedule: "0 2 * * *"
runbook: |
1. Check disk space: df -h /backups
2. Check database connectivity: psql -c "\l"
3. Review logs: tail -n 100 /var/log/backup.log
4. Manual backup: /usr/local/bin/backup.sh
5. Escalate to: db-team@company.com
Choosing a Monitoring Solution
Self-Hosted Options
Healthchecks.io (Open Source)
- Free, self-hosted
- Simple and reliable
- Python/Django based
- Good for small teams
Cronitor (Commercial, has open-source version)
- Feature-rich
- Beautiful UI
- Higher cost
SaaS Options
My tool - CronMonitor
- Dead simple setup
- Timezone-aware
- Generous free tier (10 monitors)
- Built by someone who felt your pain
Others:
- Cronitor (established, expensive)
- Better Uptime (includes cron monitoring)
- Dead Man's Snitch (simple, focused)
Choosing criteria:
- Number of jobs you need to monitor
- Budget
- Need for self-hosting
- Integration requirements (Slack, PagerDuty, etc.)
Complete Example: Production-Ready Script
Here's a fully instrumented backup script you can adapt:
#!/bin/bash
set -euo pipefail
# Configuration
BACKUP_DIR="/backups"
DB_NAME="production"
MONITOR_URL="${MONITOR_URL:-}"
RETENTION_DAYS=30
ALERT_EMAIL="admin@example.com"
# Setup logging
LOG_FILE="/var/log/backup/$(date +%Y-%m-%d).log"
exec 1> >(tee -a "$LOG_FILE")
exec 2>&1
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
}
# Signal start to monitoring
if [ -n "$MONITOR_URL" ]; then
curl -fsS --retry 3 "$MONITOR_URL/start" || log "WARNING: Could not signal start"
fi
log "Starting backup process"
# Check prerequisites
if ! command -v pg_dump &> /dev/null; then
log "ERROR: pg_dump not found"
exit 1
fi
if [ ! -d "$BACKUP_DIR" ]; then
log "ERROR: Backup directory $BACKUP_DIR does not exist"
exit 1
fi
# Check disk space (need at least 10GB)
AVAILABLE=$(df -BG "$BACKUP_DIR" | awk 'NR==2 {print $4}' | sed 's/G//')
if [ "$AVAILABLE" -lt 10 ]; then
log "ERROR: Insufficient disk space. Available: ${AVAILABLE}GB"
exit 1
fi
# Run backup
BACKUP_FILE="$BACKUP_DIR/db-$(date +%Y%m%d-%H%M%S).sql.gz"
START_TIME=$(date +%s)
log "Creating backup: $BACKUP_FILE"
if pg_dump "$DB_NAME" | gzip > "$BACKUP_FILE"; then
END_TIME=$(date +%s)
DURATION=$((END_TIME - START_TIME))
SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
log "Backup completed successfully in ${DURATION}s, size: $SIZE"
# Verify backup is not empty
if [ ! -s "$BACKUP_FILE" ]; then
log "ERROR: Backup file is empty"
rm "$BACKUP_FILE"
exit 1
fi
# Cleanup old backups
log "Cleaning up backups older than $RETENTION_DAYS days"
find "$BACKUP_DIR" -name "db-*.sql.gz" -mtime +$RETENTION_DAYS -delete
# Signal success
if [ -n "$MONITOR_URL" ]; then
curl -fsS --retry 3 \
--data-urlencode "status=success" \
--data-urlencode "duration=$DURATION" \
--data-urlencode "size=$SIZE" \
"$MONITOR_URL" || log "WARNING: Could not signal success"
fi
log "Backup process completed successfully"
exit 0
else
log "ERROR: Backup failed"
# Signal failure
if [ -n "$MONITOR_URL" ]; then
curl -fsS --retry 3 \
--data-urlencode "status=fail" \
--data-urlencode "error=pg_dump failed" \
"$MONITOR_URL" || log "WARNING: Could not signal failure"
fi
# Send email alert
if command -v mail &> /dev/null; then
echo "Backup failed. Check logs at $LOG_FILE" | \
mail -s "ALERT: Backup Failed" "$ALERT_EMAIL"
fi
exit 1
fi
Conclusion
Monitoring cron jobs isn't optional for production systems. The question isn't "should we monitor?" but "how should we monitor?"
Start simple:
- Add health check pings to your most critical jobs
- Set up alerts for failures
- Track patterns over time
- Iterate and improve
Remember:
- Cron jobs fail silently by design
- You need active monitoring, not passive logging
- Set realistic grace periods
- Monitor the monitors
- Document your runbooks
The peace of mind from knowing your backups actually ran? Worth every minute of setup time.
What's your cron monitoring strategy? Drop a comment - I'd love to hear how others are solving this problem!
If you're looking for a simple solution to get started, I built CronMonitor specifically for this. Free tier includes 10 monitors, no credit card needed.
P.S. - If you found this helpful, follow me for more DevOps and SaaS content!
Additional Resources
Cover image: A terminal window showing cron job output with monitoring indicators

Top comments (0)