Last month, my backup cron job failed at 3 AM on a Saturday. I didn't notice until Monday morning when I needed to restore data.
Three days of backups — gone.
The job had been failing with a disk space error, but cron doesn't care about exit codes by default. It just runs the command and moves on.
The Silent Killer
Here's what most cron setups look like:
# crontab
0 3 * * * /path/to/backup.sh
0 6 * * * /path/to/report.sh
0 */2 * * * /path/to/cleanup.sh
No monitoring. No alerts. No logging. If any of these fail, you won't know until the damage is done.
The Fix: Wrap Every Job
I built a simple Python wrapper that:
- Logs start/end time and exit code
- Sends a Telegram/Slack alert on failure
- Detects missed runs
import subprocess
import json
from datetime import datetime
from pathlib import Path
import urllib.request
DB = Path.home() / '.cron-monitor.json'
def load_db():
return json.loads(DB.read_text()) if DB.exists() else {'jobs': {}}
def save_db(db):
DB.write_text(json.dumps(db, indent=2, default=str))
def alert(message, bot_token, chat_id):
url = f'https://api.telegram.org/bot{bot_token}/sendMessage'
data = json.dumps({'chat_id': chat_id, 'text': message}).encode()
req = urllib.request.Request(url, data=data,
headers={'Content-Type': 'application/json'})
urllib.request.urlopen(req, timeout=10)
def run_job(name, command, bot_token=None, chat_id=None):
db = load_db()
start = datetime.now()
result = subprocess.run(command, capture_output=True, text=True)
db['jobs'][name] = {
'last_run': start.isoformat(),
'duration': (datetime.now() - start).total_seconds(),
'exit_code': result.returncode,
'status': 'success' if result.returncode == 0 else 'failed'
}
save_db(db)
if result.returncode != 0 and bot_token:
alert(
f"🔴 CRON FAILED: {name}\n"
f"Exit code: {result.returncode}\n"
f"Error: {result.stderr[:200]}",
bot_token, chat_id
)
# Usage
run_job("daily-backup", ["bash", "/path/to/backup.sh"],
bot_token="YOUR_BOT_TOKEN", chat_id="YOUR_CHAT_ID")
Updated Crontab
# Before (silent failures)
0 3 * * * /path/to/backup.sh
# After (monitored)
0 3 * * * python3 /path/to/monitor.py --name "daily-backup" -- bash /path/to/backup.sh
What You Get
Every job is now tracked:
=== Cron Job Monitor ===
Job: daily-backup
Last run: 2026-03-25 03:00:01
Status: ✅ Success (exit code 0)
Duration: 4m 23s
Job: db-cleanup
Last run: 2026-03-25 02:00:00
Status: 🔴 Failed (exit code 1)
Error: "connection refused"
Alert sent: Telegram ✅
Why Not Use Existing Tools?
- Healthchecks.io — great service, but it's external. I want self-hosted.
- Cronitor — $20/month. This is free.
- systemd timers — powerful but complex to set up.
- Dead Man's Snitch — SaaS, costs money.
My solution: 50 lines of Python, zero dependencies, instant setup.
3 Bonus Tips for Cron
1. Always redirect output
0 3 * * * /path/to/backup.sh >> /var/log/backup.log 2>&1
2. Use flock to prevent overlapping runs
0 3 * * * flock -n /tmp/backup.lock /path/to/backup.sh
3. Set PATH explicitly
PATH=/usr/local/bin:/usr/bin:/bin
0 3 * * * /path/to/backup.sh
Cron has a minimal PATH. Your script works in terminal but fails in cron? This is why.
The full monitor with Telegram, Slack, timeout detection, and missed run alerts is on GitHub.
What's the worst cron failure you've had? I know I'm not the only one who lost backups.
Follow for more DevOps and automation content.
Top comments (0)