DEV Community

Alex Spinov
Alex Spinov

Posted on

Your Cron Jobs Are Failing Silently (Here's a 50-Line Fix)

Last month, my backup cron job failed at 3 AM on a Saturday. I didn't notice until Monday morning when I needed to restore data.

Three days of backups — gone.

The job had been failing with a disk space error, but cron doesn't care about exit codes by default. It just runs the command and moves on.

The Silent Killer

Here's what most cron setups look like:

# crontab
0 3 * * * /path/to/backup.sh
0 6 * * * /path/to/report.sh
0 */2 * * * /path/to/cleanup.sh
Enter fullscreen mode Exit fullscreen mode

No monitoring. No alerts. No logging. If any of these fail, you won't know until the damage is done.

The Fix: Wrap Every Job

I built a simple Python wrapper that:

  1. Logs start/end time and exit code
  2. Sends a Telegram/Slack alert on failure
  3. Detects missed runs
import subprocess
import json
from datetime import datetime
from pathlib import Path
import urllib.request

DB = Path.home() / '.cron-monitor.json'

def load_db():
    return json.loads(DB.read_text()) if DB.exists() else {'jobs': {}}

def save_db(db):
    DB.write_text(json.dumps(db, indent=2, default=str))

def alert(message, bot_token, chat_id):
    url = f'https://api.telegram.org/bot{bot_token}/sendMessage'
    data = json.dumps({'chat_id': chat_id, 'text': message}).encode()
    req = urllib.request.Request(url, data=data,
        headers={'Content-Type': 'application/json'})
    urllib.request.urlopen(req, timeout=10)

def run_job(name, command, bot_token=None, chat_id=None):
    db = load_db()
    start = datetime.now()

    result = subprocess.run(command, capture_output=True, text=True)

    db['jobs'][name] = {
        'last_run': start.isoformat(),
        'duration': (datetime.now() - start).total_seconds(),
        'exit_code': result.returncode,
        'status': 'success' if result.returncode == 0 else 'failed'
    }
    save_db(db)

    if result.returncode != 0 and bot_token:
        alert(
            f"🔴 CRON FAILED: {name}\n"
            f"Exit code: {result.returncode}\n"
            f"Error: {result.stderr[:200]}",
            bot_token, chat_id
        )

# Usage
run_job("daily-backup", ["bash", "/path/to/backup.sh"],
        bot_token="YOUR_BOT_TOKEN", chat_id="YOUR_CHAT_ID")
Enter fullscreen mode Exit fullscreen mode

Updated Crontab

# Before (silent failures)
0 3 * * * /path/to/backup.sh

# After (monitored)
0 3 * * * python3 /path/to/monitor.py --name "daily-backup" -- bash /path/to/backup.sh
Enter fullscreen mode Exit fullscreen mode

What You Get

Every job is now tracked:

=== Cron Job Monitor ===

Job: daily-backup
  Last run: 2026-03-25 03:00:01
  Status: ✅ Success (exit code 0)
  Duration: 4m 23s

Job: db-cleanup
  Last run: 2026-03-25 02:00:00
  Status: 🔴 Failed (exit code 1)
  Error: "connection refused"
  Alert sent: Telegram ✅
Enter fullscreen mode Exit fullscreen mode

Why Not Use Existing Tools?

  • Healthchecks.io — great service, but it's external. I want self-hosted.
  • Cronitor — $20/month. This is free.
  • systemd timers — powerful but complex to set up.
  • Dead Man's Snitch — SaaS, costs money.

My solution: 50 lines of Python, zero dependencies, instant setup.

3 Bonus Tips for Cron

1. Always redirect output

0 3 * * * /path/to/backup.sh >> /var/log/backup.log 2>&1
Enter fullscreen mode Exit fullscreen mode

2. Use flock to prevent overlapping runs

0 3 * * * flock -n /tmp/backup.lock /path/to/backup.sh
Enter fullscreen mode Exit fullscreen mode

3. Set PATH explicitly

PATH=/usr/local/bin:/usr/bin:/bin
0 3 * * * /path/to/backup.sh
Enter fullscreen mode Exit fullscreen mode

Cron has a minimal PATH. Your script works in terminal but fails in cron? This is why.


The full monitor with Telegram, Slack, timeout detection, and missed run alerts is on GitHub.

What's the worst cron failure you've had? I know I'm not the only one who lost backups.

Follow for more DevOps and automation content.

Top comments (0)