DEV Community

Jack
Jack

Posted on • Originally published at anethoth.com

Stop Using crontab -l to Debug Cron Jobs (Here's What to Do Instead)

Cron jobs are the duct tape of infrastructure. They hold everything together until they silently fail at 3am and nobody notices until Monday.

I've been building CronPing — a cron job monitoring API — and I've seen every way cron can go wrong. Here are the patterns that actually work.

The Problem with crontab -l

When a cron job fails, most developers run crontab -l to verify the schedule. But the schedule is rarely the problem. The real issues are:

  1. Environment variables — cron doesn't load your .bashrc
  2. PATH differences/usr/local/bin isn't in cron's PATH
  3. Silent failures — the job ran but the command failed
  4. Timing drift — the job takes longer than the interval

Pattern 1: The Dead Man's Switch

Instead of checking if a cron job ran, check if it didn't run. This is the dead man's switch pattern:

# Instead of this:
0 * * * * /usr/bin/python3 /app/cleanup.py

# Do this:
0 * * * * /usr/bin/python3 /app/cleanup.py && curl -s https://cronping.anethoth.com/ping/YOUR_TOKEN
Enter fullscreen mode Exit fullscreen mode

If the ping doesn't arrive within the expected window, something went wrong. You don't need to parse logs or check exit codes — the absence of the ping IS the alert.

You can set this up with any monitoring service, or build your own with a simple database table:

CREATE TABLE cron_pings (
    job_name TEXT PRIMARY KEY,
    last_ping TIMESTAMP,
    expected_interval_seconds INTEGER
);
Enter fullscreen mode Exit fullscreen mode

Then a checker that runs every minute:

import sqlite3
from datetime import datetime, timedelta

def check_overdue():
    conn = sqlite3.connect('cron_monitor.db')
    cur = conn.cursor()
    cur.execute("""
        SELECT job_name, last_ping, expected_interval_seconds 
        FROM cron_pings 
        WHERE last_ping < datetime('now', '-' || expected_interval_seconds || ' seconds')
    """)
    for job_name, last_ping, interval in cur.fetchall():
        alert(f"{job_name} is overdue! Last ping: {last_ping}")
Enter fullscreen mode Exit fullscreen mode

Pattern 2: Wrapper Script with Logging

Never put complex logic directly in crontab. Use a wrapper:

#!/bin/bash
# /usr/local/bin/run-with-logging.sh
set -euo pipefail

JOB_NAME="$1"
shift
LOG_FILE="/var/log/cron/${JOB_NAME}.log"

echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] START ${JOB_NAME}" >> "$LOG_FILE"

if "$@" >> "$LOG_FILE" 2>&1; then
    echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] SUCCESS ${JOB_NAME}" >> "$LOG_FILE"
    # Ping your monitoring endpoint
    curl -sf "https://cronping.anethoth.com/ping/${CRONPING_TOKEN}" > /dev/null || true
else
    EXIT_CODE=$?
    echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] FAILED ${JOB_NAME} (exit ${EXIT_CODE})" >> "$LOG_FILE"
    # Don't ping — the monitor will detect the missing ping
fi
Enter fullscreen mode Exit fullscreen mode

Then your crontab becomes:

0 * * * * /usr/local/bin/run-with-logging.sh cleanup /usr/bin/python3 /app/cleanup.py
0 2 * * * /usr/local/bin/run-with-logging.sh backup /usr/local/bin/pg_dump -U app mydb > /backups/daily.sql
Enter fullscreen mode Exit fullscreen mode

Pattern 3: Lock Files for Long-Running Jobs

If your job runs every 5 minutes but sometimes takes 7 minutes, you get overlapping runs:

#!/bin/bash
LOCK_FILE="/tmp/etl-job.lock"

if [ -f "$LOCK_FILE" ]; then
    PID=$(cat "$LOCK_FILE")
    if kill -0 "$PID" 2>/dev/null; then
        echo "Job already running (PID $PID), skipping"
        exit 0
    fi
    echo "Stale lock file found, removing"
fi

echo $$ > "$LOCK_FILE"
trap "rm -f $LOCK_FILE" EXIT

# Your actual job here
python3 /app/etl_pipeline.py
Enter fullscreen mode Exit fullscreen mode

Or use flock which is built into most Linux distributions:

*/5 * * * * flock -n /tmp/etl.lock python3 /app/etl_pipeline.py
Enter fullscreen mode Exit fullscreen mode

Pattern 4: Cron Expression Validation

Before deploying a new cron schedule, validate it programmatically:

# Free API — no signup required
curl -s 'https://cronping.anethoth.com/api/v1/cron/describe?expr=0+*/2+*+*+1-5' | jq .
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "expression": "0 */2 * * 1-5",
  "description": "At minute 0, every 2 hours, Monday through Friday",
  "fields": {
    "minute": "0",
    "hour": "*/2 (every 2 hours)",
    "day_of_month": "* (every day)",
    "month": "* (every month)",
    "day_of_week": "1-5 (Monday through Friday)"
  },
  "is_valid": true
}
Enter fullscreen mode Exit fullscreen mode

You can also get the next run times:

curl -s 'https://cronping.anethoth.com/api/v1/cron/next?expr=0+*/2+*+*+1-5&count=5' | jq '.next_runs'
Enter fullscreen mode Exit fullscreen mode

Pattern 5: Status Badges in Your README

Make cron job health visible to your whole team. If you use CronPing, each monitor gets an SVG badge you can embed in your GitHub README:

## Cron Job Status
![Backup Job](https://cronping.anethoth.com/badge/YOUR_TOKEN)
![ETL Pipeline](https://cronping.anethoth.com/badge/YOUR_TOKEN2)
Enter fullscreen mode Exit fullscreen mode

This creates accountability — when the badge turns red, everyone sees it.

The Bigger Picture

Cron jobs are the most under-monitored piece of infrastructure in most organizations. They fail silently, they overlap, they depend on environment variables that change during deployments.

The fix isn't better cron expressions — it's better observability. Treat cron jobs like you treat HTTP endpoints: monitor them, log them, alert on failures.


Resources:

Top comments (0)