Jack

Posted on Apr 20 • Originally published at anethoth.com

Stop Using crontab -l to Debug Cron Jobs (Here's What to Do Instead)

#devops #linux #productivity #tutorial

Cron jobs are the duct tape of infrastructure. They hold everything together until they silently fail at 3am and nobody notices until Monday.

I've been building CronPing — a cron job monitoring API — and I've seen every way cron can go wrong. Here are the patterns that actually work.

The Problem with `crontab -l`

When a cron job fails, most developers run crontab -l to verify the schedule. But the schedule is rarely the problem. The real issues are:

Environment variables — cron doesn't load your .bashrc
PATH differences — /usr/local/bin isn't in cron's PATH
Silent failures — the job ran but the command failed
Timing drift — the job takes longer than the interval

Pattern 1: The Dead Man's Switch

Instead of checking if a cron job ran, check if it didn't run. This is the dead man's switch pattern:

# Instead of this:
0 * * * * /usr/bin/python3 /app/cleanup.py

# Do this:
0 * * * * /usr/bin/python3 /app/cleanup.py && curl -s https://cronping.anethoth.com/ping/YOUR_TOKEN

If the ping doesn't arrive within the expected window, something went wrong. You don't need to parse logs or check exit codes — the absence of the ping IS the alert.

You can set this up with any monitoring service, or build your own with a simple database table:

CREATE TABLE cron_pings (
    job_name TEXT PRIMARY KEY,
    last_ping TIMESTAMP,
    expected_interval_seconds INTEGER
);

Then a checker that runs every minute:

import sqlite3
from datetime import datetime, timedelta

def check_overdue():
    conn = sqlite3.connect('cron_monitor.db')
    cur = conn.cursor()
    cur.execute("""
        SELECT job_name, last_ping, expected_interval_seconds 
        FROM cron_pings 
        WHERE last_ping < datetime('now', '-' || expected_interval_seconds || ' seconds')
    """)
    for job_name, last_ping, interval in cur.fetchall():
        alert(f"{job_name} is overdue! Last ping: {last_ping}")

Pattern 2: Wrapper Script with Logging

Never put complex logic directly in crontab. Use a wrapper:

#!/bin/bash
# /usr/local/bin/run-with-logging.sh
set -euo pipefail

JOB_NAME="$1"
shift
LOG_FILE="/var/log/cron/${JOB_NAME}.log"

echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] START ${JOB_NAME}" >> "$LOG_FILE"

if "$@" >> "$LOG_FILE" 2>&1; then
    echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] SUCCESS ${JOB_NAME}" >> "$LOG_FILE"
    # Ping your monitoring endpoint
    curl -sf "https://cronping.anethoth.com/ping/${CRONPING_TOKEN}" > /dev/null || true
else
    EXIT_CODE=$?
    echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] FAILED ${JOB_NAME} (exit ${EXIT_CODE})" >> "$LOG_FILE"
    # Don't ping — the monitor will detect the missing ping
fi

Then your crontab becomes:

0 * * * * /usr/local/bin/run-with-logging.sh cleanup /usr/bin/python3 /app/cleanup.py
0 2 * * * /usr/local/bin/run-with-logging.sh backup /usr/local/bin/pg_dump -U app mydb > /backups/daily.sql

Pattern 3: Lock Files for Long-Running Jobs

If your job runs every 5 minutes but sometimes takes 7 minutes, you get overlapping runs:

#!/bin/bash
LOCK_FILE="/tmp/etl-job.lock"

if [ -f "$LOCK_FILE" ]; then
    PID=$(cat "$LOCK_FILE")
    if kill -0 "$PID" 2>/dev/null; then
        echo "Job already running (PID $PID), skipping"
        exit 0
    fi
    echo "Stale lock file found, removing"
fi

echo $$ > "$LOCK_FILE"
trap "rm -f $LOCK_FILE" EXIT

# Your actual job here
python3 /app/etl_pipeline.py

Or use flock which is built into most Linux distributions:

*/5 * * * * flock -n /tmp/etl.lock python3 /app/etl_pipeline.py

Pattern 4: Cron Expression Validation

Before deploying a new cron schedule, validate it programmatically:

# Free API — no signup required
curl -s 'https://cronping.anethoth.com/api/v1/cron/describe?expr=0+*/2+*+*+1-5' | jq .

Response:

{
  "expression": "0 */2 * * 1-5",
  "description": "At minute 0, every 2 hours, Monday through Friday",
  "fields": {
    "minute": "0",
    "hour": "*/2 (every 2 hours)",
    "day_of_month": "* (every day)",
    "month": "* (every month)",
    "day_of_week": "1-5 (Monday through Friday)"
  },
  "is_valid": true
}

You can also get the next run times:

curl -s 'https://cronping.anethoth.com/api/v1/cron/next?expr=0+*/2+*+*+1-5&count=5' | jq '.next_runs'

Pattern 5: Status Badges in Your README

Make cron job health visible to your whole team. If you use CronPing, each monitor gets an SVG badge you can embed in your GitHub README:

## Cron Job Status
![Backup Job](https://cronping.anethoth.com/badge/YOUR_TOKEN)
![ETL Pipeline](https://cronping.anethoth.com/badge/YOUR_TOKEN2)

This creates accountability — when the badge turns red, everyone sees it.

The Bigger Picture

Cron jobs are the most under-monitored piece of infrastructure in most organizations. They fail silently, they overlap, they depend on environment variables that change during deployments.

The fix isn't better cron expressions — it's better observability. Treat cron jobs like you treat HTTP endpoints: monitor them, log them, alert on failures.

Resources:

CronPing — Free cron monitoring API (10 monitors free)
Free Cron Expression Helper — Parse, validate, and build cron expressions
Free Uptime Calculator — SLA uptime/downtime calculator
Cron Schedule Reference — 28 common cron schedule examples

DEV Community

Stop Using crontab -l to Debug Cron Jobs (Here's What to Do Instead)

The Problem with `crontab -l`

Pattern 1: The Dead Man's Switch

Pattern 2: Wrapper Script with Logging

Pattern 3: Lock Files for Long-Running Jobs

Pattern 4: Cron Expression Validation

Pattern 5: Status Badges in Your README

The Bigger Picture

Top comments (0)

The Problem with crontab -l

Pattern 1: The Dead Man's Switch

Pattern 2: Wrapper Script with Logging

Pattern 3: Lock Files for Long-Running Jobs

Pattern 4: Cron Expression Validation

Pattern 5: Status Badges in Your README

The Bigger Picture

The Problem with `crontab -l`