Cron jobs are the duct tape of infrastructure. They hold everything together until they silently fail at 3am and nobody notices until Monday.
I've been building CronPing — a cron job monitoring API — and I've seen every way cron can go wrong. Here are the patterns that actually work.
The Problem with crontab -l
When a cron job fails, most developers run crontab -l to verify the schedule. But the schedule is rarely the problem. The real issues are:
-
Environment variables — cron doesn't load your
.bashrc -
PATH differences —
/usr/local/binisn't in cron's PATH - Silent failures — the job ran but the command failed
- Timing drift — the job takes longer than the interval
Pattern 1: The Dead Man's Switch
Instead of checking if a cron job ran, check if it didn't run. This is the dead man's switch pattern:
# Instead of this:
0 * * * * /usr/bin/python3 /app/cleanup.py
# Do this:
0 * * * * /usr/bin/python3 /app/cleanup.py && curl -s https://cronping.anethoth.com/ping/YOUR_TOKEN
If the ping doesn't arrive within the expected window, something went wrong. You don't need to parse logs or check exit codes — the absence of the ping IS the alert.
You can set this up with any monitoring service, or build your own with a simple database table:
CREATE TABLE cron_pings (
job_name TEXT PRIMARY KEY,
last_ping TIMESTAMP,
expected_interval_seconds INTEGER
);
Then a checker that runs every minute:
import sqlite3
from datetime import datetime, timedelta
def check_overdue():
conn = sqlite3.connect('cron_monitor.db')
cur = conn.cursor()
cur.execute("""
SELECT job_name, last_ping, expected_interval_seconds
FROM cron_pings
WHERE last_ping < datetime('now', '-' || expected_interval_seconds || ' seconds')
""")
for job_name, last_ping, interval in cur.fetchall():
alert(f"{job_name} is overdue! Last ping: {last_ping}")
Pattern 2: Wrapper Script with Logging
Never put complex logic directly in crontab. Use a wrapper:
#!/bin/bash
# /usr/local/bin/run-with-logging.sh
set -euo pipefail
JOB_NAME="$1"
shift
LOG_FILE="/var/log/cron/${JOB_NAME}.log"
echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] START ${JOB_NAME}" >> "$LOG_FILE"
if "$@" >> "$LOG_FILE" 2>&1; then
echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] SUCCESS ${JOB_NAME}" >> "$LOG_FILE"
# Ping your monitoring endpoint
curl -sf "https://cronping.anethoth.com/ping/${CRONPING_TOKEN}" > /dev/null || true
else
EXIT_CODE=$?
echo "[$(date -u +%Y-%m-%dT%H:%M:%SZ)] FAILED ${JOB_NAME} (exit ${EXIT_CODE})" >> "$LOG_FILE"
# Don't ping — the monitor will detect the missing ping
fi
Then your crontab becomes:
0 * * * * /usr/local/bin/run-with-logging.sh cleanup /usr/bin/python3 /app/cleanup.py
0 2 * * * /usr/local/bin/run-with-logging.sh backup /usr/local/bin/pg_dump -U app mydb > /backups/daily.sql
Pattern 3: Lock Files for Long-Running Jobs
If your job runs every 5 minutes but sometimes takes 7 minutes, you get overlapping runs:
#!/bin/bash
LOCK_FILE="/tmp/etl-job.lock"
if [ -f "$LOCK_FILE" ]; then
PID=$(cat "$LOCK_FILE")
if kill -0 "$PID" 2>/dev/null; then
echo "Job already running (PID $PID), skipping"
exit 0
fi
echo "Stale lock file found, removing"
fi
echo $$ > "$LOCK_FILE"
trap "rm -f $LOCK_FILE" EXIT
# Your actual job here
python3 /app/etl_pipeline.py
Or use flock which is built into most Linux distributions:
*/5 * * * * flock -n /tmp/etl.lock python3 /app/etl_pipeline.py
Pattern 4: Cron Expression Validation
Before deploying a new cron schedule, validate it programmatically:
# Free API — no signup required
curl -s 'https://cronping.anethoth.com/api/v1/cron/describe?expr=0+*/2+*+*+1-5' | jq .
Response:
{
"expression": "0 */2 * * 1-5",
"description": "At minute 0, every 2 hours, Monday through Friday",
"fields": {
"minute": "0",
"hour": "*/2 (every 2 hours)",
"day_of_month": "* (every day)",
"month": "* (every month)",
"day_of_week": "1-5 (Monday through Friday)"
},
"is_valid": true
}
You can also get the next run times:
curl -s 'https://cronping.anethoth.com/api/v1/cron/next?expr=0+*/2+*+*+1-5&count=5' | jq '.next_runs'
Pattern 5: Status Badges in Your README
Make cron job health visible to your whole team. If you use CronPing, each monitor gets an SVG badge you can embed in your GitHub README:
## Cron Job Status


This creates accountability — when the badge turns red, everyone sees it.
The Bigger Picture
Cron jobs are the most under-monitored piece of infrastructure in most organizations. They fail silently, they overlap, they depend on environment variables that change during deployments.
The fix isn't better cron expressions — it's better observability. Treat cron jobs like you treat HTTP endpoints: monitor them, log them, alert on failures.
Resources:
- CronPing — Free cron monitoring API (10 monitors free)
- Free Cron Expression Helper — Parse, validate, and build cron expressions
- Free Uptime Calculator — SLA uptime/downtime calculator
- Cron Schedule Reference — 28 common cron schedule examples
Top comments (0)