DEV Community

Cover image for Your Cron Jobs Are Silently Failing. Here's How to Know in 30 Seconds.
Pytheas
Pytheas

Posted on

Your Cron Jobs Are Silently Failing. Here's How to Know in 30 Seconds.

"My database backup script broke 11 days before I found out. Credentials got rotated, pg_dump started erroring, and cron just kept running it on schedule like nothing was wrong. No email. No alert. Eleven days of no backups. I only found out because I needed to restore something."

Sound familiar? If you've run cron jobs in production, you've probably been here.

Cron doesn't know your job failed

This is the part that gets people. Cron's job is to start your command at the time you told it to. That's it. If the command exits 1, cron doesn't care. If it hangs forever, cron doesn't care. If the server reboots and the cron daemon doesn't come back up, nobody cares.

You find out when:

  • A customer asks why their data is stale
  • A queue fills up because the consumer job stopped
  • You manually check a dashboard and notice the last run was 9 days ago

The fix is one line

After your job completes successfully, ping an external URL. If the ping stops arriving, you get alerted. That's the whole idea.

# before
0 2 * * * /scripts/backup-db.sh

# after
0 2 * * * /scripts/backup-db.sh && curl -fsS https://cronsignal.io/ping/abc123
Enter fullscreen mode Exit fullscreen mode

The && means the curl only fires if the script exits 0. Script fails? No ping. Script hangs? No ping. Server goes down? No ping. In all cases, you hear about it.

I built CronSignal for this because I wanted something stupid simple. You create a check, tell it how often to expect a ping, and add the curl. If the ping is late, it hits you on email, Slack, Discord, Telegram, or webhook. Setup is maybe 30 seconds.

3 monitors are free. If you need more, it's $5/month flat for unlimited. No per-monitor pricing nonsense.

GitHub Actions has its own version of this problem

If you use schedule triggers in GitHub Actions, you've probably noticed they're... unreliable. GitHub can delay scheduled runs by minutes or hours. If your repo goes 60 days without a push, GitHub silently disables the schedule entirely. No warning.

I made a GitHub Action for this:

name: Nightly Build
on:
  schedule:
    - cron: '0 2 * * *'

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm run build
      - run: npm test

      - name: Ping CronSignal
        if: success()
        uses: CronSignal/ping-action@v1
        with:
          check-id: ${{ secrets.CRONSIGNAL_CHECK_ID }}
Enter fullscreen mode Exit fullscreen mode

If the workflow gets delayed, skipped, or disabled, you know about it the same day instead of weeks later.

Works with basically anything

The curl pattern works anywhere. Not just crontab:

  • systemd timersExecStartPost directive
  • Kubernetes CronJobs — final container step
  • Laravel$schedule->command('...')->after(function() { Http::get('...'); })
  • Django/Celeryrequests.get() at the end of your task
  • Nodefetch() call after your logic

The point is: don't monitor the scheduler. Monitor whether the job actually finished. Those are different things.

Anyway

If you're running cron jobs without monitoring, you're going to have a bad time eventually. The fix is one curl command and 30 seconds of setup.

CronSignal if you want to try it. Or use any heartbeat monitoring service — Healthchecks.io, Dead Man's Snitch, whatever. Just use something. The && curl pattern works the same regardless.

Top comments (0)