Kriss

Posted on Apr 30 • Edited on May 5

How to add dead man's switch monitoring to any cron job in 2 minutes

#devops #monitoring #opensource #tutorial

How to add dead man's switch monitoring to any cron job in 2 minutes

The concept is simple: your job checks in when it runs. If it stops checking in, you get alerted.

No agent to install. No SDK to integrate. Just a curl at the end of your script.

The one-liner

curl -fsS https://deadmancheck.io/ping/YOUR-TOKEN > /dev/null

That's it. Stick that at the end of your cron job. If the job stops running — server dies, cron daemon crashes, script errors out before it gets there — you get an alert.

The flags: -f fails silently on HTTP errors, -s suppresses progress output, -S still shows errors if -s is set. Redirect to /dev/null because you don't want curl output polluting your logs.

Setting it up

Sign up at deadmancheck.io (free for up to 5 monitors). Create a monitor, set the expected interval — say, every 24 hours — and copy your unique token.

Then configure the alert window. If you're running a daily job, set it to alert after 25 hours of silence. That gives a 1-hour grace period for slow servers and slight scheduling drift.

Start/end pattern for longer jobs

The one-liner is fine for quick jobs. For anything that runs more than a few minutes, use the start/end pattern. This also catches jobs that start but hang indefinitely.

# Signal job started
curl -fsS https://deadmancheck.io/ping/YOUR-TOKEN/start > /dev/null

# ... your job logic ...

# Signal job completed
curl -fsS https://deadmancheck.io/ping/YOUR-TOKEN > /dev/null

If the job starts but never pings the end URL within your configured timeout, you get alerted. Useful for ETL jobs that sometimes decide to run for 6 hours when they should take 20 minutes.

Python

import requests
import os

DEADMANCHECK_TOKEN = os.environ["DEADMANCHECK_TOKEN"]
BASE_URL = f"https://deadmancheck.io/ping/{DEADMANCHECK_TOKEN}"

def ping(path="", count=None):
    try:
        url = f"{BASE_URL}{path}"
        if count is not None:
            requests.post(url, json={"count": count}, timeout=5)
        else:
            requests.get(url, timeout=5)
    except requests.RequestException:
        pass  # never let monitoring break the job

ping("/start")
try:
    rows = run_export()
    ping(count=len(rows))
except Exception:
    ping("/fail")
    raise

The try/except around each ping is deliberate. Your monitoring call should never take down your job.

Ruby

require 'net/http'
require 'uri'
require 'json'

TOKEN = ENV['DEADMANCHECK_TOKEN']
BASE = "https://deadmancheck.io/ping/#{TOKEN}"

def ping(path = '', count = nil)
  uri = URI("#{BASE}#{path}")
  if count
    req = Net::HTTP::Post.new(uri, 'Content-Type' => 'application/json')
    req.body = JSON.generate({ count: count })
    Net::HTTP.start(uri.host, uri.port, use_ssl: true) { |http| http.request(req) }
  else
    Net::HTTP.get(uri)
  end
rescue StandardError
  # don't let monitoring kill the job
end

ping('/start')

begin
  count = run_etl
  ping('', count)
rescue => e
  ping('/fail')
  raise
end

Bash with error handling

For bash scripts, use a trap to ping the fail URL on any error:

#!/bin/bash
set -euo pipefail

TOKEN="YOUR-TOKEN"
BASE="https://deadmancheck.io/ping/${TOKEN}"

curl -fsS "${BASE}/start" > /dev/null

trap 'curl -fsS "${BASE}/fail" > /dev/null' ERR

/usr/local/bin/run-backup.sh

ROW_COUNT=$(wc -l < /backups/output.csv)
curl -fsS -X POST -H "Content-Type: application/json" \
  -d "{\"count\": ${ROW_COUNT}}" \
  "${BASE}" > /dev/null

set -euo pipefail means any unhandled error exits the script and triggers the trap. The ERR trap fires before exit, pinging the fail endpoint.

What to monitor first

If you're not sure where to start:

Database backups — silent failures here are catastrophic
ETL/data pipeline jobs — wrong data is worse than no data
Invoice/billing jobs — customers notice immediately
Report generation — stakeholders notice next morning
Cache warmers — performance degrades silently

Anything that runs unattended and that you'd be embarrassed to find broken three weeks later.

One token per cron job. If you have 10 jobs, create 10 monitors. DeadManCheck's free tier covers 5 monitors — the $12/mo plan covers 100, which handles most teams.

Two minutes of setup. One less thing to find out about the hard way.

DEV Community

How to add dead man's switch monitoring to any cron job in 2 minutes

How to add dead man's switch monitoring to any cron job in 2 minutes

The one-liner

Setting it up

Start/end pattern for longer jobs

Python

Ruby

Bash with error handling

What to monitor first

Top comments (0)