How to add dead man's switch monitoring to any cron job in 2 minutes
The concept is simple: your job checks in when it runs. If it stops checking in, you get alerted.
No agent to install. No SDK to integrate. Just a curl at the end of your script.
The one-liner
curl -fsS https://deadmancheck.io/ping/YOUR-TOKEN > /dev/null
That's it. Stick that at the end of your cron job. If the job stops running — server dies, cron daemon crashes, script errors out before it gets there — you get an alert.
The flags: -f fails silently on HTTP errors, -s suppresses progress output, -S still shows errors if -s is set. Redirect to /dev/null because you don't want curl output polluting your logs.
Setting it up
Sign up at deadmancheck.io (free for up to 5 monitors). Create a monitor, set the expected interval — say, every 24 hours — and copy your unique token.
Then configure the alert window. If you're running a daily job, set it to alert after 25 hours of silence. That gives a 1-hour grace period for slow servers and slight scheduling drift.
Start/end pattern for longer jobs
The one-liner is fine for quick jobs. For anything that runs more than a few minutes, use the start/end pattern. This also catches jobs that start but hang indefinitely.
# Signal job started
curl -fsS https://deadmancheck.io/ping/YOUR-TOKEN/start > /dev/null
# ... your job logic ...
# Signal job completed
curl -fsS https://deadmancheck.io/ping/YOUR-TOKEN > /dev/null
If the job starts but never pings the end URL within your configured timeout, you get alerted. Useful for ETL jobs that sometimes decide to run for 6 hours when they should take 20 minutes.
Python
import requests
import os
DEADMANCHECK_TOKEN = os.environ["DEADMANCHECK_TOKEN"]
BASE_URL = f"https://deadmancheck.io/ping/{DEADMANCHECK_TOKEN}"
def ping(path="", count=None):
try:
params = {"count": count} if count is not None else {}
requests.get(f"{BASE_URL}{path}", params=params, timeout=5)
except requests.RequestException:
pass # never let monitoring break the job
ping("/start")
try:
rows = run_export()
ping(count=len(rows))
except Exception:
ping("/fail")
raise
The try/except around each ping is deliberate. Your monitoring call should never take down your job.
Ruby
require 'net/http'
require 'uri'
TOKEN = ENV['DEADMANCHECK_TOKEN']
BASE = "https://deadmancheck.io/ping/#{TOKEN}"
def ping(path = '', params = {})
uri = URI("#{BASE}#{path}")
uri.query = URI.encode_www_form(params) unless params.empty?
Net::HTTP.get(uri)
rescue StandardError
# don't let monitoring kill the job
end
ping('/start')
begin
count = run_etl
ping('', { count: count })
rescue => e
ping('/fail')
raise
end
Bash with error handling
For bash scripts, use a trap to ping the fail URL on any error:
#!/bin/bash
set -euo pipefail
TOKEN="YOUR-TOKEN"
BASE="https://deadmancheck.io/ping/${TOKEN}"
curl -fsS "${BASE}/start" > /dev/null
trap 'curl -fsS "${BASE}/fail" > /dev/null' ERR
/usr/local/bin/run-backup.sh
ROW_COUNT=$(wc -l < /backups/output.csv)
curl -fsS "${BASE}?count=${ROW_COUNT}" > /dev/null
set -euo pipefail means any unhandled error exits the script and triggers the trap. The ERR trap fires before exit, pinging the fail endpoint.
What to monitor first
If you're not sure where to start:
- Database backups — silent failures here are catastrophic
- ETL/data pipeline jobs — wrong data is worse than no data
- Invoice/billing jobs — customers notice immediately
- Report generation — stakeholders notice next morning
- Cache warmers — performance degrades silently
Anything that runs unattended and that you'd be embarrassed to find broken three weeks later.
One token per cron job. If you have 10 jobs, create 10 monitors. DeadManCheck's free tier covers 5 monitors — the $12/mo plan covers 100, which handles most teams.
Two minutes of setup. One less thing to find out about the hard way.
Top comments (0)