Heartbeat monitoring: know when your scheduled jobs silently stop working

#devops #go #node #productivity

Uptime monitoring tells you when your server goes down. But some of the worst outages look like this:

The server is fine
The cron scheduler fired
Nothing visibly broke

The job just quietly stopped doing anything useful.

A nightly data sync that hasn't run in four days. A backup that "completed" but wrote zero bytes. A report job that started silently throwing exceptions three weeks ago. These failures are invisible to a traditional HTTP monitor because the endpoint
never went down.

This is what heartbeat monitoring solves.

How it works

A heartbeat monitor is a dead-man's switch. Instead of Tickstem polling your endpoint, your job calls Tickstem at the end of every successful run. If the ping stops arriving within the expected interval + grace window, you get an alert.

You configure two things:

interval — how often you expect a ping (e.g. every 24h)
grace window — buffer past the deadline before alerting (e.g. 1h)

No ping for two consecutive intervals → alert sent.

Wiring it up

Go:

import "github.com/tickstem/heartbeat"

client := heartbeat.New(os.Getenv("TICKSTEM_API_KEY"))

hb, err := client.Create(ctx, heartbeat.CreateParams{
    Name:         "nightly-sync",
    IntervalSecs: 86400,
    GraceSecs:    3600,
})

// at the end of every successful run — token is the credential, no API key needed
if err := client.Ping(ctx, hb.Token); err != nil {
    log.Println("heartbeat ping failed:", err) // non-fatal
}

Node.js:

import { HeartbeatClient } from "@tickstem/heartbeat"

const hb = new HeartbeatClient(process.env.TICKSTEM_API_KEY)

const heartbeat = await hb.create({ name: "nightly-sync", interval_secs: 86400 })

// at the end of every successful run
await hb.ping(heartbeat.token).catch(err => console.error("ping failed:", err))

Or just curl — no SDK needed:

curl -s -X POST https://api.tickstem.dev/v1/heartbeats/$HEARTBEAT_TOKEN/ping

The token goes in the URL. No auth header. If curl fails, the script still exits cleanly.

The thing worth noting

The ping only happens on success. Silence means something went wrong — either the job crashed, was never scheduled, or completed without doing its actual work. That's the point.

Make the ping non-fatal though. A transient network blip shouldn't abort a successful sync.

When to use it

Any job where "it ran" and "it did something useful" are different things: