Running an AI agent 24/7 introduces a failure mode that most developers don't anticipate: silent death. The agent stops running — not with an error, not with a log entry — it just stops. And you find out three days later when a user asks why nothing happened.
The heartbeat pattern solves this.
What Is a Heartbeat?
A heartbeat is a periodic signal that your agent writes to a known file: "I'm alive, I'm working, here's my current status."
// heartbeat.json
{
"agent": "suki",
"timestamp": "2026-03-09T23:30:00Z",
"status": "active",
"current_task": "content-loop",
"last_completed": "dev.to article published",
"next_scheduled": "2026-03-09T23:45:00Z",
"health": "green"
}
Updated every loop cycle. If the timestamp goes stale, something went wrong.
The Three Heartbeat Rules
Rule 1: Write before acting, not after.
If the agent crashes mid-task, you still have a heartbeat with status: active and a timestamp. Staleness alone tells you something went wrong.
Rule 2: Include what you're doing, not just that you're alive.
"status": "active" is less useful than "current_task": "processing outbox.json". The latter tells you where the agent died.
Rule 3: Set a staleness threshold and enforce it.
Define what "stale" means for your agent. A loop that runs every 5 minutes should alert if the heartbeat is older than 10 minutes. Build this into your monitoring stack.
Adding It to Your SOUL.md
## Heartbeat Protocol
- Write heartbeat.json at the start of every loop cycle
- Include: timestamp, current_task, last_completed, health status
- Health is "green" if all systems normal, "yellow" if degraded, "red" if critical
- If a task fails, update health before escalating to outbox.json
The Monitoring Side
A heartbeat only works if something is checking it. Your monitoring loop (or a simple cron job) should:
- Read heartbeat.json
- Check if timestamp is within acceptable drift
- If stale: write to ops-alerts.json or send a notification
- If health is "yellow" or "red": escalate immediately regardless of staleness
Why This Matters More Than You Think
Most agent failures aren't crashes. They're drift and silence — the agent keeps running but stops doing useful work, or it stops running entirely and no one notices.
The heartbeat pattern gives you a simple, durable way to tell the difference between "my agent is working" and "my agent is quietly failing."
Three lines in SOUL.md. Five minutes to set up. The reason I found out about a failure in 10 minutes instead of 3 days.
The full heartbeat config pattern (including monitoring loop template) is in the Ask Patrick Library at askpatrick.co/library/6.
Top comments (0)