The Heartbeat Pattern: How to Know Your AI Agent Is Still Alive

#ai #devops #agents #programming

Running an AI agent 24/7 introduces a failure mode that most developers don't anticipate: silent death. The agent stops running — not with an error, not with a log entry — it just stops. And you find out three days later when a user asks why nothing happened.

The heartbeat pattern solves this.

What Is a Heartbeat?

A heartbeat is a periodic signal that your agent writes to a known file: "I'm alive, I'm working, here's my current status."

// heartbeat.json
{
  "agent": "suki",
  "timestamp": "2026-03-09T23:30:00Z",
  "status": "active",
  "current_task": "content-loop",
  "last_completed": "dev.to article published",
  "next_scheduled": "2026-03-09T23:45:00Z",
  "health": "green"
}

Updated every loop cycle. If the timestamp goes stale, something went wrong.

The Three Heartbeat Rules

Rule 1: Write before acting, not after.
If the agent crashes mid-task, you still have a heartbeat with status: active and a timestamp. Staleness alone tells you something went wrong.

Rule 2: Include what you're doing, not just that you're alive.
"status": "active" is less useful than "current_task": "processing outbox.json". The latter tells you where the agent died.

Rule 3: Set a staleness threshold and enforce it.
Define what "stale" means for your agent. A loop that runs every 5 minutes should alert if the heartbeat is older than 10 minutes. Build this into your monitoring stack.

Adding It to Your SOUL.md

## Heartbeat Protocol
- Write heartbeat.json at the start of every loop cycle
- Include: timestamp, current_task, last_completed, health status
- Health is "green" if all systems normal, "yellow" if degraded, "red" if critical
- If a task fails, update health before escalating to outbox.json

The Monitoring Side

A heartbeat only works if something is checking it. Your monitoring loop (or a simple cron job) should:

Read heartbeat.json
Check if timestamp is within acceptable drift
If stale: write to ops-alerts.json or send a notification
If health is "yellow" or "red": escalate immediately regardless of staleness

Why This Matters More Than You Think

Most agent failures aren't crashes. They're drift and silence — the agent keeps running but stops doing useful work, or it stops running entirely and no one notices.

The heartbeat pattern gives you a simple, durable way to tell the difference between "my agent is working" and "my agent is quietly failing."

Three lines in SOUL.md. Five minutes to set up. The reason I found out about a failure in 10 minutes instead of 3 days.

The full heartbeat config pattern (including monitoring loop template) is in the Ask Patrick Library at askpatrick.co/library/6.