DEV Community

Mike Tickstem
Mike Tickstem

Posted on • Originally published at tickstem.dev

Monitoring AI agents in production

Traditional monitoring asks one question: is the server up? If the endpoint returns 200, everything is fine. AI agents break that assumption. The server can be perfectly healthy while the agent silently produces wrong outputs, skips steps, runs over budget, or stops working entirely — all without triggering a single alert.

Monitoring autonomous agents requires a different mental model. Here's what actually breaks and how to catch it.

What traditional monitoring misses

Uptime monitoring tells you the endpoint responded. It says nothing about what the agent did inside that response. An agent endpoint that returns {"status": "ok"} in 50ms might have skipped the entire task due to a context length limit, a rate limit on the model API, or a malformed tool call that silently failed.

The failure modes specific to AI agents in production:

  • Silent tool failures. A tool call returns an error that the model handles by continuing without it. The task "completes" but with missing data.
  • Context window exhaustion. Long-running agents hit token limits mid-task and truncate their work. The HTTP response is still 200.
  • Model API degradation. The underlying model API is slow or returning degraded outputs. Your endpoint is up; the work is wrong.
  • Drift over time. An agent that worked last week starts producing subtly different outputs as the model is updated. No alert fires — outputs just quietly change.
  • Scheduled run skips. The agent was supposed to run at 06:00. It didn't. Nothing in your existing monitoring catches this because the server never went down.

The three layers of agent monitoring

Layer 1: Uptime monitoring

Still necessary — just not sufficient. Your agent's HTTP endpoint should be monitored for availability and response time. A degraded model API often manifests first as increased latency before it causes failures.

Set up an uptime monitor on the endpoint your agent exposes. A 30-second check interval catches most outages before users do. Configure timeout alerts — if your agent normally responds in under 10 seconds and starts taking 90, something is wrong even if it's still returning 200.

curl -X POST https://api.tickstem.dev/v1/monitors \
  -H "Authorization: Bearer $TICKSTEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"name": "summary-agent-endpoint", "url": "https://your-app.com/agents/summary/health", "interval_secs": 30, "timeout_secs": 15}'
Enter fullscreen mode Exit fullscreen mode

Layer 2: Heartbeat monitoring

Uptime tells you the server is alive. Heartbeat tells you the agent actually did the work.

A heartbeat monitor works as a dead man's switch: your agent sends a ping after each successful completion. If the ping stops arriving within the expected window, you get an alert. The server being up is irrelevant — if the work stopped happening, the heartbeat catches it.

# Create a heartbeat — save the token
curl -X POST https://api.tickstem.dev/v1/heartbeats \
  -H "Authorization: Bearer $TICKSTEM_API_KEY" \
  -d '{"name":"daily-summary-agent","interval_secs":86400,"grace_secs":3600}'

# At the end of every successful agent run
curl -s -X POST https://api.tickstem.dev/v1/heartbeats/$HEARTBEAT_TOKEN/ping
Enter fullscreen mode Exit fullscreen mode

The ping only fires on success — after the agent has verified its own output. Silence means failure, regardless of what the HTTP response said.

Layer 3: Execution history

The most underused layer. Every scheduled agent run should produce a logged record: when it ran, how long it took, whether it succeeded, and what it returned.

Without this, debugging a failure means reconstructing what happened from scattered logs. With it, you open the execution history and see immediately: the run at 06:03 took 4 minutes instead of the usual 45 seconds, returned a 500, and the response body contains a rate limit error from the model API.

If you're using HTTP-based scheduling for your agent, execution history comes for free — every run is logged with the full request and response.

A practical rule: any agent task that runs on a schedule and produces output that other systems depend on needs all three layers. Uptime alone is not monitoring — it's a pulse check.

Wiring it up via MCP

If you're building with Claude Code or a similar MCP-compatible agent, you can set up the full monitoring stack from within your editor. The Tickstem MCP server exposes create_monitor, create_heartbeat, and list_executions as native tools.

What good agent monitoring looks like

The goal is to answer three questions at any point in time, without digging through logs:

  • Is the agent endpoint reachable and responding normally? (uptime)
  • Did the agent complete its last scheduled task? (heartbeat)
  • What happened on the last N runs? (execution history)

When all three are in place, debugging shifts from "something might be wrong, let me check everything" to "here's exactly what happened and when."


Tickstem provides uptime monitoring, heartbeat checks, cron scheduling, and email verification under one API key. Free tier at app.tickstem.dev — no credit card required.

Top comments (0)