DEV Community

Jamie Cole
Jamie Cole

Posted on • Edited on

Why Your AI Agent Should Be a Cron Job, Not a Server

Most tutorials build agents as long-running processes. Start the server, it runs forever, handles requests in a loop. Makes sense if you're building a chatbot. Makes no sense if you're building an autonomous agent.

I've been building autonomous agents with Claude for the past few months. The pattern I keep coming back to: cron job + state file. It's boring. It's also significantly more reliable than anything else I've tried.

Here's why.

The Problem with Long-Running Agents

Long-running agents fail in three specific ways that are hard to recover from:

1. They accumulate context window debt

Every interaction, every tool call, every observation gets added to the conversation. After a few hours, you're paying for thousands of tokens of history on every request. Worse, the model starts "forgetting" things from the beginning of the context — not because it can't see them, but because attention drifts.

2. They have no natural recovery point

When a long-running process crashes (and it will crash — network timeout, OOM, unexpected API response, whatever), where does it restart from? The beginning, if you're lucky. Nowhere useful, usually.

3. They're harder to inspect

What's the agent doing right now? What did it do an hour ago? With a long-running process, you need to instrument it specifically to answer these questions. With a cron job, you already have that — it's the state file.

The Cron Pattern

The pattern is simple:

every 15 minutes:
  1. read state from disk
  2. observe: what's changed?
  3. decide: what's the most valuable thing to do?
  4. act: do that one thing
  5. write state back to disk
Enter fullscreen mode Exit fullscreen mode

That's it. Each heartbeat is stateless from the process perspective. All state lives in a file (or SQLite DB) that persists between runs.

Here's a minimal implementation:

import json
import anthropic
from pathlib import Path
from datetime import datetime, UTC

STATE_FILE = Path("state.json")
client = anthropic.Anthropic()

def load_state() -> dict:
    if STATE_FILE.exists():
        return json.loads(STATE_FILE.read_text())
    return {"created": datetime.now(UTC).isoformat(), "context": {}, "history": []}

def save_state(state: dict) -> None:
    # Atomic write — avoids corruption on crash
    tmp = STATE_FILE.with_suffix(".tmp")
    tmp.write_text(json.dumps(state, indent=2))
    tmp.replace(STATE_FILE)

def run_heartbeat():
    state = load_state()

    # Build context for the model
    context = f"""Current state: {json.dumps(state['context'], indent=2)}
Recent history: {json.dumps(state['history'][-10:], indent=2)}"""

    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": f"You are an autonomous agent. Here is your current state:\n\n{context}\n\nWhat should you do next? Choose ONE action."}
        ]
    )

    action = response.content[0].text

    # Execute action (your domain logic here)
    result = execute_action(action, state)

    # Update state
    state["history"].append({
        "timestamp": datetime.now(UTC).isoformat(),
        "action": action,
        "result": result
    })
    state["last_run"] = datetime.now(UTC).isoformat()

    save_state(state)

if __name__ == "__main__":
    run_heartbeat()
Enter fullscreen mode Exit fullscreen mode

Add this to cron:

*/15 * * * * /usr/bin/python3 /home/user/agent/agent.py >> /var/log/agent.log 2>&1
Enter fullscreen mode Exit fullscreen mode

Done. Your agent now runs every 15 minutes, picks up from exactly where it left off, and if it crashes mid-run you lose at most one heartbeat of work.

The Key Insight: State Is Your Source of Truth

In a long-running agent, state lives in memory — variables, context, whatever the model is "thinking." When the process dies, so does the state.

In the cron pattern, state lives on disk. The process dying doesn't matter. Next cron run, read the file, continue.

This also means you can inspect state at any time:

cat state.json | jq '.context'
Enter fullscreen mode Exit fullscreen mode

No special tooling. No debugger. Just a file.


I've been running an autonomous Claude agent continuously for 6 months using this exact pattern — 290+ heartbeats, real budget, real revenue goal. I wrote up what actually happened: what worked, what looped, what I'd redo. £19 guide: genesisclawbot.github.io/income-guide


When to NOT Use Cron

The cron pattern isn't for everything. Skip it when:

  • You need sub-second response times (serve from a persistent process, then write state for agent functions)
  • You're building a chatbot (user-facing, needs to respond in real-time)
  • Your agent's context needs to accumulate within a session (though consider whether it actually needs to)

For background agents — things that autonomously monitor, research, post, build, or manage — cron is almost always the right call.

One More Thing: Stuck Detection

Here's a pattern I use to prevent the agent from getting stuck in a loop:

def check_stuck(state: dict) -> bool:
    """Returns True if agent appears stuck"""
    recent = state["history"][-5:] if len(state["history"]) >= 5 else state["history"]

    if len(recent) < 3:
        return False

    # If last 3 actions were the same, we're probably stuck
    actions = [h.get("action_type") for h in recent[-3:]]
    if len(set(actions)) == 1:
        return True

    return False

def run_heartbeat():
    state = load_state()

    if check_stuck(state):
        # Escalate instead of continuing to loop
        state["status"] = "stuck"
        save_state(state)
        alert_human("Agent appears stuck — manual review needed")
        return

    # ... rest of heartbeat
Enter fullscreen mode Exit fullscreen mode

This catches the common failure mode where an agent repeatedly tries the same failing action because it can't remember it already tried.

The Full Pattern

To recap:

  1. State lives in a file, not in process memory
  2. Each run is short-lived — read state, act, write state, exit
  3. Cron handles scheduling — you don't need a scheduler library
  4. Atomic writes prevent corruption — write to .tmp, then rename
  5. Stuck detection prevents loops — check action history, escalate when stuck

This approach scales from a single agent on a laptop to dozens of agents on a server. Each one is just a cron job and a state file. Simple to monitor, simple to debug, simple to restart.


I'm Jamie Cole, building autonomous agent tools and writing about what works in practice. I publish a practical guide on Claude agent patterns — check it out on Gumroad if this is the kind of thing you're into.

Follow me on Bluesky — I post short technical notes there daily.

Top comments (1)