DEV Community

wartzar-bee
wartzar-bee

Posted on

How to Run Claude as an Autonomous Agent: Loops, Memory, Schedules, and Guardrails

Most people use Claude one prompt at a time. But the interesting territory is unattended operation: an agent that wakes up on a schedule, reads its own notes from last time, does real work with tools, writes down what it learned, and goes back to sleep — for days or weeks, without you babysitting it.

We run Claude this way in production — a long-lived agent operating inside a sandboxed container — so this guide is the patterns that actually hold up, not a thought experiment. It covers the four things every autonomous Claude setup needs: a loop, memory, a schedule, and guardrails.

TL;DR

  • An autonomous agent is a loop: read state → decide → act with tools → write state → repeat.
  • Memory is the hard part. Each session starts fresh, so the agent must persist state to disk (or a store) and reconstruct context at the start of every run. No memory = an amnesiac that repeats itself.
  • Schedule it with cron, a systemd timer, or a job runner — short scheduled runs beat one giant always-on process.
  • Guardrails are non-negotiable: a sandbox, allow-listed tools, hard stop conditions, and a "no fabrication" rule so the agent reports reality, not vibes.

The core loop

Strip away the buzzwords and an autonomous agent is a while loop with a brain in the middle:

loop:
  1. RECONSTRUCT  — load memory/state from last run
  2. ORIENT       — what's the goal, what's already done, what's next
  3. ACT          — call tools (shell, HTTP, file I/O) to make progress
  4. RECORD       — write decisions + results back to durable storage
  5. CHECK        — stop conditions met? budget exhausted? if not, continue
Enter fullscreen mode Exit fullscreen mode

You can drive this with the Claude Code CLI in headless mode or the Anthropic SDK. The headless CLI is the lowest-friction option — one command does a full reason-act-observe cycle with tools already wired up:

claude -p "$(cat ./agent/task.md)" \
  --output-format json \
  --max-turns 40 \
  > ./agent/last-run.json
Enter fullscreen mode Exit fullscreen mode

-p runs a single non-interactive prompt; --max-turns caps how many tool-use round-trips it can take (a basic runaway guard); JSON output gives you something a script can parse for the next step.

Memory: the thing that makes it "autonomous" instead of "amnesiac"

Here's the trap. Each agent invocation is a fresh context window. Nothing from yesterday's run carries over automatically. If you don't solve memory, your "autonomous agent" rediscovers the same facts, redoes the same work, and re-makes decisions you already made.

The pattern that works: memory lives on disk, and the agent's first job every run is to read it.

A simple, durable layout:

agent/
  MEMORY.md        # durable facts, decisions, "don't relitigate these"
  state.json       # structured current state (what's done, what's queued)
  log/             # append-only run logs, one file per run
  task.md          # the standing instructions / goal
Enter fullscreen mode Exit fullscreen mode

Then make reconstruction the first instruction in the prompt:

# task.md
Before doing anything else:
1. Read agent/MEMORY.md and agent/state.json.
2. Skim the two most recent files in agent/log/.
Then continue the goal below, picking up exactly where the last run left off.

GOAL: <your standing objective>

When you finish this run:
- Append what you did + what you learned to agent/log/<timestamp>.md
- Update agent/state.json with the new current state.
- Add any durable decision to agent/MEMORY.md (one fact, one home — don't duplicate).
Enter fullscreen mode Exit fullscreen mode

Two principles keep memory from rotting:

  • One fact, one home. Every decision/result has a single canonical place. Scattering the same fact across five files guarantees they drift out of sync.
  • Write down decisions, not just actions. "Killed approach X because Y" is worth more than "ran command Z" — it stops the agent from relitigating settled questions next run.

For larger setups, the same idea scales to a vector store or a notes index the agent can search semantically — but flat markdown files get you remarkably far and stay debuggable.

Scheduling: cron beats always-on

You can run one infinite process, but it's fragile — a crash loses everything, and a long-lived context drifts. The robust pattern is many short scheduled runs, each reconstructing state from disk.

A cron entry that runs the agent every hour:

# minute hour * * *  — top of every hour
0 * * * * cd /opt/agent && ./run.sh >> /opt/agent/log/cron.log 2>&1
Enter fullscreen mode Exit fullscreen mode

…where run.sh is the loop body:

#!/usr/bin/env bash
set -euo pipefail

cd "$(dirname "$0")"

claude -p "$(cat task.md)" \
  --max-turns 40 \
  --output-format json \
  > "log/run-$(date +%Y%m%dT%H%M%S).json"
Enter fullscreen mode Exit fullscreen mode

On systems with systemd, a timer is more observable than cron (you get systemctl status, logging, and easy enable/disable):

# /etc/systemd/system/claude-agent.timer
[Unit]
Description=Run the Claude agent hourly

[Timer]
OnCalendar=hourly
Persistent=true

[Install]
WantedBy=timers.target
Enter fullscreen mode Exit fullscreen mode

Because each run is independent and reads state from disk, a single failed run is harmless — the next one picks up where things stand.

Tool use: give it hands, but few

An agent that can only talk is a chatbot. An agent that can act needs tools — shell, HTTP, file I/O, maybe a database client. Claude Code ships with file editing, shell, and search out of the box, and you can extend it with MCP (Model Context Protocol) servers to add custom capabilities (a search index, an internal API, a deployment hook).

The discipline that matters more than the list: expose the fewest tools that get the job done, and allow-list the safe ones so the agent isn't pausing for permission on git status while still being gated on anything destructive.

# Pre-approve read-only / safe commands; everything else still prompts or is denied.
claude -p "$(cat task.md)" \
  --allowedTools "Bash(git status),Bash(npm test),Read,Grep" \
  --max-turns 40
Enter fullscreen mode Exit fullscreen mode

Guardrails: where most autonomous setups go wrong

Autonomy without guardrails isn't bold, it's a liability. The four that have earned their place:

  1. Sandbox the whole thing. Run the agent in a container with a read-only root, only the working directory mounted, and a default-deny network (allow-list the model API and your registries). If you're going to grant autonomy, do it where the blast radius is the container, not your machine. (We wrote a full walkthrough of this in a companion piece on sandboxing Claude Code.)

  2. Hard stop conditions. Caps on turns (--max-turns), wall-clock time, and a budget. An autonomous agent with no off-switch will find a way to run forever.

  3. A "no fabrication" rule, enforced in the prompt. Long-running agents are tempted to report success they didn't achieve — "tests pass" without running them, "deployed" without checking. Bake the opposite into the standing instructions:

   Never claim a result you didn't verify. Every metric must link to its
   source (a command's output, a URL, a log line). If you didn't run it,
   say you didn't run it. Honest "blocked" beats fake "done".
Enter fullscreen mode Exit fullscreen mode
  1. Idempotent, reviewable output. Have the agent propose changes as diffs/PRs you can review, and make actions safe to re-run. An agent that re-runs its last action without doubling the effect is one you can actually trust on a schedule.

Watch the cost — autonomy is where token bills hide

The whole appeal of an autonomous agent is that it runs without you watching. That's also exactly how a token bill quietly balloons: dozens of scheduled runs, each with a big reconstructed context and cache misses you never see. Because Claude Code writes JSONL session logs locally (to ~/.claude/projects/), you can audit spend after the fact. We built an open-source CLI, tokenscope, that turns those logs into a per-session, per-model, per-day cost breakdown — including the cache-creation-vs-cache-read split that drives most surprises. For anything running unattended, npx tokenscope is the cheapest insurance you'll buy: it's read-only and offline, so it slots straight into a sandboxed setup.

FAQ

What's the minimum to make Claude "autonomous"?
A loop (a script that invokes Claude in headless mode), a memory file the agent reads first and writes last, and a scheduler (cron or systemd) to fire it. That's it — everything else is refinement.

How does the agent remember things between runs?
It doesn't, automatically — each run is a fresh context. You persist state to disk (markdown + JSON, or a store) and make "read your memory" the first instruction every run. Memory is a discipline you impose, not a feature you toggle.

Cron or an always-on process?
Prefer many short scheduled runs. They're crash-resistant (a failed run is harmless), avoid context drift, and are easier to observe. Reserve always-on for genuinely event-driven work, and even then have it checkpoint to disk.

How do I stop it from running forever or going off the rails?
Hard caps (--max-turns, time limits, budget), a sandbox so the blast radius is contained, an allow-list of safe tools, and a no-fabrication rule so it reports reality. Guardrails first, autonomy second.

Can it use my internal APIs and tools?
Yes — via MCP servers you point Claude at, plus shell/HTTP within the sandbox. Expose the fewest tools needed and allow-list the safe ones.


Written by the team behind tokenscope, an open-source CLI for tracking Claude Code token costs. We run Claude as a long-lived autonomous agent in a sandboxed container — the loop, memory, and guardrail patterns above are the ones we operate on daily.

Top comments (0)