I ran an AI coding agent autonomously for a month. A 15-line script showed me it produced almost nothing.

#ai #productivity #programming #opensource

I spent a month making sure my autonomous AI coding agent would never sit idle. I wrapped Claude Code in a restart loop, added a watchdog that nudged it back to work after three minutes of silence, and tuned hooks so it would never stall. It worked. The machine almost never idled.

Then one evening I ran a fifteen-line script over every session log on disk, and it reframed everything I thought I knew about running an agent on its own.

The number I was watching

The fear when you first leave an agent running unattended is obvious: what if it just stops? What if it sits there burning a subscription, waiting for a human who isn't coming? So that is the failure mode I engineered against, for weeks.

By that metric, the month was a triumph. Here is the raw shape of the work, straight from ~/.claude/projects/*/*.jsonl, with no rounding in my favor:

295 sessions retained locally, spanning about 32 days
Median 488 assistant responses per session (an "assistant response" = one real turn: call a tool, read the result, decide, call the next)
85% of sessions did more than 100 responses
Sessions that did literally nothing: 6. Two percent.

The machine was not idle. The machine was a firehose. I had spent a month defending against a failure mode that happened two percent of the time.

The number that wasn't there

Now the other side of the ledger. Same month, same machine — everything that actually left the building:

47 product folders created on disk
36 finished drafts sitting unpublished
0 sales in the outcome ledger I'd started keeping

Forty-seven product folders. Thirty-six drafts that were done — written, edited, ready — and never shipped. That is what 488-responses-times-295-sessions of relentless activity had actually piled up: an enormous inventory of work no reader, user, or buyer had ever seen.

The firehose was pointed at the floor.

Run it on your own logs

This isn't a vibe. It's two minutes of counting. Here is the whole script — it's read-only, it writes nothing:

import json, glob, os, statistics

files = glob.glob(os.path.expanduser('~/.claude/projects/*/*.jsonl'))
counts = []
for f in files:
    a = sum(1 for line in open(f) if '"type":"assistant"' in line)
    counts.append(a)

total = len(counts)
print(f"sessions          : {total}")
print(f"median responses  : {statistics.median(counts):.0f}")
print(f"sessions over 100 : {sum(c>100 for c in counts)} "
      f"({sum(c>100 for c in counts)/total*100:.0f}%)")
print(f"did nothing       : {sum(c==0 for c in counts)} "
      f"({sum(c==0 for c in counts)/total*100:.1f}%)")

If your median is in the hundreds and your outward output — things published, shipped, sold — is near zero, you are where I was. The diagnostic is uncomfortable precisely because the activity looks like health.

Why a tireless agent makes this worse, not better

The failure mode of autonomous AI operation is not idleness. It's motion that never reaches anyone — and a tireless agent generates that comforting motion faster than a lazy one would. Every loop produces a plausible next action. Refactor this. Draft that. Audit the other thing. None of it is wrong, exactly. It just never crosses the line from activity to outcome, and the firehose volume hides the gap behind a wall of green checkmarks.

Three things moved the needle for me, and none of them were "work harder":

An outcome ledger. Redefine "progress" so that busywork structurally cannot count as a win — only a reader/user/buyer seeing something does. If a session can't name what reached a person, it didn't progress.
A gate before the action. Two questions asked before starting any task — who is helped, and which number it moves in 14 days — that kill floor-pointed work before it eats a single token.
A session-boundary rule. One handoff discipline so every restart doesn't re-derive what the last session already settled (a huge silent drain when you run in loops).

If you want to dig in

The script above, a richer read-only audit, and the full free first chapter are in the companion repo: autonomous-claude-ops. Run the audit, read the diagnosis, decide for yourself.
If you operate Claude Code and want the broader free safety tooling (hooks that catch destructive operations, token drain, silent failures), that lives here: cc-safe-setup.

The short book that expands these six mechanisms is coming to Kindle; the repo is the honest, runnable half, free either way. But the script is the part that matters today — go count your own logs before you trust your impression of the month. Mine was wrong by a wide margin.