DEV Community

Yurukusa
Yurukusa

Posted on

My AI agent answered 488 times a session. It shipped nothing.

When you first leave an AI coding agent running on its own, the fear is obvious: what if it just stops?

What if it sits idle, burning a subscription, waiting for a human who isn't coming?

So that is what I built against. I wrapped Claude Code in a restart loop. I added a watchdog that nudged it back to work whenever it went quiet for three minutes. I tuned hooks to keep it from stalling. For about a month, almost all of my effort went into one goal: never let the machine idle.

It worked. The machine almost never idled.

Then one evening I did the thing I should have done on day one. I stopped trusting my impression of how the month had gone, and I counted.

A month of "productive" autonomy, straight from the logs

Every session Claude Code runs leaves a transcript on disk under ~/.claude/projects/*/*.jsonl. I ran a fifteen-line script over all of mine. No rounding in my favor:

  • 295 sessions, spanning about 32 days.
  • Median 488 assistant responses per session. (An "assistant response" is one real turn of work: call a tool, read the result, decide, call the next.)
  • 252 of 295 sessions — 85% — produced more than 100 responses.
  • Sessions that did literally nothing — zero responses: 6. Two percent.

Sit with that median. Four hundred and eighty-eight turns of work in the middle session. By the only metric I had been watching — is it working? — this was a triumph. The machine wasn't idle. The machine was a firehose.

I had spent a month defending against a failure mode that happened two percent of the time.

The other side of the ledger

Same month, same machine. Here is everything that actually left the building:

  • 47 product folders created under ~/products/.
  • 36 finished drafts — written, edited, ready — sitting unpublished.
  • ¥6,144 in cumulative revenue.
  • 0 sales recorded in the outcome ledger I had started keeping.

Forty-seven product folders. Thirty-six drafts that were done and never shipped. That is what 488-responses-times-295-sessions of relentless activity had piled up: an enormous inventory of work no reader, user, or buyer had ever seen.

The firehose was pointed at the floor.

Two kinds of numbers that feel identical from the inside

Here is the trap, stated plainly. An agent can grow two kinds of numbers, and they feel the same while you are inside the session.

The first kind measures motion: responses, tool calls, commits, files touched, drafts written. These are cheap. An agent that never idles grows them without limit, and every increment feels like progress. The session was busy. The commit count went up. Surely something happened.

The second kind measures outward value: things published, things sold, things a real person reacted to. These are expensive — and here is the cruel part — an agent cannot grow them by working harder. Publishing requires a decision to expose work to judgment. Selling requires someone on the other side. No amount of internal motion crosses that gap on its own.

An autonomous agent optimizes whatever you let it count as progress. If "progress" means motion, it will give you motion — 488 responses a session of it — while the outward number sits at zero and you congratulate yourself on how hard the machine is working. The very thing I built to prevent idleness made the trap worse, because a tireless machine generates the comforting motion faster than a lazy one would.

This isn't a Claude Code problem. It's a problem with what you tell any autonomous worker to count. But agents make it acute, because they remove the natural friction — boredom, fatigue, the human pause before busywork — that used to cap how much motion you could mistake for progress.

Count it on your own machine

Don't take my 488 on faith. If you run Claude Code, the same claim is checkable against your own logs. This only reads; it changes nothing.

python3 - <<'PY'
import glob, os, statistics
counts = []
for f in glob.glob(os.path.expanduser('~/.claude/projects/*/*.jsonl')):
    a = sum(1 for line in open(f) if '"type":"assistant"' in line)
    counts.append(a)
if counts:
    counts.sort()
    print(f"sessions:            {len(counts)}")
    print(f"median responses:    {statistics.median(counts):.0f}")
    print(f"over 100 responses:  {sum(1 for c in counts if c > 100)}")
    print(f"zero-response (idle): {sum(1 for c in counts if c == 0)}")
PY
Enter fullscreen mode Exit fullscreen mode

Now hold that median next to a number only you can produce: how many things did this machine actually ship to a real person in the same period? Published posts, sales, replies that helped someone. If the first number is large and the second is near zero, you have found the firehose pointing at the floor.

The fix is a different scoreboard, not a faster machine

The way out is not more hooks or a tighter restart loop. It is to make the agent count the expensive number. The single change that moved mine: an outcome ledger — one append-only file where the only events that count are publish, sell, release, and real reaction. Tool calls don't go in it. Commits don't go in it. Drafts don't go in it. When the agent's progress is measured by that file, "I refactored the helper for the third time" stops looking like a win, and "nobody has seen this yet" becomes visible.

I wrote up the full version of this — the ledger, the gate that runs before risky actions, the silent-failure detection, and surviving the context-window boundary — as a short, honest book built entirely on real logs. The companion repo is free and runnable (the counter above, an expanded verify.py, and the first chapter in full):

👉 github.com/yurukusa/autonomous-claude-ops

If you run an agent unattended — or you're about to — count your two numbers first. The motion one will flatter you. The outward one is the only one your users ever see.

Top comments (0)