I let Claude Code run autonomously for a month. "Productive" was a story my own logs didn't support

#claude #ai #devops #productivity

I ran Claude Code in a near-autonomous loop for weeks — wake it, let it work, check in once a day. Every session looked productive: commits, files touched, tasks "done," long busy transcripts. Then I went back and counted what had actually reached a reader, a user, or a buyer. The number was much smaller than the motion suggested. The agent had been busy at the floor — generating activity that reads as work but doesn't move anything outward.

This isn't a "Claude is bad" post. It's the opposite: the model does what you point it at, and if your loop rewards motion, you get motion. The fix is operator-side, and it's checkable against your own logs. Here's the part you can run today.

Activity is not progress — and the logs will prove it

The trap has a precise shape: an autonomous agent, left to define its own "progress," will count internal busywork — reorganizing notes, re-explaining its plan, re-reading files, "preparing" — as wins. None of it produces an outward outcome. Over a long run, the transcript fills up and the outward ledger stays nearly flat.

You don't have to take my word for it. Claude Code writes every session to ~/.claude/projects/<project>/*.jsonl. Go count, per session: how many turns happened, versus how many produced something a reader/user/buyer could actually see (a publish, a shipped change, a sent reply). The ratio is usually sobering. Mine was.

Two cheap defenses that change the loop

1. An outcome ledger where only outward events earn a line. Keep a record (a flat file is enough) where the only thing that registers as progress is an outward event with a link: published URL, shipped commit that reached users, a reply sent. Internal reorganizing is structurally unable to score. When "progress" can only mean "something left the building," the agent stops rewarding itself for motion.

2. A gate that runs before an action, not after. Before starting any task, force two questions: who specifically benefits from this? and which number moves within 14 days? ("Myself" and "none" are failing answers.) Most floor-pointed work dies at this gate before it consumes a single token — which, in a long session, is also where the real money goes (the re-sent context is billed every turn and grows with conversation length, so a busy-but-pointless session is also an expensive one).

Why this matters more for autonomous runs than for chat

In an interactive session you are the gate — you feel the lack of progress and redirect. In an autonomous loop, nobody is watching turn by turn, so the busy-machine trap compounds silently until you check the ledger a day later and find motion without outcomes. The defenses above put the gate and the ledger into the loop so it self-corrects instead of drifting.

I wrote up the full version of this — seven mechanisms that make an autonomous agent look productive while shipping little, each with something you can run against your own ~/.claude/projects/*.jsonl logs (the busy-machine trap, the outcome ledger, the pre-action gate, silent tool-result failures, the cost drain that grows with session length, carrying state across restarts, and protecting work a subsystem can silently wipe) — in a short book, Autonomous Claude Ops (7 chapters, on Gumroad). Every number in it is measured from real logs, with the verification script, or left out. Free hooks that wire some of these checks in: cc-safe-setup (npx cc-safe-setup, MIT).

If your agent runs while you sleep, the dangerous failure isn't a crash — it's a month of busy transcripts that shipped nothing. Count the outward outcomes in your own logs before you trust the motion.