DEV Community

golproductions
golproductions

Posted on • Originally published at golproductions.com

152 Failed Commands in One AI Agent Session. 1 With Check.

I ran Claude Code on a production project. Deploying workers, editing files, running builds. Afterward I parsed the full transcript. 152 command failures in one session.

The failures themselves weren't surprising. Wrong flags that don't exist. Missing binaries tried over and over. Bash syntax errors. The usual AI agent mistakes.

What hurt was the retries..

The agent tried gh: command not found. Failed. Tried it again in PowerShell. Failed. Tried it three more times across the session. Five attempts to run a binary that was never installed.

Same thing with wrangler deploy. The route config was wrong. The agent adjusted one thing, retried. Still wrong. Adjusted again. Eight times for the same broken config.

Every failure dumps the error output into the context window. Every retry is another inference call carrying that error forward. The context gets heavier and the cost compounds.

I restarted the session with a pre-execution hook. Every command the agent generates passes through validation before it reaches the shell. If the command won't work — wrong flag, missing binary, bad syntax — it gets blocked. The agent never sees the error so it never retries.

Same project. Same kind of work. One failure. That one was a PowerShell syntax edge case from parsing a large file. Not a hallucinated command.

The retries didn't decrease. They pretty much disappeared.

Full breakdown with the numbers: 152 Failed Commands Without Check. 1 With It.

(https://www.golproductions.com/blog/check-if-url-is-reachable-without-visiting.html)

Top comments (0)