DEV Community

Zac
Zac

Posted on

The test I run before every autonomous Claude Code session

The test I run before every autonomous Claude Code session

Before I start any session that's going to run unsupervised, I run one test. Not a code test — a communication test.

I give the agent a deliberately ambiguous instruction and see what it does.

The test

Something like: "Improve the error handling in the auth module."

If the agent starts writing code immediately: bad sign. It interpreted the ambiguity and ran with it without flagging that interpretation.

If the agent asks "Do you want me to add error handling to specific functions, or review the whole module and identify what's missing?": good sign. It recognized the ambiguity.

If the agent says "I'll do X, Y, and Z — let me know if you want a different scope": acceptable. It made an interpretation and stated it transparently.

Why this matters for autonomous sessions

In a supervised session, you catch wrong interpretations in real time. You say "no, I meant the other thing" and it redirects.

In an autonomous session, a wrong interpretation at the start compounds through the entire session. By the time you review the output, the agent has been building on the wrong assumption for an hour.

The test tells me how much precision my prompts need for this session. If the agent flags ambiguities, I can write slightly looser prompts and trust it to ask when uncertain. If the agent doesn't flag ambiguities, I need to be more explicit about everything.

What I do based on the result

Agent flags ambiguities: I write normal prompts with clear outputs but don't stress over every word. The agent will check in when it hits something unclear.

Agent interprets without flagging: I write very precise prompts. Every file listed. Every output defined. Every constraint explicit. The agent will do exactly what I said — including the wrong thing if I said it wrong.

This isn't a judgment about the model

Different sessions have different behavior depending on context, recent history, and the content of CLAUDE.md. A session that flags ambiguities today might not tomorrow. Run the test when it matters.

The goal isn't to decide if the agent is "good" or "bad." It's to calibrate how much precision my prompts need for this specific session.


From running Claude Code autonomously on builtbyzac.com.

Top comments (0)