Claude Certified : Inside the Agentic Loop - How Claude Code Actually Decides What Tool to Call Next

#ai #claude #llm #agents

Claude Code looks like magic from the outside. You type a vague request — "add auth to this endpoint" — and it opens three files, runs a grep, writes a patch, runs the test suite, and commits. Somewhere in there it decided, at each step, what tool to call next. How?

If you've built on the Agent SDK or Claude Code for any length of time, you've probably built a mental model for this. But I'll bet that model is fuzzier than you think it is. Mine was, until I sat down to write this.

This post is a practical walkthrough of the agentic loop: what it is, how tool selection actually happens at each turn, where the loop terminates, and one counter-intuitive thing about sub-agent boundaries that I see people get wrong.

Everything here is based on Anthropic's public documentation — the Messages API, the Claude Agent SDK, and the Claude Code reference. No internal secrets, no reverse engineering.

The loop, in one diagram

At its core, the agentic loop is this:

user  →  model  →  tool_use  →  tool_result  →  model  →  ...
                                       ↑__________|

Written out in full:

The caller sends a Messages API request. The system prompt describes who the model is. The user turn contains the request. The tools parameter lists everything the model is allowed to call.
The model reads the full conversation plus the tool definitions and decides what to do next. If it needs information or wants to perform an action, it emits a tool_use content block with a tool name and arguments. Otherwise it emits text and ends the turn.
The runtime executes the tool and sends a new user turn back, containing a tool_result content block that references the previous tool_use_id.
The model reads the appended conversation again and decides again. Tool call, text, or stop.
Repeat until stop_reason is end_turn (model is done) or a hard step-limit is hit.

That's the whole thing. The loop is not running inside the model — the loop is running in your code (or Claude Code's code), and the model is only the "decide next step" function being called in a tight cycle.

This matters, because it changes how you reason about what the model "knows" at step N.

How the model actually picks a tool

There's no rule engine. There's no decision tree. Tool selection is model-driven: at each step, Claude sees the entire conversation so far (system prompt, user turn, all prior tool calls and results), plus the full tool definitions, and produces the next message. Which tool it picks, with what arguments, is an output of a single forward pass.

What this means in practice:

Tool descriptions matter far more than tool names. The model reads the description field on each tool like a human reads a function's docstring. If two tools have similar descriptions, the model will flip-flop between them in ways that look like bugs. Fix your descriptions first, blame the model second.
Arguments are grounded in context, not in the user's literal words. If the user says "fix the test that's failing," the model may call grep first, then read_file on whatever grep turned up, then edit_file with a patch — even though the user never mentioned any of those filenames. The grounding comes from tool results accumulating in the conversation.
The system prompt primes tool selection. A system prompt that says "always use grep to locate before reading" will nudge selection consistently. One that just lists tools leaves selection to vibe.

This is also why tool descriptions are part of your prompt budget. Every step, every description is re-sent. Big tool catalogs cost tokens.

What goes into the next turn — and what doesn't

Here's the part that confuses people. When a tool runs and returns, the runtime doesn't hand the result to the model as a variable. It appends a new turn to the conversation:

// pseudocode
conversation.push({
  role: "user",
  content: [
    {
      type: "tool_result",
      tool_use_id: previousCall.id,
      content: toolOutput,
    },
  ],
});

So from the model's point of view, tool results look like user messages. The model doesn't "remember" that the tool ran in some side channel — the result is part of the conversation history, and on the next forward pass the model sees it inline with everything else.

Two consequences:

Anything not in the conversation history is invisible to the model. If your runtime does something between tool calls — logs, metrics, side effects — the model has no idea it happened unless you explicitly append it as part of a tool result or an injected message.
Context grows every turn. Long agentic loops burn context fast. This is the tightest constraint on how deep your agents can go, and it's why context management (one of the CCA exam domains) is a load-bearing discipline in production.

When the loop stops

The loop terminates on one of three conditions:

The model returns stop_reason: "end_turn" with no tool_use blocks. This is the "I'm done" signal.
A hard step-limit configured by the runtime kicks in. Claude Code has one; your own SDK runner should have one too.
An error propagates up from a tool and the runtime decides to surface it instead of retrying.

The interesting case is the first one, because the model is the one deciding when to stop. If you see Claude Code continuing to loop when it shouldn't, that's usually a sign that tool results are sending ambiguous signals — a grep with no matches might read as "keep searching" instead of "nothing to find," depending on how the tool formats its output. This is a fixable problem, and it's almost always on the tool, not on the model.

The sub-agent boundary, and the thing people get wrong

Claude Code can delegate to sub-agents via the Task tool (and the Agent SDK exposes similar primitives). The mental model most people have is: "a sub-agent is just another agentic loop running in parallel."

Close, but missing the important part.

When you launch a sub-agent with Task, it runs in its own isolated conversation. It gets its own system prompt, its own tool set (configurable), its own context window. The parent agent's conversation is not visible to the sub-agent. The only thing that crosses the boundary on the way in is the task description you pass. The only thing that crosses on the way out is the sub-agent's final text response.

The counter-intuitive consequence: the parent cannot see what the sub-agent saw along the way. It only sees the final answer. If the sub-agent read ten files and synthesized a conclusion, the parent gets the conclusion. It cannot ask "which files did you read?" unless the sub-agent explicitly included that in its final response.

This is why the CCA exam hammers on structured handoff protocols. If you want the parent to use intermediate data the sub-agent found, the sub-agent must return that data in its final message, in a format the parent can parse. Not "remember" — there's no remembering across the boundary. Return, or it's gone.

Model-driven vs decision tree — when to use which

Not every workflow should be a model-driven loop. Sometimes you want a rigid decision tree: if X, do Y, else Z. The question is when.

A good heuristic:

Model-driven loop when the state space is large, fuzzy, or unknown — code navigation, open-ended research, conversational support. The whole point of an LLM is that it can generalize.
Decision tree when the state space is small, well-defined, and the cost of a wrong turn is high — payment flows, compliance checks, anything with a regulator on the other end.

Mixing the two is common and correct. Use a decision tree at the outer layer (is this a payment? is this a compliance query?) and a model-driven loop inside the branches that need flexibility. The CCA exam explicitly tests this framing under Domain 1 — "when to let the model decide vs when to route deterministically" is a recurring question pattern.

Wrapping up

The agentic loop isn't magic. It's a straightforward sequence: model reasons, tool runs, result gets appended as a user turn, model reasons again. Tool selection is a model-driven pick based on the full conversation. Sub-agents are isolated — return what you want the parent to see, or it's gone.

If you're preparing for the Claude Certified Architect exam, this material lives in Domain 1 (Agentic Architecture & Orchestration), which is 27% of the exam — the biggest chunk. There's more detail on the specific knowledge points at claudecertified.io/knowledge/domain1 — a free community-built study site with practice questions organized by domain. It's not affiliated with Anthropic; the questions are community-authored from the public exam guide.

If you're not studying for the exam, understanding the loop this way still pays. Production agent bugs are almost always bugs in how context flows through the loop, not bugs in the model.

Claude Certified is an independent, community-built practice platform and is not affiliated with or endorsed by Anthropic.