- Book: AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs
- Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
- My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
- Me: xgabriel.com | GitHub
You have an agent that books travel. A user asks it to find a flight to Lisbon, reserve a hotel near the venue, and add both to their calendar. You wired it as a ReAct loop: the model thinks, calls a tool, reads the result, thinks again. It works in the demo. Then a real user runs it, the flight search returns nothing for their dates, and the model spends four more turns flailing before it gives up halfway through, hotel unbooked, calendar untouched. The loop never knew the shape of the whole job. It only ever saw the next step.
That failure is not a bug in your prompt. It is a property of the loop shape you picked. ReAct and Plan-and-Execute are the two dominant shapes in 2026, and they fail in opposite ways. The choice is an architecture decision, and like most architecture decisions, the wrong one is cheap to make and expensive to live with.
The two shapes
ReAct interleaves reasoning and acting. The pattern comes from the 2022 ReAct paper by Yao et al. The model produces a thought, an action, and then reads the observation from that action before producing the next thought. One model call per step. The plan only exists in the model's head, regenerated from scratch every turn.
def react(model_call, tool_runner, messages, tools):
while True:
resp = model_call(messages, tools)
messages.append(
{"role": "assistant", "content": resp.content}
)
calls = [
b for b in resp.content if b.type == "tool_use"
]
if not calls:
return resp # model decided it is done
results = [
tool_runner(c.name, c.input) for c in calls
]
messages.append(
{"role": "user", "content": results}
)
Plan-and-Execute splits the work in two. A planner model call produces an ordered list of steps up front. An executor then runs the steps, usually with a smaller model or no model at all for the mechanical ones. The plan is an explicit data structure you can read, log, and edit.
def plan_and_execute(planner, executor, goal, tools):
plan = planner(goal, tools) # one call, returns steps
results = []
for step in plan.steps:
out = executor(step, results, tools)
results.append(out)
if out.failed and step.required:
return replan(planner, goal, plan, results)
return results
The difference is where the plan lives. In ReAct it is implicit and recomputed every turn. In Plan-and-Execute it is explicit and computed once. Everything else follows from that.
Latency
ReAct pays for a full model round-trip on every step. A six-step task is six sequential model calls, each one waiting on the previous observation before it can start. The thinking and the doing are welded together, so you cannot overlap them. Latency scales linearly with the number of steps, and each step carries the model's full reasoning cost.
Plan-and-Execute pays for one big planning call, then cheap execution. If the steps are independent, the executor can run them in parallel. If three of your six steps are read-only API lookups with no dependency between them, they go out at once. The planning call is slower than a single ReAct step because it reasons over the whole task, but you make it once instead of six times.
For short tasks, ReAct often wins on latency because there is no upfront planning tax to amortize. For long tasks with parallelizable steps, Plan-and-Execute wins, sometimes by a lot. The crossover is usually around three to four steps, but measure it on your own traffic rather than trusting that number.
Recoverability
This is where the travel-agent story bites. ReAct recovers from a failed step naturally, because every step is a fresh decision. The flight search returns nothing, the model sees the empty result, and it adapts on the next turn. No special handling. The loop was always going to look at the observation and decide what to do next, so a bad observation is just another input.
Plan-and-Execute recovers badly by default. The plan was written before any step ran, so it assumes every step succeeds. When step two fails, the executor is holding a plan that no longer matches reality. You need an explicit replanning path: detect the failure, feed the partial results back to the planner, get a revised plan. Without that, the executor either marches on through steps that depend on the failed one or stops dead.
So ReAct is more resilient to surprise, and Plan-and-Execute is more resilient to drift. ReAct adapts to a single bad observation for free but can wander off the goal over many turns, each local decision reasonable, the global trajectory lost. Plan-and-Execute holds the goal steady because the plan is the goal, but it needs deliberate machinery to handle a step that goes sideways.
Token cost
Token cost tracks latency, because tokens are most of what you pay for. ReAct resends the growing conversation on every step. Turn one is small. Turn six carries every thought, action, and observation from turns one through five. The context grows quadratically with steps if the observations are large, because each new turn re-reads all prior observations. A six-step ReAct run over verbose tool outputs can cost more in tokens than the task seems to justify.
Plan-and-Execute spends its tokens differently. The planning call reads the goal and the tool definitions once and produces a compact plan. Execution steps each carry only what that step needs, not the entire history. If your executor is a smaller, cheaper model, or a deterministic function for the mechanical steps, the per-step cost drops further. You front-load the expensive reasoning into one call and keep the rest lean.
The trap is replanning. Every replan is another full planning call, and a Plan-and-Execute agent that replans on every step has quietly turned into an expensive ReAct loop with extra steps. If your traffic replans constantly, that is a signal the planner cannot see enough up front, and ReAct is the more honest shape for that workload.
When each one wins
ReAct wins when the task is exploratory and the next step genuinely depends on what the last one returned. Debugging, research, anything where you cannot know step three until you see the result of step two. It wins on short tasks where planning overhead is not worth paying. It wins when you want the simplest possible loop, because there is less to build and less to break.
Plan-and-Execute wins when the task has a knowable structure up front. Multi-step workflows, report generation, anything where the steps are mostly independent and parallelizable. It wins when you need the plan to be auditable, because a regulator, a user, or your own on-call wants to see what the agent intended to do before it did it. It wins when latency and token cost matter at scale and the steps are cheap to execute.
Two practical notes. First, the boundary is not a wall. A common production shape is a planner that produces coarse steps, where each step is itself a small bounded ReAct loop. You get the global structure from the plan and the local adaptability from the inner loops. Second, whichever shape you pick, put a budget on it: a max-iteration cap on ReAct so a wandering loop cannot run forever, and a max-replan cap on Plan-and-Execute so a thrashing planner cannot bill you to death. The loop shape decides how the agent thinks. The budget decides how much that thinking is allowed to cost.
The decision in one paragraph
Pick ReAct when you cannot predict the path and the tasks are short. Pick Plan-and-Execute when the path is mostly knowable, the steps parallelize, and someone needs to read the plan. Then instrument both ends so you can see, from real traces, whether your tasks actually have the shape you assumed. Most teams guess wrong on the first build and find out from the latency graph and the token bill. The graph and the bill are the real arbiter, not the paper you read.
The two shapes are not rivals so much as answers to different questions. ReAct answers "what should I do next?" one step at a time. Plan-and-Execute answers "what is the whole job?" once and then carries it out. The right question depends on whether you know the job before you start.
If you are building agents and want the loop patterns laid out with the failure modes attached, this is the territory the AI Agents Pocket Guide walks through: loop shapes, replanning, bounded iteration, and the recovery patterns each shape needs. The chapter on agent loops pairs directly with the trade-offs in this post.

Top comments (0)