The Planner-Executor Split: Why One Agent Loop Is Not Enough

#ai #agents #llm #architecture

Book: AI Agents Pocket Guide: Patterns for Building Autonomous Systems with LLMs
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You give the agent a five-step task. Migrate a table, backfill it, update the read path, flip a flag, verify. The first two steps go fine. Then step three returns an error the model did not expect. The model improvises. By step four it has forgotten step five exists. It declares victory on a job that is half done, and the trace looks confident the whole way through.

This is the single-loop failure mode. One model holds the plan, the current step, the tool results, and the next decision all in the same context window, all in the same call. Every turn it re-derives what it is doing from a conversation history that keeps growing and keeps mutating. Nothing in the loop is responsible for the plan staying intact. The plan is just more tokens, and tokens drift.

The fix is to stop asking one model to do two jobs. Split planning from execution.

What a single loop actually does on every turn

A standard agent loop calls the model, reads the tool calls, runs them, feeds the results back, and calls the model again. The model is doing strategy and tactics in one shot. It decides the overall approach, picks the next concrete action, and interprets the last result, all inside one forward pass.

That works for short tasks. The context is small, the goal is fresh, the model has not had room to lose the thread. It falls apart on long tasks for a reason that has nothing to do with model quality. The plan and the execution state share one context. As the execution history grows, it crowds out the plan. The model attends to the last three tool errors and stops attending to the goal you gave it forty turns ago.

You see it as drift. The agent solves a problem you did not ask about. It fixes the thing that broke instead of the thing you wanted. It quits early because the recent context reads like the work is done.

Two models, two jobs

The planner takes the goal and produces a plan. A list of steps, each with a success condition. It does not call tools. It does not touch the API. It reads the task and the current state and outputs structure.

The executor takes one step at a time and gets it done. It calls tools, reads results, retries, and reports back whether the step succeeded or failed and why. It does not decide what the next step is. It does not reconsider the goal. It owns the small loop for a single step.

The orchestrator sits between them. It hands the executor the next step, collects the result, and decides whether to continue, re-plan, or stop. The plan lives in the orchestrator's state, not inside a model's context window where it can erode.

from dataclasses import dataclass, field


@dataclass
class Step:
    description: str
    success_condition: str
    done: bool = False
    result: str = ""


@dataclass
class Plan:
    goal: str
    steps: list[Step] = field(default_factory=list)

The plan is a plain data structure the orchestrator holds. The planner writes it. The executor never sees the whole thing at once, only the one step it is working.

The orchestrator loop

def run(goal, planner, executor, max_replans=3):
    plan = planner.make_plan(goal)
    replans = 0

    while not all(s.done for s in plan.steps):
        step = next(s for s in plan.steps if not s.done)
        outcome = executor.run_step(goal, step)

        if outcome.ok:
            step.done = True
            step.result = outcome.detail
            continue

        if replans >= max_replans:
            return plan, "replan_budget_exhausted"

        plan = planner.replan(goal, plan, step, outcome)
        replans += 1

    return plan, "done"

The executor returns a structured outcome, not prose. The orchestrator reads outcome.ok and branches. A failed step does not crash the run and it does not get improvised around inside the executor. It goes back to the planner, which is the only component allowed to change the plan.

Notice the planner sees the failure when it re-plans. It gets the goal, the plan so far, the step that failed, and why. That is enough to revise. Maybe the failing step needed a prerequisite. Maybe the approach was wrong and three steps need replacing. The planner decides, the orchestrator carries it out.

Re-plan triggers: when to call the planner again

Re-planning is the part teams skip, and it is the part that earns the split. A plan made before any tool ran is a guess. The world answers back. You re-plan when the answer does not match the guess.

Concrete triggers worth wiring in:

A step fails its success condition. The clearest signal. The executor tried and could not meet the bar. The plan assumed something false.
A step succeeds but returns surprising state. The backfill worked, but it touched ten times the rows the plan expected. The next steps were sized for the small number.
The same step fails twice. Not a transient error. The approach is wrong, and retrying the executor will not fix an approach problem.
A step is no longer reachable. A prerequisite resource got deleted out from under the run. The remaining plan is stale.
The user changes the goal mid-run. Rare in batch jobs, common in chat. The whole plan is now answering the wrong question.

def needs_replan(step, outcome, attempts):
    if not outcome.ok and attempts >= 2:
        return True, "repeated_failure"
    if outcome.ok and outcome.surprising:
        return True, "unexpected_state"
    if outcome.blocked:
        return True, "step_unreachable"
    return False, None

Cap the re-plans. A planner that re-plans without a budget is a different runaway loop wearing a strategy hat. Three is a reasonable default. When you hit the cap, stop and return a partial result with the reason, the same way a good single loop stops on its iteration budget.

The cost and latency trade-off

The split is not free. You pay for it in two places.

You add latency on every re-plan. A re-plan is an extra model call that produces no user-visible progress, just a revised list of steps. On a task that re-plans twice, that is two turns of pure planning the user waits through.

You add cost, but less than the naive count suggests, and here is the part that surprises people. The planner and the executor do not have to be the same model. The planner does the hard reasoning over the whole task: use your strongest, most expensive model there, but call it rarely, once per plan and once per re-plan. The executor runs many turns but each one is a narrow, well-scoped step: use a cheaper, faster model, because reading one tool result and deciding one retry is not the job that needs the frontier.

That asymmetry is the real argument for the split on a cost basis. A single loop runs your most expensive model on every turn, including the dozens of turns that are pure mechanical tool-running. The split runs the expensive model on the handful of turns that need judgment and the cheap model on everything else.

planner = Planner(model="strong-reasoning-model")
executor = Executor(model="fast-cheap-model")

Measure it on your own traffic before you commit. If your tasks are short, the single loop wins: the planning overhead buys you nothing because there is no drift to prevent on a three-step job. The split earns its keep on long, multi-step tasks where a single loop loses the thread and you pay for the lost turns in retries and wrong answers.

Where the single loop still wins

Be honest about the counter. Plenty of agent work is short and linear. Answer a question, call one tool, return the result. Wrapping that in a planner, an executor, and an orchestrator adds three moving parts and two extra model calls to a job that needed none of it.

The split is for tasks where the plan and the execution state fight for the same context. The signal is drift: runs that quit early, solve the wrong problem, or forget a late step. If your traces show that, separate the two jobs. If they do not, the single loop is the simpler thing, and simpler ships.

There is also a middle ground. Keep one model, but force it to write an explicit plan to a scratchpad before it acts, and re-read that scratchpad every turn. That recovers some of the discipline without the second model. It does not recover the cost asymmetry, and it does not stop the plan from eroding under a long history, but it is cheaper to build and a fine first step.

The shape to take away

Reach for the split when your single loop drifts. Skip it when your tasks are short. The split moves the plan out into orchestrator state where it cannot erode, scopes the executor to one step at a time, re-plans when reality disagrees with the guess, and lets you run the expensive model only where judgment is needed.

The AI Agents Pocket Guide works through this pattern and the others it pairs with: structured plans, re-plan triggers, and the orchestration that holds them together. The chapter on multi-step agents covers the planner-executor split end to end, including the model-routing trade-off this post sketches.