SapotaCorp

Posted on May 24 • Originally published at sapotacorp.vn

ReAct vs Planning: when your agent stops making progress

#ai

A founder pinged us with a debugging request. His engineering team had been working on the same AI agent for three weeks. The agent's job was to generate weekly business reports from internal data. It would start, run a database query, run another database query, run a third one, occasionally call the formatter, then output a half-finished report at the 10-iteration safety limit.

The team had tried a bigger model, more tools, better tool descriptions, longer prompts, and tighter validation. Nothing worked. The agent kept exhibiting the same behavior: lots of motion, no convergence.

The diagnosis took fifteen minutes. The agent was running a ReAct loop on a task that needed a Planning pattern. Switching the architecture took half a day. The agent started generating reports correctly within one iteration each, took 18 seconds instead of timing out at 60, and the team shipped to production the next sprint.

This is one of the most common architectural mistakes in agent systems. The fix is not always obvious unless you understand what each pattern does well and where it breaks.

What ReAct actually does

ReAct (Reasoning and Acting) is the default agent pattern. The loop looks like:

Thought: the agent reasons about what to do next
Action: the agent picks a tool and inputs
Observation: the tool returns a result
Repeat until the agent decides to output a final answer

ReAct is reactive by design. Each step depends on the previous observation. It works well when the next move genuinely depends on what just happened. Searching for information, debugging a system, navigating a UI, exploring a database schema. The agent does not know what it will need next until it sees what it has.

ReAct also has a known weakness: on tasks with many steps, the agent loses track of the overall goal. After three or four cycles, the immediate observation dominates the agent's attention. It forgets the master objective and starts making locally reasonable but globally wrong decisions.

This is the loop death the founder's team was seeing. The agent kept reacting to each query result as if it were a fresh task, never zooming out to consider whether the report was actually getting written.

What Planning actually does

The Planning pattern (sometimes called plan-and-execute) splits the work into two phases:

Phase 1: Plan. The agent reads the goal and produces a structured plan: a list of steps with dependencies. No tool calls, no execution. Just the roadmap.

Phase 2: Execute. The agent runs each step in dependency order. Each step might call tools, but the agent is not deciding what to do next. The plan already says what to do next.

The pattern trades flexibility for predictability. The agent cannot adapt mid-flight to unexpected discoveries (well, not without a re-plan step), but it cannot get lost either. It knows what it is supposed to be doing because it wrote the plan.

For multi-step tasks with clear sub-goals, this is dramatically more reliable than ReAct. Generating a report has clear sub-goals: pull data, analyze, structure, format. Planning the sub-goals upfront and executing them in order is much harder to mess up than discovering them one observation at a time.

When ReAct is the right call

ReAct wins on:

Exploratory tasks: debugging, investigation, research where the path is genuinely unknown
Single-domain Q&A: customer support, FAQ chatbot, simple lookup
Tasks with 1 to 3 steps: short enough that loop death does not happen
Highly dynamic environments: where each observation can change the goal (trading bots, real-time monitoring)

The founder's report agent was none of these. The task was structurally fixed (always: data → analyze → structure → format), the steps were known in advance, and there was nothing exploratory about it. Forcing the agent to discover the steps every time was the bug.

When Planning is the right call

Planning wins on:

Multi-step tasks with clear sub-goals: report generation, data analysis pipelines, content creation flows
Tasks where cost predictability matters: planning step counts upfront lets you estimate cost before execution
Tasks with parallel sub-steps: the plan can identify independent steps and execute them concurrently
High-stakes outputs: the plan itself is auditable, which is harder with ReAct

The trade-off is the planning step itself. It costs one LLM call upfront, adds 1 to 3 seconds of latency, and the plan can be wrong. But for tasks where the structure is roughly known in advance, the upfront cost pays back many times over in execution reliability.

What we shipped for the founder

The original agent had eight tools (database queries, formatters, validators) and a single ReAct loop. The agent was supposed to figure out which queries to run in which order and how to format the result.

The new architecture:

Planner agent: reads the report request, outputs a structured plan with five to eight steps. Each step specifies which tool, what input, and which previous step's output it depends on.

Executor: runs the plan in dependency order. Independent steps run in parallel. Each step is a focused tool call, not a ReAct loop.

Synthesizer agent: assembles step outputs into the final report.

The plan was usually 6 steps for the standard weekly report (revenue query, expense query, customer count query, churn calculation, comparison to previous week, format). All six ran in roughly 18 seconds total because steps 1-3 ran in parallel.

The previous ReAct agent had been trying to discover this same six-step structure on every request, often getting lost on step 3 or 4 and never finishing. The planner agent figured out the structure once, and the executor just executed.

The hybrid pattern

The pattern Sapota actually ships most often is a hybrid: planning at the high level, ReAct at the leaf level.

For the report agent: the planner outputs six high-level steps. Most steps are direct tool calls (one query, one calculation). But step 4 ("analyze churn drivers") is open-ended and benefits from ReAct exploration. So step 4 spawns a sub-agent with a ReAct loop scoped to that single sub-goal.

The hybrid combines the predictability of planning at the structural level with the flexibility of ReAct at the exploratory level. It is more complex to implement than either pure pattern, but it handles a wider range of real-world tasks.

The decision matrix

When designing a new agent, walk through these questions:

Is the task structure known in advance? If yes, lean toward Planning. If no, ReAct.
Does the task have 4+ steps? If yes, Planning will be more reliable.
Are there independent sub-tasks? If yes, Planning can parallelize them. ReAct cannot.
Is the task exploratory or routine? Exploratory wants ReAct. Routine wants Planning.
Is cost predictability important? Planning lets you estimate before execution. ReAct does not.

For a customer support chatbot, ReAct usually wins. The questions are unpredictable, the steps are short, and exploratory tool use matters. For a report generator, content workflow, or data pipeline, Planning usually wins.

The sign your agent needs Planning

If your team is debugging an agent that exhibits any of these behaviors, the issue is likely architectural:

Frequently hits the iteration limit without finishing
Calls the same tool 3+ times with similar inputs
Produces partial outputs and stops
Latency is highly variable across similar inputs
"Sometimes works, sometimes doesn't" with no clear pattern

These are classic loop-death symptoms. Switching to Planning, or a hybrid plan-then-react, usually resolves them in a single sprint.

If your agent is stuck in loop death

If your team has been debugging an agent for weeks and the symptoms above sound familiar, the fastest fix is usually an architectural switch, not more prompt engineering.

Sapota offers a one-week agent architecture audit that takes your current pattern, identifies whether ReAct or Planning fits the task, and ships the migration as a working integration. We have done this for report agents, content workflows, customer support routing, and research synthesis. The patterns are similar; the breakdown into specific steps depends on the domain.

Reach out via the AI engineering page with a description of what your agent is supposed to do and what it is currently doing wrong. The diagnosis is usually clear within thirty minutes.

DEV Community