DEV Community

Devanshu Biswas
Devanshu Biswas

Posted on

ReWOO: plan every tool call up front, then call the model only twice

Give an agent a multi-hop question and the usual answer is ReAct: think, act, look at the result, think again, act again. It works, but there is a hidden bill. Every "think" is a full call to the model, and every call re-sends the entire growing transcript. A three-hop question costs you four model calls. A ten-hop one costs eleven. ReWOO — Reasoning WithOut Observation — asks a sharper question: what if the model never had to see the tool results at all while it was reasoning? Then it could plan the whole thing once, hand the plan to plain code, and only come back at the very end to write the answer. Two model calls, no matter how many hops.

The example

Take a genuinely multi-hop question: "What is the population of the capital of the country that won the 2018 World Cup?" You cannot answer it in one lookup. You need the winner (France), then its capital (Paris), then that city's population (2.1 million). Three tool hops, each depending on the last.

ReAct walks it like a conversation with itself:

  • Thought 1 (model call): "I need the 2018 World Cup winner." → Search → France
  • Thought 2 (model call): "Now the capital of France." → Search → Paris
  • Thought 3 (model call): "Now Paris's population." → Population → 2.1M
  • Final (model call): "The answer is 2.1 million."

Four LLM calls, and each one re-reads everything that came before. The model is in the loop the entire time.

ReWOO: plan, work, solve

ReWOO splits the agent into three roles, and only two of them ever touch the model.

1. Planner (one LLM call). Give the model the question and the tool list, and ask for the entire plan up front — as variable assignments:

Plan: find the winner
#E1 = Search[2018 World Cup winner]
Plan: find its capital
#E2 = Search[capital of #E1]
Plan: find that city's population
#E3 = Population[#E2]
Enter fullscreen mode Exit fullscreen mode

Notice the model never saw a single tool result to write this. It doesn't know the winner is France. It only needs to say "the capital of whatever #E1 turns out to be." That is the "without observation" part — the reasoning happens before any observation exists. The dependencies are declared with those #E1, #E2 tokens.

2. Worker (zero LLM calls). Now it's just code. Walk the steps in order. Before each tool call, substitute the tokens with the results collected so far, run the tool, store the result:

const evidence = {};
for (const s of steps) {
  let input = s.input;
  for (const [id, val] of Object.entries(evidence))
    input = input.replaceAll(id, val);          // #E1 -> "France"
  evidence[s.id] = await TOOLS[s.tool](input);  // real tool, no model
}
Enter fullscreen mode Exit fullscreen mode

#E1 resolves to "France", so #E2's input becomes Search[capital of France] → "Paris", and #E3 becomes Population[Paris] → "2.1 million". The observations are collected but never sent back to the model mid-run. No model is invoked in this whole loop.

3. Solver (one LLM call). Hand the model the plan with the evidence filled in and ask for the final answer. That's it — call number two.

Why this is cheaper

Count the calls as a function of hops k:

const reactCalls = k + 1;   // k thoughts + 1 final
const rewooCalls = 2;       // plan + solve, always
Enter fullscreen mode Exit fullscreen mode

For 3 hops: ReAct 4, ReWOO 2. For 10 hops: ReAct 11, ReWOO 2. For 30 hops: ReAct 31, ReWOO 2. ReWOO's model-call count is constant — it does not grow with the length of the chain. And it's not only the number of calls: ReAct re-sends the whole transcript on every call, so token usage compounds too. Fewer calls plus no repeated transcript is why the ReWOO paper reports large token savings at similar accuracy on multi-hop benchmarks. Fewer round-trips also means lower latency, since network time usually dominates.

The catch

Planning blind is exactly why ReWOO is cheap, and exactly where it can hurt. Because the Planner never sees a tool result, it cannot change course if one surprises it — a search returns nothing, an entity is ambiguous, a step errors, or the true answer needs a branch nobody anticipated. ReAct, seeing each observation, can rethink and pivot. So the two techniques sit on a spectrum: ReWOO trades adaptability for efficiency.

Reach for ReWOO when the path is predictable and multi-hop — fixed lookups, known pipelines, RAG retrievals where you already know which sources to hit. Reach for ReAct when the path is uncertain and must respond to what it finds. A nice middle ground is a hybrid: plan like ReWOO, but fire a fresh Planner call to re-plan whenever the Worker hits a surprise. You keep the two-call savings on the common path and the adaptability on the rare one.

Play with the interactive version — run both agents side by side and watch the LLM-call counters diverge as you add hops: https://dev48v.infy.uk/prompt/day23-rewoo.html

Top comments (0)