Give an agent a multi-hop question and the usual answer is ReAct: think, act, look at the result, think again, act again. It works, but there is a hidden bill. Every "think" is a full call to the model, and every call re-sends the entire growing transcript. A three-hop question costs you four model calls. A ten-hop one costs eleven. ReWOO — Reasoning WithOut Observation — asks a sharper question: what if the model never had to see the tool results at all while it was reasoning? Then it could plan the whole thing once, hand the plan to plain code, and only come back at the very end to write the answer. Two model calls, no matter how many hops.
The example
Take a genuinely multi-hop question: "What is the population of the capital of the country that won the 2018 World Cup?" You cannot answer it in one lookup. You need the winner (France), then its capital (Paris), then that city's population (2.1 million). Three tool hops, each depending on the last.
ReAct walks it like a conversation with itself:
- Thought 1 (model call): "I need the 2018 World Cup winner." → Search → France
- Thought 2 (model call): "Now the capital of France." → Search → Paris
- Thought 3 (model call): "Now Paris's population." → Population → 2.1M
- Final (model call): "The answer is 2.1 million."
Four LLM calls, and each one re-reads everything that came before. The model is in the loop the entire time.
ReWOO: plan, work, solve
ReWOO splits the agent into three roles, and only two of them ever touch the model.
1. Planner (one LLM call). Give the model the question and the tool list, and ask for the entire plan up front — as variable assignments:
Plan: find the winner
#E1 = Search[2018 World Cup winner]
Plan: find its capital
#E2 = Search[capital of #E1]
Plan: find that city's population
#E3 = Population[#E2]
Notice the model never saw a single tool result to write this. It doesn't know the winner is France. It only needs to say "the capital of whatever #E1 turns out to be." That is the "without observation" part — the reasoning happens before any observation exists. The dependencies are declared with those #E1, #E2 tokens.
2. Worker (zero LLM calls). Now it's just code. Walk the steps in order. Before each tool call, substitute the tokens with the results collected so far, run the tool, store the result:
const evidence = {};
for (const s of steps) {
let input = s.input;
for (const [id, val] of Object.entries(evidence))
input = input.replaceAll(id, val); // #E1 -> "France"
evidence[s.id] = await TOOLS[s.tool](input); // real tool, no model
}
#E1 resolves to "France", so #E2's input becomes Search[capital of France] → "Paris", and #E3 becomes Population[Paris] → "2.1 million". The observations are collected but never sent back to the model mid-run. No model is invoked in this whole loop.
3. Solver (one LLM call). Hand the model the plan with the evidence filled in and ask for the final answer. That's it — call number two.
Why this is cheaper
Count the calls as a function of hops k:
const reactCalls = k + 1; // k thoughts + 1 final
const rewooCalls = 2; // plan + solve, always
For 3 hops: ReAct 4, ReWOO 2. For 10 hops: ReAct 11, ReWOO 2. For 30 hops: ReAct 31, ReWOO 2. ReWOO's model-call count is constant — it does not grow with the length of the chain. And it's not only the number of calls: ReAct re-sends the whole transcript on every call, so token usage compounds too. Fewer calls plus no repeated transcript is why the ReWOO paper reports large token savings at similar accuracy on multi-hop benchmarks. Fewer round-trips also means lower latency, since network time usually dominates.
The catch
Planning blind is exactly why ReWOO is cheap, and exactly where it can hurt. Because the Planner never sees a tool result, it cannot change course if one surprises it — a search returns nothing, an entity is ambiguous, a step errors, or the true answer needs a branch nobody anticipated. ReAct, seeing each observation, can rethink and pivot. So the two techniques sit on a spectrum: ReWOO trades adaptability for efficiency.
Reach for ReWOO when the path is predictable and multi-hop — fixed lookups, known pipelines, RAG retrievals where you already know which sources to hit. Reach for ReAct when the path is uncertain and must respond to what it finds. A nice middle ground is a hybrid: plan like ReWOO, but fire a fresh Planner call to re-plan whenever the Worker hits a surprise. You keep the two-call savings on the common path and the adaptability on the rare one.
Play with the interactive version — run both agents side by side and watch the LLM-call counters diverge as you add hops: https://dev48v.infy.uk/prompt/day23-rewoo.html
Top comments (0)