WonderLab

Posted on May 24

Agent Series (3): Plan-and-Solve — Think First, Then Act

#langchain #ai #llm #agents

Where Does ReAct Hit a Wall?

The previous article established ReAct's greedy strategy — each step looks at only the current state and decides the next action. This works well most of the time, but there's one class of task where it stumbles.

Imagine you ask an Agent to do this:

Search for the release years of Python, Java, and Go. Sort them chronologically. Calculate how many years apart Python and Go are.

A typical ReAct execution might look like:

Action: web_search("Python release year")
Action: web_search("Java release year")
Action: web_search("Go release year")
Action: calculator("...")
(occasionally repeats a search or takes extra steps)

That's not terrible — but there's a latent problem: ReAct has no global plan before acting. It doesn't know how many steps the task needs, doesn't know which step depends on which, and doesn't know where it is in the overall task. Every step is locally optimal, not globally optimal.

For multi-step tasks with clear dependencies, this is like navigating without a map — you'll eventually arrive, but you'll take detours.

Plan-and-Solve's answer: use the LLM to produce a complete action plan first, then execute step by step.

The Two-Phase Architecture

This paradigm comes from the 2023 paper Plan-and-Solve Prompting. The core idea is two phases:

Phase 1 — Plan: Ask the LLM to analyze the entire task from a bird's-eye view and output an ordered list of steps. No tools are called during this phase — it's pure thinking.

Phase 2 — Solve: Execute each step in the plan, one at a time. Tools can be called at each step. The result of the previous step is injected into the next step's context.

With the production-essential fault-tolerance mechanisms added, the complete architecture looks like this:

Task
 │
 ▼
[Plan Node]     ← LLM generates 3-7 step plan (no execution, just planning)
 │
 ▼
[Execute Node]  ← Execute current step (embedded ReAct, can call tools)
 │
 ├─ Step failed? ─→ [Replan Node] ← Re-plan remaining steps based on progress so far
 │                      │
 │                      └──────────────┐
 │                                     ▼
 ├─ More steps? ─→ back to Execute    Execute (continue)
 │
 └─ All done? ─→ [Finalize Node] ← Output final answer
                       │
                       ▼
                      END

The key difference from ReAct: ReAct is an open-ended loop; Plan-and-Solve is a sequence with a defined endpoint.

LangGraph Implementation: State + Graph

LangGraph is the ideal tool for this architecture — it models the Agent as a state machine (StateGraph), with state flowing between nodes.

State Design

from typing import TypedDict

class PlanSolveState(TypedDict):
    task: str                    # original user task
    plan: list[str]              # current plan (list of steps)
    completed_steps: list[str]   # completed steps with result summaries
    current_step_index: int      # which step we're on (0-based)
    step_result: str             # result of the current step
    replan_count: int            # how many times we've replanned
    final_answer: str            # the final answer

State is the "bloodstream" of the entire graph — all nodes read from it and write to it. Design the state well, and you've won half the battle.

Plan Node

def plan_node(state: PlanSolveState) -> dict:
    messages = [
        SystemMessage(content=PLANNER_SYSTEM),  # planner expert prompt
        HumanMessage(content=f"Task: {state['task']}"),
    ]
    response = llm.invoke(messages)
    plan = parse_plan(response.content)  # parse "1. xxx\n2. xxx" format

    return {
        "plan": plan,
        "current_step_index": 0,
        "completed_steps": [],
    }

The Planner system prompt is critical:

PLANNER_SYSTEM = """You are a task planning expert.
Rules:
1. Break the task into 3-7 independent steps
2. Each step must be concrete and actionable
3. Steps must have clear dependencies (later steps can use earlier results)
4. The final step should be "synthesize all information and deliver the answer"

Output format (only the step list, nothing else):
1. [step description]
2. [step description]
...
"""

Execute Node (Embedded ReAct Sub-Agent)

def execute_node(state: PlanSolveState) -> dict:
    idx = state["current_step_index"]
    current_step = state["plan"][idx]

    # Build execution context (includes results from completed steps)
    system_prompt = EXECUTOR_SYSTEM.format(
        completed_steps=format_completed_steps(state["completed_steps"]),
        current_step=current_step,
    )

    # Use a ReAct sub-agent to execute a single step (may need tools)
    sub_agent = create_react_agent(model=llm, tools=[calculator, web_search])
    result = sub_agent.invoke(
        {"messages": [
            SystemMessage(content=system_prompt),
            HumanMessage(content=f"Execute this step: {current_step}"),
        ]},
        config={"recursion_limit": 8},
    )

    step_result = result["messages"][-1].content
    new_completed = state["completed_steps"] + [
        f"{current_step} → {step_result[:100]}"
    ]

    return {
        "step_result": step_result,
        "completed_steps": new_completed,
        "current_step_index": idx + 1,
    }

There's an important design choice here: the Execute node embeds a ReAct sub-agent. Plan-and-Solve and ReAct aren't mutually exclusive — Plan-and-Solve provides global structure, ReAct handles tool calls within each step.

Routing Function

MAX_REPLAN = 2

def should_continue(state) -> Literal["execute", "replan", "finalize"]:
    idx = state["current_step_index"]
    total = len(state["plan"])

    if idx >= total:
        return "finalize"  # all steps complete

    # detect step failure
    result = state.get("step_result", "")
    failed = any(kw in result for kw in ["Calculation error", "Search failed", "Error"])

    if failed and state["replan_count"] < MAX_REPLAN:
        return "replan"  # failed, still have retry budget

    return "execute"  # keep going

Building the Graph

from langgraph.graph import END, START, StateGraph

graph = StateGraph(PlanSolveState)

graph.add_node("plan", plan_node)
graph.add_node("execute", execute_node)
graph.add_node("replan", replan_node)
graph.add_node("finalize", finalize_node)

graph.add_edge(START, "plan")
graph.add_edge("plan", "execute")
graph.add_conditional_edges(
    "execute",
    should_continue,
    {"execute": "execute", "replan": "replan", "finalize": "finalize"},
)
graph.add_conditional_edges(
    "replan", after_replan,
    {"execute": "execute", "finalize": "finalize"},
)
graph.add_edge("finalize", END)

agent = graph.compile()

Full code: agent-02-plan-and-solve/plan_and_solve_agent.py

Real Execution: Watching the Plans Get Made

Demo 1: Multi-Country Population Data

Task: Search China, US, and India's populations. Calculate the total and China's share.

The Planner's output:

1. Search "China population", "US population", "India population"
   to get the latest figures.
2. Record China, US, and India's population numbers.
3. Add China, US, and India's populations to get the three-country total.
4. Calculate China's population as a percentage of the three-country total.
5. Synthesize all information and deliver the final answer.

Execution trace:

[Step 1] web_search("China population") → 1.40489 billion
         web_search("US population")    → 341 million
         web_search("India population") → 1.451 billion

[Step 2] Record results (no tool call, model consolidates)
         → China 1.40489B, US 341M, India: no data available ← ⚠️

[Step 3] calculator("14048900000.0 + 3400000000.0")
         → 17448900000 ← ⚠️ India missing!

[Step 4] calculator("14.0489 / 17.4489 * 100")
         → 80.5145%

[Final answer] Three-country total: 1.74489B, China's share: 80.5145%

Wait. What happened?

Step 1 successfully found India's population (1.451 billion). But Step 2 said "no data available for India." Step 3's calculation only added China and the US.

This is one of Plan-and-Solve's most common traps: information gets lost in transit between steps.

Step 1's results were stored in completed_steps, but the summary was truncated (only 100 characters). Critical numbers may not have survived the truncation. Step 2 had no tool calls — it relied entirely on the model "remembering" Step 1's results from context. The model hallucinated "no data available."

This isn't a bug; it's an inherent cost of the design decision: when the information chain is long, summary-style transmission causes information loss. Solutions in the last section.

Demo 2: Dependency Chain Task (iPhone Price in CNY)

Task: Search the latest iPhone's USD price, search the exchange rate, convert to CNY.

The Planner generated a 7-step plan — when 3 steps would suffice (search price, search rate, calculate). This demonstrates the Planner's tendency to over-plan simple tasks, splitting every small action into its own step.

Step 6 produced an interesting tool failure:

[Step 6] Need to round 8836.45
  → calculator("round(8836.45)")
  → Error: unsupported AST node: Call
  → calculator("round(8836.45, 0)")
  → Error: unsupported AST node: Call
  → Result: Sorry, need more steps to process this request.

Our calculator only supports arithmetic — no function calls (by design, to prevent injection). The model tried round() twice, both failed, and gave up with an uncertain response.

But in Step 7 (the final synthesis), the model elegantly worked around it:

1299 USD × 6.8025 = 8836.45 CNY
Rounded to approximately 8836 CNY

It did the "rounding" in natural language, without a tool. Tool failure is not the end — the model's own capabilities can serve as a fallback.

Demo 3: Simple Task Planning

Task: Calculate 2^10 + 3^5.

The Planner generated a 4-step plan:

1. Calculate 2 to the power of 10
2. Calculate 3 to the power of 5
3. Add the results of steps 1 and 2
4. Synthesize all information and give the final answer

Compare to ReAct's approach: a single calculator("2**10 + 3**5") call. Done.

Plan-and-Solve is clearly "overkill" here — turning a one-step calculation into 4 steps. This is one of the core trade-offs we need to discuss.

Five Key Findings

After running this demo, here are 5 observations that matter in real engineering:

Finding 1: Planners tend to over-plan

For simple tasks, LLMs turn every micro-action into its own step. This increases execution rounds and token consumption — making things slower. A good Planner prompt should explicitly limit: no more than 3 steps for simple tasks, only split when there's a genuine dependency.

Finding 2: Information transmission between steps requires careful design

Each step's result is stored as a natural language summary in completed_steps. If the summary is too short, critical numbers get cut off (India's population in Demo 1). Fix: use structured formats (JSON or key-value pairs) to store step results, rather than truncated prose.

Finding 3: Tool failure ≠ step failure

The model can fall back to its own knowledge when tools fail (Demo 2's rounding). Don't immediately trigger Replan on tool failure — let the Execute node handle it first. Only trigger Replan if the model truly cannot produce a reasonable result.

Finding 4: Replan is a double-edged sword

Replan gives the system fault tolerance, but also introduces uncertainty: the new plan may conflict with the original or skip necessary steps. Production recommendation: cap Replan at 2 attempts. If that's not enough, degrade gracefully — tell the user the task couldn't be completed.

Finding 5: Plan-and-Solve and ReAct aren't opposites

In our implementation, each Execute step internally uses a ReAct sub-agent. Plan-and-Solve provides "strategic planning," ReAct provides "tactical execution." This layered design is very common in real Agent engineering and is essentially what LangGraph was built for.

When to Choose ReAct vs. Plan-and-Solve

This is the core engineering judgment:

Task analysis
│
├─ Fewer than 3 steps?
│   └─ Use ReAct (lightweight, fast)
│
├─ Strong dependencies between steps?
│   (later steps need precise results from earlier steps)
│   └─ Plan-and-Solve (explicit plan enforces dependency order)
│
├─ Clear task boundary, enumerable steps?
│   └─ Plan-and-Solve or even Workflow-Driven
│
├─ Open-ended task, fuzzy boundaries?
│   └─ ReAct (adapts to unknowns)
│
└─ Long-horizon planning (10+ steps)?
    └─ Consider multi-Agent architecture (later article)

Real-world examples:

Scenario	Recommended	Reason
Search a fact and answer	ReAct	Single step, no planning needed
Multi-source comparative analysis	Plan-and-Solve	Data collection has dependency order
Auto-write code and test	Plan-and-Solve	Clear steps: write → run → fix
Open-ended competitive research	ReAct	Search direction evolves dynamically
Data processing pipeline	Workflow-Driven	Steps fully fixed
Complex fault diagnosis	ReAct + Plan	Hybrid: plan investigation path, then execute dynamically

Fixing the Information Loss Problem

The India population loss in Demo 1 has a few engineering solutions:

Option A: Store step results in structured format

# Instead of natural language summaries:
completed_steps.append(f"Search China population → {step_result[:100]}")

# Use structured data:
step_data = {
    "step_index": idx,
    "description": current_step,
    "result": step_result,          # full result, no truncation
    "extracted_values": {},         # have the model extract key numbers
}

Option B: Dedicated state slot for collected data

class PlanSolveState(TypedDict):
    # ... other fields ...
    collected_data: dict[str, Any]  # dedicated storage for gathered data

Each Execute step not only writes to completed_steps but also extracts key data into collected_data. Later steps read directly from this dictionary — no relying on the model to "remember" prose.

Option C: Have the Planner specify data flow explicitly

Prompt the Planner to annotate each step with:

"Input: which data from step X"
"Output: what data to produce and store where"

This defines the data flow graph at the planning layer, before any execution begins.

The three options increase in complexity and robustness. In production, match the complexity to the task.

Interview Prep: Explaining the Plan/Execute Separation

Interview question: Does your Agent plan before executing? How does that work?

Many candidates describe ReAct — implicit reasoning during execution, no explicit plan. If you've implemented Plan-and-Solve, this is a strong differentiator:

"We use different architectures for different task types. For tasks with few steps and fuzzy boundaries, ReAct's implicit reasoning is sufficient. For multi-step tasks with clear dependencies — like multi-source comparative analysis — we use Plan-and-Solve.

Concretely: Plan phase uses the LLM to do a complete task decomposition and generate a step list — no tool calls at this stage, pure thinking. Solve phase executes each step sequentially, with an embedded ReAct sub-agent handling tool calls within each step.

This gives us two advantages: the execution path is determined upfront, dependencies are explicit, and debugging is much easier. The Replan mechanism provides fault tolerance.

A real pitfall we hit in production: information transmission between steps needs to be structured. Natural language summaries lose critical data — we moved to structured JSON for step results, so later steps don't rely on the model 'remembering' earlier results."

This answer shows you've moved beyond running examples — you've encountered and thought through production problems.

Summary

Three things from this article:

Plan-and-Solve = plan first, execute second: Compared to ReAct's greedy strategy, Plan-and-Solve generates a complete step list before execution, making dependencies visible and execution paths predictable. Best for structured multi-step tasks.
Information transmission is the biggest pitfall: Passing data between steps via natural language summaries causes information loss. Production systems should use structured formats to store critical intermediate results — don't rely on the model to "remember" previous results.
Plan-and-Solve and ReAct compose naturally: Plan-and-Solve provides global structure; ReAct handles tool calls within each step. This layered design is common in complex Agent systems.

Next up: Agent Series Article 4 — Deep Dive into Tool Calling: Tools Are the Agent's "Hands," But Hand Design Determines What the Agent Can Do. We'll go deep on tool design principles, parameter validation, error handling, and how to prevent tools from becoming security vulnerabilities.

References

Wang et al., Plan-and-Solve Prompting, ACL 2023
LangGraph Documentation: StateGraph
hello-agents Open Tutorial (Chapter 6)
Demo code: agent-02-plan-and-solve

Welcome to visit my personal homepage for more useful knowledge and interesting products

DEV Community