🤖 What is a ReAct-style agent?

#ai #agents #architecture #learning

TL;DR: "ReAct" is the way all models do their internal "reasoning+acting" loop since 2023. prompt => {loop = evaluate <-> tool call(s)} => final answer It used to be dumber, before a paper from Google.

Disclaimer: this article is a reviewed, lightly edited and enriched conversation with Claude Opus 4.8.
I was learning the subject and found the answer worth sharing.
I'm not taking any credit: It helped me, hopefully it'll help you.

A ReAct-style agent is an LLM agent that interleaves reasoning and acting in a loop. The name comes from "Reasoning + Acting." It was introduced in a October 2022 paper by Google Research's Brain Team, and it became one of the foundational patterns for building tool-using agents.

The paper

ReAct: Synergizing Reasoning and Acting in Language Models by Yao et al. (2022).

arXiv abstract page: https://arxiv.org/abs/2210.03629
- PDF: https://arxiv.org/pdf/2210.03629
mini-site https://react-lm.github.io/
Readable write-up from the authors on the Google Research blog (lighter version): https://research.google/blog/react-synergizing-reasoning-and-acting-in-language-models/

In details

The core idea is that instead of having the model either just think (chain-of-thought) or just call tools blindly, you let it alternate between the two in a structured cycle:

Thought → the model reasons about the current state and what to do next
Action → the model issues a tool call (search, API request, code execution, etc.)
Observation → the result of that action gets fed back in
…and then it loops back to Thought, using the new observation to inform its next step, until it decides it has enough to produce a final answer.

A toy trace looks like this:

Question: What's the population of the capital of France?

Thought: I need to find the capital of France first.
Action: search("capital of France")
Observation: Paris is the capital of France.

Thought: Now I need Paris's population.
Action: search("population of Paris")
Observation: Approximately 2.1 million.

Thought: I have what I need.
Answer: The population of Paris is about 2.1 million.

Why it works well: The reasoning steps keep the model grounded—it plans before acting and adapts when an observation contradicts its expectation. The action steps keep it factual, since it pulls real data from tools rather than hallucinating. The interleaving is the key: reasoning informs which action to take, and observations correct the reasoning. Pure chain-of-thought can reason its way confidently into a wrong answer; pure action-taking can't plan or recover from surprises.

In practice today, the "Thought/Action/Observation" text format from the original paper has largely been absorbed into native tool-calling APIs (function calling), where the model emits structured tool calls and the runtime feeds back results. But the underlying loop is still ReAct — it's the conceptual backbone behind frameworks like LangChain agents, LlamaIndex agents, and most custom agent loops you'd build yourself.

The trade-offs: ReAct loops can be token-hungry and occasionally get stuck in repetitive cycles (re-issuing similar actions), which is why people layer on things like step limits, reflection (e.g., Reflexion), or more structured planning (Plan-and-Execute) on top.

ReAct's core contribution is best understood as a fix for the failure mode of pure chain-of-thought (CoT): When a model reasons via CoT, it isn't grounded in the external world and works only from its own internal representations, which limits its ability to reactively explore, reason, and update what it knows.

The consequence is the familiar pathology: a model reasons fluently and confidently down a chain that's built on a wrong or stale premise, and because nothing external ever contradicts it, the error compounds at every subsequent step. That's the "hallucination and error propagation" problem.

ReAct breaks that closed loop by letting actions interrupt the reasoning. On HotpotQA and Fever, ReAct overcomes the hallucination and error-propagation issues prevalent in chain-of-thought by interacting with a simple Wikipedia API, and it produces human-like task-solving trajectories that are more interpretable than baselines lacking reasoning traces.

The key word is synergy—it's bidirectional. Reasoning traces help the model induce, track, and update action plans and handle exceptions, while actions let it interface with external sources to gather additional information. So reasoning decides what to look up, the lookup injects a real fact, and that fact constrains the next round of reasoning—the model can't drift far from reality because it keeps getting pulled back to ground truth.

Two things from the paper are worth underlining for how you think about agent design:

✅ The gains weren't marginal on the interactive tasks. On ALFWorld and WebShop (two interactive decision-making benchmarks) ReAct outperformed imitation and reinforcement-learning methods by absolute success-rate margins of 34% and 10% respectively, while being prompted with only one or two in-context examples. That few-shot efficiency was a big part of why the pattern caught on: you got RL-beating behavior from a couple of prompt examples, no training loop required.
⚠️ ReAct alone wasn't strictly dominant on the QA tasks the real winner was the hybrid. On HotpotQA and Fever, the best overall approach was a combination of ReAct and CoT that lets the model use both its internal knowledge and externally obtained information during reasoning.

The takeaway: grounding via tools and reasoning from internal knowledge aren't competitors; the strongest setup lets the model lean on its own priors when they're reliable and reach for external sources when they aren't. That tension—when to trust the model's internal knowledge versus when to force a retrieval—is exactly the design knob you're tuning in RAG and agentic pipelines.

The interpretability point is also more than a footnote. Because every step is an explicit thought or a named action with an observation, the trajectory is human-readable, which makes the system far easier to diagnose and steer when it goes wrong—a meaningful operational advantage over an opaque policy that just emits actions.

Top comments (2)

Max Quimby • Jun 14

Nice clean explainer — and you're right that the literal Thought/Action/Observation text format has mostly dissolved into native tool-calling. The mental model still holds, which is why it's worth knowing. The part I'd add from running these loops in production: ReAct's elegance hides its main failure mode, which is context accumulation. Every Observation gets appended, so a 30-step loop drags a growing pile of tool output through every subsequent reasoning step — and somewhere near the limit the model stops planning and just pattern-matches its last action, looping or flailing. The fixes that actually helped us were observation compaction (summarize old tool results instead of carrying them raw) and giving the loop an explicit "should I stop?" check rather than waiting for it to decide it has "enough." The subtler thing: reasoning quality degrades faster than people expect once observations start contradicting each other. Have you played with structured scratchpads or sub-agents to keep the main loop's context lean? That's been a bigger lever for us than the prompt format itself.

Yves Jutard • Jun 15

🤣 the emdash straight in the first sentence. But thanks for commenting, AI.