Stop building reactive agents: Why your architecture needs a System 1 and System 2

#ai #machinelearning #architecture #engineering

If you’ve built an LLM agent recently, you’ve probably hit the "autonomy wall."

You give the agent a tool to search the web, a prompt to "be helpful," and a task. For the first two turns, it looks like magic. On turn three, it goes down a Wikipedia rabbit hole. On turn ten, it’s stuck in an infinite loop trying to fix a syntax error on a file it never downloaded.

Most developers try to fix this by cramming more instructions into the system prompt: "Never repeat the same action twice! Think step-by-step!"

But the problem isn’t the prompt. It’s the architecture.

You are forcing a single execution loop to do two completely different jobs: talking/acting (which requires low latency and high bandwidth) and planning (which requires slow, deliberative reasoning).

We need to borrow a concept from human psychology—Daniel Kahneman’s Thinking, Fast and Slow—and build Dual-Process Agents.

The Problem: The Single-Loop Trap

Most standard agents (like a naive ReAct loop) operate in a flat sequence:
Observe -> Think -> Act -> Observe -> Think -> Act

When the agent is "thinking," it is trying to decide what to say to the user and what its long-term strategy should be. Because LLMs are autoregressive, the immediate context (the last thing the user said, or the last API error) overwhelmingly dominates its attention.

If the agent’s only "planner" is the exact same loop that’s doing the work, you get two failure modes:

Shallow Exploration: It never discovers new subgoals because it's too focused on the immediate task.
Runaway Exploration: It forgets the original goal entirely and never finishes.

The Dual-Process Solution

A dual-process architecture explicitly separates the "doer" from the "planner."

A recent paper out of Stanford (SparkMe, arXiv:2602.21136) demonstrated this brilliantly in the context of AI conducting qualitative interviews. They split their agent into two distinct systems:

System 1: The Executor (Fast)

This is your fast, reactive loop. Its only job is to look at the immediate context and execute the next tactical step. In the interview example, this agent just asks the next question, decides whether to probe deeper into the current topic, or transition to the next one. It does not worry about the global strategy.

System 2: The Planner (Slow & Asynchronous)

This is the deliberative loop. It runs asynchronously in the background (e.g., every k turns). Its job is to look at the entire history, zoom out, and optimize the overarching trajectory.

How does it do this? By simulating rollouts.

The Planner takes the current state and spins up hypothetical futures: "If I steer the agent to ask about X, the user might say Y. If I steer it toward Z, the user might say W." It scores these hypothetical futures against a predefined utility function (e.g., maximizing new information while minimizing token cost).

Once the Planner finds a high-utility trajectory, it quietly updates the shared "Agenda" that System 1 is reading from.

Why this changes everything

When you decouple execution from planning, you gain actual control knobs over your agent's autonomy:

How often to plan: You can set the Planner to run every 5 steps, saving massive amounts of compute compared to forcing a deep "Chain of Thought" on every single micro-action.
How far to look ahead: You can define the simulation horizon (e.g., look 3 steps into the future).
What to optimize: You can mathematically define what "good" looks like in the Planner's utility function, rather than relying on vibes in a system prompt.