varun pratap Bhardwaj

Posted on Mar 6 • Originally published at superlocalmemory.com

Agentic Engineering Patterns: Architectural Building Blocks for AI Agent Systems

#agenticai #designpatterns #agentmemory #aiarchitecture

Building an AI agent is not the same as calling an LLM in a loop. The moment you need an agent to use tools, remember past interactions, revise its own plans, or collaborate with other agents, you enter the domain of systems architecture. The patterns you choose — how the agent reasons, when it retrieves context, how it delegates — determine whether your system is reliable or a stochastic mess. This post breaks down the core architectural patterns that have emerged in agentic AI engineering, explains when each one applies, and shows you how memory layers tie them all together.

What You Will Learn

The four foundational agentic patterns: ReAct, Plan-and-Execute, Reflection, and Delegation

How each pattern structures the loop between reasoning, action, and observation

Where memory (short-term, long-term, episodic) fits into each pattern

Concrete Python code implementing each pattern with persistent memory retrieval

Trade-offs and failure modes so you know when not to use a given pattern

Conceptual Foundation: What Makes a System "Agentic"

An LLM call is stateless. You send a prompt, you get a completion. An agent, by contrast, operates in a loop: it perceives its environment, decides on an action, executes that action, observes the result, and feeds that observation back into its next decision. This loop is the defining characteristic.

Three capabilities distinguish an agent from a simple chain:

Tool use — the agent can invoke external functions (search, databases, APIs, code execution).
State management — the agent maintains context across multiple steps, including across sessions.
Autonomous decision-making — the agent decides what to do next without human intervention at each step.

The patterns we will examine are different ways of structuring this loop. None of them is universally superior. Each makes a trade-off between autonomy, reliability, latency, and cost.

graph TD
    subgraph "Agentic Loop"
        A[User Query] --> B[Reasoning / Planning]
        B --> C{Select Action}
        C -->|Tool Call| D[Execute Tool]
        C -->|Respond| H[Final Answer]
        D --> E[Observation / Result]
        E --> F[Memory Write]
        F --> B
    end

    subgraph "Memory Layer"
        G[Short-Term Memory<br/>Current conversation] -.-> B
        I[Long-Term Memory<br/>Past sessions, facts] -.-> B
        J[Episodic Memory<br/>Past task outcomes] -.-> B
        F --> G
        F --> I
        F --> J
    end

    style B fill:#4a90d9,color:#fff
    style F fill:#d9a34a,color:#fff

This diagram shows the general shape. Every pattern we discuss is a specific instantiation of this loop with different control flow decisions at the "Reasoning / Planning" and "Select Action" nodes.

Pattern 1: ReAct (Reason + Act)

ReAct, introduced by Yao et al. (2023), interleaves reasoning traces with actions. At each step, the agent produces a Thought (natural language reasoning), then an Action (tool invocation), then receives an Observation (tool output). This cycle repeats until the agent has enough information to produce a final answer.

ReAct is the simplest agentic pattern and the one you should reach for first.

1. The agent receives a query and generates a Thought

The thought is an explicit reasoning trace: "I need to find the current stock price of AAPL. I should use the stock_price tool."

2. The agent selects and invokes a tool (Action)

Based on the thought, the agent emits a structured action: stock_price(symbol="AAPL").

3. The tool returns a result (Observation)

The tool returns {"price": 187.42, "currency": "USD"}. This is appended to the agent's context.

4. The loop repeats or the agent responds

The agent decides whether it has enough information. If yes, it produces a final answer. If not, it generates another Thought and continues.

Here is a minimal ReAct implementation:

import openai
import json

# Define available tools
TOOLS = {
    "search": lambda query: f"Results for '{query}': [Wikipedia article about {query}]",
    "calculate": lambda expr: str(eval(expr)),  # simplified; use a sandbox in production
}

TOOL_DESCRIPTIONS = """
Available tools:
- search(query: str) -> str: Search the web for information
- calculate(expr: str) -> str: Evaluate a math expression
"""

def react_agent(query: str, max_steps: int = 5) -> str:
    """A minimal ReAct agent loop."""
    messages = [
        {"role": "system", "content": f"""You are a ReAct agent. For each step:
1. Thought: reason about what to do next
2. Action: call a tool using JSON: {{"tool": "name", "args": {{"key": "value"}}}}
3. When ready, respond with: {{"answer": "your final answer"}}
{TOOL_DESCRIPTIONS}"""},
        {"role": "user", "content": query},
    ]

    for step in range(max_steps):
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            temperature=0,
        )
        content = response.choices[0].message.content
        messages.append({"role": "assistant", "content": content})

        # Try to parse the agent's output as JSON
        try:
            parsed = json.loads(content)
        except json.JSONDecodeError:
            # If it's not JSON, treat it as the final answer
            return content

        if "answer" in parsed:
            return parsed["answer"]

        if "tool" in parsed:
            tool_name = parsed["tool"]
            tool_args = parsed.get("args", {})
            # Execute the tool
            observation = TOOLS[tool_name](**tool_args)
            # Feed observation back into the loop
            messages.append({
                "role": "user",
                "content": f"Observation: {observation}"
            })

    return "Max steps reached without a final answer."

# Usage
result = react_agent("What is the population of France divided by 3?")
print(result)

When to use ReAct: When your task requires interleaving information gathering with reasoning. It works well for question answering, research tasks, and data lookups where the agent needs 2-5 tool calls.

When not to use it: When the task requires a long-horizon plan with 10+ steps. ReAct is greedy — it decides one step at a time, which can lead to wandering.

Pattern 2: Plan-and-Execute

Plan-and-Execute separates planning from execution. First, the agent creates an explicit multi-step plan. Then a separate execution loop carries out each step. After execution, the agent can optionally revise the plan based on what it learned.

This pattern is better suited for complex tasks because the planning step forces the agent to commit to a strategy before spending tokens and tool calls on execution.

import openai
import json

def plan_and_execute(query: str) -> str:
    """Plan-and-Execute pattern: create a plan, then execute each step."""

    # Phase 1: Generate a plan
    plan_response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """Create a step-by-step plan to answer the user's query.
Return a JSON array of step strings. Example: ["Step 1: ...", "Step 2: ..."]
Each step should be a concrete, actionable instruction."""},
            {"role": "user", "content": query},
        ],
        temperature=0,
    )
    plan = json.loads(plan_response.choices[0].message.content)
    print(f"Plan: {plan}")

    # Phase 2: Execute each step
    context = ""  # Accumulates results from previous steps
    for i, step in enumerate(plan):
        exec_response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": f"""You are executing step {i+1} of a plan.
Previous context: {context}
Execute this step and return the result."""},
                {"role": "user", "content": step},
            ],
            temperature=0,
        )
        step_result = exec_response.choices[0].message.content
        context += f"\nStep {i+1} result: {step_result}"

    # Phase 3: Synthesize final answer
    final_response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Synthesize a final answer from the execution results."},
            {"role": "user", "content": f"Original query: {query}\n\nExecution results:{context}"},
        ],
        temperature=0,
    )
    return final_response.choices[0].message.content

The key insight: by separating planning from execution, you can use different models for each phase (a stronger model for planning, a cheaper one for execution), and you can checkpoint and resume the plan.

Pattern 3: Reflection

Reflection adds a self-critique step. After the agent produces an output, a second pass evaluates that output for correctness, completeness, and adherence to instructions. If the evaluation fails, the agent revises its output.

This is not a standalone pattern — it layers on top of ReAct or Plan-and-Execute.

def reflect_and_revise(query: str, max_revisions: int = 2) -> str:
    """Generate an answer, then reflect on it and revise if needed."""

    # Initial generation
    draft = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer the user's question thoroughly."},
            {"role": "user", "content": query},
        ],
    ).choices[0].message.content

    for revision in range(max_revisions):
        # Reflection step: critique the draft
        critique = openai.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": """Critique the following answer.
Identify factual errors, missing information, or logical gaps.
If the answer is satisfactory, respond with exactly: APPROVED
Otherwise, list the specific issues."""},
                {"role": "user", "content": f"Query: {query}\n\nAnswer: {draft}"},
            ],
        ).choices[0].message.content

        if "APPROVED" in critique:
            return draft

        # Revision step: fix the issues
        draft = openai.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": "Revise the answer based on the critique."},
                {"role": "user", "content": f"Original query: {query}\n\nDraft: {draft}\n\nCritique: {critique}"},
            ],
        ).choices[0].message.content

    return draft

Reflection Can Be Wasteful

Reflection doubles (or triples) your LLM calls. Do not apply it to every agent response. Reserve it for high-stakes outputs: generated code that will be executed, answers to complex multi-step questions, or content that will be published. For simple lookups, reflection adds cost without meaningful quality improvement.

Pattern 4: Delegation (Multi-Agent Coordination)

Delegation splits a complex task across specialized agents. A supervisor agent breaks the task into subtasks and routes each to a specialist agent — a coder, a researcher, a data analyst. Each specialist has its own tools, system prompt, and potentially its own memory context.

def supervisor_agent(query: str) -> str:
    """A supervisor that delegates to specialist agents."""

    # Decide which specialists to invoke
    routing = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": """You are a supervisor agent.
Available specialists: ["researcher", "coder", "analyst"]
Given a query, return a JSON plan:
[{"agent": "researcher", "task": "..."}, {"agent": "coder", "task": "..."}]"""},
            {"role": "user", "content": query},
        ],
    ).choices[0].message.content

    subtasks = json.loads(routing)
    results = {}

    for subtask in subtasks:
        agent_name = subtask["agent"]
        task = subtask["task"]
        # Each specialist has a different system prompt and tool set
        specialist_result = run_specialist(agent_name, task, context=results)
        results[agent_name] = specialist_result

    # Synthesize results
    synthesis = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Combine the specialist results into a final answer."},
            {"role": "user", "content": f"Query: {query}\nResults: {json.dumps(results)}"},
        ],
    ).choices[0].message.content
    return synthesis

def run_specialist(name: str, task: str, context: dict) -> str:
    """Run a specialist agent with its own system prompt."""
    prompts = {
        "researcher": "You are a research agent. Find factual information.",
        "coder": "You are a coding agent. Write correct, tested code.",
        "analyst": "You are a data analyst. Interpret data and produce insights.",
    }
    response = openai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": prompts[name]},
            {"role": "user", "content": f"Task: {task}\nContext from other agents: {json.dumps(context)}"},
        ],
    )
    return response.choices[0].message.content

The hard part of delegation is not the routing — it is shared state. When the coder agent needs context from the researcher agent, how does it get it? Passing everything in the prompt works at small scale but breaks down quickly. This is where memory becomes critical.

How Memory Ties These Patterns Together

Every pattern above has a shared weakness: context management. ReAct accumulates observations in its message history. Plan-and-Execute passes results between steps as text. Delegation passes context between agents as JSON blobs. None of these approaches scale beyond a single session.

Memory solves this by providing a persistent, queryable store that any agent (or any step within an agent) can read from and write to.

There are three memory layers:

Memory Type	Scope	Lifetime	Example
Short-term	Current task/session	Minutes to hours	Conversation history, intermediate results
Long-term	Cross-session	Days to permanent	User preferences, learned facts, past decisions
Episodic	Per-task	Permanent	"Last time I tried approach X, it failed because Y"

Here is how memory integrates into a ReAct agent:

import numpy as np

class AgentMemory:
    """A simple vector-based memory store for agent state."""

    def __init__(self):
        self.entries = []  # List of {"text": str, "embedding": list, "metadata": dict}

    def store(self, text: str, metadata: dict = None):
        """Store a memory entry with its embedding."""
        embedding = get_embedding(text)  # Your embedding function
        self.entries.append({
            "text": text,
            "embedding": embedding,
            "metadata": metadata or {},
        })

    def retrieve(self, query: str, top_k: int = 3) -> list[str]:
        """Retrieve the most relevant memories for a query."""
        query_embedding = get_embedding(query)
        scored = []
        for entry in self.entries:
            # Cosine similarity
            sim = np.dot(query_embedding, entry["embedding"]) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(entry["embedding"])
            )
            scored.append((sim, entry["text"]))
        scored.sort(reverse=True, key=lambda x: x[0])
        return [text for _, text in scored[:top_k]]

def react_agent_with_memory(query: str, memory: AgentMemory, max_steps: int = 5) -> str:
    """ReAct agent augmented with persistent memory retrieval."""

    # Retrieve relevant past memories before starting
    relevant_memories = memory.retrieve(query, top_k=3)
    memory_context = "\n".join(f"- {m}" for m in relevant_memories) if relevant_memories else "None"

    messages = [
        {"role": "system", "content": f"""You are a ReAct agent with access to memory.
Relevant memories from past sessions:
{memory_context}

Use these memories to avoid repeating past mistakes and to build on prior knowledge.
Follow the Thought -> Action -> Observation loop."""},
        {"role": "user", "content": query},
    ]

    # ... standard ReAct loop from earlier ...
    final_answer = run_react_loop(messages, max_steps)

    # Store the outcome as episodic memory
    memory.store(
        f"Task: {query} | Outcome: {final_answer}",
        metadata={"type": "episodic", "task": query}
    )

    return final_answer

The critical detail: memory retrieval happens before the agent starts reasoning, and memory storage happens after the agent finishes. This creates a learning loop where each task execution improves future performance.

Seeing This in Practice

Multi-agent delegation introduces a harder memory problem: trust scoring. When Agent B retrieves a memory that Agent A wrote, how much should it trust that memory? If Agent A's task failed, its stored observations might be misleading.

SuperLocalMemory implements a local agent memory layer with hybrid search (combining vector similarity and keyword matching) that addresses this. It exposes a straightforward API for storing memories with metadata — including agent identity and task outcomes — and retrieving them with configurable scoring:

from superlocalmemory import MemoryStore

store = MemoryStore(path="./agent_memories")

# Agent A stores a research finding
store.add(
    text="The API rate limit for service X is 100 requests/minute as of March 2026.",
    metadata={
        "agent": "researcher",
        "task_id": "task-42",
        "task_outcome": "success",
        "confidence": 0.95,
    }
)

# Agent B (coder) retrieves relevant context, filtered by trust
results = store.search(
    query="rate limits for service X",
    top_k=5,
    filters={"task_outcome": "success"},  # Only trust successful task memories
)

for result in results:
    print(f"[{result.metadata['agent']}] {result.text} (score: {result.score:.3f})")

The hybrid search combines dense vector retrieval with sparse keyword matching, which matters in agentic contexts where queries often contain specific identifiers (API names, error codes) that pure semantic search can miss. You can inspect the full implementation in the GitHub repository to see how the scoring and filtering work under the hood.

Real-World Considerations

The Abstraction Trap

A recurring concern in the developer community — highlighted in discussions like "The Abstraction Trap: Why Layers Are Lobotomizing Your Model" — is that adding too many layers between the LLM and the task degrades performance. Every abstraction layer (planner, reflector, memory retrieval, routing) adds latency and potential error. Start with the simplest pattern (ReAct) and add complexity only when you have evidence that it helps.

Cost. A single ReAct loop with 4 steps costs 4 LLM calls. Add reflection and that doubles to 8. Add a planner and you are at 9+. Delegation multiplies this by the number of agents. Profile your token usage early.

Debugging. Agentic systems are hard to debug because the LLM's reasoning is non-deterministic. Log every step: the full prompt, the model's response, the tool inputs and outputs, and the memory retrievals. Without these logs, you are flying blind.

Failure modes by pattern:

ReAct: gets stuck in loops, calls the same tool repeatedly with slightly different arguments
Plan-and-Execute: creates plans that are too rigid or too vague; early step failures cascade
Reflection: the critic always finds something to complain about, causing infinite revision loops (always cap revision count)
Delegation: specialists produce incompatible outputs; the supervisor cannot reconcile them

Tool Execution Safety

If your agent can execute code or write to databases, sandbox it. A ReAct agent calling eval() on untrusted expressions — as in our simplified example above — is a remote code execution vulnerability. Use containers, restricted interpreters (like asteval), or separate execution environments with strict timeouts and resource limits.

Choosing the Right Pattern

Pattern	Best For	Steps	LLM Calls	Complexity
ReAct	Simple tool-use, Q&A	2-5	Low	Low
Plan-and-Execute	Multi-step tasks, research	5-15	Medium	Medium
Reflection	High-stakes outputs	+1-2 per cycle	Medium-High	Medium
Delegation	Complex tasks needing specialization	Varies	High	High

A practical heuristic: start with ReAct. If you find the agent wandering or failing on tasks that require more than 5 steps, move to Plan-and-Execute. If output quality matters more than speed, add Reflection. If the task genuinely requires different expertise domains, use Delegation.

These patterns also compose. A delegation supervisor can use Plan-and-Execute for routing, while each specialist uses ReAct internally, and the final synthesis uses Reflection. The architecture is modular by design.

DEV Community