Leo Han

Posted on Jun 9

agents-concepts-principles-patterns

#agents #ai #architecture #llm

AI Agents: Concepts, Principles, and Patterns

What Is an AI Agent?

In 2025–2026, AI Agents have become the dominant paradigm for building applications on top of large language models. Simply put, an Agent is an AI system that can autonomously perceive its environment, reason about next steps, take actions, and continuously adjust based on feedback. The fundamental difference from a traditional chatbot is this: an Agent doesn't just answer questions — it actively uses tools, executes multi-step tasks, and self-corrects when things go wrong.

If a large language model (LLM) is the "brain," then an Agent is that brain equipped with "hands and feet" — it reads and writes files, runs terminal commands, searches the web, calls APIs, and transforms the model's reasoning capabilities into real-world actions.

The Core Principle: The ReAct Paradigm

The philosophical foundation of Agent operation comes from a landmark paper: ReAct: Synergizing Reasoning and Acting in Language Models (Yao et al.).

The paper identified a critical problem: before ReAct, the LLM's reasoning abilities (e.g., Chain-of-Thought prompting) and acting abilities (e.g., generating action plans) were studied as two separate topics. Reasoning happened only "in the head," and acting happened only "externally," with no synergy between them.

ReAct's core contribution is interleaving reasoning and acting in a looping fashion. Specifically:

Reasoning helps the model induce, track, and update action plans, and handle exceptions
Acting allows the model to interface with external resources (knowledge bases, code environments, web pages, etc.) and gather additional information

This synergy delivers significant advantages: on question-answering (HotpotQA) and fact verification (Fever), ReAct effectively overcomes the hallucination and error propagation issues of pure Chain-of-Thought reasoning by interacting with a Wikipedia API. On interactive decision-making benchmarks (ALFWorld), ReAct substantially outperforms prior approaches.

The Agent Work Loop

A typical Agent operates according to the following cycle:

Task
  ↓
Thought: Analyze the current state, decide what to do next
  ↓
Action: Invoke a tool to perform an operation
  ↓
Observation: Receive the tool's output
  ↓
(Repeat until a final answer can be given)
  ↓
Final Answer

This loop may look simple, but it gives Agents tremendous adaptability. If an action doesn't produce the expected result, the Agent can adjust its strategy in the next thought — in essence, trading runtime reasoning for robustness.

A Question-Answering Example

Task: How tall is the Empire State Building?

Thought: I need to look up the height of the Empire State Building.
Action: get_height("Empire State Building")
Observation: Roof height 381 meters, antenna tip 443 meters

Thought: I now have the height data and can answer the user.
Final Answer: The Empire State Building has a roof height of 381 meters
and a total height of 443 meters including the antenna.

A Cooking Task Example

Task: I need to make tomato scrambled eggs

Thought: First, let me find a recipe for tomato scrambled eggs.
Action: find_recipe(dish="tomato scrambled eggs")
Observation: Needs: 3 eggs, 2 tomatoes, salt, sugar, oil, scallions

Thought: I should check if these ingredients are in the fridge.
Action: check_fridge(item="eggs")
Observation: Eggs are in the fridge

Thought: Ingredients confirmed, ready to start cooking.
Final Answer: Key ingredients confirmed: eggs in stock.
Need to verify tomatoes...

The Agent Tool System

An Agent's capabilities are bounded by the tools it can use. From the demonstrations, a typical Agent toolset includes:

read_file(file_path) — read the contents of a specified file
write_to_file(filename, content) — create or overwrite a file
run_terminal_command(command) — execute a command in the terminal

These tools are defined in the System Prompt using XML format, and the Agent issues structured tool calls with XML tags:

<action>write_to_file("test.txt", "a\nb\nc")</action>

Formatting conventions typically use \n for newlines to ensure tool call parameters are correctly transmitted.

Three Agent Construction Patterns

The video demonstrated three mainstream Agent construction patterns through real-world examples:

1. General-Purpose Agent Platform: Manus

Manus is a general-purpose AI Agent capable of autonomously completing complex, multi-step research tasks. The video showed it executing the task "iPhone 15 Pro Max vs Galaxy S24 Ultra vs Pixel 8 Pro comparison report" in its entirety:

Autonomously searched for specifications and performance data for all three phones
Collected visual assets and reference images
Generated a comprehensive comparison website
Produced a structured report (including executive summary, detailed comparison tables, etc.)

Manus's defining characteristic is high autonomy — the user only needs to provide a task description, and the Agent independently plans, searches, organizes, and outputs, with no human intervention required.

2. Code Agent: Claude

Claude as a code agent demonstrated building a Snake game using HTML/CSS/JavaScript:

Received the task: "Write a Snake game using HTML, CSS, and JS"
Planned the file structure: index.html, style.css, script.js
Created files one by one and wrote the code
Delivered a runnable game

Claude's pattern illustrates how Agents can execute deterministic coding tasks in a controlled environment — each step has clear inputs and outputs, and errors can be detected and corrected promptly.

3. Open-Instruction Agent: DeepSeek

DeepSeek's demonstration focused more on following extremely detailed system instructions. The video showed its Agent-mode prompt structure:

Strictly defined XML tags for <thought>, <action>, <observation>, <final_answer>
Specified the operating system environment (macOS 15.5) and working directory
Provided complete documentation of tool definitions and calling formats
Also executed the Snake game construction task

DeepSeek's case illustrates that through meticulous prompt engineering, even a general-purpose chat model can be shaped into an agent that follows a specific Agent protocol.

Key Design Decisions for Building Agents

Drawing from these cases, building an effective Agent involves several critical decisions:

1. Prompt structure design. The Agent's system prompt must precisely describe its role, available tools, output format, and reasoning steps. XML tags may seem tedious, but they provide a structured "grammar" for the model's output, reducing the probability of parsing failures.

2. Tool interface granularity. Tools should be sufficiently atomic (e.g., read_file, write_to_file) so the Agent can flexibly compose them, rather than offering overly monolithic "do-everything" functions.

3. Quality of observation feedback. The Observation returned by tools is the Agent's sole basis for adjusting its next strategy. If the return information is too terse or ambiguous, the Agent's reasoning chain breaks.

4. Termination conditions. The Agent needs a clear stopping signal (<final_answer>), or it may fall into an endless "think-act" loop. In practice, a maximum step limit is typically set as a safety net.

5. Error handling and recovery. The core advantage of the ReAct paradigm is its ability to handle exceptions — when a tool call fails or returns unexpected results, the Agent can reassess the situation in the next thought and try alternative approaches.

Conclusion

AI Agents represent a critical leap from "language models" to "action models." The ReAct paradigm, by interleaving reasoning and acting, transforms the LLM from a system that can only "speak" into an agent that can "do."

From Manus's autonomous research, to Claude's code generation, to DeepSeek's precise instruction execution, we see three distinct implementation paths for Agents — but they all share the same core philosophy: thinking guides action, and action feeds back into thinking.

As the tool ecosystem matures and model reasoning capabilities advance, Agents are moving from experimental prototypes to production-grade applications. Understanding the ReAct paradigm and mastering Agent construction patterns will become essential skills for engineers in the AI era.

This article is adapted from the video "Agent Concepts, Principles, and Construction Patterns," covering the definition of Agents, the core ideas of the ReAct paper, the Agent work loop, tool system design, and a comparative analysis of three mainstream construction patterns.

DEV Community