WonderLab

Posted on May 21

Agent Series (1): What Is an Agent — It's Not Just an LLM That Can Call Tools

#agents #llm #ai #langchain

You Think You're Using an Agent. You're Not.

In 2023, "AI Agent" became a buzzword overnight. Every company claimed they built an Agent. Every product slapped the Agent label on it.

But ask them: What's the fundamental difference between your Agent and a regular LLM call?

Most people go quiet for three seconds, then say: "...it can call tools."

Is that wrong? No. But it's missing the point. It's like answering "what's the difference between a car and a bicycle" with "a car has four wheels" — technically correct, but you forgot to mention the engine.

This article has one goal: help you understand what an Agent actually is — and why it's fundamentally different from an LLM or a Chatbot. Get this right, and you'll make better technical decisions instead of wrapping an LLM API call and calling it "our Agent system."

Start With a Scenario

Say you want to build an AI tool that analyzes competitors for users. The user types a company name, and the tool generates a competitive analysis report.

Option A: Direct LLM call

User input: Analyze Notion's competitors
↓
LLM generates report directly
↓
Output (based on training data, potentially outdated)

Option B: Chatbot

User input: Analyze Notion's competitors
↓
LLM generates reply, remembers conversation history
User follow-up: Focus on pricing strategy
↓
LLM continues with context
↓
Multi-turn conversation, still based on training data

Option C: Agent

User input: Analyze Notion's competitors
↓
Agent thinks: I need fresh data, let me search first
↓
Calls search tool → gets latest competitor info
↓
Agent thinks: I should compare pricing, let me calculate
↓
Calls calculation tool → gets result
↓
Agent thinks: I have enough information now
↓
Outputs report (grounded in real-time data, with sources)

See the difference? An Agent actively thinks "what do I need to do" and autonomously decides the next action. That's the core — not whether it can call tools, but who decides which tool to call and when.

Three Concepts, Three Levels

LLM: A "Brain" with Language Ability

A Large Language Model is fundamentally a function:

Input: text (prompt)
Output: predicted next token (repeated until done)

Its capabilities come from statistical patterns learned from massive amounts of text. It understands language, it can reason — but it has no memory, no perception, no action capability. Every call is stateless. It has no idea what you talked about last time.

A standalone LLM is like a brilliant scholar who only answers questions: deeply knowledgeable, but locked in a room with no windows, unaware of what's happening outside, unable to proactively do anything for you.

Chatbot: An LLM with Memory

Chatbot = LLM + conversation history management.

It solves one simple problem: making the LLM "remember" what was said in this conversation. The implementation is also simple — prepend conversation history to every prompt:

# Pseudocode: the core logic of a Chatbot
messages = []
while True:
    user_input = get_user_input()
    messages.append({"role": "user", "content": user_input})
    response = llm.invoke(messages)  # send full history to LLM
    messages.append({"role": "assistant", "content": response})
    print(response)

The limitation of a Chatbot: it can converse, but it can't act. It can't proactively look up information, call APIs, or run code — it can only answer based on what it already knows.

If the LLM is the brilliant scholar, the Chatbot is that scholar with a phone — you can finally have a conversation, but they're still in their room.

Agent: An Autonomous Actor

An Agent adds two critical capabilities on top of a Chatbot: tool use and autonomous decision-making loop.

But here's the key point that's often misunderstood: the tools themselves aren't what makes something an Agent. What matters is who decides which tool to use and when.

Chatbot with tools = you tell it "check the weather," it calls the weather API
Agent = it decides on its own that "to answer this question, I need to check the weather," then proactively calls it

This is the difference between passive response and active planning.

The Four Elements of an Agent

The clearest framework for understanding an Agent is to break it into four components. This framework draws from cognitive science research on intelligent behavior and represents the mainstream engineering understanding today.

┌─────────────────────────────────────────────────────────┐
│                        Agent                            │
│                                                         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐  │
│  │  Perception │    │   Memory    │    │   Action    │  │
│  │             │    │             │    │             │  │
│  │ · User msgs │    │ · Chat hist │    │ · Call tools│  │
│  │ · Tool results    · Tool results    · Run code   │  │
│  │ · Environment│   · External KB│    │ · Call APIs │  │
│  └──────┬──────┘    └──────┬──────┘    └──────▲──────┘  │
│         │                  │                  │         │
│         └──────────┬───────┘                  │         │
│                    ▼                           │         │
│             ┌─────────────┐                   │         │
│             │  Reasoning  │───────────────────┘         │
│             │             │                             │
│             │ · Plan steps│                             │
│             │ · Choose tool                             │
│             │ · Decide done                             │
│             └─────────────┘                             │
└─────────────────────────────────────────────────────────┘

Perception: What the Agent can "see." At minimum, user input. More advanced: tool return values, database query results, screenshots, file contents. Perception defines the Agent's awareness — what it can't see, it can't act on.

Memory: What the Agent can "remember." This operates at several levels: current conversation history (short-term memory), past experiences stored in vector databases (long-term memory), and static external knowledge bases (semantic memory). We'll dedicate a full article to memory systems later in this series.

Reasoning: The Agent's "brain," and the most essential difference from a Chatbot. The LLM here acts as a controller, not a "question answerer." Its job: decompose the task, plan the steps, choose which tool to use next, decide when the task is complete.

Action: What the Agent can "do." Tool calls are the most common action — search, query a database, send an email, execute code. The range of actions defines the Agent's capability boundary — more tools means more tasks it can handle, but also higher risk of things going wrong (this is what Harness Engineering, a later topic, addresses).

These four elements are the foundation for understanding Agent architecture, and the central thread running through the rest of this series:

Perception + Memory → Article 6: Memory Management
Reasoning → Article 2: The ReAct Paradigm, Article 3: Plan-and-Solve
Action → Article 4: Tool Calling, Article 5: Intent Recognition

Two Agent Paradigms: Assembly Line vs. Expedition Guide

Real-world Agent systems break into two camps based on who controls the execution flow:

Workflow-Driven Agent

Representative tools: Dify, n8n, Coze, Zapier AI

Core idea: The developer draws the flowchart; LLM is one node in it.

Flowchart (defined by developer in advance):
Receive user question
    ↓
[LLM node] Classify question type
    ↓
If "billing question"  → [Tool node] Query billing system
If "complaint"         → [Tool node] Create support ticket
    ↓
[LLM node] Generate final reply
    ↓
Send to user

The execution path is pre-designed by the developer. The LLM handles natural language understanding and generation, but the "what happens next" logic is hardcoded in the flowchart.

Strengths:

Behavior is predictable; every path can be fully tested before launch
Easy to debug when something goes wrong (broken node is obvious)
Doesn't require the LLM to understand complex task planning

Best for:

Customer service bots (question types are fixed, processes are known)
Approval flow automation (steps are fixed, conditions are clear)
Form processing, data ETL (structured, predictable)

AI Native Agent

Representative frameworks: LangGraph, AutoGen, CrewAI

Core idea: LLM is the control center and decides what to do.

User question arrives
    ↓
[LLM reasoning]: I need current data, should search first
    ↓
[Calls search tool] → results returned
    ↓
[LLM reasoning]: A number in the results needs verification
    ↓
[Calls calculation tool] → result returned
    ↓
[LLM reasoning]: I have enough to answer now
    ↓
Final answer output

Every "what to do next" step is dynamically decided by the LLM at runtime. Nobody hardcoded the flow. This is the essence of AI Native Agent: the LLM isn't a tool — the LLM is the conductor.

Strengths:

Handles open-ended tasks with unclear boundaries
Adapts strategy based on intermediate results
Suited for problems requiring multi-step reasoning

Best for:

Open-ended research (user questions are diverse, impossible to enumerate)
Automated bug fixing (requires dynamic decisions based on code analysis)
Complex data analysis (needs multiple rounds of retrieval and computation)

An Analogy to Remember

Imagine planning a trip:

Workflow-Driven Agent = high-speed rail. Fixed tracks, fixed stops, fixed departure times. Highly efficient, never gets lost — but can only go where the rails go.

AI Native Agent = an experienced travel guide. You say "I want somewhere with historical character," and they ask a few questions, check reviews in real time, adjust the itinerary based on today's weather, and handle "the attraction is temporarily closed" on the fly. Flexible — but they might also take you on a detour.

When to Use an Agent vs. a Plain LLM Call

This is the most important engineering judgment you'll make — and the most common place for over-engineering.

The trap many fall into: Agent sounds sophisticated, so people reach for it regardless of the problem. But Agents have costs — longer response times, higher token consumption, more complex debugging.

Use this decision tree:

Does your task need an Agent?
│
├─ Are the task steps fixed and enumerable?
│   └─ Yes → use LLM + fixed Prompt, or Workflow-Driven Agent
│
├─ Does the task only need a single LLM call (no tools)?
│   └─ Yes → call the LLM API directly, no Agent needed
│
├─ Does the task need to decide the next step based on intermediate results?
│   └─ Yes → needs an Agent
│
├─ Does the task have more than 3 interdependent steps?
│   └─ Yes → needs an Agent
│
└─ Does the task need to handle situations you can't predict in advance?
    └─ Yes → needs an Agent (and specifically AI Native Agent)

Real examples:

Scenario	Recommended Approach	Reason
Article summarization	Direct LLM call	Single call, fixed prompt
FAQ chatbot	Chatbot	Multi-turn needed, no tools required
Customer service routing	Workflow-Driven Agent	Fixed flow, enumerable cases
Automated bug analysis & fix	AI Native Agent	Dynamic decisions based on code analysis
Competitive research report	AI Native Agent	Open-ended, needs multi-round search
Code review	AI Native Agent	Dynamic, depends on code structure

Don't use an Agent just because you can

If your task can be solved with a well-crafted Prompt, use the Prompt. The added complexity of an Agent (harder to debug, higher latency, higher cost) is only worth it when the task genuinely requires dynamic decision-making.

Anthropic's official guidance says it plainly: "LLMs should only be used as autonomous agents when autonomy and flexibility genuinely provide value — otherwise, direct API calls are more reliable and predictable."

What a Minimal AI Native Agent Looks Like

Enough theory — here's real code. Below is a minimal ReAct Agent built with LangGraph (this is the most foundational AI Native Agent paradigm; the next article covers it in depth):

# Dependencies: pip install langchain-anthropic langgraph
from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent

# 1. Define tools (the Agent's "hands")
@tool
def search_web(query: str) -> str:
    """Search the web for current information"""
    # In real use, connect to a real search API (e.g., Tavily)
    return f"Search results: latest information about '{query}'..."

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression"""
    try:
        result = eval(expression)  # Note: don't use eval in production
        return str(result)
    except Exception as e:
        return f"Calculation error: {e}"

# 2. Create the Agent (LLM is the control center)
llm = ChatAnthropic(model="claude-sonnet-4-6")
tools = [search_web, calculate]

agent = create_react_agent(llm, tools)

# 3. Run
result = agent.invoke({
    "messages": [("user", "What is Apple's current market cap? How much more is that than $1 trillion?")]
})
print(result["messages"][-1].content)

Running this code, the Agent will automatically:

Decide it needs to search for Apple's market cap
Call search_web
See the result, decide it needs to compute the difference
Call calculate
Combine the results into a final answer

Nobody told it to search first and then calculate. It planned that on its own. That's how an AI Native Agent works.

Notes on the code above

eval(expression) has security implications in production; replace with a safe math library (e.g., numexpr)
A real search tool requires connecting to a search API like Tavily or SerpAPI
The model claude-sonnet-4-6 is the recommended version as of this article's writing (May 2026); adjust as needed

How to Explain This in an Interview

Common interview question: Is your system an Agent or a Workflow? What's the difference?

Many candidates stumble here because they've never seriously considered which one they actually built.

A clear response framework:

"Our system uses an AI Native Agent architecture, and the core distinction is who controls the execution flow.

In a Workflow-Driven approach, the developer pre-defines all possible paths, and the LLM is just one processing node — it's more predictable and well-suited for fixed-step scenarios.

We chose AI Native Agent because our tasks (like automated bug analysis) have unclear boundaries — the code might span multiple modules, and we need to dynamically decide what to retrieve next based on each intermediate analysis result. A Workflow-Driven approach can't enumerate all possible code scenarios.

Of course, the more autonomous the Agent, the higher the risk. That's why we added execution boundary controls (Harness Engineering) to ensure it never performs operations beyond its authorized scope."

The key to this answer: don't just say "I used an Agent." Explain why you chose it and show that you're aware of the trade-offs.

Summary

Three things from this article:

The hierarchy of LLM, Chatbot, and Agent: LLM is the brain, Chatbot is the brain with memory, Agent is a complete system that can autonomously plan and act. The core difference isn't "can it call tools" — it's "who decides when to call which tool."
The four elements of an Agent: Perception (what it sees), Memory (what it remembers), Reasoning (what it plans), Action (what it executes). The LLM plays the role of "conductor," not "executor."
The selection logic between the two paradigms: Workflow-Driven suits fixed, predictable tasks; AI Native Agent suits open-ended tasks requiring dynamic decision-making. Don't use an Agent because it sounds impressive — use the right tool for the job.

Next up: Agent Series Article 2 — ReAct: The Most Important Reasoning Paradigm for Agents. We'll dig into the Thought → Action → Observation loop, explore "what the Agent is thinking," and explain why Chain-of-Thought alone isn't enough.

References

hello-agents Open Tutorial (Chapter 1 and Chapter 4)
Anthropic, Building Effective Agents, 2024
OpenAI, A Practical Guide to Building Agents, 2025
LangGraph Documentation: langchain-ai.github.io/langgraph

This is the first article in the Agent Engineering series. If you're just starting out with Agents, start here and read in order. Questions or feedback? Leave a comment below.

DEV Community