AI Agent vs LLM: Why Just Calling the OpenAI API Doesn't Give You an Agent (And What Does)

#webdev #programming #ai #python

"We built an agent, we called the API in a loop." No. You built a chatbot with extra steps. Here's the actual difference.

I want to start with a confession: I've made this mistake myself, early on, with enough confidence that I told a client we'd built an "agent" when what we'd actually built was a chatbot that occasionally called a function.

The mistake is understandable because the surface-level pattern looks similar. You call an LLM. You give it a prompt. You get a response. Loop it a few times, add a tool call, and it feels agentic. But there's a real, technical distinction between an LLM wrapper and an actual agent and the distinction matters because it determines whether your system can handle the messy, multi-step, failure-prone reality of production tasks.

Let's define it precisely, then build both versions so you can see exactly where the line is.

What Makes Something an Agent (Not Just an LLM Call)

An agent has four properties that a simple LLM wrapper doesn't have.

Autonomous tool use: the system decides which tools to call and when, based on reasoning about the current state, not a hardcoded sequence you wrote.

Planning and replanning: the system forms a plan, executes a step, observes the result, and revises its plan based on what it learns. A single API call with a fixed prompt can't do this. It doesn't see its own output and decide what to do next.

Memory across steps: the system maintains state about what it's already tried, what worked, and what didn't, within a single task execution.

Error recovery: when a tool call fails or returns unexpected output, the system adapts rather than crashing or returning a generic failure.

If your system doesn't do at least the first two, you have an LLM wrapper, not an agent. That's not an insult, LLM wrappers are useful and often the right tool. But calling it an agent when it's not creates a mismatch between what you've built and what stakeholders think they're getting.

The LLM Wrapper (What Most People Build First)

import anthropic

client = anthropic.Anthropic()

def simple_llm_call(user_query: str) -> str:
    """
    This is an LLM call. It is not an agent.
    It has no memory, no planning, no tool use,
    no error recovery.
    """
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[{"role": "user", "content": user_query}]
    )
    return response.content[0].text

# This works fine for simple Q&A
result = simple_llm_call("What's the capital of France?")

This is perfectly fine code for the right job, single-turn question answering. The problem is when people wrap this in a loop and call it agentic:

def "agent"_with_a_loop(tasks: list[str]) -> list[str]:
    """
    This is NOT an agent. It's an LLM call 
    executed multiple times. There's no reasoning
    about what to do next based on previous results.
    """
    results = []
    for task in tasks:
        result = simple_llm_call(task)
        results.append(result)
    return results

This is a batch processing loop. It doesn't reason about whether task 2 should change based on the outcome of task 1. It doesn't use tools. It doesn't recover from failure beyond whatever exception handling you bolt on externally. Calling this an agent is where the confusion starts.

The Actual Agent: ReAct Pattern With Tool Use

A real agent implements the ReAct pattern, Reasoning and Acting in a loop, where each action produces an observation that informs the next reasoning step.

import anthropic
import json

client = anthropic.Anthropic()

tools = [
    {
        "name": "search_database",
        "description": "Search the customer database by criteria",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "send_email",
        "description": "Send an email to a customer",
        "input_schema": {
            "type": "object",
            "properties": {
                "recipient": {"type": "string"},
                "subject": {"type": "string"},
                "body": {"type": "string"}
            },
            "required": ["recipient", "subject", "body"]
        }
    }
]

def execute_tool(tool_name: str, tool_input: dict) -> str:
    """Actual tool execution — this is what makes it agentic."""
    if tool_name == "search_database":
        # Real database call would go here
        return json.dumps({"customer_id": 4521, "status": "active", 
                           "last_order": "2026-05-12"})
    elif tool_name == "send_email":
        # Real email sending would go here
        return json.dumps({"sent": True, "message_id": "msg_8821"})
    return json.dumps({"error": "unknown tool"})


def run_agent(goal: str, max_iterations: int = 8) -> str:
    """
    This IS an agent. It reasons, acts, observes, 
    and replans based on what it learns at each step.
    This is the property a simple LLM call doesn't have.
    """

    messages = [{"role": "user", "content": goal}]

    for iteration in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=2048,
            tools=tools,
            messages=messages
        )

        # The model decided whether to use a tool or finish.
        # This decision happens at every step based on 
        # everything that's happened so far — that's planning.
        if response.stop_reason == "end_turn":
            final_text = next(
                (b.text for b in response.content if hasattr(b, 'text')),
                "Task completed."
            )
            return final_text

        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})

            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  [Agent decided to call] {block.name}({block.input})")

                    # Execute and observe — error recovery happens here
                    try:
                        result = execute_tool(block.name, block.input)
                    except Exception as e:
                        result = json.dumps({"error": str(e), 
                                              "recoverable": True})

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            # The observation goes back into context.
            # The NEXT reasoning step sees what happened
            # and decides what to do based on it.
            messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached without completing the goal."

Run this with a goal like "Find customer 4521's order status and email them an update if their last order was more than 30 days ago" and watch what happens. The agent calls search_database, observes the result, reasons about whether the date condition is met, and only then decides whether to call send_email. If the database call returned an error, the agent would see that in its context and could try a different search strategy, that's the replanning property.

This is the part a fixed-sequence script can't do. A script that calls search_database() then unconditionally calls send_email() is not reasoning about the result. It's executing a predetermined sequence that happens to involve an LLM somewhere in it.

The Test: Could You Replace the LLM Call With an If-Statement?

Here's the practical test I use when someone shows me their "agent" architecture.

If you could replace the LLM call with a deterministic if-statement and get the same behaviour, you don't have agentic reasoning, you have a script with an LLM bolted on for the parts that needed natural language generation.

# This is NOT agentic, even though it calls an LLM
def fake_agent(customer_id: int):
    customer = search_database(f"id:{customer_id}")
    if customer['days_since_order'] > 30:
        email_body = simple_llm_call(
            f"Write a follow-up email for customer {customer_id}"
        )
        send_email(customer['email'], "Update", email_body)

The LLM here is doing text generation, not reasoning about what action to take. The decision logic is all in your Python if statement. This is a perfectly reasonable architecture for plenty of use cases, but it's not an agent. It's a script that uses an LLM as a writing tool.

The real agent version lets the model decide whether to check the order date, whether the result warrants an email, what the email should say, and whether to retry if the email send fails, all through its own reasoning over the conversation history, not through your hardcoded control flow.

Memory: The Property Most Wrapper Implementations Skip

A genuine agent maintains state across the interaction, which means it can reference earlier observations when making later decisions.

class AgentSession:
    """
    Memory across a task execution. This is what 
    separates a stateful agent from a stateless 
    LLM call repeated in a loop.
    """

    def __init__(self):
        self.messages = []
        self.tool_call_history = []
        self.observations = {}

    def add_observation(self, tool_name: str, result: dict):
        self.tool_call_history.append({
            "tool": tool_name,
            "result": result,
            "step": len(self.tool_call_history)
        })
        # Agent can later ask "did I already check this?"
        self.observations[tool_name] = result

    def already_attempted(self, tool_name: str) -> bool:
        """
        This prevents the agent from repeating failed 
        actions indefinitely — a common failure mode
        in naive implementations.
        """
        return tool_name in self.observations

Without this, a naive agent loop can call the same failing tool repeatedly because it has no memory of having already tried it. This is one of the most common production bugs in early agent implementations, the agent gets stuck in a loop because nothing tells it "you already tried this and it didn't work."

Why This Distinction Actually Matters

This isn't pedantry. The architectural decision has real consequences.

If your task is single-turn, answer this question, summarise this document, classify this text, an LLM wrapper is simpler, cheaper, faster, and easier to debug. Building agent infrastructure for a task that doesn't need planning or tool orchestration is wasted complexity.

If your task requires multiple steps where each step's outcome affects the next decision, process this application by checking three systems and deciding what to do based on what you find, you need actual agentic architecture. An LLM wrapper will fail because it can't adapt mid-task to what it discovers.

The full breakdown of AI agent vs LLM architectures, including when each is the right call for specific production use cases, covers the decision framework in more depth than fits here.

What's Next: RAG vs Agentic Architecture

Now that the LLM vs agent distinction is clear, the next architectural decision most teams face is RAG vs agentic AI and these get confused almost as often as LLM vs agent does. RAG solves the problem of grounding LLM responses in your specific documents. Agentic architecture solves the problem of taking multi-step actions across systems. They're different tools for different problems, and the RAG vs agentic AI comparison we published covers exactly where each one fits, including the hybrid pattern where an agent uses RAG as one of its tools.

Published by Dextra Labs | AI Consulting & Enterprise Agent Development