Agents 101: Reasoning, Actions & Autonomy

#ai #agents

What is an AI agent?

An AI agent is a system that uses a large language model to make decisions and take actions in pursuit of a goal. It calls tools, observes what they return, and iterates until the goal is reached. A chatbot waits for the next message; an agent plans and executes its own sequence of steps.

Why it matters

The term entered the mainstream in late 2022, when projects like AutoGPT showed that LLMs could direct their own execution. The concept wasn't new. Researchers had been studying goal-directed autonomous systems for decades. What changed was accessibility: capable base models (GPT-4, Claude) and standardized tool-calling APIs made it practical to build a working agent in a few dozen lines of code.

The word now gets used loosely. Some vendors call a chatbot with a search feature an agent. Others claim that any LLM inference with retrieval is "agentic." This inflation matters. It obscures what's actually new and what's repackaging. Precision helps you know what you're building or evaluating.

Agents represent a shift in how LLMs are deployed. The old model: user asks a question, system returns an answer, conversation ends. Agents invert that. The system receives a goal, decides on sub-goals, gathers information, corrects itself, and iterates without waiting for permission between steps. New architecture. New error handling. New thinking about safety and observability.

Agents vs. chatbots vs. workflows vs. traditional AI

A quick way to distinguish these four categories is to ask: does it use an LLM to decide what to do next? And can it call tools to act on those decisions?

Chatbots use an LLM to generate text. They don't call tools, and they don't pursue goals across steps. A customer-service chatbot answers your question. It doesn't modify your account or call internal APIs unless you ask. Even then, it tends to suggest options or retrieve data rather than decide and act. The LLM's job is to understand and respond.

Workflows call tools and pursue goals. They don't use an LLM to decide which tool to call or how to interpret the result. A workflow might be: fetch customer data, run a validation rule, log an event, send an email. Each step is predefined. Branching is rule-based. The LLM is not in the loop. Workflows are predictable and cheap. They break when the task is ambiguous or open-ended.

Agents combine both. The LLM observes the current state and decides which tool to call next. It adapts and self-corrects as it goes. If a tool call fails, the agent reasons about why and tries something else. The flexibility costs you something. Agents are less predictable, more expensive per inference, and harder to debug. The reward is open-ended tasks, where the path isn't predetermined.

Traditional AI/ML systems (classifiers, regressions, recommenders) optimize a fixed function learned from data. They have no LLM, and they don't pursue multi-step goals. They are specialized and efficient. Generalizing to a new task means retraining.

The table below summarizes the differences:

Aspect	Chatbot	Workflow	Agent	Traditional ML
Uses LLM to decide next step?	No (generates text)	No (follows rules)	Yes	No
Calls tools?	Rarely; usually retrieval only	Yes; predefined sequence	Yes; chosen by LLM	No
Pursues multi-step goal?	No (responds to input)	Yes; fixed path	Yes; adaptive path	No
Handles ambiguous tasks?	Moderate (can discuss)	Poor (requires rigid structure)	Good (can reason and adapt)	Poor

The ReAct pattern and core components

Most agents built since 2023 follow a pattern called ReAct (Reasoning and Acting), introduced in Yao et al.'s 2022 paper from Google Research and Princeton. The idea is straightforward. The LLM produces reasoning steps (thinking aloud about what it needs to do) interleaved with actions (tool calls). It observes the result, then reasons further.

A ReAct loop looks like this:

Observation: The agent observes the current state (the original goal, prior tool results, conversation history).
Reasoning: The LLM thinks through the problem: "I need to fetch the user's account, check their history, then decide whether to approve the request."
Action: The agent calls a tool, say fetch_account(user_id).
Observation: The agent receives the result and feeds it back to the LLM.
Loop: The LLM reasons again, decides on the next action, and repeats until it either reaches the goal or determines that the goal isn't achievable.

The pattern works because the reasoning traces make the LLM's decisions interpretable. You can see why it chose an action. They also enable self-correction: if a tool result is unexpected, the LLM can reason about what went wrong.

An agent's core components are:

The LLM (reasoning engine): Decides what action to take based on the goal and current state. The decision-making layer.
Tools (action layer): Functions the agent can call: APIs, database queries, code execution, web searches, file operations. Tools are how the agent affects the world.
Context and memory (state): Everything the agent knows: the original goal, conversation history, prior tool results, and any persistent state it needs. Without good memory management, agents hallucinate and repeat mistakes.
Control loop (orchestration): The code that runs the loop. It calls the LLM, parses the output for tool calls, executes them, and feeds results back. Modern frameworks (Anthropic's Claude SDK, LangChain, LlamaIndex) handle this. You can also implement it from scratch.

Levels of autonomy

Agents exist on a spectrum. On one end are suggestion-based copilots that nudge you. On the other are autonomous systems that run unattended for hours.

Copilot mode (suggestion): The agent observes what you're doing and suggests the next action. You approve before it executes. Example: Cursor's autocomplete suggests the next line of code; you hit Tab to accept or Escape to reject. The model is doing some reasoning. You stay in control of execution.

Agentic mode (supervised autonomy): The agent makes and executes decisions within a scope you define. You might say "add tests for this file" and the agent writes tests, runs them, and shows you the result, all without asking permission between steps. You can pause or override at any point. Example: Claude Code in an IDE, or an agent working a bounded coding task. The agent is autonomous within the scope, not globally.

Autonomous agent (unattended): The agent pursues a goal with minimal human oversight. You set a goal ("reduce our average response time by 10%") and the agent decides what to measure, what to try, what to roll back, and what to keep. It might run for days, making changes and watching outcomes. Example: an agent managing an experimentation platform, or optimizing an ad-bidding algorithm. These are rare and tend to be domain-specific. The cost of mistakes is too high for general-purpose deployment.

Notable tools

Here are some widely used agent runtimes and frameworks, current as of 2026:

Claude Code (anthropic.com/product/claude-code): Anthropic's agentic coding tool in the terminal, IDE, and browser. Understands your codebase, executes tasks, and handles git workflows.
Cursor (cursor.com): AI code editor with agent mode. Autonomously explores your codebase, edits files, runs tests, and implements features.
OpenHands (openhands.dev): Open-source autonomous agent for software engineering. Runs in a Docker sandbox, can execute complex tasks end-to-end, and publishes pull requests.
Aider (aider.chat): Open-source AI pair programmer for the terminal. Works with your git workflow, supports multiple LLM providers, and commits changes automatically.
Continue (continue.dev): Open-source IDE extension for VS Code and JetBrains. Offers autocomplete, chat, and agent modes, works with any LLM provider.
AutoGPT (agpt.co): Open-source autonomous agent framework, released in 2023. Pioneering example of general-purpose agent architecture; known for demonstrating both promise and limitations of autonomous systems.

Common questions

How is an agent different from a chatbot?

A chatbot responds. An agent pursues. Ask a chatbot "book me a flight" and it asks clarifying questions, then waits for you to confirm. Ask an agent and it gathers options, checks your calendar, considers your budget, and books, without asking permission between steps. The chatbot reacts. The agent acts.

What's the difference between an agent and a workflow?

A workflow is a fixed sequence of steps determined in advance. You define "do A, then B, then C, with these rules for branching." A workflow always takes the same path for the same inputs. An agent reasons about which steps to take and in what order, adapting based on intermediate results. Workflows are predictable and efficient. Agents trade predictability for flexibility.

Why does my agent keep calling the same tool five times in a row?

That's a loop, and the LLM probably doesn't recognize what the tool returned as the answer it was looking for. Common causes: the tool returned an error and the agent retried with the same inputs; the response shape was different from what the LLM expected, so it kept trying; the system prompt left the goal vague enough that the LLM thrashes between candidates. Fixes that work: clearer descriptions in your tool schema, explicit error messages from the tool ("not found" rather than null), and a hard call-count budget so the loop terminates rather than burning tokens.

How autonomous do agents actually get?

Depends on the task and the risk. In low-risk domains (code suggestions, documentation), agents run nearly unsupervised. In higher-risk domains (financial transactions, customer-facing decisions), agents operate under constraints: bounded scope, human review loops, or escalation to a human when confidence is low. Most production agents are supervised autonomy, not full autonomy.

Is it normal for a single Claude Code session to cost $40?

Not normal, not rare. A long session that maintains a big context and re-reads files often will pile up tokens fast. Three places to look. First, prompt caching: is the run hitting the cache, or rebuilding the prompt every turn? Second, context bloat: huge system prompts, large repos, and many open files multiply per-call cost. Third, model choice: Opus is meaningfully pricier than Sonnet on the same workload. Set a hard spend cap and watch tokens per turn. Most overruns trace to context size, not call count.

Why do some agents get stuck or make silly mistakes?

Agents inherit their LLM's limitations. An LLM can hallucinate or misinterpret what a tool returned. Across multiple reasoning steps, these errors compound. A bad tool result leads the agent down the wrong path. Confirmation bias makes it ignore contradictory evidence. Good design mitigates the failure modes: clear tool descriptions, explicit error signals from tools, and a memory model that lets the agent backtrack rather than press on with bad state.