DEV Community

Cover image for The Anatomy of an AI Agent: Memory, Tools, Planning, and Execution Explained
Omer Farooq
Omer Farooq

Posted on

The Anatomy of an AI Agent: Memory, Tools, Planning, and Execution Explained

Everyone's talking about AI agents. But most explanations jump straight to frameworks — LangChain, CrewAI, AutoGen — without explaining what an agent actually is under the hood.

Before you pick a framework, you need to understand the four building blocks every agent is made of: memory, tools, planning, and execution. Get these right in your head and every framework, every paper, every architecture diagram suddenly makes sense.

Let's break it down.


What Makes Something an "Agent"?

A regular LLM call is stateless. You send a prompt, you get a response, it's done. No memory of what came before. No ability to take action in the world. No loop.

An agent is different. At its simplest, an agent is an LLM in a loop — one that can observe its environment, decide what to do next, take an action, and then observe the result of that action before deciding again.

Observe → Think → Act → Observe → Think → Act → ...
Enter fullscreen mode Exit fullscreen mode

That loop is what separates a chatbot from an agent. And the four components below are what make the loop work.


1. Memory — What the Agent Knows

Memory is how an agent maintains context — about the task, about past actions, about the world it's operating in. Without memory, every iteration of the loop starts from scratch.

There are four types of memory in agent systems:

In-context memory is the simplest kind — it's just the conversation history inside the LLM's context window. Everything the agent has seen and done so far, appended as a list of messages. Fast and easy, but limited by the context window size. For long-running tasks, it overflows.

External memory (vector stores) solves the overflow problem. Past interactions, documents, and knowledge are embedded and stored in a vector database like Supabase, Pinecone, or Chroma. When the agent needs information, it retrieves the most relevant chunks by semantic similarity rather than loading everything at once.

Episodic memory is a record of what the agent has done before — not the raw text, but structured summaries. "In run #12, the user asked about Dubai real estate and responded positively to AED pricing." This is how agents get better over time.

Semantic memory is domain knowledge — facts, rules, and context baked in via a knowledge base or system prompt. In a real estate agent, this is the RERA regulations, property types, and area guides.

In practice, most production agents use a combination: in-context memory for the current session, a vector store for long-term retrieval, and a structured knowledge base for domain facts.


2. Tools — What the Agent Can Do

An LLM on its own can only produce text. Tools are what connect that text to the real world — they let the agent read files, search the web, call APIs, write to databases, and send messages.

Every tool has the same basic structure: a name, a description (so the LLM knows when to use it), and a function that actually executes.

Here's a simple example using the Anthropic SDK's tool use format:

tools = [
    {
        "name": "search_properties",
        "description": "Search for available properties in a Dubai community. Use when the user asks about listings, availability, or pricing.",
        "input_schema": {
            "type": "object",
            "properties": {
                "community": {"type": "string", "description": "e.g. Downtown Dubai, JVC, Palm Jumeirah"},
                "bedrooms": {"type": "integer"},
                "max_budget_aed": {"type": "number"}
            },
            "required": ["community"]
        }
    }
]
Enter fullscreen mode Exit fullscreen mode

The LLM reads the tool description and decides — based on the current context — whether to call it and with what arguments. This decision is called tool selection, and it's where the quality of your descriptions matters enormously. A vague description leads to wrong tool calls. A precise one leads to accurate, reliable behaviour.

Common tool categories in real-world agents:

  • Read tools — search, fetch, query (read-only, safe to call freely)
  • Write tools — create, update, delete (need guard rails)
  • Communication tools — send email, send WhatsApp, post to Slack
  • Compute tools — run code, calculate, transform data

A good rule of thumb: start with read-only tools and add write tools only once the agent's decision-making is reliable.


3. Planning — How the Agent Thinks

Planning is how an agent breaks a complex goal into a sequence of steps. Without planning, an agent can only handle single-shot tasks. With planning, it can handle multi-step workflows that take minutes or hours to complete.

There are two dominant planning patterns:

ReAct (Reason + Act) is the most common. The agent alternates between reasoning steps ("I need to find the user's budget before searching properties") and action steps (calling the search_properties tool). The reasoning is written out in natural language — often called a "scratchpad" or "chain of thought" — before each action.

Thought: The user wants a 2-bedroom in JVC under AED 1.2M. 
         I should search for listings first, then check payment plans.
Action: search_properties(community="JVC", bedrooms=2, max_budget_aed=1200000)
Observation: Found 8 listings. Cheapest is AED 980K at Bloom Heights.
Thought: Now I should get payment plan details for the top 3 listings.
Action: get_payment_plans(listing_ids=["BH-01", "BH-02", "BH-03"])
...
Enter fullscreen mode Exit fullscreen mode

Plan-and-Execute is a two-stage approach. A planner LLM first generates the full sequence of steps as a structured plan. A separate executor LLM then works through each step. This is more reliable for complex tasks because the planning step can be reviewed or modified before execution begins.

Most agents you'll build day-to-day use ReAct. Plan-and-Execute is worth adding when tasks involve more than 5–6 sequential steps or when you need a human approval checkpoint mid-task.


4. Execution — The Loop That Holds It Together

Execution is the runtime — the code that drives the agent loop, manages state between steps, handles tool call results, decides when to stop, and surfaces output to the user.

A minimal agent loop in Python looks like this:

import anthropic

client = anthropic.Anthropic()

def run_agent(user_message: str, tools: list, system: str):
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            system=system,
            tools=tools,
            messages=messages
        )

        # Agent has finished — return final text response
        if response.stop_reason == "end_turn":
            return response.content[0].text

        # Agent wants to use a tool
        if response.stop_reason == "tool_use":
            tool_use_block = next(b for b in response.content if b.type == "tool_use")

            # Execute the tool
            tool_result = execute_tool(tool_use_block.name, tool_use_block.input)

            # Append assistant response + tool result to message history
            messages.append({"role": "assistant", "content": response.content})
            messages.append({
                "role": "user",
                "content": [{
                    "type": "tool_result",
                    "tool_use_id": tool_use_block.id,
                    "content": str(tool_result)
                }]
            })
            # Loop continues — agent sees the result and decides next step
Enter fullscreen mode Exit fullscreen mode

A few things worth noting in this loop:

  • stop_reason == "end_turn" means the agent decided it's done. This is your exit condition.
  • stop_reason == "tool_use" means the agent wants to call a tool. You execute it and append the result so the agent can continue.
  • The full message history is passed on every iteration. This is how the agent "remembers" what it has done so far in the current session.

The execution layer is also where you handle guardrails — maximum iteration limits (prevent infinite loops), cost controls (stop after N tool calls), and human-in-the-loop checkpoints (pause and ask before writing to a database).


Putting It All Together

Here's how the four components work as a system in a real scenario — a real estate lead qualification agent:

Component What it's doing
Memory Holds the lead's messages in-context; retrieves community pricing from a Supabase vector store
Tools search_properties, check_visa_eligibility, send_whatsapp, create_crm_lead
Planning ReAct loop: reasons about lead intent → searches listings → qualifies budget → decides to escalate or nurture
Execution Python loop with a 10-iteration cap and a human escalation hook if confidence score drops below 0.6

None of these components is complicated on its own. The power comes from combining them — and understanding which one to improve when something goes wrong.

Agent giving wrong answers? Usually a memory problem — missing context or bad retrieval.
Agent calling the wrong tool? Usually a tools problem — vague descriptions.
Agent getting stuck in circles? Usually a planning problem — no clear stopping condition.
Agent crashing mid-task? Usually an execution problem — no error handling in the loop.


What's Next

Now that you understand the internals, you have a solid foundation for everything that comes next: multi-agent orchestration, RAG pipelines, streaming responses, and production deployments.

The framework you choose — LangGraph, CrewAI, or plain Python — is just scaffolding around these four components. Build one from scratch first. You'll understand every framework ten times faster once you've felt the loop run under your own hands.


I'm a Dubai-based AI engineer and automation consultant building agentic workflows and RAG pipelines for clients across UAE and Saudi Arabia. Follow me for more practical AI engineering content.

Top comments (0)