How to Build an AI Agent from Scratch Using Claude API (With Full Code)

#architecture #ai #duedilligence #softwareengineering

I've built a lot of AI demos that looked impressive in a notebook and fell apart in production. The usual culprit? Treating an LLM like a search engine, one prompt in, one answer out, instead of what it actually is: a reasoning engine you can wire into real workflows.

This tutorial is about doing it properly. We're going to build a functional AI agent using Anthropic's Claude API from the ground up, not a wrapper around a framework, but the actual mechanics: a ReAct loop, custom tool use, and a structure you can actually deploy. By the end you'll have running code and a mental model that makes every agent tutorial after this one make sense.

Let's get into it.

What We're Actually Building

The agent we're building will:

Accept a user query
Decide which tools it needs to answer
Call those tools, observe the results
Reason over the results and either call more tools or return a final answer

This pattern is called ReAct (Reasoning + Acting). It's the backbone of most production agents and it maps cleanly onto how Claude's tool use API works for teams looking to Build AI agents with Claude for real-world automation.

Prerequisites

bash pip install anthropic python-dotenv

You'll need a Claude API key from console.anthropic.com. Store it safely:

bash .env ANTHROPIC_API_KEY=your_key_here

Step 1: Basic Claude API Setup

Before building the agent, let's confirm you can talk to Claude.

python
import os
import anthropic
from dotenv import load_dotenv

load_dotenv()

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

def ask_claude(prompt: str) -> str:
    message = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": prompt}
        ]
    )
    return message.content[0].text

# Quick test
print(ask_claude("What is 2 + 2? Answer in one word."))

This is the foundation. If this runs cleanly, you're ready to build on it. For a deeper breakdown of model selection and API parameters, the how to use Claude API tutorial from Dextra Labs is worth reading before you go further.

Step 2: Define Your Tools

Tools are the agent's hands. Without them, Claude can only reason, it can't act. We'll define three tools that our agent can use: a calculator, a web search simulator, and a file writer.
In Claude's API, tools are defined as JSON schemas. Claude reads these schemas and decides when and how to call them.

python
tools = [
    {
        "name": "calculator",
        "description": "Performs basic arithmetic. Use this for any math operations.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Math expression to evaluate, e.g. '15 * 24 + 100'"
                }
            },
            "required": ["expression"]
        }
    },
    {
        "name": "web_search",
        "description": "Searches the web for current information on a topic.",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                    "description": "The search query"
                }
            },
            "required": ["query"]
        }
    },
    {
        "name": "save_to_file",
        "description": "Saves text content to a local file.",
        "input_schema": {
            "type": "object",
            "properties": {
                "filename": {"type": "string"},
                "content": {"type": "string"}
            },
            "required": ["filename", "content"]
        }
    }
]

Now let's write the actual Python functions that execute when Claude calls these tools:

python
import math

def calculator(expression: str) -> str:
    try:
        # Safe eval for math expressions
        allowed = {k: v for k, v in math.__dict__.items() 
                   if not k.startswith("__")}
        result = eval(expression, {"__builtins__": {}}, allowed)
        return f"Result: {result}"
    except Exception as e:
        return f"Error: {str(e)}"

def web_search(query: str) -> str:
    # In production, wire this to SerpAPI, Tavily, or Brave Search
    # Simulated response for tutorial purposes
    return (f"Search results for '{query}': "
            f"[Simulated] Top result: Relevant information about {query} "
            f"from authoritative sources. Published 2025.")

def save_to_file(filename: str, content: str) -> str:
    try:
        with open(filename, 'w') as f:
            f.write(content)
        return f"Successfully saved to {filename}"
    except Exception as e:
        return f"Error saving file: {str(e)}"

Tool dispatcher
def execute_tool(tool_name: str, tool_input: dict) -> str:
    if tool_name == "calculator":
        return calculator(tool_input["expression"])
    elif tool_name == "web_search":
        return web_search(tool_input["query"])
    elif tool_name == "save_to_file":
        return save_to_file(tool_input["filename"], tool_input["content"])
    else:
        return f"Unknown tool: {tool_name}"

The dispatcher is intentionally simple here. In production you'd want a registry pattern, but for learning, explicit is better than clever.

Step 3: Build the ReAct Agent Loop

This is the core of the tutorial. The ReAct loop works like this:

Send the user query + available tools to Claude
Claude either returns a final answer OR a tool call request
If tool call → execute it, send result back to Claude
Repeat until Claude returns a final answer

python
def run_agent(user_query: str, max_iterations: int = 10) -> str:
    print(f"\n{'='*50}")
    print(f"User: {user_query}")
    print(f"{'='*50}")

    messages = [
        {"role": "user", "content": user_query}
    ]

    system_prompt = """You are a helpful AI agent with access to tools.
    Think step by step. Use tools when you need real data or calculations.
    When you have enough information, provide a clear final answer."""

    for iteration in range(max_iterations):
        print(f"\n[Iteration {iteration + 1}]")

        # Call Claude with tools
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            system=system_prompt,
            tools=tools,
            messages=messages
        )

        print(f"Stop reason: {response.stop_reason}")

        # If Claude is done reasoning, return the final answer
        if response.stop_reason == "end_turn":
            final_answer = ""
            for block in response.content:
                if hasattr(block, 'text'):
                    final_answer += block.text
            print(f"\nFinal Answer: {final_answer}")
            return final_answer

        # If Claude wants to use tools
        if response.stop_reason == "tool_use":
            # Add Claude's response to message history
            messages.append({
                "role": "assistant",
                "content": response.content
            })

            # Process each tool call
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    print(f"  Tool: {block.name}")
                    print(f"  Input: {block.input}")

                    # Execute the tool
                    result = execute_tool(block.name, block.input)
                    print(f"  Result: {result}")

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })

            # Send tool results back to Claude
            messages.append({
                "role": "user",
                "content": tool_results
            })

    return "Max iterations reached without a final answer."

The key insight here is the message history. Every tool call and result gets appended to messages, so Claude always has full context of what it's already tried. This is what separates a stateful agent from a stateless chatbot.

Step 4: Run It

python
if __name__ == "__main__":
    # Test 1: Math + file output
    result = run_agent(
        "Calculate compound interest on $10,000 at 7% for 10 years, "
        "then save the result to 'investment.txt'"
    )

    # Test 2: Research + synthesis
    result = run_agent(
        "Search for information about RAG architecture "
        "and summarize the key components."
    )

    Test 3: Multi-step reasoning
    result = run_agent(
        "What is the square root of 144 multiplied by the number of days in a leap year?"
    )

Run this and watch the agent reason through each step in your terminal. The iteration logs show you exactly how Claude decides which tool to call and when to stop.

Step 5: Adding Memory (The Production Upgrade)

The agent above is stateless, each run_agent call starts fresh. For real applications you need conversation memory. Here's a minimal implementation:

python
class AgentWithMemory:
    def __init__(self):
        self.conversation_history = []
        self.client = anthropic.Anthropic(
            api_key=os.getenv("ANTHROPIC_API_KEY")
        )

    def chat(self, user_message: str) -> str:
        # Add user message to history
        self.conversation_history.append({
            "role": "user",
            "content": user_message
        })

        response = self.client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            system="You are a helpful assistant with memory of our conversation.",
            tools=tools,
            messages=self.conversation_history
        )

        # Handle tool use within persistent history
        if response.stop_reason == "tool_use":
            self.conversation_history.append({
                "role": "assistant",
                "content": response.content
            })
            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    result = execute_tool(block.name, block.input)
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result
                    })
            self.conversation_history.append({
                "role": "user",
                "content": tool_results
            })
            # Recursive call to get final answer
            return self.chat("")

        assistant_message = response.content[0].text
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_message
        })
        return assistant_message

## **Usage**
agent = AgentWithMemory()
print(agent.chat("My budget is $50,000. Calculate 7% annual return over 5 years."))
print(agent.chat("Now do the same calculation but for 10 years."))

**Claude remembers the $50,000 and 7% from the first message**

The conversation_history list is doing all the heavy lifting here. In production you'd persist this to Redis or a database between sessions.

What to Build Next

Once this is running, the natural next steps are:
Streaming responses — use client.messages.stream() for real-time output in web apps.
Error handling and retries — wrap tool calls in try/except with exponential backoff.
Async execution — parallel tool calls with asyncio cut latency significantly on multi-tool queries.
Structured outputs — use Pydantic models to enforce tool input/output schemas.

For the full architecture patterns and production deployment strategies, [Dextra Labs published an in-depth guide on Claude AI agents architecture and deployment] covering containerization, monitoring, and scaling patterns beyond what fits in a single tutorial.
The full repo for this tutorial is available at: github.com/dextralabs/claude-agent-tutorial

Quick Recap

What you just built is a genuine ReAct agent, not a chatbot with a system prompt, but a reasoning loop that can call real functions, observe results, and chain multiple steps together. The same pattern powers production agents handling customer support, code review, document analysis, and research workflows at scale. Many enterprises now partner with providers offering ai agent development service to deploy these systems securely and efficiently.
The code here is intentionally minimal. Strip away the frameworks and this is what's underneath all of them.

Looking to learn more, let's explore our important guides:

How to Fine-Tune Claude on Amazon Bedrock for Your Domain (Complete Guide with Code)

Claude Code vs Cursor vs Windsurf: I Used All Three for 2 Weeks, Here's My Honest Take

Claude MCP Explained: Building Enterprise AI Integrations That Actually Scale