Learn how LLM-powered agents actually work by building one yourself in under 100 lines of Python.
Introduction
Everyone's talking about AI agents. LangChain, AutoGen, CrewAI — the ecosystem is exploding. But if you've ever tried to debug one of these frameworks, you know the feeling: layers upon layers of abstraction, and you have no idea what's actually happening under the hood.
So let's skip the frameworks entirely.
In this tutorial, we'll build a fully functional AI agent from scratch using nothing but Python and the OpenAI API. By the end, you'll understand exactly how agents work — and you'll never feel lost in a framework again.
What Even Is an AI Agent?
A regular LLM call is stateless: you send a prompt, you get a response, done.
An agent is different. It can:
- Think about what steps to take
- Use tools (like searching the web, running code, reading files)
- Loop — observe results and decide what to do next
The core loop looks like this:
Think → Act → Observe → Repeat
This is often called the ReAct pattern (Reasoning + Acting), and it's the foundation of almost every agent framework out there.
What We're Building
A simple agent that can answer questions using two tools:
-
calculator— evaluates math expressions -
get_weather— returns (fake) weather data for a city
The agent will figure out which tool to call, call it, observe the result, and keep going until it has a final answer.
Prerequisites
- Python 3.9+
- An OpenAI API key (or swap in any LLM with tool/function calling)
- Basic Python knowledge
Install the dependency:
pip install openai
Step 1: Define Your Tools
Tools are just Python functions. We'll describe them to the LLM using JSON schema so it knows how to call them.
import json
import math
# --- Tool implementations ---
def calculator(expression: str) -> str:
"""Safely evaluate a math expression."""
try:
result = eval(expression, {"__builtins__": {}}, {"sqrt": math.sqrt, "pow": pow})
return str(result)
except Exception as e:
return f"Error: {e}"
def get_weather(city: str) -> str:
"""Return fake weather data for demo purposes."""
fake_data = {
"london": "Cloudy, 15°C",
"new york": "Sunny, 22°C",
"tokyo": "Rainy, 18°C",
}
return fake_data.get(city.lower(), "Weather data not available for this city.")
# --- Tool registry ---
TOOLS = {
"calculator": calculator,
"get_weather": get_weather,
}
# --- Tool descriptions for the LLM ---
TOOL_SCHEMAS = [
{
"type": "function",
"function": {
"name": "calculator",
"description": "Evaluate a mathematical expression and return the result.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "A valid Python math expression, e.g. '2 + 2' or 'sqrt(16)'",
}
},
"required": ["expression"],
},
},
},
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given city.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city, e.g. 'London'",
}
},
"required": ["city"],
},
},
},
]
Step 2: Build the Agent Loop
Here's the heart of the agent — a loop that keeps running until the LLM produces a final text answer (no more tool calls).
from openai import OpenAI
client = OpenAI() # uses OPENAI_API_KEY from environment
def run_agent(user_query: str):
print(f"\n🧠 User: {user_query}\n")
messages = [
{
"role": "system",
"content": (
"You are a helpful assistant. Use tools when needed. "
"Once you have enough information, provide a final answer."
),
},
{"role": "user", "content": user_query},
]
# Agent loop
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOL_SCHEMAS,
tool_choice="auto",
)
message = response.choices[0].message
# If no tool calls → we have a final answer
if not message.tool_calls:
print(f"✅ Agent: {message.content}")
return message.content
# Process each tool call
messages.append(message) # add assistant message with tool_calls
for tool_call in message.tool_calls:
tool_name = tool_call.function.name
tool_args = json.loads(tool_call.function.arguments)
print(f"🔧 Calling tool: {tool_name}({tool_args})")
# Execute the tool
tool_fn = TOOLS.get(tool_name)
if tool_fn:
result = tool_fn(**tool_args)
else:
result = f"Error: tool '{tool_name}' not found."
print(f"📊 Tool result: {result}\n")
# Feed the result back to the LLM
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result,
})
Step 3: Run It!
if __name__ == "__main__":
run_agent("What is the square root of 144, and what's the weather like in Tokyo?")
Output:
🧠 User: What is the square root of 144, and what's the weather like in Tokyo?
🔧 Calling tool: calculator({'expression': 'sqrt(144)'})
📊 Tool result: 12.0
🔧 Calling tool: get_weather({'city': 'Tokyo'})
📊 Tool result: Rainy, 18°C
✅ Agent: The square root of 144 is **12**. As for Tokyo, it's currently **rainy with a temperature of 18°C**. 🌧️
That's it. A working agent in ~80 lines of Python.
How It Works — The Full Picture
Let's trace what happened:
- User asks a question with two parts
-
LLM decides it needs two tools and returns
tool_callsinstead of a text answer - We execute each tool and append the results to the message history
- LLM sees the results and synthesizes a final answer
- Loop ends because there are no more tool calls
The key insight: the LLM never actually runs the tools. It just says "call this function with these arguments." Your code does the actual execution and feeds the result back. The LLM is the brain; your code is the hands.
Taking It Further
Now that you understand the core pattern, here's how to extend it:
| Idea | How |
|---|---|
| Add more tools | Write a function + add its schema to TOOL_SCHEMAS
|
| Web search | Use the Serper or Tavily API as a tool |
| Memory | Persist messages to a database between sessions |
| Multi-agent | One agent's output becomes another agent's tool |
| Streaming | Use stream=True to print tokens as they arrive |
Why Build from Scratch First?
Frameworks like LangChain are powerful, but they hide the loop. When something breaks — and it will — you need to know what's actually happening.
Building from scratch gives you:
- Full control over every message sent to the LLM
- Easy debugging — no magic, no hidden prompts
- Framework fluency — you'll understand LangChain/AutoGen much faster now
Conclusion
Agents aren't magic. They're just LLMs in a loop with access to tools. The ReAct pattern — Think, Act, Observe, Repeat — is simple once you see it in code.
You now have a working agent and, more importantly, a mental model of how every major agent framework operates under the hood.
What will you build with it? Drop your ideas in the comments 👇
Top comments (0)