- Book: Observability for LLM Applications — paperback and hardcover on Amazon · Ebook from Apr 22
- My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
- Me: xgabriel.com | GitHub
You have called the chat completions endpoint. You sent a list of messages, you got a string back, you printed it. Then someone said the word agent and it started sounding like a different thing, with frameworks and graphs and AgentExecutor classes and a tutorial that imports eleven things before it does any work.
An agent is not a different thing. It is a while-loop around the same chat completions call you already know, with two new pieces bolted on: a list of tools the model is allowed to ask for, and the code that actually runs them when it does.
This post builds one from scratch. Fifty lines of Python, the OpenAI SDK, no framework. The agent answers questions like "what's the weather in Lisbon and what time is it there" by deciding on its own which tools to call, in what order, and when to stop. Once you have read it, the LangChain source will stop looking like magic.
The whole program, first
Here is the entire agent. Read it once, then we walk through why each piece exists.
# agent.py
import json
from datetime import datetime
from zoneinfo import ZoneInfo
from openai import OpenAI
client = OpenAI()
MODEL = "gpt-4o-mini"
def get_weather(city: str) -> str:
fake = {"Lisbon": "18C, clear", "Berlin": "7C, rain"}
return fake.get(city, "no data")
def get_time(tz: str) -> str:
return datetime.now(ZoneInfo(tz)).strftime("%H:%M")
TOOLS = [
{"type": "function", "function": {
"name": "get_weather",
"description": "Current weather for a city.",
"parameters": {"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"]}}},
{"type": "function", "function": {
"name": "get_time",
"description": "Current local time for an IANA tz.",
"parameters": {"type": "object",
"properties": {"tz": {"type": "string"}},
"required": ["tz"]}}},
]
DISPATCH = {"get_weather": get_weather, "get_time": get_time}
def run(user_msg: str, max_steps: int = 6) -> str:
messages = [{"role": "user", "content": user_msg}]
for _ in range(max_steps):
resp = client.chat.completions.create(
model=MODEL, messages=messages, tools=TOOLS)
msg = resp.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
try:
result = DISPATCH[call.function.name](**args)
except Exception as e:
result = f"error: {e}"
messages.append({"role": "tool",
"tool_call_id": call.id, "content": str(result)})
return "stopped: step limit reached"
if __name__ == "__main__":
print(run("Weather in Lisbon and what time is it in Europe/Lisbon?"))
Install and run:
pip install openai
export OPENAI_API_KEY=sk-...
python agent.py
Output on a real run:
The weather in Lisbon is currently 18C and clear.
The local time in Europe/Lisbon is 14:37.
That is an agent. One file. Two tools. One loop. Let's open each piece.
The tools are just Python functions
get_weather and get_time are ordinary Python. They take arguments, return a string, and know nothing about LLMs. That matters: the model never runs code. Your process runs code. The model only names a function and supplies arguments; your program decides whether to call it.
The weather function is stubbed with a dict so the example runs offline. Swap it for a real API when you care — the agent loop does not change.
The tool schema is the contract
The TOOLS list is what the model sees. Each entry is JSON-schema for one function: a name, a natural-language description, and a parameter schema. The description is not decorative — it is the only thing telling the model when to reach for this tool instead of answering directly. Write it like a docstring for a very literal junior developer.
The schema goes on the API call via the tools= argument. The model looks at your user message, looks at the tool list, and decides whether to respond with text or with a tool_calls array asking you to run one.
The loop is where the work happens
for _ in range(max_steps):
resp = client.chat.completions.create(
model=MODEL, messages=messages, tools=TOOLS)
msg = resp.choices[0].message
messages.append(msg)
if not msg.tool_calls:
return msg.content
...
Four things to notice.
One: the loop is capped. max_steps=6 is the safety rail. Without it, a confused model that keeps calling tools in a circle runs forever and burns your account. A public incident last November cost one team $47,000 because four LangChain agents looped for eleven days. A for loop with a bound is the first defense.
Two: the exit condition is negative. You do not ask "is the agent done?" You ask "did the model skip asking for a tool?" When the model responds with plain content and no tool_calls, it is answering the user. That is how you know to stop.
Three: the assistant message goes back in. Every response from the model — text and tool calls — is appended to messages as-is. The next turn sees the full history. This is why the model knows it has already asked for the weather and does not ask again.
Four: parallel tool calls are free. Modern chat-completion models return a list of tool_calls, not one. When the user asks for weather and time, a single assistant turn can request both. The inner for call in msg.tool_calls handles that without any extra logic.
Running the tools and feeding results back
for call in msg.tool_calls:
args = json.loads(call.function.arguments)
try:
result = DISPATCH[call.function.name](**args)
except Exception as e:
result = f"error: {e}"
messages.append({"role": "tool",
"tool_call_id": call.id, "content": str(result)})
Three details:
-
call.function.argumentsis a JSON string, not a dict. The model writes JSON into a string field. You parse it. If the model produces invalid JSON,json.loadsraises — which is a tool-call-argument error, distinct from a tool-execution error, and in a production agent you would handle the two differently. -
DISPATCHis a name-to-function map. Nothing clever. It is the registry the model's function names resolve against. If the model hallucinates a tool that does not exist, you will get aKeyError; wrap it the same way. - The result goes back as a message with
role="tool"and the originaltool_call_id. That ID is how the model matches the answer to the question it asked. Lose the ID, break the conversation.
The try/except around the dispatch is the entire error-handling story. When a tool crashes, you return the error as a string instead of letting the exception escape. The model reads the error on the next turn and typically corrects itself — asks for a different city, retries with different arguments, or gives up and tells the user. An agent that panics on a raised exception stops being useful; an agent that sees the error message adapts.
The termination condition, said plainly
There are three ways this function returns:
- The model emits an assistant message with no tool calls. That is a finished answer; return it.
- The loop hits
max_stepswithout a finished answer. That is a safety stop; return a sentinel. - An unrecoverable error bubbles up — network, auth, something outside the tool dispatch. You did not handle this one and that is correct for a first agent. Let it crash loudly so you notice.
Most agent bugs are failures of termination. The model keeps asking for tools because the instructions are ambiguous, because one tool's result contradicts another, or because the model decided it needs just one more piece of information. The step cap is how you survive that without reading the logs tomorrow.
What LangChain adds (and what it costs)
A framework gives you tool decorators, structured output parsing, retry policies, memory abstractions, graph-based routing, and an execution tracer. Useful at scale. But every one of those features is built on top of the fifty lines above. When an agent misbehaves inside a framework, you debug it by mentally unrolling the abstraction until you are back at: what messages went in, what tool calls came out, what results went back.
Write the raw version once. After that, the framework is a convenience, not a black box.
What is missing from this agent
Things this code does not do, ordered by how badly you want them in production:
-
No per-tool timeout. A tool that hangs hangs the whole agent. Wrap each dispatch in
asyncio.wait_foror a thread with a timeout. -
No cost cap. The step limit bounds the number of turns but not the tokens per turn. Track
resp.usage.total_tokensacross the loop and stop when you cross a budget. - No tracing. You cannot see what the model asked for, what your tool returned, or how long each step took. That is survivable for one agent on one desk. It is not survivable in production.
-
No guardrails on tool arguments. The
**argsexpansion trusts the model. Ifget_weatherdid anything more dangerous than a dict lookup — a database query, a file read, a shell command — you would want strict argument validation before the function sees them. - No system prompt. For a toy, omitting it is fine. For a real agent, a system prompt that names the tools and the stopping rule cuts wasted turns sharply.
Try it yourself
Modify one thing at a time:
- Change
get_weatherto call a real API (Open-Meteo is free and keyless). - Add a third tool — a calculator, a file reader, a shell runner if you are brave.
- Print
resp.usageevery turn and watch what the loop actually costs. - Break the agent on purpose: make
get_timeraise, make the schema require a field the model forgets, lowermax_stepsto 2. Read the behavior.
Every agent framework you will ever use is a pile of abstractions over the loop above. Once the shape is in your hands, the frameworks become a shopping decision, not a mystery.
If this was useful
The agent above works. It is also the thing you will spend the next six months trying to see inside of — which tool got called, which argument drifted, which step burned fifteen thousand tokens for no reason. That is observability, and it is what I wrote a book about.
- Book: Observability for LLM Applications — paperback and hardcover now; ebook April 22. Covers OpenTelemetry GenAI semantic conventions, the agent span tree, cost and loop detection, and an incident playbook for the week your loop runs for eleven days.
- Hermes IDE: hermes-ide.com — the IDE for developers shipping with Claude Code and other AI coding tools.
- Me: xgabriel.com · github.com/gabrielanhaia.

Top comments (0)