Gabriel Anhaia

Posted on Apr 18

Your First AI Agent in 50 Lines of Python (No Framework, No LangChain)

#ai #python #llm #beginners

Book: Observability for LLM Applications — paperback and hardcover on Amazon · Ebook from Apr 22
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You have called the chat completions endpoint. You sent a list of messages, you got a string back, you printed it. Then someone said the word agent and it started sounding like a different thing, with frameworks and graphs and AgentExecutor classes and a tutorial that imports eleven things before it does any work.

An agent is not a different thing. It is a while-loop around the same chat completions call you already know, with two new pieces bolted on: a list of tools the model is allowed to ask for, and the code that actually runs them when it does.

This post builds one from scratch. Fifty lines of Python, the OpenAI SDK, no framework. The agent answers questions like "what's the weather in Lisbon and what time is it there" by deciding on its own which tools to call, in what order, and when to stop. Once you have read it, the LangChain source will stop looking like magic.

The whole program, first

Here is the entire agent. Read it once, then we walk through why each piece exists.

# agent.py
import json
from datetime import datetime
from zoneinfo import ZoneInfo
from openai import OpenAI

client = OpenAI()
MODEL = "gpt-4o-mini"

def get_weather(city: str) -> str:
    fake = {"Lisbon": "18C, clear", "Berlin": "7C, rain"}
    return fake.get(city, "no data")

def get_time(tz: str) -> str:
    return datetime.now(ZoneInfo(tz)).strftime("%H:%M")

TOOLS = [
    {"type": "function", "function": {
        "name": "get_weather",
        "description": "Current weather for a city.",
        "parameters": {"type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"]}}},
    {"type": "function", "function": {
        "name": "get_time",
        "description": "Current local time for an IANA tz.",
        "parameters": {"type": "object",
            "properties": {"tz": {"type": "string"}},
            "required": ["tz"]}}},
]

DISPATCH = {"get_weather": get_weather, "get_time": get_time}

def run(user_msg: str, max_steps: int = 6) -> str:
    messages = [{"role": "user", "content": user_msg}]
    for _ in range(max_steps):
        resp = client.chat.completions.create(
            model=MODEL, messages=messages, tools=TOOLS)
        msg = resp.choices[0].message
        messages.append(msg)
        if not msg.tool_calls:
            return msg.content
        for call in msg.tool_calls:
            args = json.loads(call.function.arguments)
            try:
                result = DISPATCH[call.function.name](**args)
            except Exception as e:
                result = f"error: {e}"
            messages.append({"role": "tool",
                "tool_call_id": call.id, "content": str(result)})
    return "stopped: step limit reached"

if __name__ == "__main__":
    print(run("Weather in Lisbon and what time is it in Europe/Lisbon?"))

Install and run:

pip install openai
export OPENAI_API_KEY=sk-...
python agent.py

Output on a real run:

The weather in Lisbon is currently 18C and clear.
The local time in Europe/Lisbon is 14:37.

That is an agent. One file. Two tools. One loop. Let's open each piece.

The tools are just Python functions

get_weather and get_time are ordinary Python. They take arguments, return a string, and know nothing about LLMs. That matters: the model never runs code. Your process runs code. The model only names a function and supplies arguments; your program decides whether to call it.

The weather function is stubbed with a dict so the example runs offline. Swap it for a real API when you care — the agent loop does not change.

The tool schema is the contract

The TOOLS list is what the model sees. Each entry is JSON-schema for one function: a name, a natural-language description, and a parameter schema. The description is not decorative — it is the only thing telling the model when to reach for this tool instead of answering directly. Write it like a docstring for a very literal junior developer.

The schema goes on the API call via the tools= argument. The model looks at your user message, looks at the tool list, and decides whether to respond with text or with a tool_calls array asking you to run one.

The loop is where the work happens

for _ in range(max_steps):
    resp = client.chat.completions.create(
        model=MODEL, messages=messages, tools=TOOLS)
    msg = resp.choices[0].message
    messages.append(msg)
    if not msg.tool_calls:
        return msg.content
    ...

Four things to notice.

One: the loop is capped. max_steps=6 is the safety rail. Without it, a confused model that keeps calling tools in a circle runs forever and burns your account. A public incident last November cost one team $47,000 because four LangChain agents looped for eleven days. A for loop with a bound is the first defense.

Two: the exit condition is negative. You do not ask "is the agent done?" You ask "did the model skip asking for a tool?" When the model responds with plain content and no tool_calls, it is answering the user. That is how you know to stop.

Three: the assistant message goes back in. Every response from the model — text and tool calls — is appended to messages as-is. The next turn sees the full history. This is why the model knows it has already asked for the weather and does not ask again.

Four: parallel tool calls are free. Modern chat-completion models return a list of tool_calls, not one. When the user asks for weather and time, a single assistant turn can request both. The inner for call in msg.tool_calls handles that without any extra logic.

Running the tools and feeding results back

for call in msg.tool_calls:
    args = json.loads(call.function.arguments)
    try:
        result = DISPATCH[call.function.name](**args)
    except Exception as e:
        result = f"error: {e}"
    messages.append({"role": "tool",
        "tool_call_id": call.id, "content": str(result)})

Three details:

call.function.arguments is a JSON string, not a dict. The model writes JSON into a string field. You parse it. If the model produces invalid JSON, json.loads raises — which is a tool-call-argument error, distinct from a tool-execution error, and in a production agent you would handle the two differently.
DISPATCH is a name-to-function map. Nothing clever. It is the registry the model's function names resolve against. If the model hallucinates a tool that does not exist, you will get a KeyError; wrap it the same way.
The result goes back as a message with role="tool" and the original tool_call_id. That ID is how the model matches the answer to the question it asked. Lose the ID, break the conversation.

The try/except around the dispatch is the entire error-handling story. When a tool crashes, you return the error as a string instead of letting the exception escape. The model reads the error on the next turn and typically corrects itself — asks for a different city, retries with different arguments, or gives up and tells the user. An agent that panics on a raised exception stops being useful; an agent that sees the error message adapts.

The termination condition, said plainly

There are three ways this function returns:

The model emits an assistant message with no tool calls. That is a finished answer; return it.
The loop hits max_steps without a finished answer. That is a safety stop; return a sentinel.
An unrecoverable error bubbles up — network, auth, something outside the tool dispatch. You did not handle this one and that is correct for a first agent. Let it crash loudly so you notice.

Most agent bugs are failures of termination. The model keeps asking for tools because the instructions are ambiguous, because one tool's result contradicts another, or because the model decided it needs just one more piece of information. The step cap is how you survive that without reading the logs tomorrow.

What LangChain adds (and what it costs)

A framework gives you tool decorators, structured output parsing, retry policies, memory abstractions, graph-based routing, and an execution tracer. Useful at scale. But every one of those features is built on top of the fifty lines above. When an agent misbehaves inside a framework, you debug it by mentally unrolling the abstraction until you are back at: what messages went in, what tool calls came out, what results went back.

Write the raw version once. After that, the framework is a convenience, not a black box.

What is missing from this agent

Things this code does not do, ordered by how badly you want them in production:

No per-tool timeout. A tool that hangs hangs the whole agent. Wrap each dispatch in asyncio.wait_for or a thread with a timeout.
No cost cap. The step limit bounds the number of turns but not the tokens per turn. Track resp.usage.total_tokens across the loop and stop when you cross a budget.
No tracing. You cannot see what the model asked for, what your tool returned, or how long each step took. That is survivable for one agent on one desk. It is not survivable in production.
No guardrails on tool arguments. The **args expansion trusts the model. If get_weather did anything more dangerous than a dict lookup — a database query, a file read, a shell command — you would want strict argument validation before the function sees them.
No system prompt. For a toy, omitting it is fine. For a real agent, a system prompt that names the tools and the stopping rule cuts wasted turns sharply.

Try it yourself

Modify one thing at a time:

Change get_weather to call a real API (Open-Meteo is free and keyless).
Add a third tool — a calculator, a file reader, a shell runner if you are brave.
Print resp.usage every turn and watch what the loop actually costs.
Break the agent on purpose: make get_time raise, make the schema require a field the model forgets, lower max_steps to 2. Read the behavior.

Every agent framework you will ever use is a pile of abstractions over the loop above. Once the shape is in your hands, the frameworks become a shopping decision, not a mystery.

If this was useful

The agent above works. It is also the thing you will spend the next six months trying to see inside of — which tool got called, which argument drifted, which step burned fifteen thousand tokens for no reason. That is observability, and it is what I wrote a book about.

Book: Observability for LLM Applications — paperback and hardcover now; ebook April 22. Covers OpenTelemetry GenAI semantic conventions, the agent span tree, cost and loop detection, and an incident playbook for the week your loop runs for eleven days.
Hermes IDE: hermes-ide.com — the IDE for developers shipping with Claude Code and other AI coding tools.
Me: xgabriel.com · github.com/gabrielanhaia.

DEV Community