Mila Kowalski

Posted on Mar 13

You Don't Need a Framework: Building Reliable AI Agents from First Principles

#ai #python #agents #tutorial

Everyone is reaching for a framework the moment they hear "AI agent." LangChain, AutoGen, CrewAI — the ecosystem has exploded, and that's genuinely exciting. But I've watched too many teams spend two weeks wiring up abstractions before writing a single line of business logic, only to hit a wall when something goes wrong and they can't see why.

This post is about building agents from scratch. Not because frameworks are bad — they're not — but because you can't use a tool well if you don't understand what it's doing underneath. By the end, you'll have a working agent loop in ~100 lines of Python, a mental model for tool design, and a clearer instinct for when a framework actually earns its place.

What even is an agent?

Let's be precise. An agent, in the context of LLMs, is a loop:

observe → think → act → observe → think → act → ...

The model receives a context (observation), decides what to do (think), and either calls a tool or returns a final answer (act). That's it. No magic. No orchestration daemon. Just a loop with a model at the center.

The reason this is powerful is that the model decides how many steps to take. You're not pre-scripting a chain of calls. The model reads the results of each action and figures out what to do next. That emergent flexibility is what makes agents useful for open-ended tasks.

The minimal agent loop

Here's a barebones agent in Python. No framework, just the Anthropic SDK and a dictionary of tools you define yourself.

import anthropic
import json

client = anthropic.Anthropic()
MODEL = "claude-opus-4-5"

def run_agent(user_message: str, tools: list, tool_map: dict) -> str:
    messages = [{"role": "user", "content": user_message}]

    while True:
        response = client.messages.create(
            model=MODEL,
            max_tokens=4096,
            tools=tools,
            messages=messages,
        )

        # Model is done — return the final text
        if response.stop_reason == "end_turn":
            return next(
                block.text for block in response.content
                if hasattr(block, "text")
            )

        # Model wants to use a tool
        if response.stop_reason == "tool_use":
            messages.append({"role": "assistant", "content": response.content})

            tool_results = []
            for block in response.content:
                if block.type == "tool_use":
                    fn = tool_map.get(block.name)
                    if fn is None:
                        result = f"Error: unknown tool '{block.name}'"
                    else:
                        try:
                            result = fn(**block.input)
                        except Exception as e:
                            result = f"Error: {e}"

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": str(result),
                    })

            messages.append({"role": "user", "content": tool_results})

That's the whole loop. Seventeen lines of actual logic. Let me walk through what's happening:

We send the user's message with the list of available tools.
If the model responds with end_turn, it's satisfied — we return the text.
If the model responds with tool_use, it wants to call something. We execute the function, capture the result, and append both the model's tool call and our result to the message history.
We loop again — the model now sees what happened and decides its next move.

The message history is the entire state of the agent. No hidden state, no magic context managers. Just a list.

Designing tools the model can actually use

This is where most agents fail — not in the loop, but in the tool design. A poorly described tool is like a function with no docstring: a model (like a human) will misuse it.

The three rules of good tool design

1. One responsibility per tool

Don't build a manage_database tool. Build query_database, insert_record, and delete_record. Atomic tools give the model precise control. Broad tools create ambiguity about what will happen on a given call.

2. Describe the output, not just the input

Most developers describe parameters carefully and ignore what the tool returns. The model needs to know what to expect so it can plan the next step.

# ❌ Vague
{
    "name": "search_docs",
    "description": "Search the documentation.",
    "input_schema": { ... }
}

# ✅ Clear
{
    "name": "search_docs",
    "description": (
        "Full-text search over the product documentation. "
        "Returns up to 5 results, each with a 'title', 'url', and 'excerpt'. "
        "Use this before answering any question about product features."
    ),
    "input_schema": { ... }
}

3. Make errors informative

Your tool will fail. The model will retry. Whether it retries intelligently depends entirely on what error message it gets back.

def query_database(sql: str) -> str:
    try:
        results = db.execute(sql)
        return json.dumps(results)
    except SyntaxError as e:
        return f"SQL syntax error: {e}. Check your query and try again."
    except PermissionError:
        return "Access denied. Only SELECT queries are permitted."

Human-readable errors aren't just good UX for users. They're good UX for models.

A real example: a docs search agent

Let's put this together with a concrete example. We'll build a small agent that answers questions about an API by searching a documentation index and fetching page content.

Define the tools

import httpx
from bs4 import BeautifulSoup

def search_docs(query: str) -> str:
    """Search the docs index and return matching pages."""
    # In a real scenario, this calls your search backend (Algolia, Typesense, etc.)
    results = mock_search_index(query)
    if not results:
        return "No results found for that query."
    return json.dumps(results[:5])

def fetch_page(url: str) -> str:
    """Fetch the text content of a documentation page."""
    try:
        resp = httpx.get(url, timeout=10)
        resp.raise_for_status()
        soup = BeautifulSoup(resp.text, "html.parser")
        # Grab the main content area only
        main = soup.find("main") or soup.body
        return main.get_text(separator="\n", strip=True)[:4000]
    except httpx.HTTPError as e:
        return f"Failed to fetch page: {e}"

TOOLS = [
    {
        "name": "search_docs",
        "description": (
            "Search the API documentation index. Returns a list of matching pages "
            "with 'title', 'url', and 'snippet'. Use this first to find relevant pages."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "The search query."}
            },
            "required": ["query"],
        },
    },
    {
        "name": "fetch_page",
        "description": (
            "Fetch the full text content of a documentation page by URL. "
            "Use this after search_docs to get complete details."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "url": {"type": "string", "description": "The full URL of the page."}
            },
            "required": ["url"],
        },
    },
]

TOOL_MAP = {
    "search_docs": search_docs,
    "fetch_page": fetch_page,
}

Run it

answer = run_agent(
    "How do I stream responses in the Anthropic API?",
    tools=TOOLS,
    tool_map=TOOL_MAP,
)
print(answer)

Watch the model search, find relevant pages, fetch the one that looks most useful, and synthesize an answer. All without you scripting which steps to take.

The failure modes you need to prepare for

Building agents in production means accepting that the model will sometimes do something unexpected. Here are the patterns I see most often — and how to handle them.

Infinite loops

The model keeps calling tools and never returns end_turn. This usually happens when:

A tool always returns something ambiguous (e.g., always returns "no results")
The model is stuck trying to satisfy a goal it can't reach

Fix: Add a step counter and bail out after a sensible maximum.

MAX_STEPS = 15
step = 0

while True:
    step += 1
    if step > MAX_STEPS:
        return "Agent reached maximum steps without completing the task."
    ...

Hallucinated tool calls

The model invents parameter values it couldn't possibly know, especially for IDs or URLs. This happens when the model doesn't receive the right context from earlier tool results.

Fix: Make your tool outputs explicit. Don't return {"id": "abc123"} — return {"record_id": "abc123", "use_this_id_for_subsequent_calls": true}. Verbose, but models respond to it.

Tool misuse due to poor descriptions

The model calls delete_record when it should call query_record, or passes a string where an integer is expected.

Fix: Schema validation in your tool wrapper, and rejection messages that explain the correct usage:

def delete_record(record_id: int) -> str:
    if not isinstance(record_id, int):
        return f"Invalid input: record_id must be an integer, got {type(record_id).__name__}."
    ...

When should you reach for a framework?

Now that you understand the primitives, here's an honest take on when a framework actually helps:

Situation	Roll your own	Use a framework
Single-agent, internal tool	✅	Overkill
Multi-agent coordination	Maybe	✅
Complex memory requirements	Maybe	✅
Rapid prototyping	✅	Also fine
Production, you own the stack	✅	If team knows it
Need observability/tracing	Add it yourself	✅ LangSmith, etc.

The honest answer: start from scratch until the loop gets complicated enough that a framework's abstractions save you more time than they cost you in debugging. For most internal tools and single-agent workflows, that inflection point never comes.

What's next

If this sparked something, here are some directions worth exploring:

Parallel tool calls — the Anthropic API can return multiple tool_use blocks in one response. Run them concurrently with asyncio.gather and feed back all results in one message.
Memory patterns — inject a summary of past interactions into the system prompt to give agents long-term context without blowing the context window.
Human-in-the-loop — pause the agent loop at certain tool calls and ask a human to confirm before proceeding. Especially valuable for write operations.
Multi-agent handoff — one agent's end_turn text becomes another agent's user message. Compose systems from simple agents rather than building one mega-agent.

The fundamentals don't change as you scale up. Observe, think, act. Keep the loop clear, keep the tools honest, and the model will surprise you.

DEV Community