The Boring Stack That Beats Every AI Agent Framework

#python #ai #agents #architecture

Every AI agent framework promises to make agents easy. None of them do. The complexity just moves from your code to their abstractions.

I've shipped production agents at Astraedus using LangChain, AutoGen, and CrewAI. I've also shipped agents with zero frameworks. The zero-framework versions are in production today. The framework versions got rewritten.

Here's what I learned.

The Boring Stack

The boring stack is not a product. It's not a company. It's five things:

A strong model (Claude Sonnet, GPT-4o, or similar)
Well-designed tools (functions your model can call)
A simple orchestration loop (while not done: think, act, observe)
Structured output (Pydantic or JSON schema)
Error handling (retry logic, fallbacks, logging)

That's it. No graph state. No agent personas. No multi-layer memory abstractions. No "agentic frameworks." Just a model, tools, and a loop.

Why Frameworks Add Overhead

Every framework you add to a codebase is a dependency you now maintain. That's not an opinion, it's a fact about software.

Dependency chains: LangChain at the time of writing has over 50 dependencies in its base package. When OpenAI ships a breaking API change, you wait for LangChain to update before you can use it. You are now downstream of their release cadence.

Leaky abstractions: Agent frameworks abstract the tool-call loop. But when your agent misbehaves, you end up reading the framework source code anyway. The abstraction collapses exactly when you need it most.

Magic that breaks: Framework "magic" like automatic memory injection or prompt templating works great in demos. In production, you need to know exactly what's in your model's context window at all times. Magic is the enemy of debuggability.

Documentation lag: Every major model provider ships new features constantly. Context windows grow. Tool call formats change. Native structured output gets added. Framework docs lag 2-4 weeks behind. You either wait or monkey-patch around it.

The overhead compounds. What starts as "faster development" becomes a graveyard of version pins and workarounds by month three.

The Boring Stack in Code

Here's a production agent in under 60 lines of Python. No framework. Just the Anthropic SDK and a tool loop:

import anthropic
import json
from typing import Any

client = anthropic.Anthropic()

# Define your tools as plain Python functions
def search_database(query: str) -> dict:
    # Your actual database logic here
    return {"results": [{"id": 1, "text": f"Result for: {query}"}]}

def create_record(data: dict) -> dict:
    # Your actual write logic here
    return {"success": True, "id": 42}

# Map tool names to functions
TOOLS = {
    "search_database": search_database,
    "create_record": create_record,
}

# Tool definitions for the model
TOOL_SCHEMAS = [
    {
        "name": "search_database",
        "description": "Search the database for records matching a query",
        "input_schema": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    },
    {
        "name": "create_record",
        "description": "Create a new record in the database",
        "input_schema": {
            "type": "object",
            "properties": {
                "data": {"type": "object", "description": "Record data to store"}
            },
            "required": ["data"]
        }
    }
]

def run_agent(user_message: str, max_iterations: int = 10) -> str:
    messages = [{"role": "user", "content": user_message}]

    for _ in range(max_iterations):
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=4096,
            tools=TOOL_SCHEMAS,
            messages=messages
        )

        # Add assistant response to history
        messages.append({"role": "assistant", "content": response.content})

        # Done: no tool calls
        if response.stop_reason == "end_turn":
            # Extract text response
            for block in response.content:
                if hasattr(block, "text"):
                    return block.text

        # Process tool calls
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                func = TOOLS.get(block.name)
                if func:
                    try:
                        result = func(**block.input)
                    except Exception as e:
                        result = {"error": str(e)}

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": json.dumps(result)
                    })

        # Feed results back
        if tool_results:
            messages.append({"role": "user", "content": tool_results})

    return "Max iterations reached"

# Usage
result = run_agent("Find any records about Python and create a summary record")
print(result)

Read that top to bottom. You understand exactly what it does. There's no magic. The model thinks, calls tools, observes results, and loops until done.

Add Pydantic for structured output validation, a retry decorator around the API call, and a logger, and you have a production-grade agent. Under 100 lines total.

When Frameworks Are Worth It

I'm not telling you frameworks are always wrong. There are real cases where they pay for themselves.

Multi-agent coordination at scale: If you're building a system with 10+ specialized agents that need to hand off tasks, share memory, and operate in parallel, a framework that handles the coordination layer can save weeks. AutoGen's group chat patterns make sense here. Building that coordination layer from scratch is non-trivial.

Complex state machines: Some workflows have branching logic that's genuinely hard to represent in a simple loop. If your agent needs to pause, wait for human approval, resume from a checkpoint, and branch based on approval outcome, a framework with built-in state management can be worth the overhead.

Large teams: When you have 8 engineers working on an agent system, consistent abstractions matter. A shared framework creates common vocabulary. Everyone knows what a "chain" or "agent" means in the codebase. The coordination benefit can outweigh the dependency cost.

The question to ask: "Am I using the framework for actual complexity management, or am I using it because the quickstart demo looked good?"

If you can articulate the specific feature of the framework that justifies the dependency, use it. If you're using it because it felt like the right tool without a specific reason, you probably don't need it.

The Test

Here's the simplest test for whether you need a framework:

If you can explain your agent's behavior by reading the code top to bottom, you don't need a framework.

Apply this to your current agent code. Open the entry point. Start reading. Can you follow the execution path without jumping into framework internals? Can you answer "what's in the model's context right now?" without digging through framework source?

If yes, you're already in a good place.

If you find yourself reading langchain/agents/base.py to understand why your agent is hallucinating a tool call, the abstraction has failed you. The complexity hasn't gone away. It's just harder to see.

What the Boring Stack Optimizes For

The boring stack optimizes for debuggability, not developer experience.

That tradeoff is right. Developer experience matters during the three weeks you build the agent. Debuggability matters for the next two years you run it in production.

When an agent misbehaves at 2 AM (and it will), you want to read logs and understand exactly what happened. You want to reproduce it in a Python REPL. You want to patch the behavior with a one-line change.

The boring stack gives you that. Most frameworks do not.

One more thing: the boring stack is portable. The code above runs anywhere Python runs. There's no framework version to pin, no peer dependency conflict to resolve, no migration guide to read when the framework ships a major version.

The model API is the dependency. That's a dependency worth taking.

I build production AI systems at Astraedus using the boring stack. If your agents need fewer dependencies and more reliability, let's talk. astraedus.dev

If you're building AI agents for production, check out my book Production AI Agents on Amazon Kindle. It covers architecture patterns, tool design, multi-agent coordination, and deployment strategies.