Building Autonomous AI Agents That Actually Do Work

#ai #python #tutorial #automation

How I built an autonomous SEO agent in Python that perceives data, reasons through strategy, and executes tasks — no babysitting required.

Most LLMs just sit there waiting for your next prompt. You type something, it responds, you type again. It's a glorified autocomplete loop. I wanted something that actually does work while I sleep.

So I built an autonomous agent system in Python. Here's how it works and why the architecture matters more than the model you pick.

The Problem With "AI-Powered" Tools

Every SaaS tool slaps "AI-powered" on their landing page now. But 99% of them are just wrapping an LLM call in a nice UI. You still have to:

Decide what to do
Write the prompt
Review the output
Decide what to do next
Repeat forever

That's not automation. That's you doing all the thinking while a machine does the typing.

What an Agent Actually Looks Like

An agent follows a loop. Not a single prompt-response — a continuous cycle of perceiving, reasoning, and acting. The pattern I use is based on the ReAct framework:

┌─────────────────────────────────────┐
│           AGENT LOOP                │
│                                     │
│  ┌──────────┐                       │
│  │ PERCEIVE │◄── APIs, databases,   │
│  └────┬─────┘    live data sources  │
│       │                             │
│  ┌────▼─────┐                       │
│  │  REASON  │◄── LLM + context      │
│  └────┬─────┘    from RAG store     │
│       │                             │
│  ┌────▼─────┐                       │
│  │   ACT    │──► API calls, DB      │
│  └────┬─────┘    writes, alerts     │
│       │                             │
│       └──────────► loop back ◄──────│
└─────────────────────────────────────┘

Each iteration, the agent observes its environment, thinks about what to do, does it, then observes the results. The key insight: the agent decides when to stop, not you.

The Architecture: Multi-Agent, Not Monolithic

One big agent that does everything will hallucinate and lose context. Instead, I split responsibilities across specialized sub-agents:

from dataclasses import dataclass
from enum import Enum

class AgentRole(Enum):
    RESEARCHER = "researcher"
    AUDITOR = "auditor"
    STRATEGIST = "strategist"
    EXECUTOR = "executor"

@dataclass
class AgentTask:
    role: AgentRole
    objective: str
    constraints: list[str]
    context: dict

class AgentOrchestrator:
    def __init__(self, agents: dict[AgentRole, Agent]):
        self.agents = agents
        self.shared_state = {}

    async def run_pipeline(self, trigger: dict):
        # Researcher gathers data
        research = await self.agents[AgentRole.RESEARCHER].execute(
            AgentTask(
                role=AgentRole.RESEARCHER,
                objective="Analyze SERP changes for target keywords",
                constraints=["Use DataForSEO API", "Max 500 queries"],
                context=trigger,
            )
        )

        # Auditor validates against guidelines
        audit = await self.agents[AgentRole.AUDITOR].execute(
            AgentTask(
                role=AgentRole.AUDITOR,
                objective="Check findings against brand guidelines",
                constraints=["Flag confidence < 0.7"],
                context={"research": research, **self.shared_state},
            )
        )

        # Strategist formulates the plan
        strategy = await self.agents[AgentRole.STRATEGIST].execute(
            AgentTask(
                role=AgentRole.STRATEGIST,
                objective="Create action plan from research + audit",
                constraints=["Prioritize by estimated impact"],
                context={"research": research, "audit": audit},
            )
        )

        # Executor carries it out
        if strategy.confidence > 0.8:
            await self.agents[AgentRole.EXECUTOR].execute(
                AgentTask(
                    role=AgentRole.EXECUTOR,
                    objective="Implement approved changes",
                    constraints=["Dry-run first", "Log all mutations"],
                    context={"strategy": strategy},
                )
            )

Each agent has a narrow scope. The researcher never writes content. The executor never decides strategy. This separation keeps hallucinations contained.

The ReAct Loop Inside Each Agent

Every sub-agent runs its own reasoning loop using the ReAct pattern — generating a "Thought" before each "Action":

import openai

class Agent:
    def __init__(self, role: AgentRole, tools: list[callable]):
        self.role = role
        self.tools = {t.__name__: t for t in tools}
        self.client = openai.AsyncOpenAI()

    async def execute(self, task: AgentTask, max_steps: int = 10):
        messages = [
            {"role": "system", "content": self._build_system_prompt(task)},
        ]

        for step in range(max_steps):
            response = await self.client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=self._tool_schemas(),
            )

            message = response.choices[0].message

            if message.tool_calls:
                # Agent decided to use a tool
                for call in message.tool_calls:
                    result = await self.tools[call.function.name](
                        **json.loads(call.function.arguments)
                    )
                    messages.append({"role": "tool", "content": str(result),
                                     "tool_call_id": call.id})
            else:
                # Agent decided it's done
                return self._parse_final_answer(message.content)

        raise TimeoutError(f"Agent {self.role} exceeded {max_steps} steps")

The trick is setting max_steps. Too low and the agent can't finish complex tasks. Too high and you're burning tokens on loops that lost the plot. I start at 10 and tune per agent.

Real Workflow: Content Decay Detection

Here's a concrete workflow I run on a cron schedule via n8n:

Perceive: Pull last 28 days of GSC data. Flag pages where CTR dropped >15% while impressions stayed flat.
Reason: For each flagged page, fetch the current SERP. Ask the LLM: "Has the search intent shifted? Are competitors using a different content format?"
Act: Generate a refresh brief with specific recommendations. If confidence is high enough, push title tag updates directly via the WordPress API.

The whole thing runs every 24 hours. I get a Slack notification with what it found and what it did. Most days it finds nothing. Some days it catches a decay pattern weeks before I'd have noticed manually.

What Makes This Different From a Cron Script

A cron script does the same thing every time. An agent reasons about what it sees. When the SERP shifts from listicles to video carousels, a cron script keeps optimizing listicles. An agent notices the format change and adjusts its recommendations.

The distinction matters: you define objectives and constraints, not step-by-step instructions. The agent figures out the steps.

Guardrails That Actually Matter

Autonomy without guardrails is a liability. Here's what I enforce:

Dry-run mode: Every mutation gets logged before execution. The agent proposes; a human (or a second agent) approves.
Confidence thresholds: Actions below 0.7 confidence get queued for human review instead of auto-executing.
Audit trail: Every thought, tool call, and result is logged. No black boxes.
Budget caps: Max API calls per run, max tokens per agent, max mutations per cycle.

The goal isn't "human out of the loop" — it's "human on the loop." You set the boundaries, the agent operates within them, and you review the dashboard.