DEV Community

Niko Alho
Niko Alho

Posted on

Building Autonomous AI Agents That Actually Do Work

How I built an autonomous SEO agent in Python that perceives data, reasons through strategy, and executes tasks — no babysitting required.

Most LLMs just sit there waiting for your next prompt. You type something, it responds, you type again. It's a glorified autocomplete loop. I wanted something that actually does work while I sleep.

So I built an autonomous agent system in Python. Here's how it works and why the architecture matters more than the model you pick.

The Problem With "AI-Powered" Tools

Every SaaS tool slaps "AI-powered" on their landing page now. But 99% of them are just wrapping an LLM call in a nice UI. You still have to:

  1. Decide what to do
  2. Write the prompt
  3. Review the output
  4. Decide what to do next
  5. Repeat forever

That's not automation. That's you doing all the thinking while a machine does the typing.

What an Agent Actually Looks Like

An agent follows a loop. Not a single prompt-response — a continuous cycle of perceiving, reasoning, and acting. The pattern I use is based on the ReAct framework:

┌─────────────────────────────────────┐
│           AGENT LOOP                │
│                                     │
│  ┌──────────┐                       │
│  │ PERCEIVE │◄── APIs, databases,   │
│  └────┬─────┘    live data sources  │
│       │                             │
│  ┌────▼─────┐                       │
│  │  REASON  │◄── LLM + context      │
│  └────┬─────┘    from RAG store     │
│       │                             │
│  ┌────▼─────┐                       │
│  │   ACT    │──► API calls, DB      │
│  └────┬─────┘    writes, alerts     │
│       │                             │
│       └──────────► loop back ◄──────│
└─────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Each iteration, the agent observes its environment, thinks about what to do, does it, then observes the results. The key insight: the agent decides when to stop, not you.

The Architecture: Multi-Agent, Not Monolithic

One big agent that does everything will hallucinate and lose context. Instead, I split responsibilities across specialized sub-agents:

from dataclasses import dataclass
from enum import Enum

class AgentRole(Enum):
    RESEARCHER = "researcher"
    AUDITOR = "auditor"
    STRATEGIST = "strategist"
    EXECUTOR = "executor"

@dataclass
class AgentTask:
    role: AgentRole
    objective: str
    constraints: list[str]
    context: dict

class AgentOrchestrator:
    def __init__(self, agents: dict[AgentRole, Agent]):
        self.agents = agents
        self.shared_state = {}

    async def run_pipeline(self, trigger: dict):
        # Researcher gathers data
        research = await self.agents[AgentRole.RESEARCHER].execute(
            AgentTask(
                role=AgentRole.RESEARCHER,
                objective="Analyze SERP changes for target keywords",
                constraints=["Use DataForSEO API", "Max 500 queries"],
                context=trigger,
            )
        )

        # Auditor validates against guidelines
        audit = await self.agents[AgentRole.AUDITOR].execute(
            AgentTask(
                role=AgentRole.AUDITOR,
                objective="Check findings against brand guidelines",
                constraints=["Flag confidence < 0.7"],
                context={"research": research, **self.shared_state},
            )
        )

        # Strategist formulates the plan
        strategy = await self.agents[AgentRole.STRATEGIST].execute(
            AgentTask(
                role=AgentRole.STRATEGIST,
                objective="Create action plan from research + audit",
                constraints=["Prioritize by estimated impact"],
                context={"research": research, "audit": audit},
            )
        )

        # Executor carries it out
        if strategy.confidence > 0.8:
            await self.agents[AgentRole.EXECUTOR].execute(
                AgentTask(
                    role=AgentRole.EXECUTOR,
                    objective="Implement approved changes",
                    constraints=["Dry-run first", "Log all mutations"],
                    context={"strategy": strategy},
                )
            )
Enter fullscreen mode Exit fullscreen mode

Each agent has a narrow scope. The researcher never writes content. The executor never decides strategy. This separation keeps hallucinations contained.

The ReAct Loop Inside Each Agent

Every sub-agent runs its own reasoning loop using the ReAct pattern — generating a "Thought" before each "Action":

import openai

class Agent:
    def __init__(self, role: AgentRole, tools: list[callable]):
        self.role = role
        self.tools = {t.__name__: t for t in tools}
        self.client = openai.AsyncOpenAI()

    async def execute(self, task: AgentTask, max_steps: int = 10):
        messages = [
            {"role": "system", "content": self._build_system_prompt(task)},
        ]

        for step in range(max_steps):
            response = await self.client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                tools=self._tool_schemas(),
            )

            message = response.choices[0].message

            if message.tool_calls:
                # Agent decided to use a tool
                for call in message.tool_calls:
                    result = await self.tools[call.function.name](
                        **json.loads(call.function.arguments)
                    )
                    messages.append({"role": "tool", "content": str(result),
                                     "tool_call_id": call.id})
            else:
                # Agent decided it's done
                return self._parse_final_answer(message.content)

        raise TimeoutError(f"Agent {self.role} exceeded {max_steps} steps")
Enter fullscreen mode Exit fullscreen mode

The trick is setting max_steps. Too low and the agent can't finish complex tasks. Too high and you're burning tokens on loops that lost the plot. I start at 10 and tune per agent.

Real Workflow: Content Decay Detection

Here's a concrete workflow I run on a cron schedule via n8n:

  1. Perceive: Pull last 28 days of GSC data. Flag pages where CTR dropped >15% while impressions stayed flat.
  2. Reason: For each flagged page, fetch the current SERP. Ask the LLM: "Has the search intent shifted? Are competitors using a different content format?"
  3. Act: Generate a refresh brief with specific recommendations. If confidence is high enough, push title tag updates directly via the WordPress API.

The whole thing runs every 24 hours. I get a Slack notification with what it found and what it did. Most days it finds nothing. Some days it catches a decay pattern weeks before I'd have noticed manually.

What Makes This Different From a Cron Script

A cron script does the same thing every time. An agent reasons about what it sees. When the SERP shifts from listicles to video carousels, a cron script keeps optimizing listicles. An agent notices the format change and adjusts its recommendations.

The distinction matters: you define objectives and constraints, not step-by-step instructions. The agent figures out the steps.

Guardrails That Actually Matter

Autonomy without guardrails is a liability. Here's what I enforce:

  • Dry-run mode: Every mutation gets logged before execution. The agent proposes; a human (or a second agent) approves.
  • Confidence thresholds: Actions below 0.7 confidence get queued for human review instead of auto-executing.
  • Audit trail: Every thought, tool call, and result is logged. No black boxes.
  • Budget caps: Max API calls per run, max tokens per agent, max mutations per cycle.

The goal isn't "human out of the loop" — it's "human on the loop." You set the boundaries, the agent operates within them, and you review the dashboard.

Getting Started

You don't need a complex multi-agent system on day one. Start with a single agent that does one thing:

  1. Pick one repetitive task you do weekly
  2. Write a Python script that does the "perceive" step (fetch data)
  3. Add an LLM call for the "reason" step (analyze the data)
  4. Add an API call for the "act" step (do something with the analysis)
  5. Wrap it in a loop with a termination condition

That's an agent. Everything else is optimization.


Deep dive with full architecture details: Agentic AI Workflows: Beyond Basic Content Generation

Top comments (0)