I Built a 5-Agent AI System That Runs My Content Engine (Here's the Stack)

#ai #tutorial #showdev #discuss

Six months ago I was spending 12 hours a week on content: researching topics, writing drafts, distributing to platforms, tracking what landed. Now I spend under two. Not because I found a magic tool — because I built one. A five-agent system where each agent has one job, knows its limits, and hands off cleanly to the next.

This is the architecture, the mistakes, and the code that actually runs it.

Why Single-Agent Setups Break Down Fast

Most people start with a single prompt: "Research this topic and write me a post." It works fine for one-offs. At scale, it collapses.

The problem is context pollution. When you ask one agent to research, reason, write, format, and distribute — it's holding too many concerns simultaneously. Quality degrades. Errors compound. You can't debug which step failed.

The fix isn't a better prompt. It's decomposition. Each agent should own one cognitive task with a clear input/output contract.

The 5-Agent Architecture

Here's the stack I landed on after three failed iterations:

[Orchestrator]
     │
     ├─► [Scout]      — finds topics, trends, Reddit signals
     ├─► [Researcher] — deep-dives a topic, extracts key facts
     ├─► [Writer]     — drafts the article from a structured brief
     ├─► [Editor]     — tightens, cuts, checks tone
     └─► [Distributor]— formats for each platform, posts, logs

Each agent is a separate Claude API call with a tightly scoped system prompt. They don't share memory directly — they pass structured JSON between steps.

The orchestrator is just a Python script. No framework required for the basics.

def run_pipeline(seed_topic: str):
    signal = scout_agent(seed_topic)          # returns: {topic, angle, sources}
    brief  = researcher_agent(signal)         # returns: {facts, outline, hook}
    draft  = writer_agent(brief)              # returns: {title, body, tags}
    final  = editor_agent(draft)              # returns: {edited_body, score}
    distributor_agent(final, platforms=["devto", "twitter", "linkedin"])

That's the whole loop. When something breaks, you know exactly which agent broke.

Coordination: State Machines Beat Conversation Chains

The biggest architectural mistake I made early: treating agents like a chat thread. Passing the entire conversation history forward. Agent 4 shouldn't know what Agent 1 said — it creates noise and burns tokens.

Instead, use explicit state objects with a defined schema:

from pydantic import BaseModel
from typing import Optional

class ContentState(BaseModel):
    seed: str
    signal: Optional[dict] = None
    brief: Optional[dict] = None
    draft: Optional[dict] = None
    final: Optional[dict] = None
    status: str = "pending"
    errors: list = []

Each agent reads only the fields it needs, writes only its output field, and sets status. If it fails, it writes to errors and the orchestrator decides whether to retry or skip.

If you want graph-based orchestration with built-in retry and conditional branching, LangGraph is worth the learning curve. For simpler linear pipelines, pure Python is faster to debug and cheaper to run.

Memory: The Part Everyone Gets Wrong

Stateless agents are simple but limited. They can't learn that your audience hates listicles, or that Monday posts underperform. For that, you need shared memory.

I use a two-tier approach:

Short-term (per-run): The ContentState object above. Lives in memory, passes between agents, discarded after.

Long-term (cross-run): A SQLite table that stores outcomes — which topics performed, which formats got engagement, which headlines got clicks.

def store_outcome(topic: str, platform: str, metric: str, value: float):
    db.execute("""
        INSERT INTO outcomes (topic, platform, metric, value, created_at)
        VALUES (?, ?, ?, ?, datetime('now'))
    """, (topic, platform, metric, value))

Before the Scout runs, the Researcher queries this table. It's a simple vector search against past topics to avoid repetition and weight toward what's worked. You don't need a vector DB for this at small scale — cosine similarity on cached embeddings in a dict is enough for hundreds of articles.

The Pitfalls That Cost Me Three Weeks

1. No output validation between agents. Agent 2 returned null for hook once. Agent 3 hallucinated a hook. The article was garbage and I didn't know why. Now every agent output is Pydantic-validated before the next agent runs.

2. Giving agents too much tool access. I gave the Distributor access to the file system, API calls, and a database. It started doing things I didn't expect. Each agent now gets only the tools it needs for that step — principle of least privilege applies here too.

3. No human-in-the-loop checkpoint. Fully automated is great until the Writer produces something off-brand and it gets posted. I added a single approval step after the Editor — the system writes the final draft to a review queue, I approve or reject in a Slack message, then it distributes. Two seconds of human judgment saves hours of cleanup.

4. Assuming LLM calls are cheap at scale. A five-agent pipeline running daily with long context windows adds up. I profiled every agent, found the Researcher was passing 8,000 tokens of source material that only 400 tokens were actually used. Trimming inputs cut costs 60%.

What This Actually Produces

The system runs every weekday at 6am. It scouts trending developer topics across Reddit, Hacker News, and a curated RSS list. By 7am, there's a draft in my review queue. I approve it in Slack, it posts to Dev.to, schedules Twitter threads, and queues a LinkedIn version.

Output: 5 pieces of content per week, consistent for six months. Engagement is up — partly because I'm more consistent, partly because the Scout is better at finding angles than I was when doing it manually.

The system isn't magic. It reflects the thinking I put into the prompts, the state design, and the outcome feedback loop. That's the actual work.

Where to Go From Here

If you want to build this yourself, the architecture above is a solid starting point. The tricky parts are orchestration patterns that scale, memory systems that don't hallucinate history, tool use that doesn't go off the rails, and making it production-stable enough to run unattended.

I compiled everything into a comprehensive guide: Multi-Agent AI Stack: The Complete Builder's Guide — covers orchestration patterns, memory, tool use, and production deployment.

Build the boring infrastructure first. The interesting agents run on top of it.