Łukasz Blania

Posted on May 21

How 6 AI agents write a single blog post (and why

#programming #tutorial #ai #webdev

About a year ago I shipped a one-prompt blog writer. It worked once.

The second article sounded identical to the first. The third article sounded identical to the second. By the tenth article, every blog I was generating could be mistaken for the same writer with mild amnesia.

That was the moment I started ripping the single prompt apart.

What replaced it is a backend that runs six specialized agents per article. Each one has a narrow job. None of them sees the whole article. The output reads like a human wrote it because the steering is human-shaped all the way down, not because any single model magically learned voice.

This is a walk-through of that pipeline. The exact agents, what they do, why they exist, and what each one breaks if you remove it.

TLDR: one giant prompt is the reason your AI articles sound the same. Split the job into agents with their own contracts, and the output starts behaving.

The problem with a single prompt

A naive AI writer looks like this:

const article = await llm.complete({
  prompt: `Write a 1500 word blog post about ${topic} for ${audience}.
           Use these keywords: ${keywords}. Match this tone: ${voice}.`,
});

It runs in one call. The model gets the whole job in one shot. The output gets returned.

It works. It also produces the same article every time, with cosmetic surface changes.

A 1500 word prompt to a single model produces a 1500 word output that follows the model default essay shape. The intro is always a setup. The conclusion is always a recap.

The body always builds in a smooth gradient. The vocabulary stays in the model safe band. The sentence rhythm averages out.

You can prompt against all of this. You can ask for variety. You can ask for "no AI tells".

The model nods politely and writes another essay that looks exactly like the last one.

The fix is not a better prompt. The fix is to stop treating one model call as the unit of work.

Agent 1: Research

The first agent does not write. It searches.

Input: topic and a short description from the user.

Output: a research brief with real URLs, real quotes, real numbers, and optional canonical links for the strongest claims.

async function research({ topic, description }) {
  const sources = await Promise.all([
    braveSearch(topic, { count: 10 }),
    wikipediaSummary(topic),
    perplexityQuery(`${topic}. ${description}`),
  ]);

  return rankAndDedupe(sources);
}

Why a dedicated agent: a model writing without grounded research hallucinates names, numbers, and citations. A research pass gives the rest of the pipeline a fact substrate. If the substrate is empty, the post gets flagged as "no sources found" and the user is asked to refine the topic.

What breaks if you remove it: every article reverts to model priors. Same three founders quoted. Same dates wrong. Same fake statistics.

Agent 2: Structure

Second agent does not write either. It outlines.

Input: the research brief plus the topic.

Output: an H2 outline. Each H2 gets a one-line angle, a target word count, and a list of entities that should appear inside that section.

async function outline({ topic, research }) {
  return await llm.complete({
    system: "You produce blog post outlines. Return JSON.",
    prompt: outlinePrompt(topic, research),
    responseFormat: { type: "json_schema", schema: outlineSchema },
  });
}

The outline is treated as a contract. Downstream agents have to honor it. No section creep. No drifting into new themes. No skipping a planned H2 because the model felt the article was complete.

Why this agent exists: a single-prompt model picks its own structure mid-paragraph, which is why two articles on the same topic end up structurally identical. An explicit outline forces structural variety to come from the topic and not from the model.

What breaks if you remove it: every article follows the same intro / three-part body / conclusion shape. AI detectors pick this up first.

Agent 3: Section briefer

The brief agent expands each outline item into a per-section brief.

A section brief looks like this:

{
  "h2": "How to set up the webhook",
  "angle": "Concrete walk-through, not a theory primer",
  "wordCount": 280,
  "requiredEntities": ["Stripe", "Express", "raw body parser"],
  "bannedPhrases": ["let us explore", "in this section", "diving in"],
  "codeBlocks": 1,
  "tone": "instructional"
}

Each brief is also a contract. The section writer agent only gets one brief at a time. It does not see the whole outline. It does not see the other sections. It writes for the brief in front of it.

Why this agent exists: it strips the model essay-shape reflex. A model writing one paragraph against a tight brief produces a tight paragraph. A model writing a whole article in one go produces an essay.

What breaks if you remove it: sections lose their distinct voice and collapse back into one homogenous middle. The intro and conclusion get fatter than they should. Examples get washed out.

Agent 4: Section writer

Now we write. One agent call per section.

async function writeSection({ brief, voiceProfile, prevSection }) {
  return await llm.complete({
    system: voiceProfile.systemPrompt,
    prompt: sectionPrompt(brief, prevSection),
    temperature: 0.7,
  });
}

The writer gets the section brief, the voice profile, and a short summary of the previous section so transitions feel intentional. It does not get the full article so far. It does not get the outline. Its job is small enough to stay focused.

This is the agent most teams skip. They keep the single-prompt approach for the actual writing step and only wrap research and outline agents around it. The result still sounds like the same writer because the writer step is still doing the bulk of the cognitive work in one shot.

What breaks if you remove it: rhythm uniformity. Vocabulary uniformity. Paragraph length uniformity. The exact signals AI detectors and human readers both pick up.

Agent 5: Voice profile

The voice agent runs offline, before any article generation, against the user own published writing.

Input: 10 to 20 URLs from the user existing blog or site.

Output: a JSON voice profile.

{
  "averageSentenceLength": 14,
  "vocabularyTier": "casual-technical",
  "preferredOpeners": ["I", "We", "Last", "Honestly"],
  "avoidWords": ["actually", "basically", "stuff"],
  "punctuationPatterns": {
    "semicolons": "rare",
    "exclamation": "never",
    "questionsInBody": "yes"
  },
  "systemPrompt": "You are writing in the voice of..."
}

Every section writer call uses this profile as its system prompt. Two users with different blogs get two completely different article shapes from the same backend.

Why this agent exists: it is the only mechanism that gives the output a non-generic voice. Brand voice cannot be prompted inline. It has to be baked into the system message, derived from real writing, and applied at every section call.

What breaks if you remove it: every article from every user starts sounding like the same model. The product loses its main differentiator.

Agent 6: Polish and artifact strip

Final agent runs a deterministic post-process pass, not a model call.

function polish(markdown, voiceProfile) {
  let out = markdown;
  out = out.replace(/[\u2014\u2013]/g, ", "); // em and en dashes
  out = out.replace(/\u2026/g, ".");          // ellipsis
  out = out.replace(/[\u201C\u201D]/g, '"');  // smart double quotes
  out = out.replace(/[\u2018\u2019]/g, "'");  // smart single quotes
  out = stripBannedOpeners(out, voiceProfile.avoidWords);
  out = enforceSentenceLengthVariance(out);
  return out;
}

This pass strips the obvious AI tells. Em dashes. Smart quotes. Sentence-initial connectors the voice profile banned. It also enforces sentence-length variance so paragraphs do not settle into the model preferred rhythm.

A model could do this step. A deterministic pass is cheaper, faster, and never reintroduces what it just removed. Use a model when you need judgment. Use code when you need rules.

When NOT to use this

Six agents per article is expensive. If you are writing one blog post a week for your own site, this is heavyweight engineering for a problem you could solve by editing a single-prompt output yourself.

Use the multi-agent pipeline when:

You ship dozens to thousands of articles a month
The output has to read consistently across many users
Voice has to vary per customer
Detection or ranking risk is real

Skip the pipeline when:

You write one to five articles a month
A human will edit every output anyway
Latency matters more than quality (a one-shot prompt returns in three seconds, this pipeline takes thirty plus)

The pipeline is also overkill for short formats. Tweets, captions, ad copy, email subject lines. A single prompt is fine there because the cognitive work fits in one call.

What it actually costs

For reference, one article through the six-agent pipeline runs about:

3 to 8 search calls (research agent)
1 outline call
5 to 10 brief calls (one per H2)
5 to 10 section calls
0 model calls in polish (deterministic)

Per-article cost lands around 12 to 25 cents at current GPT-4 class pricing. Total wall-time runs 25 to 45 seconds when the section calls run in parallel.

A single-prompt approach costs around 1 to 2 cents per article and finishes in 5 seconds. The price gap is real. The output gap is also real. Pick which one your product needs.

The article you are reading

Eleven months after I gave up on the single prompt, I have shipped more than 50,000 articles through this pipeline at articfly.com. The articles read like the customers wrote them because the steering profile was extracted from the customer writing. The model still does most of the typing. The pipeline does most of the thinking.

The second article does not sound like the first one anymore. Neither does the ten thousandth.

If you are about to build an AI writer and you are starting with a single prompt, you are about to learn this the slow way. Skip the year. Split the prompt.

What is the one agent in your stack that you wish you had split out earlier?

DEV Community