Multi-Agent in Practice: A 5-Agent Claude Pipeline That Ships a Blog Post End-to-End

#ai #productivity #claudecode #automation

A real 5-agent Claude pipeline that takes a topic from RSS to a scheduled blog post on raxxo.shop, no human in the loop until the final approval ping
Agent shapes are picker, writer, humanizer, validator, publisher, each with a tight job description and a shared scratchpad for handoff
Coordination uses a single JSON state file so any agent can read what the previous one decided, no message bus, no queue
Failure handling is two-tier: validator can kick a draft back to the writer once, after that it falls back to single-agent mode and pings me on Slack
Honest cost note: a 5-agent ship is roughly 2.4x the tokens of a single-agent ship, payback comes from the validator catching what I used to catch by hand

I am two days into running my blog pipeline as five Claude agents working in parallel instead of one big prompt. The structure works. The cost is real. Here is the actual shape, what each agent does, and where the whole thing falls apart if you stop watching.

This is a case study, not a tutorial. The pattern is generic, the prompts I use are mine. You will need to write your own. Read it as a worked example of what the new Multi-Agent Orchestration public beta lets a one-person studio actually do.

Why five agents and not one

Before the orchestration beta landed on 2026-05-06, my blog pipeline was one long Claude session that ran top to bottom. Pick a topic, draft it, humanize it, validate it, publish it. That worked. It also meant a single bad early decision (a duplicate topic, a misread cluster) poisoned the rest of the run, and I would not catch it until the validator step at the end. Roll back, restart, lose 8 minutes of token budget.

The five-agent version does not run faster end to end. It runs more recoverable. Each agent has one job, returns a structured output, and writes its decision to a shared state file. If step three fails, I do not lose steps one and two. I rerun step three with corrected input.

The other thing five agents buy you is honest specialization. A writer prompt that is also a validator prompt is a compromise. Splitting them lets each prompt be sharp. The writer optimizes for voice and rhythm. The validator optimizes for finding tells the writer learned to skip past.

Real numbers from this week. Single-agent ship: 9 minutes wall clock, around 32k tokens. Five-agent ship: 11 minutes, around 78k tokens. The token bill is 2.4x. The payback is one fewer manual review round, which used to cost me 12 to 15 minutes. I trade 46k tokens (call it 14 cents on Sonnet) for 12 minutes of focus. Easy math.

The 5 agent shapes

I am keeping the prompts I run inside RAXXO out of this writeup on purpose. The shapes below are generic enough to copy. If you are building your own pipeline, treat this as a skeleton, not a recipe.

1. Topic-picker. Reads an RSS feed list (around 30 sources for me, mostly AI tooling and indie business). Scores each new headline on a viral-fit rubric (novelty, specificity, evergreen vs newsy, fits an existing cluster or extends it). Returns the top 3 candidates with a one-line angle for each. Output is JSON, around 40 lines. This agent is cheap. I let it run on a cron every 4 hours and pile candidates in a queue.

2. Writer. Picks one candidate from the queue, reads the cluster guidance for that topic, and drafts a 1500 word article in my voice. Output is markdown plus a short metadata block (slug, tags, suggested h2 count). This is the expensive agent. Around 35k tokens per ship. Do not let it run on every queued candidate, you will burn budget. I gate it on a manual approve from the topic-picker output.

3. Humanizer. Takes the writer's draft and runs the AI-tells pass. Em dashes, "additionally", false ranges, rule of three, em dash again because it always sneaks back. Output is the rewritten markdown plus a list of patterns it caught, so I can spot which prompts are leaking what. This agent is small and fast.

4. Validator. Reads the humanized draft and scores it against a rubric. Mine has 12 criteria, all checkable by regex or count: TLDR present, h2 count between 4 and 6, word count between 1400 and 1800, no banned voice words, currency format right, internal link count at least 3, brand color hex if any, slug shape, banned legal terms absent, et cetera. Returns a pass/fail plus per-criterion notes. If it fails, the whole pipeline kicks back one step.

5. Publisher. Reads the validated draft and ships it. For my setup that is a Shopify blog post POST plus a published_at timestamp set in the future, so the article goes live on a cron rather than the moment the agent finishes. Output is the article ID, URL, and a Slack-ready summary line.

That is the whole stack. Five clear shapes, five clear handoffs.

Coordination protocol, the boring part that matters

Here is what people undersell about multi-agent setups. The interesting work is not in the prompts. It is in the handoff format. Get that right and the agents are interchangeable. Get it wrong and you spend your whole day debugging shape mismatches.

I use one JSON state file per ship, written to a tmp directory. Every agent reads the previous state, writes the next one, and never modifies anything earlier. Append-only. This is the same pattern I use for filesystem memory in Claude Managed Agents, just scoped to a single run.

The shape, simplified:


{
  "ship_id": "ship-2026-05-14-001",
  "stage": "validator",
  "topic": {
    "title": "...",
    "cluster": "claude-managed-agents",
    "angle": "..."
  },
  "draft": { "markdown": "...", "word_count": 1623 },
  "humanized": { "markdown": "...", "patterns_caught": [...] },
  "validation": { "score": 0.92, "passed": true, "criteria": [...] },
  "publish": null
}

Each agent gets the whole state object, fills in its own slot, and returns the new object. No message bus, no queue. A coordinator script (50 lines of Bash) reads the file, picks the next agent based on the stage field, and runs it. If you have ever written a state machine, this is the same idea with JSON instead of code.

The big advantage is debuggability. When something goes wrong I can cat ship-state.json and see exactly which step failed and what the input to the failing step looked like. No log scraping. No reconstructing context from chat history.

The big disadvantage is that this pattern does not scale to truly parallel work. Five agents running in series with a shared state file is fine. Twenty agents writing to the same file would deadlock immediately. If you need true parallelism (the Lexxa render queue I mentioned earlier in the week, for example), you need real coordination primitives, not just a shared file. Different problem.

When it breaks and how I handle it

Two failure modes I have actually seen.

The validator rejects but the writer cannot fix it. Happened twice in the first ten ships. The validator said "h2 count is 7, target is 4 to 6", the writer rewrote, the new draft had 5 h2s but failed a different criterion, the writer rewrote again, the original problem came back. Loop. I cap the writer-validator cycle at one bounce. Second failure, the whole pipeline drops to single-agent mode and pings me on Slack with the state file attached. I open it, see the conflict, fix the prompt, rerun. The 90% case stays automated. The 10% case becomes a 3 minute human nudge instead of a 30 minute investigation.

The topic-picker queues a duplicate. Happened once. I publish a lot in the same cluster, and the picker scored a fresh angle on a topic I had already shipped two weeks earlier. The writer agent does not have access to the full registry, only to cluster guidance. So it wrote a perfectly fine draft of an article that already existed. The validator caught the duplicate slug at publish time and rejected. The fix was to add a duplicate-check tool call to the topic-picker so it queries the blog index before scoring. Cheap. Should have been there from day one.

The pattern under both failures: you do not actually want full autonomy. You want the agents to do the boring 90% and surface the weird 10% to a human, fast, with enough context that the human can decide in under a minute.

When to fall back to single-agent

I want to be honest about this, because the orchestration discourse is a bit overheated. Multi-agent is not the answer for everything. Cases where I still run single-agent:

Short pieces. Anything under 800 words is faster as a single shot. The handoff overhead eats the parallelism gain.
Highly creative work. A keynote script, a brand voice piece, a rant. The writer-validator split tends to sand off the edges I want to keep. Single agent, longer thinking time, better result.
One-off tasks. Building the multi-agent state machine is a 4-hour investment. Worth it for the blog pipeline I run every day. Not worth it for a thing I am doing once.
Anything where the rubric is not crisp. If I cannot write a JSON criterion that a regex or a count can verify, the validator is going to LLM-judge the output, which is itself error-prone, which means I lose the main reason to split.

Short version. Multi-agent is for repeated, structured work where each step has a checkable output. For everything else, one agent thinking carefully still beats five agents handing each other JSON.

Bottom line

The 5-agent pipeline runs my blog publish flow as a small assembly line, not a single craftsperson. It is more expensive in tokens and more reliable in outcome. The trick is not the prompts. It is the state file and the validator rubric. Get those crisp and the pipeline mostly runs itself, with a Slack ping for the cases that need my judgment.

If you are starting from scratch, build the validator first. A good rubric makes every other agent in the chain better, because you can see what they are missing the moment they miss it. I built this pipeline in the wrong order and lost a day to it. You do not have to.

For the broader context on what the orchestration beta actually changed, see Claude Managed Agents Just Got Dreams, 20-Way Parallelism, and Self-Checking Loops. For a different take on the SDK side, I Built 3 Production Agents With the Claude Agent SDK in One Weekend covers the lower-level path. If you want the full pipeline pattern in one place, the Claude Blueprint bundle is where I put the boring infrastructure pieces.