DEV Community

RAXXO Studios
RAXXO Studios

Posted on • Originally published at raxxo.shop

Claude Managed Agents Just Got Dreams, 20-Way Parallelism, and Self-Checking Loops

  • Claude Managed Agents now ship Dreaming, a memory consolidator that learns from session logs without overwriting your data

  • Multi-agent orchestration runs up to 20 specialized agents in parallel, useful for blog cluster ships and inventory sweeps

  • Result loops let agents self-check outputs against a rubric before returning, saving you a manual QA pass

  • Webhooks plug agent runs into Slack, Shopify, or any external system without polling cron jobs

The Anthropic dev conference in San Francisco dropped a quiet bomb on Claude Managed Agents this week. Four features at once. Dreaming in testing, plus three public betas (multi-agent orchestration, result loops, webhooks). I spent yesterday rerouting parts of my one-person studio around them.

Dreaming, or how Claude Managed Agents learn while you sleep

Dreaming is the headline. It is in testing, not public beta yet, so treat this section as a heads-up and not a how-to. The mechanic is simple to describe and hard to build. The agent reads its own session logs, finds repeating patterns, merges duplicate memories, and writes a tighter, optimized memory store. The original logs stay untouched. Your agent learns over time without you cleaning up after it.

If you read Claude Managed Agents Now Have Filesystem Memory when it landed in late April, you already have the mental model. Filesystem memory was step one (give the agent a place to write notes). Dreaming is step two (let it tidy those notes on its own).

In practice, here is what changes for a daily-use agent. Right now my blog-publish agent has a memory file that, after 200 articles, looks like a hoarder's garage. Same affiliate rule, written 14 different ways. Same TLDR shape, restated every other week. Dreaming is supposed to fold those into one canonical entry per concept. The original sessions stay readable. The active memory gets shorter and sharper.

The thing I am watching for is whether Dreaming respects the boundary between rules and observations. A rule like "currency is 5€ never €5" should never get merged with a one-time observation. If the testing build keeps that line clean, this becomes the most useful agent feature shipped all year. If not, expect a wave of "my agent forgot the most important rule" tweets.

I will write up the real test once I get into the testing cohort.

20-agent parallelism, the actual unlock for solo studios

Multi-agent orchestration is now public beta. Up to 20 specialized agents collaborating in parallel, with shared context and a coordinating runner. This is the feature that changes which workflows are realistic for a one-person operation.

Three concrete RAXXO-style use cases I am moving over this week.

Blog cluster ship. I publish multiple long-form articles a day on the lab. Sequential publishing took 40 to 60 minutes per batch (research, draft, humanize, publish, syndicate, OG image). With orchestration, three writer agents work in parallel on three distinct topics, each backed by its own research sub-agent. A coordinator checks for cluster overlap before they all hit publish. End-to-end shipping for three articles drops to roughly the time of one. (Yes, the article you are reading was written this way.)

Shopify inventory watch. I run a small product catalog on Shopify. Last Christmas I lost two days to a stockout I should have caught. The new pattern: a runner spawns one agent per product line. Each watches its own SKUs, checks demand spikes against the last 30 days, and flags drift. The runner aggregates, writes a single morning summary. 12 agents, one report, zero polling cron jobs. This used to need a queue worker and a job scheduler. Now it is a single config file.

Lipsync render queue. My Lexxa video pipeline cuts long-form YouTube from blog articles. The bottleneck has always been the lipsync render step (each Magnific Speak clip takes 4 to 8 minutes). With orchestration, eight render agents process eight clips in parallel, a stitcher agent assembles the final cut, an audio agent normalizes levels. What used to be a 90-minute serial render is now closer to 15 minutes wall-clock.

The honest caveat. 20 agents in parallel sounds heroic until you realize the coordination problem grows fast. If two agents touch the same memory file you get write conflicts. The orchestration layer handles a lot of this for you, but you still have to design for parallelism. Most workflows do not actually need 20 agents. Most need 3.

Result loops, the feature that saves a manual QA pass

Result loops are the public-beta feature I expected least and now use most. The idea is dead simple. The agent ships an output, then reads it back against a rubric you provide, then either accepts or revises. No human in the loop until the final answer.

Before result loops, every blog draft I generated needed a manual pass for the obvious tells. Em dashes that snuck back in. Words like "moreover" and "furthermore". The currency format slipping into "€5" instead of "5€". I caught these by hand or with a brand-check hook. Both have a cost.

Now the publish agent ships, then runs a self-check against a 12-point rubric (TLDR shape, word count, voice rules, affiliate placement, internal links, slug format, you get the idea). If the rubric flags anything, the agent revises and re-checks. Up to 3 loops, then it asks for help.

What makes this work is the rubric. A vague rubric ("is this on-brand?") gets a vague answer ("yes"). A specific rubric ("count em dashes, count instances of $, count words below 1400") gets a specific revise. The rule of thumb I am settling on: every rubric line should be checkable by a regex or a count. If a human has to interpret it, the agent will too.

The catch. Result loops cost extra tokens. Each revise is another full output. For high-stakes work (a launch announcement, a paid product page) the savings on the manual pass are worth it. For throwaway internal tasks, leave the loop off and ship.

If you want the deeper context on the agent platform itself, Claude Managed Agents: Build and Deploy AI Agents at Scale covers the deployment story.

Webhooks, plus the thing nobody is mentioning

Webhooks are the fourth public-beta feature, and they are exactly what they sound like. An agent finishes a run, posts a JSON payload to a URL of your choice. Slack channel, Shopify webhook, your own endpoint, whatever you wired up.

This feels obvious until you realize what it replaces. Until last week, knowing when an agent finished meant either polling the API on a cron, or building a status dashboard with a refresh loop. Both work. Both are annoying. Webhooks delete the problem.

The patterns I am wiring up first.

Publish to Slack. My blog-publish agent now posts to a private channel the moment a draft validates. Title, slug, word count, link. If something fails the rubric, I get the error, not just silence.

Schedule to Buffer. A repurposing agent fans every new long-form article into a LinkedIn post and three tweets. The webhook posts the formatted content to my Buffer queue. I review on the phone, hit approve.

Render queue done. The Lexxa pipeline pings a webhook the second a final cut is ready, with the Vercel blob URL of the MP4. I get a phone notification, watch the cut, decide whether to ship.

The thing nobody is mentioning. Webhooks plus result loops plus orchestration is the actual story. Each feature on its own is fine. Stacked, they remove the last reasons to babysit a long-running agent. You queue the work, you go for a walk, you come back to a finished result with a Slack ping for anything that needed your judgment. The dev community on X is calling this the autonomous turn. It is overstated, but only barely.

A note on rate limits. None of this is free. Multi-agent orchestration burns tokens roughly linearly with agent count. Result loops add 1 to 3x on every output. Webhooks are cheap but the systems they trigger are not. Budget accordingly. I am running closer to my plan ceiling than I was a week ago. Worth it for me, plan ahead for you.

Bottom Line

Claude Managed Agents went from "nice infrastructure" to "actually changes how I plan a Tuesday" in a single dev conference. Dreaming will, in time, fix the memory-bloat problem that every long-running agent eventually hits. Multi-agent orchestration is the feature that makes solo-studio workflows competitive with small-team output. Result loops save the manual QA pass on anything boring enough to rubric-check. Webhooks close the loop on babysitting.

If you have not touched the platform yet, start with one orchestrated workflow. Pick the most boring repeatable thing you do (mine was the daily blog ship). Wire 3 agents and a webhook. You will know within a week whether the rest of your stack should follow.

If you want to see the rest of how I am running RAXXO Studios as a one-person AI shop, the toolkit lives at /pages/studio. The full archive of agent experiments lives at /blogs/lab.

Top comments (0)