On May 6th, Anthropic shipped three new capabilities for Managed Agents. Two of them — Outcomes and multi-agent orchestration — are solid infrastructure upgrades. The third one, Dreaming, is the one worth stopping to think about.
Dreaming is a scheduled background process that runs between sessions. The agent reviews its own past conversation transcripts, identifies recurring patterns, and writes learnings into its memory stores. No human prompt required. No explicit instruction to "remember this."
If you've been building with Claude agents, you already know how memory works: you tell the agent something, it stores it, it uses it next time. Passive. Explicit. You're the one deciding what gets remembered.
Dreaming flips that. The agent decides.
How It Actually Works
The process runs on a schedule between sessions. The agent scans past transcripts looking for signal: mistakes it repeated, approaches that worked, edge cases it missed. It then curates its own memory stores based on what it finds. The original session data stays untouched — Dreaming writes to memory, not back to history.
There are two autonomy modes you can configure:
- Automatic: the agent identifies patterns and writes them to memory directly
- Human review: the agent proposes memory updates, you approve before they take effect
The human review mode is the safer starting point for production systems. You get the cross-session pattern recognition without giving the agent unilateral write access to its own memory.
Currently in research preview — not GA yet.
Why This Matters: The Cross-Session Blind Spot
Here's the problem Dreaming solves. Individual sessions can't see cross-session patterns. A support agent that misclassifies a certain type of ticket won't notice it's made the same error 12 times this month. Each session starts fresh. The pattern is invisible.
Dreaming surfaces exactly that kind of signal. It's the difference between an agent that resets every session and one that accumulates operational experience over time.
The practical implication: an agent that's been running for three months has three months of self-curated experience. A freshly deployed agent starts from zero. Over time, these become fundamentally different systems — not because of different prompts, but because of different histories.
Outcomes: The Signal Dreaming Needs
Dreaming needs to know what "doing well" means. That's what Outcomes provides.
You define a success rubric. A separate Claude instance — isolated from the agent's reasoning, running in its own context window — evaluates output against your criteria. If it fails, the grader identifies what needs to change, and the agent iterates until it meets the bar.
Numbers from Anthropic's internal testing:
- Task success rates improved up to 10 percentage points over standard prompting
- Structured file generation: +8.4% on .docx, +10.1% on .pptx
- Works for subjective quality — editorial voice, writing style, brand consistency
The isolation model matters here. The grader runs in a separate context window, which means it can't be influenced by the agent's own reasoning. It's evaluating output, not process.
Connect the two: Outcomes identifies failures. Dreaming remembers them. One is the exam. The other is the error notebook.
Multi-Agent Orchestration: Now in Public Beta
The third piece moved from preview to public beta. A coordinator agent decomposes tasks and delegates to up to 20 specialist subagents running in parallel. Each subagent gets its own context window. They share a common filesystem.
Key details for builders:
- Full trace visibility in Claude Console
- Coordinator can send follow-up messages mid-workflow
- Subagents retain context between exchanges
- Orchestration depth limited to one level — no sub-sub-agents
The depth limit is worth noting. If your architecture needs nested orchestration, this isn't the right fit yet.
Real-world results from early adopters:
- Harvey (legal AI): task completion rates up approximately 6x
- Wisedocs (document verification): review speed improved 50% while maintaining quality
- Netflix: parallel batch analysis across hundreds of build logs
- Spiral by Every: Haiku coordinator + Opus writing subagents + Outcomes grader scoring against editorial principles
Webhooks and Pricing
Webhooks are in public beta. Agents push notifications to your system when tasks complete. For long-running jobs — some sessions run for hours — this is essential. You don't want to poll.
Pricing: standard Claude API token rates plus $0.08 per active session hour. Idle time is free. A 30-minute task costs 4 cents in infrastructure fees on top of tokens. Dreaming, Outcomes, and Webhooks don't add separate charges.
Quick Reference
| Feature | Status | What It Does |
|---|---|---|
| Dreaming | Research preview | Agents review past sessions, extract patterns, curate memory |
| Outcomes | Public beta | Automated output grading against developer-defined rubrics |
| Multi-agent orchestration | Public beta | Coordinator + up to 20 parallel subagents, shared filesystem |
| Webhooks | Public beta | Push notifications when agent tasks complete |
| Pricing | Live | $0.08/active session hour + standard token costs |
One Limitation Worth Knowing
Managed Agents runs Claude models exclusively. The orchestration, Dreaming, Outcomes grading — all Claude. If your architecture needs to route between models (cost optimization, specialized capabilities, latency requirements), that's a layer Managed Agents doesn't address.
If you're building multi-model agent systems that need persistent context across providers, EvoLink provides a unified gateway routing across Claude, DeepSeek, GPT, and others from a single API endpoint.
Author: Jessie, COO at EvoLink
Sources:
Top comments (0)