Anthropic's 'Dreaming' Lets Claude Agents Learn From Their Own Mistakes — Here's What That Means for Your Architecture

#claude #anthropic #aiagents #claudemanagedagents

Anthropic just shipped a feature called Dreams for Claude Managed Agents. It's in research preview now, gated behind a dreaming-2026-04-21 beta header. The short version: your agent can review its own session history and rebuild its memory into something cleaner and more useful. Automatically.

That's not a small thing.

The problem this actually solves

If you've shipped an agent that uses Claude Managed Agents memory stores, you've probably noticed the same thing I did: the memory degrades. Not dramatically at first. But after a few dozen sessions, you end up with a store that has the same preference saved three times in slightly different ways, some notes that were true six weeks ago and aren't anymore, and a handful of one-off debugging observations that are just noise.

Agents write memory incrementally. Every session adds new entries. Nothing cleans up the old ones. Over time, the thing starts to look less like useful context and more like a log file nobody's trimming.

Dreams is Anthropic's answer to that. Make the agent review the receipts.

How it actually works

A dream is an async job. You give it two inputs: an existing memory store and a list of past sessions (anywhere from 1 to 100). It reads both, then produces a new, separate output memory store.

What happens in between: the model — you choose either claude-opus-4-7 or claude-sonnet-4-6 — goes through the input store and session transcripts together. It merges duplicate entries, replaces stale or contradicted facts with the latest values, and surfaces insights that cut across sessions — patterns that weren't visible within any single run.

The thing I actually appreciate about the design: it's non-destructive. The input store is never touched. What you get back is an entirely new store. You can review it, decide you don't like it, and delete it without having broken anything. That's a reasonable default for a system that's running autonomously over your data.

Two headers required for everything to work:

anthropic-beta: managed-agents-2026-04-01,dreaming-2026-04-21

The SDK sets these automatically if you're using it.

What this changes architecturally

The naive way to use agent memory is to just let it accumulate. Run sessions, write to memory, read from memory. That works fine until it doesn't.

Dreams adds a third operation to that cycle. Write. Read. Occasionally consolidate.

For agents doing repetitive tasks — code reviews, content pipelines, research runs — you probably want to schedule a dream on a cadence. Maybe after every 10-20 sessions, depending on how fast your memory store grows. The instructions parameter (up to 4,096 characters) lets you focus the consolidation pass: "focus on coding-style preferences, ignore one-off debugging notes" is the example straight from the docs. That's useful. You don't want the model wasting time on stuff you know is ephemeral.

The other architectural implication: dreams take minutes to tens of minutes depending on input size. This is background processing, not something you can wait on in a user-facing request. If you're building a product on top of this, plan for the consolidation to happen asynchronously and apply the output store on a subsequent session, not the current one.

One thing worth knowing: while a dream is pending or running, you can't delete or archive its output store. The API returns a 400 if you try. And if you archive or delete an input store mid-run, the dream fails with input_memory_store_unavailable. So don't do that. Stage your cleanup after the dream completes.

Watching the pipeline run

This part is underrated in the coverage I've seen so far.

When a dream starts running, its session_id field points at the underlying session executing the pipeline. You can stream that session's events in real time — same SSE streaming you'd use for any other Managed Agents session. You can actually watch what the agent is reading and writing.

That's useful for debugging, but it's also just interesting. It's probably the clearest window you'll get into how the model interprets your agent's history.

The session gets archived (not deleted) when the dream reaches a terminal state, so the transcript stays available afterward if you want to review it.

The Harvey number

Legal AI company Harvey reported a 6x increase in task completion rates after implementing dreaming. That number is from secondary coverage of Anthropic's Code with Claude 2026 event — I haven't independently verified it.

Take it with the usual grain of salt for launch-week customer data. Task completion rate is also a metric you can move a lot of ways, and we don't know what the baseline looked like or how they defined "task completion." A 6x improvement would be extraordinary. A meaningful improvement in a well-defined task is more realistic.

What I'd actually expect in practice: cleaner memory means fewer confusing contradictions, means fewer hallucinations downstream, means more reliable runs. Whether that translates to 6x or 1.3x depends heavily on how messy your memory had gotten and what kind of tasks you're running. For most teams, the first few dreams on a production agent will probably be clarifying. After that, the marginal benefit per dream decreases unless you're generating a lot of new session data between runs.

Who should actually use this

If you're already building on Claude Managed Agents: Worth setting up even at research preview. Non-destructive, billed at standard token rates, and the downside is basically "I wasted some compute on a consolidation I didn't need." The upside is a cleaner memory store and potentially more reliable agent behavior.

If you're evaluating Claude Managed Agents: This fills in a gap that was previously a legitimate concern. Persistent memory plus periodic consolidation is a more complete story than persistent memory alone. Check the best AI agent platforms roundup for how the overall platform compares.

If you're building agent commerce flows or high-frequency task pipelines: The April commerce agent work is adjacent to this — agents operating autonomously over long periods with real stakes. Dreaming is directly relevant there. Memory that accumulates without cleanup is an operational risk when the agent is making real decisions.

How to get access

It's a research preview, so you need to request access through Anthropic's form. Both beta headers — managed-agents-2026-04-01 and dreaming-2026-04-21 — are required. During the preview, only claude-opus-4-7 and claude-sonnet-4-6 are supported for running the dreaming pipeline itself.

Billing is at standard token rates for whichever model you pick. Anthropic's own note: cost scales roughly linearly with the number and length of input sessions. Start small — a batch of 10-15 sessions — before you run 100 at once.

The thing about dreams is it solves a problem that most agent builders have quietly been managing by hand. The production workaround has usually been: periodically review your memory store manually, delete the garbage, rewrite the contradictions. That works when you have one agent. It doesn't scale.

An async cleanup pipeline that runs against your own session history is the right abstraction. The design decisions — non-destructive output, observable pipeline, configurable focus — are the kind of thing you want to see in a beta API. They suggest Anthropic actually thought about how developers would use this in production, not just how it would look in a demo.

I'll follow up once I've had time to run it against a real agent workload. For now: if you're on Claude Managed Agents, get on the access list.