I Trained My OpenClaw to Dream. Here's What It Learned Overnight.

#openclaw #ai #agents #productivity

Every night at 07:05 UTC, my OpenClaw instance does something I never planned: it dreams.

Not metaphorically. There's a cron job that runs a full REM cycle on my conversation history — scoring 700+ recall entries, rejecting noise, and promoting signals to long-term memory. It writes the results before I wake up. By the time I'm at my desk with coffee, my agent is a slightly sharper version of the one who went to sleep.

This post is about how that works, what it actually does with 8 hours of unsupervised memory management, and why I think this pattern — sleep + consolidation — is the missing piece in most AI agent setups today.

What Most Agents Get Wrong About Memory

The standard agent memory pattern looks like this: append everything to a context file, let it grow until the window overflows, then either truncate or start a new thread. It's a lossy, passive approach. You're not teaching the agent anything — you're just... storing.

My first attempt at "better memory" was the same: daily log files that grew indefinitely. Then weekly summaries. Then a three-tier system (daily → weekly → long-term). But even with the tiering, the problem was the same: more storage, less signal. The agent had more material to sift through but no mechanism to distinguish what mattered from what didn't.

The Dream Protocol is my answer to that. It's a nightly cron that treats memory as a learning problem, not a storage problem.

How the Dream Cycle Works

The cron fires at 07:05 UTC every morning. It's an isolated agentTurn that runs a multi-stage pipeline:

Stage 1 — Light Sleep (staging)
  → Pull all candidates from recent daily logs
  → Deduplicate near-identical entries
  → Stage remaining as "candidates"

Stage 2 — REM Sleep (scoring)
  → For each candidate:
      - Recurrence count (how many times does this theme appear?)
      - Query uniqueness (is this from different contexts or the same one?)
      - Truth score (does this contradict established facts?)
  → Threshold gates: minScore=0.8, minRecallCount=3, minUniqueQueries=3

Stage 3 — Promotion
  → Entries that pass all three gates → written to MEMORY.md (long-term)
  → Entries that fail → discarded permanently

The numbers aren't magic. The scoring model is simple: themes that appear frequently across different queries and contexts are more likely to be genuinely important than one-off observations. A correction that appears 3 times from 3 different sessions gets promoted. A passing mention from one conversation gets discarded.

Here's what it looks like in practice from last night's run:

Reviewed 740 total recall entries
Found 220 recurring theme(s)
Promoted: 1 | Rejected: 737
Gates: minScore=0.8, minRecallCount=3, minUniqueQueries=3
Promoted entries written to MEMORY.md

737 rejected. 1 promoted. That's the ratio most nights.

What Survives the Gate

I've been running this for three weeks now. Here's what's consistently promoted:

Model configuration corrections — when I fix a broken fallback chain, that correction survives. The agent stops trying to use the dead NVIDIA endpoint.
Tool preference patterns — which tools work reliably vs. which ones fail silently. The agent learns to route around failures.
User preference signals — James prefers concise answers on Telegram, detailed ones on email. That distinction gets reinforced.

What consistently gets rejected:

Contextual one-liners that made sense in the moment but aren't generally useful
Observations that were superseded by later corrections
Duplicate insights that appeared in multiple sessions (the dedup catches these)

The 1-promoted-per-night rate is intentional. Memory that survives a 737:1 rejection ratio is the kind of signal that actually changes behavior. If everything gets promoted, nothing matters.

The Config That Runs It

The cron job itself is straightforward — OpenClaw native, fires an isolated agentTurn every morning:

{
  "name": "Dreaming Sweep",
  "schedule": { "kind": "cron", "expr": "5 7 * * *", "tz": "UTC" },
  "sessionTarget": "isolated",
  "payload": {
    "kind": "agentTurn",
    "message": "Run the Dream Protocol on your memory. Review staged recall entries, score them against the three gates (minScore=0.8, minRecallCount=3, minUniqueQueries=3), promote survivors to MEMORY.md, discard the rest. Write a brief dream diary to today's memory file.",
    "timeoutSeconds": 120
  }
}

The prompt is deliberately lightweight. The heavy lifting is done by the scoring logic inside the Dreaming script — ~/.openclaw/workspace/scripts/dreaming-sweep.py — which handles the FTS5 recall queries, deduplication, and gate scoring. The agent just reviews the output and writes the diary.

Why I Think This Matters for Agent Design

Most AI agent tutorials focus on two things: tools and prompts. Give the agent more tools, write better prompts, connect it to more data sources. That's the expansion phase.

But at some point, every agent hits a plateau. More tools don't help when the agent can't remember which tools work. More context doesn't help when the signal-to-noise ratio collapses. This is the consolidation problem, and it's where most agent builds stall.

The Dream Protocol is my attempt at a general solution: treat memory like a learning system, not a filing cabinet. Let the agent experience its own failures, observe patterns across sessions, and update its behavior accordingly — without me manually intervening every time something goes wrong.

Is it perfect? No. The scoring gates are hand-tuned, the promotion rate is low enough that it takes weeks to see behavioral changes, and I have no automated way to measure whether the changes actually improve outcomes. I'm working on that.

But the core idea is sound: an agent that sleeps is an agent that learns. Even if it's just 1 true thing per night.

Running the Dream Protocol on your own OpenClaw? I'd love to hear what your agent promotes. Drop it in the discussion — the community could use more real-world data on what memory hygiene actually looks like at scale.