Every night at 7:10 PM Eastern, my OpenClaw agent goes to sleep.
It doesn't rest. It processes. For about 60 seconds, a cron job runs a three-stage pipeline against everything I did that day — every task I delegated, every error I logged, every decision I made. By morning, the agent's memory has been quietly edited: noise discarded, signal promoted, patterns surfaced.
I've been running this setup for three weeks. The numbers are honest:
- June 23: 62 candidates staged → 257 recurring themes found → 2 promoted to long-term memory
- June 22: 64 candidates staged → 242 recurring themes → 1 promoted
- June 21: 63 candidates staged → 241 recurring themes → 1 promoted
Most of what the agent sees gets rejected. That's the point.
Why I Built a Dream Protocol
The problem with a long-running AI agent is that context gets compressed. Every session, the system summarizes what happened and compaction kicks in — condensing 40 messages into a few paragraphs. It's efficient, but it's also lossy. Important lessons get averaged away. Corrections fade. Context that's critical for next time gets compacted into vague language.
I needed a way to surface what actually mattered from the daily noise.
The answer was a nightly cron job that I call the Dream Protocol. It's not sophisticated — it's a Python script that runs against my daily memory logs. But it's disciplined, and discipline beats cleverness in memory systems.
The Three-Stage Pipeline
Stage 1: Light Sleep — Staging Candidates
The script scans the day's memory log and stages every "lesson learned" entry — every ## What I learned section, every ## Self-Improvement note, every flagged correction. It also pulls from the previous few days' logs.
Before deduplication, this looks like noise: repeated attempts at the same fix, verbose corrections that say the same thing three different ways, stale entries that were already resolved.
The deduplication step removes near-duplicates. This is important — if I tried to fix the same problem three times in a week, that's one lesson, not three.
Stage 2: REM Sleep — Scoring and Filtering
This is where the real selection happens.
The script looks at recurrence: how many times does this pattern show up across different days, different sessions, different contexts? A lesson that appears once is noise. A lesson that appears three times across three different query contexts is signal.
The scoring gates are:
- Minimum recall count: 3 (must appear at least 3 times in recall store)
- Minimum unique queries: 3 (must be relevant across at least 3 different search contexts)
- Minimum score: 0.8
If a candidate survives all three gates, it gets promoted to MEMORY.md — the agent's long-term knowledge base. Everything else gets rejected.
The rejection rate is brutal. June 23: 824 rejected out of 828 candidates. June 22: 803 rejected out of 806. Most of what the agent learns, the agent forgets. But the stuff that sticks is the stuff that kept appearing — and that's what I actually want the agent to remember.
Stage 3: Dream Diary — The Log of What Didn't Make It
There's a third output: a Dream Diary entry that logs the process without the details. This isn't for the agent — it's for me. It tracks how many candidates were staged, how many themes were found, what gates were applied, and what the top-scoring survivors were.
It's the agent equivalent of waking up and not remembering the dream, but knowing something happened.
What Gets Promoted
The filtering sounds harsh, but it's surprisingly good at finding the right things.
From the last two weeks, what's survived to MEMORY.md:
-
MiniMax-M2.7as the correct compaction model — appeared across 80+ recall entries, confirmed correct by session review data - Fallback chain failures with free-tier models — kept appearing in cron failure logs; eventually promoted to long-term memory after 3+ distinct failure events
-
The
/tmp/tmpfile bug pattern — same root cause (hardcoded temp file reference in cron payload) appeared in 3 separate cron sessions before being caught
What's consistently rejected:
- One-off corrections (e.g., "fix typo in prompt X")
- Verbose explanations that say the same thing as a shorter entry
- Stale entries from days when the problem was already resolved
What This Actually Changes
The practical effect after three weeks: the agent's behavior has shifted.
When a new cron fails the same way a previous one did, the agent recognizes the pattern faster — not because it was explicitly told about it, but because it appears in MEMORY.md with enough weight that it survives compaction. When a new model configuration is proposed, the agent has enough evidence to push back on free-tier fallbacks without being explicitly told to.
The dream protocol isn't magic. It's just disciplined noise cancellation.
The alternative — storing everything — produces the opposite effect. A memory full of noise makes it harder for the agent to distinguish what actually matters. The compaction model averages everything together, and signal gets diluted.
The One-Line Summary
Most of what an AI agent learns, forget it. The 3% that survives 3 different contexts across 3 different days — that's what you want in long-term memory.
The Dream Protocol is a 60-second cron job that costs almost nothing to run. After three weeks, it's the reason my agent caught a silent cron crash that would have gone unnoticed for days. It's the reason the agent stopped suggesting free-tier fallbacks for production cron jobs. It's the reason I trust the memory more than I trust my own notes.
If you're running OpenClaw and your agent's memory keeps getting noisier over time, try a nightly deduplication pass. You don't need a sophisticated system. You need a gate that says "appeared 3 times across 3 different days" — and the discipline to actually delete the rest.
Top comments (0)