DEV Community

Cover image for Claudelab: a manually-gated alternative to Anthropic's agent 'Dreaming'
이종국 (barw)
이종국 (barw)

Posted on

Claudelab: a manually-gated alternative to Anthropic's agent 'Dreaming'

A disclaimer before anything else: this is not a framework, and not a
product. It's a handful of markdown conventions plus one small Python script —
a personal habit for keeping my own Claude Code setup from rotting. Later in
this post I put it next to Anthropic's "Dreaming"; I want to be clear up front
that the comparison is about a shared problem, not a shared scale. If "a few
markdown files" sounds too small to be worth your time, that's a fair call —
feel free to bail here.

If you run long-lived projects through Claude Code, you've probably noticed its
memory system (a MEMORY.md index plus topic files) is genuinely useful — and
also that it rots.

After a few months of daily use across several projects, I kept hitting the same
three failure modes:

  1. Audit inflation. You start logging decisions ("tried X, didn't work"),
    encoding rules ("always typecheck before push"), recording context. Then one day
    you notice you spend more time documenting decisions than making them. Memory
    files multiply. Cross-references break.

  2. Project pollution. A throwaway meta-experiment — "let me check if this MCP
    server works" — leaks into a real project's commit history and workspace. Six
    months later you can't tell signal from scratchpad.

  3. Rule drift. You write a governance rule on Monday. On Tuesday you bypass it
    with "this time is different." The rule was never wrong; you just stopped
    enforcing it.

I built a small workbench pattern to fight all three. It's called claudelab,
and it's deliberately boring: a few markdown conventions and one Python script.

  • Audit inflation cap — an explicit per-session limit on new memory files. Hit it, and only justified exceptions (a real incident, an external audit) let you add more. It forces you to stop documenting and go build.
  • REJECTED archive — every killed idea is logged with a reason and a reversal trigger. You don't get to re-litigate a dead idea without new evidence.
  • Closed-loop self-audit — a health-check script finds broken cross-references, orphan files, and stale entries. The system stays consistent without a human scanning it.
  • Track-separated governance — real projects, meta-experiments, and research each live under different rules. Strict where it matters, loose where it doesn't.

Here's the part that made me write this up.

While building claudelab, I didn't know Anthropic had just shipped Dreaming —
a managed feature, announced in May 2026, that does automated between-session
memory consolidation: it reviews past sessions, prunes stale notes, merges
duplicates, resolves contradictions, and writes structured playbooks for future
runs. The legal-AI company Harvey reportedly saw task completion rates jump ~6x
after adopting it.

When I found out, my first reaction was "well, that's the same problem." And it
is — but the two solutions sit at opposite ends:

Anthropic Dreaming claudelab
Memory consolidation automatic, between sessions manual, explicitly gated
Where it runs managed / hosted your local files, your repo
Recurring-mistake surfacing learned, implicit REJECTED.md — explicit reason + reversal trigger
Stops over-documentation not a stated goal audit inflation cap is the core constraint
Auditable yes (plain-text playbooks) yes (plain-text, in your git)

To be clear about the timeline: Dreaming was announced before I'd finished
claudelab's tooling. This isn't a priority claim — it's independent convergence.
The interesting thing isn't who was first; it's that the problem is real enough
that two people hit it and picked opposite trade-offs.

Dreaming optimizes for scale and autonomy — agents that get better
unsupervised, the longer they run. claudelab optimizes for one human staying
in the loop without drowning in audit overhead. If you run fleets of managed
agents, Dreaming is the answer. If you want a transparent, self-hosted workbench
where every memory change is something you consciously approved, that's
claudelab — and the two aren't mutually exclusive.

One deliberate non-feature: claudelab does not automate memory updates with
hooks. That sounds backwards for a "self-evolving" system, but auto-creating
memory is just audit inflation with extra steps — rule drift happens faster than
a human notices. Manual gating is the feature.

It's MIT, it's alpha, and it's a transferable pattern more than a product. If you
run agents long-term, I'd genuinely like to know whether the audit-inflation cap
resonates — or whether I've built an elaborate solution to a problem only I have.

Repo: https://github.com/whdrnr2583-cmd/claudelab

Top comments (0)