DEV Community

Charles Wu for seekdb

Posted on

MEMORY.md Every Turn? That’s Noise, Not Memory.

Full context feels safe until it isn’t. Here’s the engineering fork in the road — and real numbers from an open-source memory layer on OpenClaw.

The AI memory dilemma: stateless models, long chats, and the “wait — who are you again?” moment

Stateless isn’t a feature. It’s the default bug.

That’s the part nobody puts on the landing page: large language models don’t continue anything on their own. Every turn is a fresh sheet of paper — unless you shovel history back in.

So the industry reached for two comforting reflexes:

  • Crank the context window — pack in everything that might matter. Longer feels safer.

  • Drop a MEMORY.md and paste it every turn — simple, auditable, easy to debug.

Both are great at small scale. Both fall apart at real scale.

Because context isn’t free. You pay three ways: slower, pricier, and muddier. Inference drags. Your token bill climbs linearly. Worst of all, as context grows, attention thins out in the middle — quality drops, contradictions creep in, and you’re not buying “memory.” You’re buying noise.

So the real question isn’t whether to remember. It’s this:

What shape should memory take before it enters the model — verbatim dump, or retrieve what matters?

This piece is about that engineering fork — and some numbers we saw hooking an open memory layer to OpenClaw.

Full context is like reciting your entire diary before every sentence

A lot of people hear “long-term memory” and think: store everything anyone ever said.

A better engineering definition is tighter:

Keep only facts that help future decisions — and that retrieval can amplify — and let stale stuff expire on purpose.

Human memory isn’t a bit-perfect disk image. We lose detail; we blur timelines; we still keep actionable residue — “no cilantro,” “last time we were blocked on that dependency.” Flip that around for AI: full retain + full inject often buys you less coherence and more contradiction and context pollution.

That’s why I keep coming back to three knobs:

  • Write path: what do you distill from chat into durable memory?

  • Read path: what do you retrieve — and how much — before the model sees it?

  • Lifecycle: how do old facts fade instead of squatting forever?

What should a “memory system” actually look like?

One sane pattern: a persistent memory layer outside the LLM — think PowerMem (Apache 2.0 from OceanBase) — that extracts salient facts from dialogue (dedupe, conflict update, merge related), recalls on demand, and forgets stale items with an explicit decay policy.

A few properties that actually matter in production:

  • Hybrid retrieval — vectors + full-text + graph-style links. Fuzzy intent and exact keywords need to hit. “Embedding-only search” ages poorly in real products.

  • Forgetting isn’t a bug — Ebbinghaus-style decay sounds like a psych meme; it’s really a capacity vs. signal-to-noise trade you’re engineering on purpose.

  • Multi-agent — private memory and shared memory across agents. Multi-agent isn’t “someday”; it’s now. Single-user, single-session assumptions break fast.

  • Multimodal — text, images, audio. Not for show — workflows are already messy.

If you list those as a feature matrix, it reads like marketing. In engineering terms, they answer one question: how do you put the smallest useful slice of memory in front of the model this turn?

The benchmark doesn’t care about your vibes: LOCOMO vs. “just paste everything”

On LOCOMO (long-dialogue memory benchmark; Maharana et al., ACL 2024), PowerMem vs. a full-context baseline isn’t a rounding error:

The point isn’t “pick a winner.” It’s that retrieval + extraction beats brute-force context on quality, latency, and cost at the same time — which feels backwards until you realize the information shape changed: from “replay the transcript” to structured, retrievable facts.

OpenClaw: the anti-pattern you can actually measure

Out of the box, OpenClaw can ship the entire MEMORY.md into system_prompt every turn, with no retrieval—and the file keeps growing.

That’s full-context thinking in a real toolchain: simple, transparent, explainable — right up until it starts eating you alive.

Same workload, total input tokens:

The PowerMem plugin lands around ~18% of the default — same ballpark as “stop reciting the encyclopedia before answering one question.”

The integration model is what you’d want if you designed it on purpose: retrieve before the session, inject only what’s relevant; extract after the session, persist durable facts — instead of mirroring the whole file into the prompt every time.

Make it run: OpenClaw + PowerMem (copy-paste path)

Pick one path: use ClawHub (step 2) or the manual server + JSON (step 3) — you don’t need both.

1 — OpenClaw
https://openclaw.ai/

2 — One-click via ClawHub (recommended)
Install skill: https://clawhub.ai/Teingi/install-powermem-memory
Plugin name: memory-powermem.

3 — Manual path (when you want your own server)

Install and run PowerMem:

pip install powermem
# from a directory with a configured .env
powermem-server --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode

Install the OpenClaw plugin:

openclaw plugins install memory-powermem
Enter fullscreen mode Exit fullscreen mode

Point OpenClaw at your server — edit ~/.openclaw/openclaw.json and set the memory slot to this plugin, for example:

{
  "plugins": {
    "slots": { "memory": "memory-powermem" },
    "entries": {
      "memory-powermem": {
        "enabled": true,
        "config": {
          "baseUrl": "http://localhost:8000",
          "autoCapture": true,
          "autoRecall": true,
          "inferOnAdd": true
        }
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart the OpenClaw Gateway, then sanity-check:

openclaw ltm health
Enter fullscreen mode Exit fullscreen mode

If that’s green, you’ve swapped “paste the whole scroll every turn” for retrieve-then-inject + capture-after.

Two layers that don’t negotiate: operation plane + cognitive plane

If you’re going to run memory in production, you usually need both:

  • pmem CLI — humans and agents share the same front door: scriptable, automatable, boring in the good way.

  • Dashboard — distributions, health, “what did we actually memorize?” so humans can govern instead of guessing.

Two-layer design: CLI (operation plane) + dashboard (cognitive plane)

Agents need low-friction execution. Humans need explainability and ops judgment. Skip the operation plane and memory never ships. Skip the cognitive plane and you end up with vectors in a black box — and the only fix is nuke from orbit.

Quick local try:

pip install powermem
# or: uv add powermem
pmem --version
Enter fullscreen mode Exit fullscreen mode

With powermem-server running, open the dashboard at http://localhost:8000/dashboard/.

What the open-source argument should actually be about

If you’re building agents, CLIs, or personal automation, move the debate past “do we keep a MEMORY.md?”

  • Is memory a document or a database?

  • Is injection append-only or retrieve-then-inject?

  • Do you have real decay, or do you pretend “never expires” means “always correct”?

Projects like PowerMem are less a billboard and more a reproducible lab bench—hybrid retrieval, extraction, decay, multi-agent, multimodal—trading context signal-to-noise for engineering you can argue about in issues instead of vibes.

If you take one line home, make it this:

Long-term memory isn’t about remembering more. It’s about recalling the right thing when it matters.

Further reading

Top comments (2)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.