I built an agent learning system that performs way better than RAG

#opensource #openclaw #agents #rag

Most AI agents are amnesiac. Every new conversation, every new task, they start from zero. You feed them context through a RAG pipeline, stuff their window with documents, and hope retrieval does the heavy lifting. It works, kind of. Until the agent forgets what it learned two sessions ago, or confidently repeats a mistake it already made last week, or cannot distinguish between a fact that was true once and a fact that is still true now.

I kept running into this wall. And eventually I stopped patching around it and started thinking about the actual problem.

The result is Membrane: an open-source, typed, revisable memory substrate for long-lived LLM agents.

The Problem with "Memory" in Agentic Systems Today

There are basically two paradigms for agent memory right now:

1. The context window. Your agent gets a fresh window every turn. Everything it "knows" has to be in those tokens. It is fast, it is simple, and it completely resets when the conversation ends. There is no continuity. There is no learning.

2. Append-only RAG. You dump everything into a vector store and retrieve it later. This is better. You get retrieval across sessions. But here is the problem: facts go stale, procedures drift, and you have no mechanism to revise what is in there safely. You can add more, but you cannot cleanly say "that old thing is wrong, use this instead." The old thing just sits there, waiting to be retrieved.

Neither of these is memory in any meaningful sense. They are storage. Real memory is selective. Real memory is revisable. Real memory decays when it stops being useful and gets reinforced when it keeps working.

That is what I wanted to build.

What Membrane Actually Does

Membrane is a long-lived daemon (or embeddable Go library) that sits between your orchestration layer and your model calls. It gives your agents five distinct memory types, each with its own schema and lifecycle:

Episodic records capture raw experience: tool calls, observations, errors. They are immutable once ingested. Ground truth.
Working memory tracks in-flight task state across a session.
Semantic records are stable facts and preferences, the things the agent "knows" about the world or the user.
Competence records are learned procedures with tracked success rates. Not just what happened, but how to solve a class of problem.
Plan graphs are reusable solution structures with dependencies and checkpoints, like a solved map for recurring multi-step tasks.

The flow looks like this: you ingest events and observations during execution, background consolidation promotes episodic traces into semantic facts and competence records, and retrieval pulls relevant memory back out filtered by trust level and salience rank.

The Part That Makes It Different: Revision

This is the thing I am most proud of.

Every other memory system I looked at treated its store as append-only. Membrane does not. It has five explicit revision operations:

Supersede replaces a record with a newer, corrected version.
Fork creates a conditional variant of a record for a different context.
Retract marks a record as no longer valid.
Merge consolidates multiple records into one.
Contest flags a record as disputed when conflicting evidence appears.

Every one of these operations produces a full audit trail. You always know what changed, when, and why. The revision chain is part of the record's identity.

This means an agent can actually learn and correct itself over time, without the old bad information silently contaminating future retrievals.

Trust-Aware Retrieval

One thing I did not expect to care this much about when I started was access control. But when you have agents operating in real environments with real sensitive data, you need it.

Membrane has five sensitivity levels for records: public, low, medium, high, and hyper. Retrieval is trust-gated: when you request memory, you pass a trust context that specifies the maximum sensitivity level you are authorized for. Records above your threshold come back redacted, metadata only, no payload.

This makes it possible to run the same agent across different security contexts without a separate memory store for each one.

The Numbers

The eval suite is thorough. The current results on a vector-aware end-to-end run:

Recall@k: 1.000
Precision@k: 0.267
MRR@k: 0.956
NDCG@k: 0.955

These are not hand-tuned benchmarks. They are regression guards. The suite covers memory type handling, revision semantics, decay curves, trust-gated retrieval, competence learning, plan graph operations, episodic consolidation, observability metrics, system invariants, and gRPC endpoint coverage. If something breaks, these catch it.

The Observability You Actually Want

Membrane exposes a GetMetrics endpoint that gives you a point-in-time snapshot of things like retrieval usefulness (how often retrieved records actually get reinforced), competence success rate, plan reuse frequency, and revision rate. You can actually watch your agent get better over time, rather than guessing.

Who It Is For

If you are building:

Long-lived agents that need continuity across sessions
Systems where incorrect facts need to be corrected, not just buried
Multi-agent setups where trust and access control matter
Anything where you want the agent to genuinely improve its procedures, not just retrieve more text

...then Membrane is worth a look.

It ships with a gRPC API, a TypeScript client SDK (@gustycube/membrane), a Python client, and can run either as a standalone daemon or embedded as a Go library. The encryption at rest (SQLCipher), optional TLS, bearer token auth, and rate limiting are all there out of the box. This is not a toy.

Try It

The repo is at github.com/GustyCube/membrane. The README covers the full API, architecture, configuration, and eval setup. There is also full documentation available via VitePress in the docs/ directory.

If you are building anything in the agent memory space, I would genuinely love to hear how you think about these problems. Open an issue, open a PR, or just star the repo if you want to follow along.

This is still early. The paradigm is still being figured out across the whole field. But I think selective, revisable, typed memory is the right direction, and Membrane is my bet on that.

Bennett Schwartz | github.com/GustyCube/membrane