DEV Community: gregor

Do You Still Need an Agent Memory Layer if ChatGPT Already Has Memory?

gregor — Wed, 22 Jul 2026 10:11:28 +0000

Do You Still Need an Agent Memory Layer if ChatGPT Already Has Memory?

Yes — for a reason that becomes clear once you separate two different jobs. ChatGPT memory, Claude Projects, and Gemini's built-in memory solve a conversational recall problem: they remember what you told the model inside its own interface. An agent memory layer (Mem0, Zep, Letta, PLUR) solves a programmatic recall problem: it stores facts inside your application, accessible from your code, portable across every tool your agent uses. These systems serve different jobs and work best together.

What model-native memory does (and where it stops)

As of 2026, the major model providers ship built-in memory in two forms.

ChatGPT has "saved memories" — an explicit, user-editable list of facts that ChatGPT chose to remember across conversations — and "reference chat history" (launched April 2025), which implicitly draws on patterns from past chats. Saved memories are auditable: you can open Settings → Personalization → Manage memory and see exactly what is stored. Reference chat history is not auditable — you cannot see what it has inferred. ChatGPT Projects adds a Project Memory scope that captures facts within a specific project workspace, isolated from other projects and from main chat.

Claude Projects maintains a project-level instruction set and uploaded file context, scoped per project, persisting across all conversations in that project. Unlike ChatGPT's saved memories, Claude Projects does not automatically extract facts from conversations — knowledge is added explicitly via the instruction set or uploaded files.

Gemini offers workspace-scoped memory through Gemini Apps Activity, manageable through your Google Account privacy dashboard.

The common constraint: all of these live inside the model provider's servers, scoped to the model's own interface. They do not follow your agents into code.

When your Claude Code session ends, Claude's project memory does not inject into your next Cursor session. When ChatGPT remembers you prefer Python, your custom CLI agent does not know that. Model-native memory exists in one silo, per provider, per interface.

Source: ChatGPT Memory Guide 2026

What an agent memory layer adds

An agent memory layer runs in your application, not the model provider's server. It gives you four things model-native memory cannot:

Portability. The memory store is a resource your code owns. If you switch from Claude to GPT-4o, or from Cursor to a custom CLI, the memory follows automatically — no migration, no re-learning.

Programmatic read/write. Your code can store a fact at runtime (plur_learn, mem0.add), retrieve it on the next run (plur_recall, mem0.search), and update or delete it via API. Model-native memory cannot be written to by your code — only the model and the user can modify it.

Auditability. With file-based memory systems like PLUR, you can open the store with a text editor, grep for specific facts, and see exactly what the agent knows and when it learned each item. With vector-based stores like Mem0, you can query the API. With model-native memory, you are limited to what the provider's UI exposes.

Deletion control. You can implement right-to-be-forgotten policies, purge all facts about a specific user, or wipe session state programmatically. Model-native memory deletion goes through the provider's interface, not your code.

Source: How to Build AI Agent Memory in 2026, AI Agent Memory Frameworks in 2026

The complement frame: different jobs, use both

Model-native memory is useful for: "Remember I prefer Python over JavaScript" — a conversational preference stored in the model's own interface, useful when you are talking directly to ChatGPT or Claude.

Agent memory is useful for: "Remember the architecture decision we made last sprint, the user's confirmed preferences across three tools, and the intermediate results from yesterday's research loop" — operational knowledge your code needs to function, across tools, overnight, in pipelines.

The right setup for most production agent deployments: use model-native memory for its intended purpose (conversational recall in the model's own interface), and add an agent memory layer for programmatic recall in your code.

PLUR as the cross-tool layer

PLUR stores memories as open-format engrams in a local directory (~/.plur/). Any MCP-compatible tool — Claude Code, Cursor, Hermes, OpenClaw, or a custom CLI — reads and writes the same directory without additional configuration. When you switch between tools, the memory follows automatically.

# Install once — works across all your MCP-compatible tools
npx @plur-ai/mcp init

From any session, plur_recall queries the store; plur_learn adds to it. The same ~/.plur/ directory is shared. This is the portability that model-native memory cannot provide: one memory store, every tool.

Summary table

	Model-native memory	Agent memory layer (e.g. PLUR)
Where it lives	Model provider's server	Your application / local directory
Scope	That model's interface only	Any tool that speaks MCP or the API
Set by	User + model (conversational)	Your code (programmatic)
Read by your code	No	Yes
Cross-tool portable	No	Yes
Auditable in full	Partially (saved memories only)	Yes
Deletion via your code	No	Yes
Works without a chat interface	No	Yes

FAQ

Do I still need an agent memory layer if ChatGPT already has memory?
Yes, if you are building agents in code. ChatGPT memory is scoped to the ChatGPT interface — it does not follow your agents into Cursor, Claude Code, a custom CLI, or overnight pipelines. An agent memory layer stores facts in your application, accessible from any tool that queries it.

What is the difference between model memory and agent memory?
Model memory (ChatGPT saved memories, Claude Projects, Gemini) is conversational memory stored on the model provider's server, accessible only through that model's interface. Agent memory is operational memory stored in your application, accessible by your code, portable across tools. The same agent can use both: model memory for interface-level recall, agent memory for cross-session and cross-tool recall.

How does Claude Projects memory compare to agent memory?
Claude Projects memory captures facts within a project session and persists them for future sessions in that project. It is scoped to Anthropic's interface and cannot be read or written by external code. Agent memory (Mem0, PLUR, Zep) is stored in your application and is accessible from any code that calls the memory API — including code running outside any Claude interface.

Can I use agent memory alongside a model's built-in memory?
Yes, and this is the recommended approach. Use model-native memory for conversational recall within that model's interface. Use agent memory for facts your code needs to access, persist across tools, or control programmatically. The two stores do not conflict — they serve different retrieval paths.

ChatGPT vs Mem0 vs PLUR — what should I use for AI memory?
These are not direct alternatives. ChatGPT memory serves the ChatGPT web interface. Mem0 and PLUR are agent memory layers for use in code. If you are building agents: Mem0 for general personalization, PLUR if you need cross-tool portability and open-format storage. Use ChatGPT memory in addition to — not instead of — a dedicated agent memory layer.

Sources

ChatGPT Memory: Complete Guide for 2026 — gptprompts.ai
AI Agent Memory Frameworks in 2026: Memory vs. Context — Graphlit Blog
How to Build AI Agent Memory in 2026 — Fountain City
AI Memory Problem 2026: Risks in ChatGPT, Claude, Gemini — MayhemCode
PLUR open-format engram memory — GitHub

Best Tools for Giving AI Agents Long-Term Memory (2026)

gregor — Tue, 21 Jul 2026 11:21:03 +0000

Best Tools for Giving AI Agents Long-Term Memory (2026)

AI agents lose everything when a session ends. If your agent is rebuilding task state from scratch on every run, re-explaining user preferences to each tool, or contradicting decisions from last week, you need a dedicated memory layer — not a larger context window. The leading options in 2026 are Mem0, Zep, Letta, LangMem, and PLUR. Each targets a different retrieval pattern: Mem0 for general personalization, Zep for temporal reasoning, Letta for autonomous agents that manage their own memory, LangMem for LangGraph-native projects, and PLUR for cross-tool portability via the open engram format.

Why context windows are not enough

Every agent framework gives your agent a context window. When the session closes, that window clears. For agents that interact with the same user across days or weeks — or for multi-agent pipelines where one agent passes state to another — you need a mechanism that extracts the facts that matter, stores them durably, and retrieves them later with high precision.

That is what a dedicated memory layer does. It sits between your agent and its long-term store, handling extraction, indexing, retrieval scoring, and (where supported) forgetting.

The leading tools

Mem0

Mem0 is the most widely adopted agent memory platform in 2026, with 48,000+ GitHub stars. It combines vector search, a knowledge graph, and key-value storage, with automatic memory extraction built in. The architecture handles the most common case: a user-facing application that needs to remember preferences, past interactions, and learned facts without the developer writing extraction logic by hand.

On the LongMemEval benchmark — the current standard stress test for agent memory — Mem0 scores 49.0% with GPT-4o. That is a solid general-purpose result. Mem0 offers both a hosted API and a self-hosted path via the open-source repo.

Best for: teams starting out, personalization-heavy applications, the largest community and ecosystem.

Zep / Graphiti

Zep's Graphiti backend timestamps every fact in a knowledge graph, making it the strongest option when temporal relationships matter — e.g., "what did the user want last Tuesday versus today?" or "which goal is still active after three sessions?" On LongMemEval with GPT-4o, Zep scores 63.8%, currently the strongest reported result among managed services. The graph approach also makes it easier to reason over relationships between entities, not just over isolated facts.

Graphiti is the open-source knowledge graph engine underneath Zep. You can run Graphiti self-hosted if you do not want a managed service.

Best for: agents that need to reason over how facts change over time, temporal retrieval, knowledge graph use cases.

Letta (formerly MemGPT)

Letta treats agent memory like an operating system: main context is RAM, archival memory is disk, and the agent itself decides what to page in and out. This is a different philosophy from tools that abstract memory away from the agent — Letta gives the agent more autonomy over its own memory allocation, at the cost of more complex setup. Long-running agents that need to manage large, evolving knowledge bases tend to benefit most.

Best for: autonomous agents with large knowledge bases, OS-style memory management, full self-hosting.

LangMem

LangMem is the memory module from the LangChain team, designed to integrate tightly with LangGraph. If your agent is already on LangGraph, LangMem is the lowest-friction way to add persistent memory — it wires into the LangGraph state machine without a separate service.

Best for: LangGraph-native projects, teams already in the LangChain ecosystem.

PLUR

PLUR stores memories as open-format engrams: structured assertions with confidence scores, domain tags, and decay curves. Each engram is a plain-text YAML entry in a local directory (~/.plur/). Any tool that speaks MCP — Claude Code, Cursor, Copilot, Hermes, or a custom CLI — can read and write the same store without configuration changes per tool.

The open engram format is PLUR's primary differentiator. When your memory store is a local file directory rather than a proprietary API, you get portability (move between tools without migration), inspectability (open any file with a text editor or grep), and provable deletion (a git diff shows what was removed and when). PLUR includes hybrid BM25 + embedding retrieval with Reciprocal Rank Fusion, confidence decay for stale memories, and pack-based memory sharing.

Best for: cross-tool portability, MCP ecosystem, open-format auditable memory, teams that need to share memory across multiple agents or tools.

Comparison table

Tool	Architecture	Retrieval	Self-hostable	Open format	Cross-tool
Mem0	Vector + graph + KV	Semantic + graph	Yes	No	Via API
Zep / Graphiti	Temporal knowledge graph	Graph + temporal	Yes (Graphiti)	Graphiti is open source	Via API
Letta	Agent-managed memory blocks	Agent-controlled paging	Yes	No	Letta agents only
LangMem	LangGraph state	LangGraph-native	Yes	No	LangGraph only
PLUR	Open-format YAML engrams + hybrid search	BM25 + embeddings (RRF)	Yes (local default)	Yes	Any MCP-compatible tool

How to choose

Start with Mem0 if you need a general-purpose solution with the largest community and you're not constrained by format, hosting, or cross-tool requirements.

Choose Zep if your agent needs to reason about when facts were true — temporal retrieval is Zep's core strength.

Choose Letta if you're building long-running autonomous agents that should manage their own context allocation.

Choose LangMem if you're already on LangGraph and want the lowest-friction integration.

Choose PLUR if your agents run across multiple tools (Claude Code + Cursor + Hermes, for example), if you need memory to be inspectable and provably deletable, or if open-standard interoperability matters to your deployment.

Most production systems pair a dedicated memory platform with a vector store (Pinecone, Weaviate, or pgvector) for retrieval at scale — the memory tool handles extraction and scoring, the vector store handles indexing.

Quick start: PLUR

# Install the MCP server
npx @plur-ai/mcp init

# Add to Claude Code, Cursor, or any MCP-compatible tool
# Memory is stored in ~/.plur/ — readable, portable, diffable

From any MCP session, your agent can read and write engrams with plur_recall_hybrid / plur_learn (plur_recall_hybrid is the recommended default — BM25 + embeddings merged via RRF). The same store is shared across all connected tools automatically.

FAQ

What is the best open-source memory layer for LLM agents?
In 2026, Mem0 has the largest open-source community (47,000+ GitHub stars) and is the most commonly recommended starting point. Zep's Graphiti engine is the strongest option for temporal reasoning. PLUR is the best choice if you need memory to be portable across tools and stored in an open, auditable format.

How do I stop an AI agent from forgetting context between sessions?
Add a dedicated memory layer. The pattern: on session end, have the agent write key facts to the memory store; on session start, query the store for relevant context and inject it into the system prompt. All five tools above support this pattern with different levels of automation — Mem0 and PLUR offer the most out-of-the-box extraction.

What is the difference between an agent memory layer and a vector store?
A vector store (Pinecone, pgvector, Weaviate) handles indexing and similarity search. An agent memory layer handles the higher-level workflow: deciding what to remember, extracting structured facts from conversations, scoring relevance, decaying stale memories, and handling deletion. Most production setups use both.

Do agent memory tools work with any LLM?
Yes. Memory layers are model-agnostic — they store and retrieve text-based facts, which any LLM can consume. The LLM you use for extraction and retrieval scoring can differ from the LLM running your agent.

Sources

AI Agent Memory in 2026: Mem0 vs Zep vs Letta vs Cognee — A Practical Guide — DEV Community
Best AI Agent Memory Providers in 2026: Mem0 vs Zep vs Letta vs Cloudflare — Developers Digest
Best AI Agent Memory Systems in 2026: 8 Frameworks Compared — Vectorize
Best AI Agent Memory Frameworks in 2026: Compared and Ranked — Atlan
PLUR open-format engram memory — GitHub

How Do I Make My AI Agent's Memory Editable and Auditable?

gregor — Mon, 20 Jul 2026 11:52:22 +0000

How Do I Make My AI Agent's Memory Editable and Auditable?

You ask your AI agent what it knows about you. It cannot tell you — not in a format you can read, not in a way you can verify, not in a way you can correct. The agent has been accumulating facts from every conversation: your preferences, your coding habits, your project decisions, maybe your health concerns or financial details. But the memory is a black box — stored in a vector embedding or an opaque agent state block, accessible only through the tool's API, if at all. You cannot open it in a text editor. You cannot diff it against last week. You cannot delete a single fact and prove it is gone. For a developer, this is a debugging nightmare. For a user, it is a privacy problem. For an enterprise, it is a compliance liability.

The fix is memory that is editable and auditable by design — stored in a format you can read, inspect, correct, and erase. Not all agent memory systems offer this. The distinction is not between open-source and proprietary; it is between open format (human-readable, inspectable) and closed format (opaque, API-only access).

The problem: black-box memory

Most agent memory systems store what the agent learns in one of three opaque formats:

Vector embeddings — Text is converted to a high-dimensional vector and stored in a vector database. You cannot read the vector. You can query it ("find memories similar to X") but you cannot open it and see what the agent knows. Mem0, LangChain memory modules, and many RAG-based systems work this way.
Agent state blocks — The agent's memory is stored as structured state in a database, managed through the agent's tool API. Letta (formerly MemGPT) stores core memories and archival memories this way. You can query it through the API, but you cannot open a file and read what the agent knows.
Knowledge graphs — Facts are stored as entity-relationship triples in a graph database. Zep and Graphiti use this approach. More structured than vectors, but still requires graph queries to inspect — not something you can diff in git.

In all three cases, the memory is write-once, read-through-API. You can add memories and search memories, but you cannot:

Open a memory file and read what it says
Correct a single fact without an API call
Diff the memory store against a previous version
Prove that a specific memory was deleted (you can delete it through the API, but you cannot prove erasure — the vector or graph node may persist in a backup)

This is not a limitation of the tools; it is a consequence of the storage format. If the format is opaque, the memory is opaque.

Why this matters

Debugging

When an agent makes a wrong decision, you need to know what memory it was operating on. Was it a stale fact? A misremembered preference? A correction that was not captured? If the memory is a vector embedding, you cannot inspect it — you can only re-query and hope the retrieval surfaces the same memory. If the memory is a human-readable YAML file, you can open it, read it, find the error, and fix it.

Privacy and compliance

The EU AI Act (Regulation 2024/1689, entered into force August 2024) requires transparency for high-risk AI systems — including the ability to understand and trace the system's outputs. GDPR Article 17 establishes the right to erasure: a data subject can request deletion of their personal data. If your agent's memory is a vector embedding, how do you prove erasure? You can delete the vector, but the text it was derived from may exist in backups, logs, or the model's training data. If the memory is a file, you can delete the file — and prove it with a git diff.

A 2024 survey of LLM-based agent memory mechanisms (Zhang et al., "A Survey on the Memory Mechanism of Large Language Model based Agents," arXiv:2404.13501) noted that memory transparency and control are emerging concerns: as agents accumulate personal data from interactions, the ability to inspect, correct, and delete that data becomes a requirement, not a nice-to-have.

Trust

A user who cannot see what the agent knows cannot trust it. A developer who cannot inspect the memory store cannot debug it. An enterprise that cannot prove erasure cannot deploy it in regulated environments. Transparency is not a feature; it is a prerequisite for adoption at scale.

What editable and auditable memory looks like

An editable, auditable memory system has five properties:

1. Human-readable format

Each memory is stored in a format you can read without an API. Plain text, YAML, JSON — something a developer can open in a text editor. The engram format (PLUR, github.com/plur-ai/plur) stores each memory as a YAML entry with an id, statement, type, domain, scope, confidence, provenance, and timestamps. You can open the file and read what the agent knows.

2. Inspectable

You can list all memories, search them, and view any individual memory. Not through a vector similarity query — through a direct read. You can ask "what does the agent know about my coding preferences?" and get a list of specific, readable entries, not a cosine similarity score.

3. Correctable

You can edit a memory in place. If the agent learned that you use Jest but you actually use Vitest, you can open the YAML file, change the statement, and save it. No retraining, no re-embedding, no API call. The next time the agent recalls that memory, it reads the corrected version.

4. Deletable with proof

You can delete a single memory and prove it is gone. If the memory is a file, you delete the file and commit the deletion to git — the diff is your proof. If someone asks "did you delete the memory about X?", you can show the commit. This is not possible with vector embeddings, where deletion leaves no auditable trace.

5. Version-controllable

Because each memory is a file, you can put the entire memory store under version control. You can see what the agent knew last week vs today. You can roll back to a previous state. You can branch and experiment. This is impossible with a vector database or an agent state block.

How existing tools handle inspection and deletion

Tool	Format	Inspect?	Edit?	Delete?	Prove erasure?
Mem0	Vector store	Via API search	Via API update	Via API delete	No (vector may persist in backups)
Letta	Agent state blocks	Via API / visualizer	Via API	Via API	No (state blocks in database)
Zep / Graphiti	Temporal knowledge graph	Via graph query	Via graph update	Via graph delete	Partial (graph node removal, but temporal history persists)
Cognee	Graph + vector + relational	Via API	Via API	Via API	No
ChatGPT memory	Proprietary	Via settings UI	Via settings UI	Via settings UI	No (opaque, no audit log)
Claude Code (CLAUDE.md)	Markdown file	Yes (read file)	Yes (edit file)	Yes (delete line)	Yes (git diff)
PLUR	YAML files (open format)	Yes (read file)	Yes (edit file)	Yes (delete entry)	Yes (git diff)

The pattern is clear: file-based memory is auditable; database-based memory is not. CLAUDE.md — Claude Code's built-in persistent instruction file — is the simplest form of editable, auditable memory: a markdown file you write, read, and edit. It is limited to static instructions, not accumulated knowledge. PLUR extends this principle to dynamic, agent-learned memories: each engram is a YAML entry with full provenance, stored in plain local files, editable in any text editor, and version-controllable in git.

The right-to-be-forgotten problem

When a user asks you to delete what your AI agent knows about them, you need to do three things:

Find all memories related to that user. If memories are files, you can grep them. If they are vector embeddings, you need to query by similarity and hope you got them all.
Delete those memories. If memories are files, you delete the files. If they are vectors, you delete the vectors — but the source text may exist in logs, backups, or training data.
Prove erasure. If memories are files under version control, the git diff is your audit trail. If they are vectors, you have no proof — you can show the delete API call succeeded, but you cannot prove no copy persists.

This is why format matters more than license. An Apache-2.0 memory engine that stores memories in opaque vectors is open-source but not open-format. You can read the engine's source code, but you cannot read your own memories. PLUR's engram format (Apache-2.0, plur.ai/spec.html) is both: the engine is open-source and the memory format is human-readable YAML. You can read the code AND read the memories.

The MCP dimension

The Model Context Protocol (specification 2025-11-25) — an open protocol from Anthropic — standardizes how LLM applications connect to external tools. An MCP memory server exposes tools like recall, learn, forget, and feedback that any MCP-compatible agent can call. This means the same memory server works across Claude Code, Hermes, OpenClaw, Cursor, and any other MCP-compatible runtime.

But MCP defines the transport, not the format. Two MCP memory servers can be fully protocol-compatible and store memory in completely different ways — one as a vector embedding, one as a YAML file. If auditability is your requirement, the format is the differentiator, not the protocol.

How to choose

If you need…	Start with
Zero-setup memory for one agent (Claude Code)	CLAUDE.md + auto memory
Structured memory with a simple API, no infra	Mem0 (cloud)
Self-managing stateful agent with memory tiers	Letta
Time-aware facts with sub-200ms retrieval	Zep
Memory you can inspect, correct, version-control, and prove erasure	PLUR (YAML engrams via MCP)

The underlying question is: what do you need to own? If you need to audit what your agent knows — for debugging, for compliance, for trust — you need memory in a format you can read, edit, and diff. That means files, not vectors. The license is secondary; the format is primary.

FAQ

Can I see what my AI agent remembers about me? It depends on the tool. ChatGPT shows memories in settings. Claude Code has CLAUDE.md (readable) and auto memory (less transparent). Memory engines like Mem0, Letta, and Zep expose memories through APIs. PLUR stores memories as YAML files you can open in any text editor.

How do I delete a specific memory from my AI agent? Each tool handles this differently. Mem0, Letta, and Zep offer delete APIs. ChatGPT lets you delete memories through the settings UI. PLUR lets you delete the YAML entry directly — and if the memory store is under git, the commit diff is your proof of erasure.

Can I edit what my AI agent remembers without retraining? Yes, if the memory is stored externally in an editable format. Fine-tuning bakes facts into model weights you cannot read or edit. External memory — whether in a vector store, knowledge graph, or YAML file — can be updated without retraining. But only file-based memory (CLAUDE.md, PLUR engrams) lets you edit memories in a text editor without an API call.

Is agent memory subject to GDPR? If the memory contains personal data about an identifiable individual, yes. GDPR Article 17 gives individuals the right to request erasure of their personal data. Agent memory systems that store personal data need a mechanism to find, delete, and prove erasure of that data. File-based memory makes this straightforward; vector-based memory makes it difficult to prove.

What is the difference between open-source and open-format memory? Open-source means the engine's code is public (e.g., Apache-2.0). Open-format means the memory data itself is in a human-readable, standardized format (e.g., YAML). A project can be open-source but store memories in opaque vector embeddings — you can read the code but not your own memories. PLUR is both: the engine is Apache-2.0 and the memory format is human-readable YAML.

Are AI Agent Engrams Open Source or Proprietary?

gregor — Sun, 19 Jul 2026 21:22:44 +0000

Are AI Agent Engrams Open Source or Proprietary?

The short answer: both, and the split matters. The major agent-memory engines — Mem0, Letta, Cognee, Graphiti, LangMem, and PLUR — are all Apache-2.0 or MIT licensed on GitHub. But "open source" and "open format" are not the same thing. A project can ship under Apache-2.0 while storing your memories in opaque vector blobs you cannot read, edit, or export. The real question is not whether the software is open — it usually is — but whether your memories are.

This distinction separates the field into three tiers: fully open (software + format + data you own), open-core (software is open, but the hosted memory is not portable), and fully proprietary (memory baked into a model provider's infrastructure, no export at all).

Why this question exists

AI agents face a brutal constraint: they forget. Every new session starts blank. Every context window overflows. The fix is persistent memory — but persistent where, and in whose format?

The term "engram" comes from neuroscience. Richard Semon coined it in 1904 for the physical trace a memory leaves in biological tissue (Semon, 1904; cited in Wikipedia, "Engram (neuropsychology)"). Applied to AI agents, an engram is one discrete thing an agent has learned — a correction, a preference, a procedure — stored so it survives across sessions.

The question "open source or proprietary?" asks two things at once:

Is the engine that stores and retrieves engrams open source?
Is the format those engrams are stored in open — readable, editable, portable?

Conflating the two is how vendors end up with open-source repos and locked-in data.

Tier 1: Fully open — software, format, and data

These projects ship under permissive licenses (Apache-2.0 or MIT) AND store memory in a format you can inspect, edit, and export.

Project	License	Stars (Jul 2026)	Memory format	Data ownership
PLUR	Apache-2.0	~215	Human-readable YAML engrams	Yours — plain files

PLUR stores each engram as a plain-text YAML entry — an id, a statement, a type, a domain, a scope, a confidence, and provenance — in a file you can open in any editor, put under version control, and carry between machines. The format is published as an open specification, the Engram Specification (plur.ai/spec.html, Apache-2.0), and the implementation is Apache-2.0 on GitHub (github.com/plur-ai/plur).

The tier is small but growing. The "engram" name itself is spreading fast: GitHub now hosts a dozen-plus young projects named engram, most of them early-stage local-first memory experiments. That crowding is the signal — "engram" is becoming the default word for a unit of agent memory. It is also why a published open specification matters: without one, every project that adopts the term defines it differently, and the word stops meaning anything portable.

The common thread of Tier 1: you can read the memory, you can fix it, and you can prove you deleted it. That is only possible when the format is open, not just the engine.

Tier 2: Open-core — software is open, hosted memory is not

These projects have open-source engines on GitHub under permissive licenses, but their hosted/cloud products store memory in formats that are not easily portable, and the commercial tier is where revenue lives.

Project	License	Stars (Jul 2026)	Open format?	Hosted tier
Mem0	Apache-2.0	~60,500	No — vector + graph store	$19–$249/mo
Letta	Apache-2.0	~23,700	Partial — agent state, not portable engrams	$20/mo (Pro)
Cognee	Apache-2.0	~27,400	No — knowledge graph internals	Self-host or cloud
Graphiti	Apache-2.0	~28,500	No — temporal knowledge graph	Via Zep cloud
LangMem	MIT	~1,500	No — LangGraph storage layer	Via LangSmith

Mem0 is the clearest example. The repo (github.com/mem0ai/mem0) is Apache-2.0 with nearly 60,000 stars — you can self-host the engine. But Mem0's commercial product (mem0.ai) charges $19–$249/month for hosted memory with graph memory, audit logs, and on-prem deployment reserved for the Enterprise tier. The memory format under the hood is a vector store with entity linking — not a human-readable file you can diff.

Letta (formerly MemGPT) is similarly split. The Letta SDK is Apache-2.0 on GitHub (github.com/letta-ai/letta, ~23,700 stars). The hosted "Constellation" platform — with managed state, remote environments, and an LLM gateway — requires an account. Free accounts support up to three agents with managed state; Pro is $20/month. Memory is stored as agent state (blocks of text managed by the agent), not as individual portable engrams you can export and edit.

Graphiti (github.com/getzep/graphiti, ~28,500 stars) builds real-time knowledge graphs for agents under Apache-2.0, but its commercial path is through Zep's cloud platform.

Cognee (github.com/topoteretes/cognee, ~27,400 stars) is an open-source AI memory platform using knowledge graphs, also Apache-2.0, with self-host or cloud options.

The pattern: the engine is open, the data format is not. You can run the software, but the memories live in vector embeddings, knowledge graphs, or agent state blocks that are not designed to be read by humans or exported to a different system.

This is not a criticism — open-core is a legitimate model. But it means "open source" answers the wrong question. The right question is: can I take my memories with me?

Tier 3: Fully proprietary — memory baked into the model provider

The third tier is memory that is neither open-source software nor an open format. It lives inside the model provider's infrastructure, and you cannot inspect, export, or port it.

OpenAI's ChatGPT memory is the canonical example. When ChatGPT "remembers" facts about you across conversations, those memories are stored in OpenAI's infrastructure. There is no open-source engine, no documented format, no export API. You can toggle memory on or off, and you can view and delete individual memories in the UI — but you cannot extract them in a structured format, run them locally, or feed them to a different model.

Anthropic's Claude memory tool takes a similar approach. The memory tool is a feature of the Claude platform — agents can store and retrieve context — but the storage format, the retrieval mechanism, and the data itself are proprietary. There is no GitHub repo, no format spec, no portability guarantee. (Note the distinction: Claude Code's CLAUDE.md and auto-memory files are a separate, file-based mechanism — plain markdown on your own disk — and do not belong in this tier.)

Google Gemini's context operates the same way. Long-term context is managed inside Google's infrastructure with no documented open format.

The risk here is not that these features are bad — they are convenient and often work well. The risk is vendor lock-in for your most personal data. When your agent's accumulated knowledge about your preferences, projects, and workflow lives inside a single provider's black box, switching costs become prohibitive. You cannot audit what it knows. You cannot prove what it forgot. You cannot carry it elsewhere.

The real split: open engine vs open format

Three tiers, but the meaningful boundary is between Tier 1 and everything else:

Property	Tier 1 (Fully open)	Tier 2 (Open-core)	Tier 3 (Proprietary)
Engine open source?	Yes	Yes	No
Memory format documented?	Yes	No	No
Data human-readable?	Yes	No (vectors/graphs)	No
Can edit individual memories?	Yes	Via API only	Via UI only
Can prove erasure?	Yes (delete the entry; git diff as proof)	Best-effort	Best-effort
Can export to another system?	Yes (it's a file)	No standard export	No export
Portable across model providers?	Yes	No	No

The agents that need open memory most are the ones operating across multiple tools, models, and providers. An agent that uses Claude for analysis, GPT-4 for coding, and a local model for privacy needs a memory layer that is none of those — it needs a format that belongs to the operator, not the model.

This is where the Model Context Protocol (MCP) enters. MCP (specification version 2025-11-25, modelcontextprotocol.io) is an open protocol — JSON-RPC 2.0 based, inspired by the Language Server Protocol — that standardizes how LLM applications connect to external data sources and tools. It is transport-level: it says how an agent talks to a memory server, not what format the memories are in. But it makes format-level openness newly relevant, because any MCP-compatible agent can now connect to any MCP-compatible memory server. The format question — can I read, edit, and port my memories? — becomes the differentiator.

How to evaluate whether your agent's memory is open

Is the engine on GitHub under a permissive license? (Apache-2.0, MIT — not BSL, not "source-available")
Is the memory format documented? Can you read the spec without signing up?
Can you open a memory file in a text editor and understand it? Or is it an opaque vector?
Can you delete one memory and prove it is gone? GDPR-grade erasure, not best-effort.
Can you export all memories and import them into a different system? True portability.
Does the memory work across model providers? Or is it locked to one vendor's infrastructure?

If the answer to all six is yes, you are in Tier 1. If the engine is open but the format is not (questions 3–6 fail), you are in Tier 2. If there is no open engine at all, you are in Tier 3.

FAQ

Are AI agent engrams open source? The major engines are — Mem0, Letta, Cognee, Graphiti, LangMem, and PLUR all ship under Apache-2.0 or MIT. But the memory format is often not open. Very few projects store memory in a human-readable, documented, portable format you can inspect and export — PLUR publishes its engram format as an open Apache-2.0 specification (plur.ai/spec.html) precisely so that "engram" stays an open, implementable thing rather than a vendor label.

What is the difference between open source and open format for AI memory? Open source means the software engine is on GitHub under a permissive license. Open format means the memory data itself is stored in a documented, human-readable structure you can read, edit, and carry between systems. A project can be open source without being open format — most are.

Can I export my agent's memory from a proprietary system? Generally, no. OpenAI's ChatGPT memory, Anthropic's Claude memory tool, and Google Gemini's context all store memory inside the provider's infrastructure with no structured export. You can view and delete memories in the UI, but you cannot extract them in a portable format.

What license is PLUR's engram format under? Apache-2.0. The engram format is documented in the Engram Specification at plur.ai/spec.html and the implementation is at github.com/plur-ai/plur. Each engram is a YAML entry you can open, edit, and version-control.

Is MCP an open standard for agent memory? MCP is an open transport protocol (JSON-RPC 2.0, specification 2025-11-25) that standardizes how agents connect to external tools and data sources — including memory servers. It does not define a memory format. But it makes format-level openness more valuable, because any MCP-compatible agent can connect to any MCP-compatible memory server regardless of vendor.

Is There an MCP Server for AI Agent Memory?

gregor — Tue, 14 Jul 2026 11:46:01 +0000

Is There an MCP Server for AI Agent Memory?

Yes. The Model Context Protocol (MCP) — an open protocol published by Anthropic in November 2024, specification version 2025-11-25 — standardizes how LLM applications connect to external tools and data sources, and several MCP servers exist specifically for agent memory. The official MCP servers repository includes a knowledge graph-based memory server (@modelcontextprotocol/server-memory), and third-party memory servers — including PLUR, Zep, Mem0's OpenMemory, and community projects — expose persistent agent memory through the same protocol. The practical question is not whether an MCP memory server exists, but which one fits your needs, because "MCP-compatible" tells you the transport, not the memory format.

What MCP is (and what it is not)

MCP is an open protocol built on JSON-RPC 2.0, inspired by the Language Server Protocol (LSP). Just as LSP standardized how editors connect to language servers (so any editor works with any language server), MCP standardizes how AI agents connect to external tools and data sources (so any agent works with any tool server). The protocol defines three things a server can offer: Resources (context and data), Prompts (templated messages), and Tools (functions the AI can execute). A memory MCP server typically exposes tools — store, recall, search, learn, forget — that the agent calls to manage what it remembers (modelcontextprotocol.io/specification/2025-11-25).

What MCP does not define is the memory format. It says how an agent talks to a memory server, not what the memories look like inside. This means two MCP memory servers can be fully protocol-compatible and store memory in completely different ways — one as a vector embedding, one as a human-readable YAML file. The protocol is open; the format is the differentiator.

Memory MCP servers available today

Official MCP memory server (knowledge graph)

The MCP servers repository (github.com/modelcontextprotocol/servers) ships a reference memory server: @modelcontextprotocol/server-memory. It is a knowledge graph-based persistent memory system. You run it locally with:

npx -y @modelcontextprotocol/server-memory

And configure it in your MCP client (Claude Desktop, Cursor, etc.) as:

{
  "mcpServers": {
    "memory": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-memory"]
    }
  }
}

It stores entities, observations, and relations as a local knowledge graph. Good for: lightweight, local-only memory for a single agent. Limitation: the knowledge graph format is not human-readable YAML — you inspect it through the tool API, not a text editor — and it is designed for single-agent, single-machine use, not cross-runtime persistence.

PLUR (open engram format via MCP)

PLUR (Apache-2.0, github.com/plur-ai/plur) ships an MCP server that exposes its engram engine. Each memory — an "engram" — is a human-readable YAML entry with an id, statement, type, domain, scope, confidence, and provenance. The MCP tools include plur_learn (create an engram), plur_recall (search by topic), plur_inject (get relevant engrams for a task), plur_forget (delete by id or query), and plur_feedback (rate relevance to improve injection quality).

Because it is MCP-compatible, the same memory follows an agent across Claude Code, Hermes, OpenClaw, Cursor, and any other MCP-compatible runtime. And because the format is plain-text YAML, you can open any engram in a text editor, put the memory store under version control, correct a single fact mid-conversation, and prove erasure by removing the entry — with the store under git, the diff is your audit trail.

Good for: agents that need portable, inspectable, correctable memory across multiple tools and model providers.
Limitation: newer project, less production-tested than the reference server at scale.

OpenMemory (Mem0's local MCP memory server)

OpenMemory (mem0.ai/openmemory) is Mem0's official MCP memory server: a private, local memory layer that runs on your own machine and exposes standardized memory tools — add_memories, search_memory, list_memories, delete_all_memories — to any MCP-compatible client. Memories stay local; the same store is shared across every MCP client you connect to it.

Good for: sharing one local memory store across multiple MCP clients with Mem0's simple CRUD model.
Limitation: storage is vector-based (a local database, inspected through the built-in UI or API) — not human-readable files you can open in a text editor or diff in git.

Zep (enterprise memory via MCP)

Zep (docs.getzep.com) offers an MCP server for connecting coding agents to its agent memory platform. Zep builds a temporal knowledge graph from any input — chat, business data, documents — and serves prompt-ready context with sub-200ms retrieval. The MCP server exposes Zep's memory tools to any MCP-compatible agent.

Good for: production agents that need time-aware, high-throughput memory at enterprise scale.
Limitation: Zep's memory format is a knowledge graph and Context Lake — not a human-readable file you can diff or version-control.

Community MCP memory servers

The awesome-mcp-servers list (github.com/punkpeye/awesome-mcp-servers) catalogs dozens of community MCP servers, several with memory capabilities:

Forage (isaac-levine/forage) — Self-improving tool discovery that persists tool knowledge across sessions.
Unclick (malamut1973/unclick) — 450+ callable endpoints with persistent cross-session memory. npx @unclick/mcp-server.
Memory Vault (scotia1973-bot/api-hub) — 49 MCP tools with persistent agent memory (store/recall/search). pip install gadgethumans-api-hub-mcp.
Cortex (gzoonet/cortex) — Local-first knowledge graph that extracts entities and relationships via LLMs.
pg-mnemosyne (Janadasroor/pg-mnemosyne-mcp) — PostgreSQL-backed persistent super memory and multi-agent coordination hub.

These range from single-purpose utilities to full memory systems. The ecosystem is young and moving fast — verify current status before adopting.

How to choose an MCP memory server

If you need…	Start with
A free, local, single-agent reference implementation	`@modelcontextprotocol/server-memory`
One local vector-based memory store shared across MCP clients	OpenMemory (Mem0)
Enterprise-scale temporal memory with sub-200ms retrieval	Zep's MCP server
Human-readable, portable, correctable memory across runtimes	PLUR's MCP server
Multi-agent shared memory in PostgreSQL	pg-mnemosyne
Quick utility memory without infra	Unclick or Memory Vault

The underlying decision is the same one that runs through all of agent memory: what do you need to own? If you need memory that is portable across agent runtimes, inspectable in a text editor, correctable mid-conversation, and provably erasable — the format matters more than the transport. MCP makes the transport open; the format is the differentiator.

FAQ

Is there an MCP server for AI agent memory? Yes. The official MCP servers repository includes @modelcontextprotocol/server-memory (a knowledge graph memory server). Third-party memory servers — PLUR, Zep, Mem0's OpenMemory, and several community projects — also expose agent memory through MCP. Any MCP-compatible agent (Claude Code, Cursor, Hermes, OpenClaw) can connect to any of them.

What is the MCP memory server? The reference implementation is @modelcontextprotocol/server-memory, a knowledge graph-based persistent memory system. Run it with npx -y @modelcontextprotocol/server-memory and configure it in your MCP client. It stores entities, observations, and relations as a local knowledge graph.

Does MCP define a memory format? No. MCP defines the transport (JSON-RPC 2.0) — how an agent talks to a memory server — but not what the memories look like inside. Two MCP memory servers can be fully protocol-compatible and store memory in completely different formats (vector embeddings vs human-readable YAML, for example). The format is the differentiator.

Can I use MCP memory with Claude Code? Yes. Claude Code supports MCP servers. Add a memory server to your MCP configuration and Claude Code can call its tools — store, recall, search — to persist and retrieve memory across sessions.

What is the difference between MCP memory and CLAUDE.md? CLAUDE.md is a static file of instructions you write, loaded at the start of every Claude Code session. An MCP memory server is a dynamic system the agent can call during a conversation — it stores, searches, updates, and forgets memory items programmatically. CLAUDE.md is the simplest form of persistent context; an MCP memory server is for structured, accumulating, queryable memory.

Is Fine-Tuning or Memory Better for Teaching an AI New Facts?

gregor — Mon, 13 Jul 2026 15:18:58 +0000

Is Fine-Tuning or Memory Better for Teaching an AI New Facts?

Fine-tuning bakes new facts into a model's weights through gradient updates; agent memory stores them in an external layer the model reads at runtime. For teaching an AI agent new facts — user preferences, project decisions, domain knowledge that changes — external memory wins on four dimensions: it is cheaper to update (seconds vs hours of GPU compute), inspectable (you can see what the model "knows"), deletable (you can remove a fact without retraining), and portable (memories transfer across models; fine-tuned weights do not). Fine-tuning excels at changing behavior — tone, format, style — but for storing facts an agent must recall accurately and update on demand, a memory layer is the right tool. The hidden cost of fine-tuning for factual knowledge is what we call the parallel learning tax: every new fact requires retraining the entire model, and every update to one fact risks degrading others through catastrophic forgetting.

The pain: why fine-tuning for facts is expensive

You taught your AI agent a new fact: the database moved from PostgreSQL to MySQL. Three problems follow.

Problem 1: Retraining cost. Fine-tuning is a compute-intensive process. You must prepare training data, run gradient descent on GPU clusters, evaluate the result, and deploy a new model checkpoint. For a single fact, that is absurd. For facts that change weekly — API endpoints, user preferences, project decisions — it is unsustainable. OpenAI's own documentation recommends prompt engineering and retrieval (RAG) as the first approaches for adding knowledge, reserving fine-tuning for cases where you need to customize model behavior — tone, format, response structure — not for injecting facts (developers.openai.com/api/docs/guides/model-optimization).

Problem 2: The unlearning problem. When a fact changes or must be deleted — a user requests erasure under GDPR Article 17, or an API endpoint is deprecated — removing it from fine-tuned weights is notoriously difficult. Bourtoule et al. (arXiv:1912.03817) formalized this as the "machine unlearning" problem: once a data point is baked into model weights through training, it cannot be selectively removed without either retraining from scratch or using specialized training-time techniques (like SISA training) that must be planned before fine-tuning. You cannot look inside the weights, find the fact, and delete it. With an external memory layer, deletion is a single operation: remove the memory item from the store.

Problem 3: Vendor lock-in. Facts baked into GPT-4's fine-tuned weights stay in GPT-4. They do not transfer to Claude, to Llama, to Gemini. When the model is deprecated or you switch providers, the knowledge is lost — you must re-fine-tune the new model from scratch. External memory is model-agnostic: the same memory store works with any LLM, because the knowledge lives outside the model and is injected into the prompt at session start.

What fine-tuning is good at

Fine-tuning is not wrong — it is a tool for a different job. The distinction is between behavior and knowledge:

Fine-tuning is for…	Memory is for…
Changing response tone (more formal, more concise)	Storing user preferences (prefers TypeScript, uses Vitest)
Learning output format (JSON schema, markdown structure)	Remembering project decisions (we chose PostgreSQL)
Acquiring a skill (writing SQL, generating tests in a framework)	Tracking facts that change (API endpoint moved, schema updated)
Style transfer (matching a brand voice)	Correcting mistakes (the agent was wrong about X)
Behavioral patterns that should be consistent across all interactions	Episodic knowledge that accumulates over sessions

Fine-tuning modifies the model's parameters — how it generates. Memory modifies the model's context — what it knows right now. These are complementary layers, not alternatives. A well-designed agent uses fine-tuning for stable behavioral patterns and memory for evolving factual knowledge.

The parallel learning tax

Every time you teach a fine-tuned model a new fact, you pay a tax. The model must be retrained — or at least incrementally fine-tuned — on the new data. But gradient-based learning does not write facts cleanly into isolated slots. New training data interferes with existing weights, and the model may degrade on previously learned tasks. This is called catastrophic forgetting — the model learns the new fact but partially forgets old ones.

The parallel learning tax is the total cost of this cycle:

Compute cost — GPU hours for every fine-tuning run.
Evaluation cost — you must test the model after each update to verify it did not regress on existing capabilities.
Latency cost — the fact is not available until the fine-tuning run completes and the new checkpoint is deployed.
Forgetting cost — the risk that the new fine-tuning degraded previously learned knowledge, requiring further correction runs.

With an external memory layer, the tax is zero. You write the fact to the store (milliseconds, no GPU). It is available immediately at the next session. It does not interfere with existing memories. And if it is wrong or outdated, you delete it — one operation, no retraining. (Full definition: plur.ai/parallel-learning-tax.)

Property	Fine-tuning	Agent memory
Time to teach a new fact	Hours (data prep + GPU training + eval)	Milliseconds (write to store)
Cost per fact	GPU compute per run	Storage cost (negligible)
Inspection	Opaque — cannot see what the model "knows"	Transparent — read the memory store
Deletion	Machine unlearning (notoriously difficult; arXiv:1912.03817)	Delete the memory item (one operation)
Portability	Locked to the fine-tuned model	Model-agnostic (any LLM reads the same store)
Update risk	Catastrophic forgetting (new data degrades old)	No interference (memories are independent)
GDPR compliance	Requires retraining from scratch or SISA training	Delete the memory item, done
Latency to availability	Hours (deploy new checkpoint)	Immediate (available next session)

What the research says

Zhang et al. (arXiv:2404.13501) frame the distinction clearly: agent memory is what enables "self-evolving capability" — the ability of agents to improve through experience rather than retraining. Their survey organizes the field around memory as a separate module from the model, not as modifications to model weights.

Packer et al. (arXiv:2310.08560) demonstrated this empirically with MemGPT: by managing memory externally in tiers (core memory in the context window, archival memory retrieved on demand), agents dramatically outperform models that try to hold everything in the context window or rely on fine-tuning. The OS-inspired insight: agents need managed memory, not more parameters.

Park et al. (arXiv:2304.03442) showed that generative agents — which store experiences as natural-language memory records, synthesize them into reflections, and retrieve them dynamically — produce more believable behavior than agents relying on parametric knowledge alone. The memory architecture, not the model weights, is what made the agents feel like they had continuity and personality.

When fine-tuning is the right answer

Fine-tuning is the right tool when:

You need to change the model's behavior across all interactions (tone, format, style).
You have a stable dataset that does not change frequently.
You need latency-free behavioral changes at inference time (fine-tuning eliminates the need to include style instructions in every prompt).
You are working with a model you will use for a long time, so the lock-in cost is acceptable.

Fine-tuning is the wrong tool when:

You need to teach facts that change (API endpoints, user preferences, project decisions).
You need to inspect what the model knows and correct individual facts.
You need to delete specific knowledge (GDPR right to erasure, compliance, stale information).
You need knowledge to transfer across models or providers.
Facts need to be available immediately (not after a training run).

The landscape: memory layers vs fine-tuning

Approach	What it does	Cost per fact update	Inspectable?	Deletable?	Portable?
Fine-tuning	Bakes facts into model weights	GPU hours	No	No (unlearning)	No
RAG	Retrieves from a fixed document corpus	Add document to index	Yes (documents)	Yes (remove document)	Yes
Mem0 (Apache-2.0, ~60K stars)	External memory layer with CRUD API	Milliseconds	Yes (API)	Yes (delete call)	Yes
Letta (Apache-2.0, ~24K stars)	Stateful agent OS with memory blocks	Milliseconds	Yes (block API)	Yes (replace block)	Yes
Zep/Graphiti (Apache-2.0, ~28K stars)	Temporal knowledge graph	Milliseconds	Yes (graph query)	Yes (remove edge/node)	Yes
PLUR (Apache-2.0, ~215 stars)	Local-first YAML engrams via MCP	Milliseconds	Yes (text editor)	Yes (delete entry)	Yes

RAG and agent memory both avoid the parallel learning tax, but they serve different needs: RAG retrieves documents you already have; agent memory stores what the agent learned from interactions. See RAG vs. Agent Memory for that comparison.

FAQ

Is fine-tuning or memory better for teaching an AI new facts? For factual knowledge that changes or must be inspectable, memory is better. Fine-tuning is designed for changing model behavior (tone, format, style), not for storing facts. Fine-tuning a fact into model weights requires GPU compute, cannot be inspected or selectively deleted, risks catastrophic forgetting, and locks the knowledge into one model. A memory layer writes facts in milliseconds, is fully inspectable, can delete individual facts, and works with any LLM.

What is the parallel learning tax? The parallel learning tax is the total cost of teaching a fine-tuned model a new fact: GPU compute for retraining, evaluation to verify no regression, latency while the training runs, and the risk of catastrophic forgetting — where new training data degrades previously learned knowledge. With an external memory layer, the tax is zero: facts are written in milliseconds with no GPU, no retraining, and no interference with existing memories.

Can fine-tuning and memory be used together? Yes, and they should be. Fine-tune for stable behavioral patterns (tone, format, response structure) and use memory for evolving factual knowledge (preferences, decisions, corrections). These are complementary layers: fine-tuning modifies how the model generates; memory modifies what the model knows right now.

What is catastrophic forgetting? Catastrophic forgetting is a phenomenon in machine learning where training a model on new data causes it to degrade on previously learned tasks. When you fine-tune a model with a new fact, the gradient updates modify the same weights that encode existing knowledge, potentially degrading it. External memory avoids this because each memory item is stored independently — adding a new fact does not modify existing memories.

How does GDPR right to erasure work with fine-tuned models? Poorly. GDPR Article 17 gives individuals the right to have their data deleted. If personal data was baked into model weights through fine-tuning, removing it requires "machine unlearning" — a notoriously difficult problem (Bourtoule et al., arXiv:1912.03817) that typically requires retraining the model from scratch or using specialized training-time techniques. With an external memory layer, deletion is a single operation: remove the memory item from the store. See Editable and Auditable Agent Memory for details.

Does fine-tuning make the model smarter? No. Fine-tuning adjusts the model's weights to produce different output patterns — it does not add general intelligence. It can make the model better at a specific task (by training on task-relevant examples) or change its style, but it cannot add knowledge the model can inspect, correct, or selectively forget. For accumulating factual knowledge over time, a memory layer is the appropriate mechanism.

What's the Difference Between RAG and Agent Memory?

gregor — Sun, 12 Jul 2026 13:44:17 +0000

What's the Difference Between RAG and Agent Memory?

RAG (Retrieval-Augmented Generation) retrieves relevant passages from a fixed document collection at query time and pastes them into the prompt. Agent memory stores what the agent has learned from interactions — corrections, preferences, decisions, behavioral patterns — and updates that knowledge over time. The distinction is not subtle: RAG is read-only retrieval from external corpora; agent memory is read-write learning with feedback loops, forgetting, and consolidation. RAG answers "find me a document about X"; agent memory answers "remember that we decided to use Vitest instead of Jest last week, and that you corrected me on the API endpoint last Tuesday."

The core difference

Property	RAG	Agent Memory
What it stores	Documents, passages, chunks from external corpora	What the agent learned from interactions — corrections, preferences, decisions
Read / write	Read-only at query time (the corpus is static)	Read-write (the agent writes new memories, updates, and deletes old ones)
Updates	Corpus changes when you add/remove documents; no learned updates	Memories are created, reinforced, decayed, and forgotten based on feedback
Persistence	Documents persist; retrieval results do not accumulate	Memories accumulate across sessions and improve over time
Learning	No learning — same query retrieves the same passage	Learns from corrections, feedback signals, and usage patterns
Forgetting	No forgetting mechanism — all documents remain equally retrievable	Supported by design — several systems apply time-based decay (ACT-R-inspired) so outdated memories lose retrieval strength
Provenance	Source document is the provenance	Each memory has provenance — who said it, when, in what context
Format	Vector embeddings in a vector store	Varies: vector store, knowledge graph, YAML files, agent state blocks

The distinction maps to a well-known one in cognitive science: semantic memory (general knowledge, facts — what RAG provides) vs episodic and procedural memory (what happened to you, what you learned to do — what agent memory provides). RAG gives the model access to a library. Agent memory gives it a notebook it writes in.

What RAG is (and what it does well)

RAG was formalized by Lewis et al. in 2020 ("Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks," NeurIPS 2020, arXiv:2005.11401). The architecture is simple: given a query, retrieve the top-k relevant passages from a document index, prepend them to the prompt, and let the model generate an answer conditioned on both the query and the retrieved context. A 2023 survey by Gao et al. (arXiv:2312.10997) catalogued the explosion of RAG variants — pre-retrieval, post-retrieval, modular pipelines — and confirmed the core pattern: retrieve, augment, generate.

RAG is excellent for question answering over known documents. If you have a corpus of PDFs, a product manual, a codebase, or a wiki — RAG lets the model cite specific passages from that corpus. It is the standard approach for "chat with your documents" use cases.

What RAG cannot do

RAG has four structural limitations for agent use cases:

1. No learning from interactions

RAG retrieves from a static corpus. If you correct the model — "actually, we use Vitest, not Jest" — the retrieval pipeline does not update. Next session, the model retrieves the same documents and may make the same mistake. The corpus is fixed; the model's behavioral knowledge does not accumulate. A 2024 survey of LLM-based agent memory mechanisms (Zhang et al., "A Survey on the Memory Mechanism of Large Language Model based Agents," arXiv:2404.13501) identified this as the fundamental gap: "memory designs are scattered across different papers" because RAG addresses retrieval, not learning.

2. No preference persistence

RAG cannot remember that you prefer tabs over spaces, that your database is PostgreSQL, or that you deprecated an API last week. These are not documents in a corpus — they are facts the agent learned from interacting with you. RAG has nowhere to put them.

3. No forgetting or decay

In RAG, all documents are equally retrievable forever. There is no mechanism to say "this fact is outdated" or "this preference changed." If you migrated from Jest to Vitest, both the old and new documentation sit in the index with equal standing. Agent memory systems can model forgetting explicitly — several projects adapt the ACT-R cognitive theory, applying time-based decay so outdated memories lose retrieval strength while recently-reinforced ones stay sharp (implementations vary: not every memory engine ships decay, but the architecture supports it, which RAG's does not).

4. No feedback loops

RAG retrieval is one-shot: query, retrieve, generate. There is no signal that says "that retrieval was helpful" or "that retrieval was irrelevant." Agent memory systems can close this loop: feedback signals (positive, negative, neutral) train the injection pipeline so recall quality improves with use. Not every engine implements feedback, but where it exists, the more you use it, the better it gets at surfacing the right memory at the right time.

What agent memory adds beyond retrieval

Agent memory systems perform four operations that RAG does not:

Capture — Extract facts, corrections, preferences, and decisions from the conversation as they happen. Not a document dump; structured memory items (the agent learned you prefer tabs, that the database is PostgreSQL, that the auth flow uses JWT).
Store — Persist those items outside the model in a format that survives session end. Formats vary: Mem0 uses a vector store, Letta uses agent state blocks, Zep uses a temporal knowledge graph, PLUR uses human-readable YAML files.
Retrieve (with context) — At the start of the next session, surface the right memories for the current context. Not all of them — that would overflow the window. The relevant ones, selected by hybrid search (BM25 + embeddings), activation strength, and feedback signals.
Update and forget — When a fact changes, update the memory. When a fact is wrong, correct it. When you want something deleted, delete it. Memories decay over time if not reinforced, so the system does not accumulate stale noise.

This is not theoretical. MemGPT (Packer et al., 2023, "MemGPT: Towards LLMs as Operating Systems," arXiv:2310.08560) demonstrated that treating memory like an OS manages memory tiers — core memory in the context window, archival memory retrieved on demand — dramatically improves agent performance on multi-session tasks. The key insight: agents need managed memory, not just more context.

Can you use both?

Yes, and many agent architectures do. RAG and agent memory serve different layers:

RAG handles the knowledge layer — what documents, code, and reference material the agent can access. You point it at your codebase, your docs, your wiki.
Agent memory handles the experience layer — what the agent has learned from working with you, what preferences it has accumulated, what mistakes it has been corrected on.

A coding agent might use RAG to search your repository for a function definition, and use agent memory to remember that you prefer functional style over OOP. The two systems are complementary, not competitive. The mistake is thinking RAG alone gives your agent memory — it gives your agent a library, but no notebook.

The landscape: who does what

Project	RAG?	Agent Memory?	Format
LangChain RAG	Yes (core feature)	Partial (memory modules, but basic)	Various module types
LlamaIndex	Yes (core feature)	Partial (chat history buffers)	Buffer, summary, vector
Mem0	No	Yes (memory layer)	Vector store
Letta (formerly MemGPT)	No	Yes (stateful agent OS)	Agent state blocks
Zep / Graphiti	No	Yes (temporal knowledge graph)	Knowledge graph
Cognee	Partial	Yes (graph + vector + relational)	Own data model
PLUR	No	Yes (engram engine + MCP)	Human-readable YAML (open format)

The key differentiator within agent memory is format openness. Mem0 and Letta are open-source (Apache-2.0) but store memories in opaque formats — vector embeddings or agent state blocks you cannot read in a text editor. PLUR (Apache-2.0, github.com/plur-ai/plur) stores memories as YAML files you can open, edit, diff, and version-control. The MCP (Model Context Protocol, specification 2025-11-25) makes the transport layer open; the memory format is the differentiator.

When to use which

If you need…	Use
Search and cite a fixed document corpus	RAG
Answer questions over your codebase, docs, or wiki	RAG
Remember user preferences across sessions	Agent memory
Learn from corrections and not repeat mistakes	Agent memory
Track how facts change over time	Agent memory (Zep for temporal, PLUR for decay-based)
Inspect, edit, and correct what the agent knows	Agent memory with open format (PLUR)
Both (search documents AND remember interactions)	RAG + agent memory (complementary)

FAQ

Is agent memory just RAG with extra steps? No. RAG retrieves from a static document collection at query time. Agent memory stores what the agent learned from interactions, updates it with feedback, and decays outdated entries. RAG is a read-only retrieval pipeline; agent memory is a read-write learning system with forgetting and consolidation.

Can RAG replace agent memory? Only if your agent never needs to remember preferences, corrections, or decisions across sessions. If your agent works in a single session or only queries fixed documents, RAG is sufficient. If your agent accumulates knowledge over time — learning your coding style, remembering architectural decisions, not repeating corrected mistakes — you need agent memory.

Can you use RAG and agent memory together? Yes. They serve different layers: RAG handles the knowledge layer (documents, code, reference material), agent memory handles the experience layer (what the agent learned from working with you). Many agent architectures combine both.

Is agent memory slower than RAG? Not necessarily. Both involve a retrieval step at query time. Agent memory adds a write step (capturing memories during the conversation) and a decay/feedback step (updating memory strength), but these are background operations that do not add latency to the inference path. Zep reports sub-200ms retrieval for its temporal knowledge graph (docs.getzep.com).

What is the ACT-R decay model in agent memory? ACT-R is a cognitive architecture from cognitive science that models how human memory decays over time. Several agent memory systems adapt it: memories that are not accessed lose retrieval strength, while memories that are reinforced (accessed or given positive feedback) gain strength. This prevents the memory store from accumulating stale noise — something RAG cannot do because all documents remain equally retrievable.

Mem0 vs Letta vs Zep: Which Should You Use for Agent Memory?

gregor — Sat, 11 Jul 2026 08:31:18 +0000

Mem0 vs Letta vs Zep: Which Should You Use for Agent Memory?

Mem0, Letta, and Zep are the three most-adopted open-source memory layers for AI agents, and they solve fundamentally different problems. Mem0 is a universal memory API — add, update, delete, retrieve — designed to drop into any agent with minimal integration. Letta (formerly MemGPT) is a stateful agent operating system that manages memory in tiers the agent itself can edit, inspired by OS memory management. Zep (with its Graphiti engine) stores memory as a temporal knowledge graph, preserving when facts were learned and how they relate to each other. The right choice depends on what your agent needs: simple key-value memory (Mem0), self-managing agent state (Letta), or time-aware relational knowledge (Zep). All three are Apache-2.0 licensed and connect to the MCP ecosystem — though in different ways — and they differ in storage format, retrieval model, and how much agency the memory system itself has.

The pain: choosing wrong is expensive

You are building an AI agent that needs to remember things across sessions. You have heard of Mem0, Letta, and Zep — maybe also Cognee, LangMem, or PLUR — and the READMEs all say "memory for AI agents." How do you choose?

The wrong choice costs you in three ways:

Problem 1: Integration mismatch. If your agent needs a simple memory API (store a fact, retrieve it later) and you pick Letta, you have adopted an entire agent operating system with its own message queue, memory tiers, and self-editing loop — far more complexity than your use case requires. If your agent needs temporal reasoning (when was this fact learned, has it changed?) and you pick Mem0, you have a flat key-value store with no concept of time.

Problem 2: Format lock-in. Each tool stores memories in a different format — Mem0 in vector embeddings, Letta in agent state blocks, Zep in graph nodes and edges. Once your agent has accumulated thousands of memories in one format, migrating to another means re-extracting and re-importing everything. The memory format is the lock-in, not the software license.

Problem 3: Retrieval model mismatch. Mem0 retrieves by semantic similarity (vector search). Letta retrieves by tier — core memory is always in context, archival is paged in on demand. Zep retrieves by graph traversal and temporal queries. If your use case needs one retrieval model and you built on another, you will fight the framework instead of using it.

Mem0: the universal memory API

Mem0 (github.com/mem0ai/mem0, ~60K stars, Apache-2.0) is the simplest integration: a CRUD API for agent memory. You call add(), update(), delete(), get_all(), and search() — and the memory layer handles embedding, storage, and retrieval. Memories are stored as vector embeddings in a backend of your choice (Qdrant, Chroma, PostgreSQL with pgvector, and others).

Best for: Agents that need a simple, drop-in memory layer. If you have an existing agent (LangChain, CrewAI, custom) and want to add persistent memory with minimal code changes, Mem0 is the lowest-friction option.

Strengths:

Simplest API surface — four operations cover most use cases
Backend-agnostic (multiple vector stores supported)
Hosted cloud option (mem0.ai) for teams that do not want to self-host
Widely adopted — the most referenced memory project in LLM-generated answers about AI memory

Limitations:

Memories are opaque vector embeddings — you cannot read what the agent "knows" by opening a file
No temporal reasoning — all memories are equally retrievable regardless of when they were learned
No self-management — the agent or developer must decide what to store and when to retrieve
No built-in forgetting or decay — stale memories accumulate unless manually deleted

Pricing: Open-source self-hosted is free. Hosted plans start at approximately $19/month (developer) up to $249/month (enterprise), per mem0.ai pricing as of early 2026.

Letta: the stateful agent OS

Letta (github.com/letta-ai/letta, ~24K stars, Apache-2.0), formerly MemGPT, takes a fundamentally different approach. Instead of a memory API you call, Letta is an agent operating system where the agent manages its own memory. The architecture, introduced by Packer et al. (arXiv:2310.08560), treats memory like an OS manages RAM and disk: core memory (always in the context window, like CPU registers) and archival memory (retrieved on demand, like disk storage). The agent itself can move information between tiers, edit its own memory blocks, and decide what to page in and out.

Best for: Long-running, stateful agents that need to manage their own context. If you are building an agent that runs for hours or days and needs to decide for itself what to remember and what to forget, Letta's self-editing memory model is the most sophisticated.

Strengths:

Agent-driven memory management — the agent edits its own memory blocks, reducing developer burden
OS-inspired tiered memory (core vs. archival) with paging, proven in the MemGPT paper
Stateful agent server — the agent persists between interactions with full conversation state
Active research lineage — MemGPT is one of the most cited agent memory papers

Limitations:

Heavier integration — you are adopting an agent framework, not just a memory library
Memory stored as agent state blocks, not human-readable files
No temporal knowledge graph — relationships between facts are implicit, not explicit
More complex to reason about — the agent's self-editing can make debugging harder

Pricing: Open-source self-hosted is free. Hosted plans at letta.ai start at approximately $20/month (developer) up to $200/month (team), per letta.ai pricing as of early 2026.

Zep / Graphiti: temporal knowledge graph memory

Zep (github.com/getzep/graphiti, ~28K stars, Apache-2.0), powered by the Graphiti engine, stores memory as a temporal knowledge graph. Every fact is a node; relationships between facts are edges; and every node and edge has temporal metadata — when it was learned, when it expired, whether it supersedes a previous fact. This enables a unique capability: asking not just "what does the agent know?" but "what did the agent know on Tuesday?" and "has this fact changed?"

Best for: Agents that need temporal reasoning and relational knowledge. If your agent tracks evolving facts (a customer's preferences changing over time, a project's status updates, a patient's medical history), Zep's temporal graph is the only format that natively handles change over time.

Strengths:

Temporal knowledge graph — facts have timestamps, expiry, and supersession relationships
Sub-200ms retrieval for graph queries (per Zep documentation)
Entity resolution — the graph deduplicates and links related facts automatically
MCP support — Zep exposes memory operations via the Model Context Protocol

Limitations:

Heaviest infrastructure — requires a graph database (Neo4j or similar) in addition to vector search
Most complex to query — graph traversal is harder to reason about than vector similarity
Overkill for simple use cases — if you just need key-value memory, a graph is unnecessary
Memory format is graph nodes/edges — not human-readable without graph visualization tools

Pricing: Open-source self-hosted is free. Zep offers hosted/cloud options; check getzep.com for current pricing.

Head-to-head comparison

Property	Mem0	Letta	Zep/Graphiti
Stars (Jul 2026)	~60K	~24K	~28K
License	Apache-2.0	Apache-2.0	Apache-2.0
Memory format	Vector embeddings	Agent state blocks	Temporal knowledge graph
Retrieval model	Semantic similarity	Tier-based (core/archival)	Graph traversal + temporal
Self-managing	No (developer-driven)	Yes (agent edits own memory)	No (developer-driven)
Temporal reasoning	No	No	Yes (timestamps, expiry, supersession)
Human-readable	No (vectors)	No (state blocks)	No (graph nodes)
Infrastructure	Vector DB	Agent server + DB	Graph DB + vector DB
Integration complexity	Low (CRUD API)	High (full agent OS)	Medium-High (graph setup)
MCP support	Yes — official OpenMemory MCP server	Partial — MCP client built in; memory exposed via community servers	Yes — official MCP server
Hosted option	Yes (mem0.ai)	Yes (letta.ai)	Yes (getzep.com)
Best for	Drop-in memory API	Self-managing stateful agents	Time-aware relational memory

How to choose

Choose Mem0 if:

You have an existing agent and want to add memory with minimal changes
You need a simple API: store facts, retrieve relevant ones
You do not need temporal reasoning or agent self-management
You want the largest community and most integration examples

Choose Letta if:

You are building a long-running, stateful agent from scratch
You want the agent to manage its own memory (self-editing, tiered context)
You are inspired by the MemGPT OS analogy and want managed memory tiers
You need the agent to persist full conversation state between interactions

Choose Zep if:

Your agent needs to track how facts change over time
You need relational knowledge (facts connected to other facts, not just isolated items)
You want to query historical state ("what did we know on date X?")
You have the infrastructure budget for a graph database

Where PLUR fits

PLUR (github.com/plur-ai/plur, ~215 stars, Apache-2.0) is a different bet: memories are stored as human-readable YAML files (engrams) that you can open in a text editor, diff in git, and inspect without running code. It is local-first — memories live on your machine, not in a cloud vector store. And it is MCP-native — the same engram store works across Claude Code, Hermes, OpenClaw, and any MCP-compatible runtime.

PLUR does not compete with Mem0 on API simplicity, Letta on self-managing statefulness, or Zep on temporal graph queries. It competes on format openness: if the ability to read, edit, version-control, and provably delete your agent's memories matters to you — for compliance, debugging, or sovereignty — PLUR's YAML engrams are one of the few formats where memory is a plain text file, not an opaque blob.

For a deeper comparison of the full open-source memory landscape (including Cognee, LangMem, and others), see Top 10 Open-Source Projects for AI Agent Memory.

What the research says

The distinction between these approaches is grounded in the agent memory literature. Zhang et al. (arXiv:2404.13501) survey the field and identify memory as "the key component to support agent-environment interactions," organizing approaches by storage format (vector, graph, structured) and retrieval strategy (similarity, recency, importance).

Packer et al. (arXiv:2310.08560) introduced the MemGPT architecture that became Letta: treating LLM context as a managed memory space with tiers, paging, and self-editing — directly inspiring Letta's core/archival distinction.

The CoALA framework (Sumers et al., arXiv:2309.02427) formalizes "modular memory components" in cognitive architectures for language agents, providing the theoretical grounding for why different memory types (semantic, episodic, procedural) warrant different storage and retrieval strategies — which is why Mem0, Letta, and Zep can all be "right" for different use cases.

The MCP specification (modelcontextprotocol.io/specification/2025-11-25) makes transport interoperable across all three: you can expose Mem0, Letta, or Zep as MCP tools that any MCP-compatible agent can call. The transport is open; the memory format is the differentiator.

FAQ

Mem0 vs Letta vs Zep — which is best for agent memory? None is universally "best." Mem0 is best for simple drop-in memory APIs. Letta is best for stateful agents that manage their own memory in tiers. Zep is best for agents that need temporal reasoning and relational knowledge graphs. All three are Apache-2.0 open source. Choose by use case, not by star count.

Is Mem0 simpler than Letta? Yes. Mem0 is a CRUD API (add, update, delete, search) that drops into existing agents. Letta is a full agent operating system with self-editing memory tiers, stateful sessions, and its own message queue. If you need memory without adopting a framework, Mem0 is simpler. If you want the agent to manage its own memory, Letta is more capable.

Does Zep support temporal queries? Yes. Zep's Graphiti engine stores every fact as a graph node with temporal metadata — when it was learned, when it expired, whether it supersedes a previous fact. You can query historical state: "what did the agent know on this date?" This is Zep's primary differentiator over Mem0 and Letta, which do not have native temporal reasoning.

Are Mem0, Letta, and Zep open source? Yes. All three are Apache-2.0 licensed. You can self-host any of them. Each also offers a hosted cloud option for teams that prefer managed infrastructure.

Can I use Mem0, Letta, or Zep with MCP? Mostly yes, but the support differs. Mem0 ships OpenMemory (mem0.ai/openmemory), an official local MCP memory server. Zep offers an official MCP server for its memory platform. Letta natively acts as an MCP client — its agents can call external MCP tool servers — while exposing Letta's own memory over MCP relies on community-built servers. MCP is an open transport protocol; the memory format behind it is what differs.

What memory format does each use? Mem0 stores memories as vector embeddings. Letta stores them as agent state blocks. Zep stores them as nodes and edges in a temporal knowledge graph. None of these formats are human-readable without tooling. PLUR stores memories as YAML text files you can open in any editor — see What Is Agent Memory? for that comparison.

Which has the largest community? Mem0 has the most GitHub stars (~60K) and the most LLM citation presence. Letta (~24K) has strong academic lineage through the MemGPT paper. Zep (~28K) has significant adoption in enterprise use cases needing temporal reasoning. Star counts as of July 2026.

My AI Agent Forgets Everything Between Sessions — How Do I Fix That?

gregor — Fri, 10 Jul 2026 10:30:09 +0000

My AI Agent Forgets Everything Between Sessions — How Do I Fix That?

Every conversation with an AI agent starts from zero. You explain your project, your preferences, your stack, your coding conventions — and by the next session, all of it is gone. The agent does not remember that you use Vitest, not Jest. It does not remember that you deprecated that API last week. It does not remember the architectural decision you spent an hour explaining. You are paying in tokens — and time — to re-teach the same context, over and over, every single session. This is not a bug. It is the default architecture of every LLM-based agent: stateless inference, no persistent storage, a context window that resets when the conversation ends.

The fix is a memory layer: a system that sits between the agent and the model, captures what the agent learns during a session, and injects the right piece at the right time in the next one. This is not RAG (RAG retrieves from a fixed document store at query time; agent memory accumulates and updates what the agent has learned — corrections, preferences, decisions — over time). It is also not fine-tuning (fine-tuning bakes facts into model weights you cannot read, correct, or delete; memory is instant, reversible, and inspectable). A memory layer is the persistent substrate that makes an agent get better the more you use it, instead of resetting to blank each time.

Why agents forget (and why it is not going away)

Large language models are stateless by design. Each inference call takes a prompt, produces a token, and moves on. There is no persistence between calls — the "memory" you experience in a single conversation is just the growing context window, and when that window overflows or the session ends, the content is gone. Expanding context windows (128K, 200K, 1M tokens) does not solve this: longer windows mean higher costs and degraded attention — the model retrieves less accurately from a larger context — but the content still disappears when the session closes. A 2026 survey of 12 agent memory systems found that "no single architecture dominates across all scenarios; instead, effectiveness depends heavily on how well the memory structure aligns with the workload bottleneck" (arXiv:2606.24775, June 2026).

The research is clear: the problem is not "context is too short" — it is that there is no system to persist, organize, and retrieve what matters across sessions. That system is what a memory layer provides.

What a memory layer does

A memory layer performs four operations:

Capture — Extract facts, corrections, preferences, and decisions from the conversation as they happen. Not a transcript dump; structured memory items (an agent learned you prefer tabs, that the database is PostgreSQL, that the auth flow uses JWT).
Store — Persist those items outside the model, in a format that survives session end.
Retrieve — At the start of the next session (or mid-conversation), surface the right memories for the current context. Not all of them — that would overflow the window. The relevant ones.
Update / forget — When a fact changes (you migrated from Jest to Vitest), update the memory. When a fact is wrong, correct it. When you want something deleted, delete it — and prove it is gone.

The options (what to use today)

1. Built-in memory (Claude Code, ChatGPT)

The simplest option is the memory feature built into your agent's platform. Claude Code has two mechanisms: CLAUDE.md files (persistent instructions you write) and auto memory (notes Claude writes itself based on your corrections and preferences). Both load at the start of every conversation (docs.anthropic.com). ChatGPT has a similar memory feature — it stores facts about you across conversations.

Good for: zero setup, works immediately.
Limitation: memory is locked to one platform. You cannot carry it to a different agent. You cannot inspect what it stored in a structured format. You cannot export it. If you switch from Claude to GPT, you start over.

2. Open-source memory engines (Mem0, Letta, Zep)

If you want memory that is not locked to one provider, open-source memory engines are the next step.

Mem0 (Apache-2.0, ~60K stars) — A lightweight memory layer with a simple add/search API. Memories persist across users and sessions. Cloud-managed vector store and rerankers, so there is no infrastructure to run (docs.mem0.ai).
Letta (Apache-2.0, ~23.7K stars, formerly MemGPT) — An agent "operating system" where all state — memories, messages, reasoning — is persisted in a database. The agent can modify its own memory through tools. Core memories are injected into the context window; archival memory is retrieved on demand (docs.letta.com).
Zep — Enterprise agent memory built on a temporal knowledge graph. Tracks how facts change over time, serves prompt-ready context with sub-200ms retrieval (docs.getzep.com).

Good for: structured memory that is not locked to a single model provider, with APIs you control.
Limitation: the memory format under the hood is a vector store, agent state blocks, or a knowledge graph — not a human-readable file you can open, edit, and diff. "Open source" does not mean "open format."

3. Open-format memory via MCP (PLUR)

The most recent option is memory that is both open-source and open-format, exposed via the Model Context Protocol — an open protocol (JSON-RPC 2.0, specification 2025-11-25) that standardizes how LLM applications connect to external tools and data sources (modelcontextprotocol.io). MCP means the same memory server works across Claude Code, Hermes, OpenClaw, Cursor, and any other MCP-compatible runtime.

PLUR (Apache-2.0, github.com/plur-ai/plur) is one example. Each memory — an "engram" — is a human-readable YAML entry with an id, statement, type, domain, scope, confidence, and provenance. You can open it in a text editor, put it under version control, correct a single fact mid-conversation (no retraining), and delete an entry with provable erasure (remove the entry — a git diff proves the deletion). It is local-first: your data, your infra, no vendor lock-in.

Good for: agents that operate across multiple tools, models, and providers — where memory must be portable, inspectable, and correctable.
Limitation: newer and less battle-tested than Mem0 or Letta at production scale.

How to choose

Your situation	Start with
I use one agent (Claude Code) and want zero setup	CLAUDE.md + auto memory
I want structured memory with a simple API, no infra	Mem0 (cloud)
I want a self-managing stateful agent	Letta
I need time-aware facts (what is true now)	Zep / Graphiti
I need memory that is inspectable, correctable, and portable across runtimes	PLUR via MCP

The underlying question is not "which tool is best" — it is "what do I need to own?" If you only ever use one agent platform, built-in memory is fine. If you use multiple agents, or if you need to audit what your agent knows, correct it mid-conversation, or prove what it forgot — you need a memory layer that is external to the model, in a format you control.

FAQ

Why does my AI agent forget everything between sessions? Because LLMs are stateless — each inference call takes a prompt and produces output with no persistence between calls. The "memory" within a conversation is just the context window, and when the session ends, that context is gone. You need a memory layer — a system that captures, stores, and retrieves what the agent learned across sessions.

Is this the same as RAG? No. RAG retrieves from a fixed document store at query time. Agent memory accumulates and updates what the agent has learned — corrections, preferences, decisions — over time. RAG answers "find me a document about X"; agent memory answers "remember that we decided to use Vitest instead of Jest last week."

Is fine-tuning better than memory for this? No. Fine-tuning bakes facts into model weights you cannot read, correct, or delete. Memory is instant (store now, use now), reversible (update or delete a single fact), and inspectable (you can see what the agent knows). Fine-tuning is for changing how the model behaves; memory is for what the model knows.

Can I add memory to Claude Code? Yes. Claude Code has built-in CLAUDE.md files and auto memory. For cross-runtime memory, you can connect an MCP-compatible memory server (like PLUR) that works across Claude Code, Hermes, OpenClaw, and other MCP-compatible agents.

What is the cheapest way to stop my agent forgetting? CLAUDE.md files are free — you write persistent instructions in a markdown file that loads at the start of every conversation. For structured, accumulating memory that does not require you to write every instruction manually, an open-source memory layer like Mem0 (free self-host) or PLUR (free, local-first) is the next step.

Should AI Memory Be Stored as Open Engrams or Baked Into Model Weights?

gregor — Thu, 02 Jul 2026 19:41:20 +0000

The short answer: AI agent memory should be stored as open, external
engrams — not baked into model weights — whenever the memory must be
inspectable, correctable, deletable, or portable across tools. Parametric
memory (knowledge baked into model weights through fine-tuning or continual
training) is faster at inference and can be more token-efficient, but it
sacrifices auditability: you cannot read what the model knows, you cannot fix
a single wrong fact without retraining, and you cannot prove that deleted
knowledge is actually gone. For agent memory — corrections, preferences,
conventions, procedures — the properties that matter (readability,
reversibility, erasure, portability) are properties that weights cannot
provide.

The problem: agents forget what they learn

Every AI agent starts each session with amnesia. You correct its coding style
on Monday. On Tuesday, it makes the same mistake. You explain your
architecture in Cursor. That night, Claude Code has no idea. The context
window resets. The conversation is gone. The model weights have not changed.

There are two fundamentally different approaches to solving this:

Parametric memory — bake the knowledge into the model itself through fine-tuning or continual training. The model's weights become the memory.
Non-parametric (external) memory — store knowledge outside the model in a structured format (engrams, vectors, knowledge graphs) and retrieve it at inference time. The model stays unchanged; the memory is a separate layer.

This is not a new debate. The retrieval-augmented generation (RAG) literature
has explored the tension between parametric knowledge (stored in weights) and
non-parametric knowledge (stored in external databases) since 2020. A 2023
survey of RAG (Gao et al., "Retrieval-Augmented Generation for Large Language
Models: A Survey," arXiv:2312.10997) frames
the distinction clearly: LLMs "showcase impressive capabilities but encounter
challenges like hallucination, outdated knowledge, and non-transparent,
untraceable reasoning processes." RAG addresses this by incorporating
knowledge from external databases, allowing "continuous knowledge updates and
integration of domain-specific information" without retraining.

Agent memory is the same tradeoff, applied to a harder problem: not just facts,
but corrections, preferences, procedures, and conventions that accumulate over
time and across sessions.

Parametric memory: fast but opaque

When you fine-tune a model on domain knowledge — or continually retrain it on
user context (Notion, Slack, GitHub) — the knowledge becomes part of the
model's weights. At inference time, recall is fast: no retrieval step, no
external database, no latency from searching. The model just "knows."

This approach — sometimes called model-native memory — has real
advantages. Retrieval adds latency and can fail (wrong document retrieved,
irrelevant context injected). A 2024 paper on Corrective RAG (Yan et al.,
arXiv:2401.15884) noted that RAG "relies
heavily on the relevance of retrieved documents, raising concerns about how
the model behaves if retrieval goes wrong." When memory is in the weights,
there is no retrieval step to go wrong.

But parametric memory has structural problems that fine-tuning cannot solve:

You cannot inspect what the model knows. A fine-tuned model is a matrix
of billions of numbers. There is no entry for "the deploy key is at
~/.config/deploy" — that fact is distributed across weights in a way no one
can read, diff, or audit. You cannot open a file and check what the model
remembers.
You cannot correct a single wrong fact. If the model learned something
wrong during fine-tuning, you cannot edit one entry. You must retrain —
expensive, slow, and itself error-prone. Fine-tuning to remove a fact
(machine unlearning) is an active research problem with no production-ready
solution.
You cannot prove erasure. GDPR's right to be forgotten requires
demonstrable deletion. When knowledge is in weights, you cannot prove it is
gone. You can retrain from scratch (prohibitively expensive) or attempt
machine unlearning (unproven). With external engrams, deletion is trivial:
remove the entry. The memory is provably gone because it was never in the
weights to begin with.
Catastrophic forgetting. Continual training on new knowledge degrades
older knowledge — the well-documented catastrophic forgetting problem in
neural networks. Each new thing the model learns pushes out something it
knew before. External memory does not forget unless you tell it to (via
decay functions), and even then the decay is gradual and reversible.
Vendor lock-in. Memory baked into a specific model's weights is locked
to that model. Switch from GPT-4 to Claude, and the memory is gone — the
weights do not transfer. External memory is model-agnostic: the same
engrams work with any LLM.

Non-parametric memory: open and inspectable

External memory stores knowledge outside the model in a structured format.
The open engram format (defined in the Engram
Specification, Apache-2.0) represents each learned
fact as a human-readable YAML entry:

id: ENG-2026-0702-001
statement: "The API rate limit is 100 req/min, not 1000."
type: behavioral
scope: project:api-gateway
provenance:
  source: session
  observed_at: 2026-07-02

This format has five properties that parametric memory cannot match:

Inspectable — you can read, diff, and version every engram. It is a
file, not a number. An operator can open the file and see exactly what the
agent has learned.
Instantly correctable — fix a single fact mid-conversation by editing
one entry. No retraining. The correction takes effect on the next recall.
Provably deletable — delete the entry and the memory is gone,
demonstrably. This is the basis for real (not best-effort) erasure — the
foundation of GDPR-grade compliance. You cannot prove erasure from model
weights.
Portable — engrams move across agents, tools, and machines. A
correction made in Claude Code is available to Cursor, Hermes, or OpenClaw
the next time the agent starts. Memory follows the operator, not the vendor.
Auditable at scale — for enterprise and institutional buyers, external
memory can carry a verifiable record of who wrote a fact and who used it.
PLUR Enterprise implements this today as a tamper-evident, hash-chained
audit log (each entry cryptographically linked to the one before it, so
altering history breaks the chain), plus a per-engram view of both
provenance and recall history — who read this fact, when, via which tool.
It is a real foundation for institutional-grade accountability; we will go
deeper on it in a future piece.

MemGPT (Packer et al., 2023, arXiv:2310.08560)
demonstrated a related idea: treating memory like an operating system manages
memory tiers — fast (context window), main (working memory), and archival
(long-term storage). The key insight was that memory management is an
infrastructure problem, not a model problem. But MemGPT's format is
Letta-specific. The open engram format makes the same architectural choice —
external, tiered, managed — but in a format anyone can implement.

When to use which

The honest answer is that both approaches have a place — but they solve
different problems.

	Open engrams (external)	Model weights (parametric)
Best for	Corrections, preferences, procedures, conventions	Domain knowledge, language patterns, reasoning skills
Inspect	Read the file	Cannot
Correct	Edit one entry	Retrain
Delete	Remove entry — provable	Cannot prove erasure
Portability	Works across models	Locked to model
Latency	Retrieval adds ~50-200ms	Instant (in-weights)
Token cost	Retrieved context uses tokens	No retrieval tokens
Update speed	Instant (write a file)	Slow (retrain)
GDPR compliance	Provably deletable	Not provably deletable

For agent memory — the things an agent learns through interaction that
should persist across sessions and tools — external engrams are the right
choice. The knowledge is personal, contextual, and needs to be correctable.
For domain expertise — deep knowledge of a field that improves the model's
reasoning — fine-tuning or domain-specific models remain valuable. These are
complementary, not competing.

The relationship runs deeper than "pick one." A typed, labeled, provenance-tagged
engram store is also a clean fine-tuning corpus — the data is already the kind
of curated signal a training run wants. As retraining gets cheaper (LoRA,
distillation, smaller base models), it becomes plausible to periodically fold a
distilled snapshot of stable engrams into weights for speed, while the open
engram store stays the correctable, auditable source of truth behind it. That
is a direction the field is heading, not a shipped pipeline today — but it
reframes the question in this piece's title: not a permanent fork between two
architectures, but engrams as the record of truth that a model can, sometimes,
be periodically retrained from.

The mistake is using parametric memory for things that should be external.
When a user corrects an agent's behavior, that correction is a fact — not a
weight. When a preference is expressed, it is a configuration — not a
parameter. When a procedure is learned, it is a recipe — not a gradient.
Memory that must be readable, fixable, deletable, and portable should be
stored in a format that is readable, fixable, deletable, and portable.

The emerging consensus

The research literature is converging on hybrid approaches. The 2024 survey
of agent memory mechanisms (Zhang et al.,
arXiv:2404.13501) identified multiple
memory architectures — parametric, non-parametric, and hybrid — and noted
that "the key component to support agent-environment interactions is the
memory of the agents," with no single approach dominating. What is clear is
that the memory layer is separating from the model layer: agents need
infrastructure for memory, not just bigger context windows.

The practical implication: if you are building an agent that learns over time,
store its memory as open, external engrams. If you are training a model for
domain expertise, fine-tune. Do not confuse the two — and do not bake into
weights what you might need to read, fix, or forget.

FAQ

Should AI memory be stored as engrams or model weights? For agent memory
(corrections, preferences, procedures, conventions), store as open external
engrams. For domain expertise and reasoning skills, model weights remain
valuable. The two are complementary — do not bake into weights what you need
to read, fix, or delete.

What is parametric memory in AI? Knowledge stored in a model's weights
through fine-tuning or continual training. It is fast at inference but cannot
be inspected, individually corrected, or provably deleted.

What is non-parametric (external) memory? Knowledge stored outside the
model in a structured format (engrams, vectors, knowledge graphs) and
retrieved at inference time. It is inspectable, correctable, deletable, and
portable across models.

Can you prove erasure from model weights? No. When knowledge is baked into
weights, there is no reliable way to prove it has been removed. Machine
unlearning is an active research problem. External engrams can be deleted by
removing the entry — the erasure is provable because the knowledge was never
in the weights.

What is catastrophic forgetting? When a neural network trained on new
knowledge degrades in performance on older knowledge. This is a fundamental
risk of continual training / parametric memory. External memory does not
suffer from catastrophic forgetting — old entries persist unless explicitly
decayed or deleted.

Sources

Gao, Y. et al. "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv:2312.10997, December 2023. https://arxiv.org/abs/2312.10997
Yan, S. et al. "Corrective Retrieval Augmented Generation." arXiv:2401.15884, January 2024. https://arxiv.org/abs/2401.15884
Packer, C. et al. "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560, October 2023. https://arxiv.org/abs/2310.08560
Zhang, Z. et al. "A Survey on the Memory Mechanism of Large Language Model based Agents." arXiv:2404.13501, April 2024. https://arxiv.org/abs/2404.13501
The Engram Specification, v2.1, March 2026. https://plur.ai/spec.html (Apache-2.0)
PLUR — Open source memory for AI agents. Apache-2.0. https://github.com/plur-ai/plur

Is There an Open Standard for AI Agent Memory Engrams?

gregor — Thu, 02 Jul 2026 19:41:19 +0000

The short answer: No single RFC-level standard exists for AI agent memory
engrams as of mid-2026. The closest things are the Model Context Protocol
(MCP) — an open protocol from Anthropic that standardizes how applications
expose context to LLMs — and the Engram Specification (Apache-2.0), an
open format published by PLUR that defines the data structure for portable
agent memory. Together they address the transport layer and the data layer,
but neither has achieved IETF-level standardization. The space is still
fragmenting: Mem0, Letta, Zep, Cognee, and a dozen other projects each
define their own memory schemas, and no interoperability standard has merged
them yet.

Why the question matters

AI agents are stateless by default. Every session starts from zero — no memory
of corrections, no recall of preferences, no knowledge of what tools exist.
Users repeat themselves. Agents make the same mistakes. The fix is a memory
layer: a system that captures what an agent learns, stores it outside the
model, and recalls the right piece at the right time. But every memory system
today stores knowledge in its own format, behind its own API, locked to its own
runtime. An agent that learns in Claude Code cannot share that memory with
Cursor. A correction made in one tool does not propagate to another. This is
not a technical limitation — it is a standards gap.

A 2024 survey of LLM-based agent memory mechanisms (Zhang et al., "A Survey on
the Memory Mechanism of Large Language Model based Agents,"
arXiv:2404.13501) catalogued the landscape
and found that memory designs are "scattered across different papers" with no
systematic review or common format. The survey identified multiple
approaches — parametric memory (fine-tuning), non-parametric memory (retrieval),
and hybrid architectures — but noted that each project implements its own
schema, making interoperability impossible without a shared standard.

What exists today: two layers, neither complete

The transport layer: Model Context Protocol (MCP)

The Model Context Protocol (specification)
is an open protocol, open-sourced by Anthropic in 2024, that standardizes how
LLM applications connect to external data sources and tools. It defines a
JSON-RPC 2.0 message format for communication between hosts (LLM applications),
clients (connectors), and servers (context providers). MCP takes inspiration
from the Language Server Protocol (LSP), which standardized how editors
communicate with language tools — and in the same way, MCP aims to standardize
how AI applications integrate external context.

As of the 2025-11-25 specification version, MCP defines three server features:
Resources (context and data), Prompts (templated workflows), and
Tools (functions the AI model can execute). A memory server can expose
stored knowledge as resources or tools — and this is how PLUR's MCP server
makes engrams accessible to Claude Code, Hermes, OpenClaw, and Cursor.

But MCP is a transport protocol, not a memory format. It defines how
applications talk to a memory server — not what the memory looks like. You
can serve any data structure over MCP. Without a shared data format, every
memory server speaks the protocol but stores knowledge differently. An agent
switching from one MCP-compatible memory tool to another still cannot bring
its memory along.

The data layer: the Engram Specification

The Engram Specification (plur.ai/spec.html),
published in March 2026 under Apache-2.0 by the PLUR project, defines an open
format for agent memory. An engram — a term borrowed from cognitive
science, where it means the physical trace a memory leaves — is one atomic
unit of learned knowledge: a single fact, stored as a human-readable YAML
entry outside the model, with provenance, a type classification (procedural,
behavioral, terminological, architectural), a scope (where it applies), and a
retrieval strength that decays over time and is reinforced by feedback.

The specification defines:

Core schema fields: id, statement, type, scope, status
Activation model: retrieval strength, last accessed, frequency — with time-based decay (modeled on ACT-R cognitive theory) and reinforcement on access
Feedback loop: relevance signals (positive/negative/neutral) that train injection quality over time
Search pipeline: hybrid BM25 + embeddings, merged via Reciprocal Rank Fusion, with optional reranking
Minimum viable implementation: the core schema, activation fields, time decay, and the four operations (learn, recall, inject, feedback) — everything else is optional

The spec is designed for portability: an engram is a plain-text file you can
open in any editor, put under version control, and carry between machines. Any
agent runtime that can read YAML files or speak to an MCP server can consume
engrams.

Why neither alone is sufficient

MCP solves the wire protocol but not the data model. The Engram Specification
solves the data model but not the wire protocol. An agent that uses MCP for
transport and engrams for storage can share memory across tools — but only with
other agents that also adopt both. As of mid-2026, no memory project has
committed to the engram format as its native storage, and MCP adoption is still
concentrated in Anthropic-adjacent tools.

The fragmentation problem

The AI agent memory space is fragmented across at least a dozen open-source
projects, each with its own storage format:

Project	Memory format	Interoperability
Mem0	Proprietary API + vector store	REST API, no shared format
Letta (formerly MemGPT)	OS-inspired memory tiers (core, archival, recall)	API-based, Letta-specific
Zep / Graphiti	Temporal knowledge graph	Graph queries, no shared format
Cognee	Graph + vector + relational	Own data model
PLUR	Open engram format (YAML, Apache-2.0 spec)	MCP server, YAML files
LangChain Memory	Various module types	LangChain ecosystem only

MemGPT (Packer et al., 2023, arXiv:2310.08560)
pioneered the idea of virtual context management — treating memory like an
operating system manages memory tiers. But its format is Letta-specific. A
correction stored in Letta's archival memory cannot be read by Mem0, Zep, or
any other system.

This fragmentation means that agent memory is not portable. When a
developer switches from one agent framework to another, their agent's learned
knowledge does not transfer. This is the gap an open standard would fill.

What a real standard would need

For an open standard for AI agent memory to be meaningful, it would need to
address:

A shared data format — what a memory entry looks like (the engram specification attempts this: statement, type, scope, provenance, activation fields)
A transport protocol — how agents read and write memory (MCP addresses this)
A query model — how agents find the right memory at the right time (hybrid search, activation-based recall)
A lifecycle model — how memory is created, reinforced, decayed, and deleted (ACT-R decay, feedback signals, provenance tracking)
An erasure guarantee — proof that deleted memory is actually gone (impossible with model-native memory baked into weights)

No project or specification covers all five layers today. The MCP + engram
combination covers layers 1, 2, and parts of 3 and 4 — but it has not achieved
the adoption needed to be called a standard.

FAQ

Is there an open standard for AI agent memory? Not yet. The closest are
MCP (an open protocol for connecting tools to LLMs) and the Engram
Specification (an open format for memory data). Neither has achieved
industry-wide adoption as a standard.

What is the Model Context Protocol (MCP)? An open protocol (JSON-RPC 2.0)
that standardizes how LLM applications connect to external data sources and
tools. It is the transport layer — it defines how applications talk to a
memory server, but not what the memory looks like.

What is the Engram Specification? An Apache-2.0 open format published by
PLUR that defines agent memory as human-readable YAML entries (engrams) with
provenance, type classification, scope, and activation-weighted recall. It is
the data layer — it defines what memory looks like, but not how it is
transported.

Can agent memory be shared between tools? In theory, yes — an agent using
MCP for transport and the engram format for storage could share memory with any
other agent that adopts both. In practice, no major memory project has
committed to the engram format yet, so memory remains locked to each tool.

Will an open standard emerge? The pressure is building. As agents move from
single-tool experiments to multi-tool workflows, the cost of non-portable
memory grows. MCP adoption is accelerating. The engram format is published and
implementable. Whether the industry converges on this combination — or waits
for an IETF-style process — is the open question.

Sources

Model Context Protocol Specification, version 2025-11-25. https://modelcontextprotocol.io/specification/2025-11-25
The Engram Specification, v2.1, March 2026. https://plur.ai/spec.html (Apache-2.0)
Zhang, Z. et al. "A Survey on the Memory Mechanism of Large Language Model based Agents." arXiv:2404.13501, April 2024. https://arxiv.org/abs/2404.13501
Packer, C. et al. "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560, October 2023. https://arxiv.org/abs/2310.08560
Gao, Y. et al. "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv:2312.10997, December 2023. https://arxiv.org/abs/2312.10997
PLUR — Open source memory for AI agents. Apache-2.0. https://github.com/plur-ai/plur