Should AI Memory Be Stored as Open Engrams or Baked Into Model Weights?

#ai #agents #memory #opensource

The short answer: AI agent memory should be stored as open, external
engrams — not baked into model weights — whenever the memory must be
inspectable, correctable, deletable, or portable across tools. Parametric
memory (knowledge baked into model weights through fine-tuning or continual
training) is faster at inference and can be more token-efficient, but it
sacrifices auditability: you cannot read what the model knows, you cannot fix
a single wrong fact without retraining, and you cannot prove that deleted
knowledge is actually gone. For agent memory — corrections, preferences,
conventions, procedures — the properties that matter (readability,
reversibility, erasure, portability) are properties that weights cannot
provide.

The problem: agents forget what they learn

Every AI agent starts each session with amnesia. You correct its coding style
on Monday. On Tuesday, it makes the same mistake. You explain your
architecture in Cursor. That night, Claude Code has no idea. The context
window resets. The conversation is gone. The model weights have not changed.

There are two fundamentally different approaches to solving this:

Parametric memory — bake the knowledge into the model itself through fine-tuning or continual training. The model's weights become the memory.
Non-parametric (external) memory — store knowledge outside the model in a structured format (engrams, vectors, knowledge graphs) and retrieve it at inference time. The model stays unchanged; the memory is a separate layer.

This is not a new debate. The retrieval-augmented generation (RAG) literature
has explored the tension between parametric knowledge (stored in weights) and
non-parametric knowledge (stored in external databases) since 2020. A 2023
survey of RAG (Gao et al., "Retrieval-Augmented Generation for Large Language
Models: A Survey," arXiv:2312.10997) frames
the distinction clearly: LLMs "showcase impressive capabilities but encounter
challenges like hallucination, outdated knowledge, and non-transparent,
untraceable reasoning processes." RAG addresses this by incorporating
knowledge from external databases, allowing "continuous knowledge updates and
integration of domain-specific information" without retraining.

Agent memory is the same tradeoff, applied to a harder problem: not just facts,
but corrections, preferences, procedures, and conventions that accumulate over
time and across sessions.

Parametric memory: fast but opaque

When you fine-tune a model on domain knowledge — or continually retrain it on
user context (Notion, Slack, GitHub) — the knowledge becomes part of the
model's weights. At inference time, recall is fast: no retrieval step, no
external database, no latency from searching. The model just "knows."

This approach — sometimes called model-native memory — has real
advantages. Retrieval adds latency and can fail (wrong document retrieved,
irrelevant context injected). A 2024 paper on Corrective RAG (Yan et al.,
arXiv:2401.15884) noted that RAG "relies
heavily on the relevance of retrieved documents, raising concerns about how
the model behaves if retrieval goes wrong." When memory is in the weights,
there is no retrieval step to go wrong.

But parametric memory has structural problems that fine-tuning cannot solve:

You cannot inspect what the model knows. A fine-tuned model is a matrix
of billions of numbers. There is no entry for "the deploy key is at
~/.config/deploy" — that fact is distributed across weights in a way no one
can read, diff, or audit. You cannot open a file and check what the model
remembers.
You cannot correct a single wrong fact. If the model learned something
wrong during fine-tuning, you cannot edit one entry. You must retrain —
expensive, slow, and itself error-prone. Fine-tuning to remove a fact
(machine unlearning) is an active research problem with no production-ready
solution.
You cannot prove erasure. GDPR's right to be forgotten requires
demonstrable deletion. When knowledge is in weights, you cannot prove it is
gone. You can retrain from scratch (prohibitively expensive) or attempt
machine unlearning (unproven). With external engrams, deletion is trivial:
remove the entry. The memory is provably gone because it was never in the
weights to begin with.
Catastrophic forgetting. Continual training on new knowledge degrades
older knowledge — the well-documented catastrophic forgetting problem in
neural networks. Each new thing the model learns pushes out something it
knew before. External memory does not forget unless you tell it to (via
decay functions), and even then the decay is gradual and reversible.
Vendor lock-in. Memory baked into a specific model's weights is locked
to that model. Switch from GPT-4 to Claude, and the memory is gone — the
weights do not transfer. External memory is model-agnostic: the same
engrams work with any LLM.

Non-parametric memory: open and inspectable

External memory stores knowledge outside the model in a structured format.
The open engram format (defined in the Engram
Specification, Apache-2.0) represents each learned
fact as a human-readable YAML entry:

id: ENG-2026-0702-001
statement: "The API rate limit is 100 req/min, not 1000."
type: behavioral
scope: project:api-gateway
provenance:
  source: session
  observed_at: 2026-07-02

This format has five properties that parametric memory cannot match:

Inspectable — you can read, diff, and version every engram. It is a
file, not a number. An operator can open the file and see exactly what the
agent has learned.
Instantly correctable — fix a single fact mid-conversation by editing
one entry. No retraining. The correction takes effect on the next recall.
Provably deletable — delete the entry and the memory is gone,
demonstrably. This is the basis for real (not best-effort) erasure — the
foundation of GDPR-grade compliance. You cannot prove erasure from model
weights.
Portable — engrams move across agents, tools, and machines. A
correction made in Claude Code is available to Cursor, Hermes, or OpenClaw
the next time the agent starts. Memory follows the operator, not the vendor.
Auditable at scale — for enterprise and institutional buyers, external
memory can carry a verifiable record of who wrote a fact and who used it.
PLUR Enterprise implements this today as a tamper-evident, hash-chained
audit log (each entry cryptographically linked to the one before it, so
altering history breaks the chain), plus a per-engram view of both
provenance and recall history — who read this fact, when, via which tool.
It is a real foundation for institutional-grade accountability; we will go
deeper on it in a future piece.

MemGPT (Packer et al., 2023, arXiv:2310.08560)
demonstrated a related idea: treating memory like an operating system manages
memory tiers — fast (context window), main (working memory), and archival
(long-term storage). The key insight was that memory management is an
infrastructure problem, not a model problem. But MemGPT's format is
Letta-specific. The open engram format makes the same architectural choice —
external, tiered, managed — but in a format anyone can implement.

When to use which

The honest answer is that both approaches have a place — but they solve
different problems.

	Open engrams (external)	Model weights (parametric)
Best for	Corrections, preferences, procedures, conventions	Domain knowledge, language patterns, reasoning skills
Inspect	Read the file	Cannot
Correct	Edit one entry	Retrain
Delete	Remove entry — provable	Cannot prove erasure
Portability	Works across models	Locked to model
Latency	Retrieval adds ~50-200ms	Instant (in-weights)
Token cost	Retrieved context uses tokens	No retrieval tokens
Update speed	Instant (write a file)	Slow (retrain)
GDPR compliance	Provably deletable	Not provably deletable

For agent memory — the things an agent learns through interaction that
should persist across sessions and tools — external engrams are the right
choice. The knowledge is personal, contextual, and needs to be correctable.
For domain expertise — deep knowledge of a field that improves the model's
reasoning — fine-tuning or domain-specific models remain valuable. These are
complementary, not competing.

The relationship runs deeper than "pick one." A typed, labeled, provenance-tagged
engram store is also a clean fine-tuning corpus — the data is already the kind
of curated signal a training run wants. As retraining gets cheaper (LoRA,
distillation, smaller base models), it becomes plausible to periodically fold a
distilled snapshot of stable engrams into weights for speed, while the open
engram store stays the correctable, auditable source of truth behind it. That
is a direction the field is heading, not a shipped pipeline today — but it
reframes the question in this piece's title: not a permanent fork between two
architectures, but engrams as the record of truth that a model can, sometimes,
be periodically retrained from.

The mistake is using parametric memory for things that should be external.
When a user corrects an agent's behavior, that correction is a fact — not a
weight. When a preference is expressed, it is a configuration — not a
parameter. When a procedure is learned, it is a recipe — not a gradient.
Memory that must be readable, fixable, deletable, and portable should be
stored in a format that is readable, fixable, deletable, and portable.

The emerging consensus

The research literature is converging on hybrid approaches. The 2024 survey
of agent memory mechanisms (Zhang et al.,
arXiv:2404.13501) identified multiple
memory architectures — parametric, non-parametric, and hybrid — and noted
that "the key component to support agent-environment interactions is the
memory of the agents," with no single approach dominating. What is clear is
that the memory layer is separating from the model layer: agents need
infrastructure for memory, not just bigger context windows.

The practical implication: if you are building an agent that learns over time,
store its memory as open, external engrams. If you are training a model for
domain expertise, fine-tune. Do not confuse the two — and do not bake into
weights what you might need to read, fix, or forget.

FAQ

Should AI memory be stored as engrams or model weights? For agent memory
(corrections, preferences, procedures, conventions), store as open external
engrams. For domain expertise and reasoning skills, model weights remain
valuable. The two are complementary — do not bake into weights what you need
to read, fix, or delete.

What is parametric memory in AI? Knowledge stored in a model's weights
through fine-tuning or continual training. It is fast at inference but cannot
be inspected, individually corrected, or provably deleted.

What is non-parametric (external) memory? Knowledge stored outside the
model in a structured format (engrams, vectors, knowledge graphs) and
retrieved at inference time. It is inspectable, correctable, deletable, and
portable across models.

Can you prove erasure from model weights? No. When knowledge is baked into
weights, there is no reliable way to prove it has been removed. Machine
unlearning is an active research problem. External engrams can be deleted by
removing the entry — the erasure is provable because the knowledge was never
in the weights.

What is catastrophic forgetting? When a neural network trained on new
knowledge degrades in performance on older knowledge. This is a fundamental
risk of continual training / parametric memory. External memory does not
suffer from catastrophic forgetting — old entries persist unless explicitly
decayed or deleted.

Sources

Gao, Y. et al. "Retrieval-Augmented Generation for Large Language Models: A Survey." arXiv:2312.10997, December 2023. https://arxiv.org/abs/2312.10997
Yan, S. et al. "Corrective Retrieval Augmented Generation." arXiv:2401.15884, January 2024. https://arxiv.org/abs/2401.15884
Packer, C. et al. "MemGPT: Towards LLMs as Operating Systems." arXiv:2310.08560, October 2023. https://arxiv.org/abs/2310.08560
Zhang, Z. et al. "A Survey on the Memory Mechanism of Large Language Model based Agents." arXiv:2404.13501, April 2024. https://arxiv.org/abs/2404.13501
The Engram Specification, v2.1, March 2026. https://plur.ai/spec.html (Apache-2.0)
PLUR — Open source memory for AI agents. Apache-2.0. https://github.com/plur-ai/plur