DEV Community

Cover image for The Personal Small Model (PSM): Memory as a Learned Cognitive Primitive
Krishna
Krishna

Posted on

The Personal Small Model (PSM): Memory as a Learned Cognitive Primitive

The Problem With Every Memory System Today

mem0, Zep, Letta, MemPalace — they all make the same foundational assumption:

Memory is a storage problem.

Build a good enough database. Implement a smart enough retrieval mechanism. Inject the results into the LLM’s context. The model consumes the fragments. The model forgets. The cycle repeats.

This post argues that assumption is architecturally wrong, and proposes an alternative.


The Insight: Memory Is a Cognitive Skill, Not a Database

The human brain didn’t solve long-term memory by building a perfect database. It solved it through specialization:

  • 🧠 Hippocampus — fast episodic capture
  • 🧠 Neocortex — slow semantic consolidation
  • 🧠 Prefrontal cortex — relevance gating
  • 🌙 Sleep — consolidation, pruning, replay

No single system tries to do everything. Each has a narrow, trainable job.

The Personal Small Model (PSM) mirrors this exactly.


What is the PSM?

The PSM is a small model (1–3B parameters) trained not to store user content, but to master memory operations:

  • Relevance gating — what’s worth remembering at all?
  • Consolidation — when do episodic events become semantic facts?
  • Recall weighting — how strongly should this memory be surfaced?
  • Interference detection — does new info contradict old beliefs?
  • Decay scheduling — how quickly should different memory types fade?
  • Sleep-time reorganization — background consolidation between sessions

The PSM doesn’t decide what is true. It decides what is worth remembering, how strongly, and for how long.


The Critical Architectural Insight

PSM weights    →  shared, stable, trained once (the skill of memory)
Memory store   →  per-user, dynamic, personal (the content of memory)
Enter fullscreen mode Exit fullscreen mode

The PSM’s weights never store user content.

This means:

  • ✅ No catastrophic forgetting — user data never enters the weights
  • ✅ No privacy leakage between users — memory stores are fully isolated
  • ✅ No modification to the large LLM — it just receives better context
  • ✅ One model serves all users — only the memory store is personal

The Memory Tier Hierarchy

Tier Brain Analogue Lifespan PSM Role
Sensory Buffer Iconic memory Seconds Relevance gate
Working Memory Active context Session Context window
Episodic Store Hippocampus Days–weeks Consolidation decisions
Semantic Store Neocortex Months–permanent Pattern abstraction
Archival Store Cold storage Permanent Compressed, never deleted

Each memory entry carries PSM-managed metadata — strength, decay rate, recall count, emotional weight, confidence, and provenance tracing back to source episodic events.


Training the PSM

The PSM is trained on memory operations, not user content. The training signal is downstream utility:

  • Did the LLM perform better when this memory was retrieved? → reinforce
  • Was this retrieved memory irrelevant? → decay its weight
  • Did the user correct the LLM? → strongest negative signal — memory pipeline failed somewhere

This is reinforcement learning on memory utility. The PSM learns what’s worth remembering by observing what actually helped.


Sleep-Time Consolidation

Asynchronously, after sessions end, the PSM runs a consolidation loop:

for each user_shard:
    episodes = fetch_recent_episodic(since=last_consolidation)

    patterns = PSM.extract_semantic_patterns(episodes)
    for pattern in patterns:
        if pattern.confidence > threshold:
            semantic_store.upsert(pattern)
            conflicts = semantic_store.find_conflicts(pattern)
            if conflicts:
                semantic_store.flag_for_review(conflicts)

    semantic_store.apply_decay(decay_schedule)
    semantic_store.apply_reinforcement(access_log)
    episodic_store.prune(covered_by=semantic_store)
Enter fullscreen mode Exit fullscreen mode

The user’s next session begins with a reorganized, consolidated memory store — without any increase in retrieval cost.


How This Differs from Existing Work

System Key Difference
Letta / MemGPT LLM manages its own memory via tool calls — memory operations tax the primary reasoning model. PSM offloads this entirely.
mem0 / Zep External systems retrieve fragments. PSM replaces retrieval with a learned memory management model.
LoRA adapters per user Weights encode user-specific behavior. PSM explicitly avoids user content in weights.
Titans (DeepMind) Neural memory updated via test-time gradients. PSM keeps memory stores separate from any gradient updates.
Apple on-device models Closest analogue architecturally, but not trained on memory operations explicitly.

What’s Still Open

This is a prior art disclosure, not a finished system. The open problems are:

  • Optimal PSM-to-LLM interface via embeddings (requires LLM architecture changes)
  • Cold start problem for new users
  • Exact training curriculum for memory operations
  • Infrastructure for async consolidation at scale

These are tractable engineering problems, not fundamental blockers.


The Core Claim

The field has treated AI memory as a retrieval problem.

This architecture treats it as a cognitive skill problem.

A model that learns the art of remembering — operating on a personal store it curates, running consolidation asynchronously, decaying and strengthening memories based on utility — is architecturally closer to biological memory than any database-backed retrieval system.

That’s not a coincidence. Evolution had a long time to find the right answer.


📄 Full paper (CC0, public domain): https://zenodo.org/records/19647417


Top comments (0)