Krishna

Posted on Apr 19

The Personal Small Model (PSM): Memory as a Learned Cognitive Primitive

#agents #ai #machinelearning

The Problem With Every Memory System Today

mem0, Zep, Letta, MemPalace — they all make the same foundational assumption:

Memory is a storage problem.

Build a good enough database. Implement a smart enough retrieval mechanism. Inject the results into the LLM’s context. The model consumes the fragments. The model forgets. The cycle repeats.

This post argues that assumption is architecturally wrong, and proposes an alternative.

The Insight: Memory Is a Cognitive Skill, Not a Database

The human brain didn’t solve long-term memory by building a perfect database. It solved it through specialization:

🧠 Hippocampus — fast episodic capture
🧠 Neocortex — slow semantic consolidation
🧠 Prefrontal cortex — relevance gating
🌙 Sleep — consolidation, pruning, replay

No single system tries to do everything. Each has a narrow, trainable job.

The Personal Small Model (PSM) mirrors this exactly.

What is the PSM?

The PSM is a small model (1–3B parameters) trained not to store user content, but to master memory operations:

Relevance gating — what’s worth remembering at all?
Consolidation — when do episodic events become semantic facts?
Recall weighting — how strongly should this memory be surfaced?
Interference detection — does new info contradict old beliefs?
Decay scheduling — how quickly should different memory types fade?
Sleep-time reorganization — background consolidation between sessions

The PSM doesn’t decide what is true. It decides what is worth remembering, how strongly, and for how long.

The Critical Architectural Insight

PSM weights    →  shared, stable, trained once (the skill of memory)
Memory store   →  per-user, dynamic, personal (the content of memory)

The PSM’s weights never store user content.

This means:

✅ No catastrophic forgetting — user data never enters the weights
✅ No privacy leakage between users — memory stores are fully isolated
✅ No modification to the large LLM — it just receives better context
✅ One model serves all users — only the memory store is personal

The Memory Tier Hierarchy

Tier	Brain Analogue	Lifespan	PSM Role
Sensory Buffer	Iconic memory	Seconds	Relevance gate
Working Memory	Active context	Session	Context window
Episodic Store	Hippocampus	Days–weeks	Consolidation decisions
Semantic Store	Neocortex	Months–permanent	Pattern abstraction
Archival Store	Cold storage	Permanent	Compressed, never deleted

Each memory entry carries PSM-managed metadata — strength, decay rate, recall count, emotional weight, confidence, and provenance tracing back to source episodic events.

Training the PSM

The PSM is trained on memory operations, not user content. The training signal is downstream utility:

Did the LLM perform better when this memory was retrieved? → reinforce
Was this retrieved memory irrelevant? → decay its weight
Did the user correct the LLM? → strongest negative signal — memory pipeline failed somewhere

This is reinforcement learning on memory utility. The PSM learns what’s worth remembering by observing what actually helped.

Sleep-Time Consolidation

Asynchronously, after sessions end, the PSM runs a consolidation loop:

for each user_shard:
    episodes = fetch_recent_episodic(since=last_consolidation)

    patterns = PSM.extract_semantic_patterns(episodes)
    for pattern in patterns:
        if pattern.confidence > threshold:
            semantic_store.upsert(pattern)
            conflicts = semantic_store.find_conflicts(pattern)
            if conflicts:
                semantic_store.flag_for_review(conflicts)

    semantic_store.apply_decay(decay_schedule)
    semantic_store.apply_reinforcement(access_log)
    episodic_store.prune(covered_by=semantic_store)

The user’s next session begins with a reorganized, consolidated memory store — without any increase in retrieval cost.

How This Differs from Existing Work

System	Key Difference
Letta / MemGPT	LLM manages its own memory via tool calls — memory operations tax the primary reasoning model. PSM offloads this entirely.
mem0 / Zep	External systems retrieve fragments. PSM replaces retrieval with a learned memory management model.
LoRA adapters per user	Weights encode user-specific behavior. PSM explicitly avoids user content in weights.
Titans (DeepMind)	Neural memory updated via test-time gradients. PSM keeps memory stores separate from any gradient updates.
Apple on-device models	Closest analogue architecturally, but not trained on memory operations explicitly.

What’s Still Open

This is a prior art disclosure, not a finished system. The open problems are:

Optimal PSM-to-LLM interface via embeddings (requires LLM architecture changes)
Cold start problem for new users
Exact training curriculum for memory operations
Infrastructure for async consolidation at scale

These are tractable engineering problems, not fundamental blockers.

The Core Claim

The field has treated AI memory as a retrieval problem.

This architecture treats it as a cognitive skill problem.

A model that learns the art of remembering — operating on a personal store it curates, running consolidation asynchronously, decaying and strengthening memories based on utility — is architecturally closer to biological memory than any database-backed retrieval system.

That’s not a coincidence. Evolution had a long time to find the right answer.

📄 Full paper (CC0, public domain): https://zenodo.org/records/19647417

DEV Community