<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Krishna</title>
    <description>The latest articles on DEV Community by Krishna (@chkrishna2001).</description>
    <link>https://dev.to/chkrishna2001</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2637676%2F99b46d91-5997-4235-a9a6-9e21531de790.png</url>
      <title>DEV Community: Krishna</title>
      <link>https://dev.to/chkrishna2001</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/chkrishna2001"/>
    <language>en</language>
    <item>
      <title>The Personal Small Model (PSM): Memory as a Learned Cognitive Primitive</title>
      <dc:creator>Krishna</dc:creator>
      <pubDate>Sun, 19 Apr 2026 05:29:17 +0000</pubDate>
      <link>https://dev.to/chkrishna2001/the-personal-small-model-psm-memory-as-a-learned-cognitive-primitive-324f</link>
      <guid>https://dev.to/chkrishna2001/the-personal-small-model-psm-memory-as-a-learned-cognitive-primitive-324f</guid>
      <description>&lt;h2&gt;
  
  
  The Problem With Every Memory System Today
&lt;/h2&gt;

&lt;p&gt;mem0, Zep, Letta, MemPalace — they all make the same foundational assumption:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Memory is a storage problem.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Build a good enough database. Implement a smart enough retrieval mechanism. Inject the results into the LLM’s context. The model consumes the fragments. The model forgets. The cycle repeats.&lt;/p&gt;

&lt;p&gt;This post argues that assumption is architecturally wrong, and proposes an alternative.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Insight: Memory Is a Cognitive Skill, Not a Database
&lt;/h2&gt;

&lt;p&gt;The human brain didn’t solve long-term memory by building a perfect database. It solved it through &lt;strong&gt;specialization&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🧠 &lt;strong&gt;Hippocampus&lt;/strong&gt; — fast episodic capture&lt;/li&gt;
&lt;li&gt;🧠 &lt;strong&gt;Neocortex&lt;/strong&gt; — slow semantic consolidation&lt;/li&gt;
&lt;li&gt;🧠 &lt;strong&gt;Prefrontal cortex&lt;/strong&gt; — relevance gating&lt;/li&gt;
&lt;li&gt;🌙 &lt;strong&gt;Sleep&lt;/strong&gt; — consolidation, pruning, replay&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No single system tries to do everything. Each has a narrow, trainable job.&lt;/p&gt;

&lt;p&gt;The Personal Small Model (PSM) mirrors this exactly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is the PSM?
&lt;/h2&gt;

&lt;p&gt;The PSM is a &lt;strong&gt;small model (1–3B parameters)&lt;/strong&gt; trained not to store user content, but to master &lt;strong&gt;memory operations&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Relevance gating — what’s worth remembering at all?&lt;/li&gt;
&lt;li&gt;Consolidation — when do episodic events become semantic facts?&lt;/li&gt;
&lt;li&gt;Recall weighting — how strongly should this memory be surfaced?&lt;/li&gt;
&lt;li&gt;Interference detection — does new info contradict old beliefs?&lt;/li&gt;
&lt;li&gt;Decay scheduling — how quickly should different memory types fade?&lt;/li&gt;
&lt;li&gt;Sleep-time reorganization — background consolidation between sessions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The PSM doesn’t decide what is &lt;em&gt;true&lt;/em&gt;. It decides what is &lt;em&gt;worth remembering&lt;/em&gt;, how strongly, and for how long.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Critical Architectural Insight
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PSM weights    →  shared, stable, trained once (the skill of memory)
Memory store   →  per-user, dynamic, personal (the content of memory)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The PSM’s weights never store user content.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;✅ No catastrophic forgetting — user data never enters the weights&lt;/li&gt;
&lt;li&gt;✅ No privacy leakage between users — memory stores are fully isolated&lt;/li&gt;
&lt;li&gt;✅ No modification to the large LLM — it just receives better context&lt;/li&gt;
&lt;li&gt;✅ One model serves all users — only the memory store is personal&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Memory Tier Hierarchy
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Brain Analogue&lt;/th&gt;
&lt;th&gt;Lifespan&lt;/th&gt;
&lt;th&gt;PSM Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Sensory Buffer&lt;/td&gt;
&lt;td&gt;Iconic memory&lt;/td&gt;
&lt;td&gt;Seconds&lt;/td&gt;
&lt;td&gt;Relevance gate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Working Memory&lt;/td&gt;
&lt;td&gt;Active context&lt;/td&gt;
&lt;td&gt;Session&lt;/td&gt;
&lt;td&gt;Context window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Episodic Store&lt;/td&gt;
&lt;td&gt;Hippocampus&lt;/td&gt;
&lt;td&gt;Days–weeks&lt;/td&gt;
&lt;td&gt;Consolidation decisions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Store&lt;/td&gt;
&lt;td&gt;Neocortex&lt;/td&gt;
&lt;td&gt;Months–permanent&lt;/td&gt;
&lt;td&gt;Pattern abstraction&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Archival Store&lt;/td&gt;
&lt;td&gt;Cold storage&lt;/td&gt;
&lt;td&gt;Permanent&lt;/td&gt;
&lt;td&gt;Compressed, never deleted&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Each memory entry carries PSM-managed metadata — strength, decay rate, recall count, emotional weight, confidence, and provenance tracing back to source episodic events.&lt;/p&gt;




&lt;h2&gt;
  
  
  Training the PSM
&lt;/h2&gt;

&lt;p&gt;The PSM is trained on &lt;strong&gt;memory operations&lt;/strong&gt;, not user content. The training signal is downstream utility:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Did the LLM perform better when this memory was retrieved? → reinforce&lt;/li&gt;
&lt;li&gt;Was this retrieved memory irrelevant? → decay its weight&lt;/li&gt;
&lt;li&gt;Did the user correct the LLM? → &lt;strong&gt;strongest negative signal&lt;/strong&gt; — memory pipeline failed somewhere&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is reinforcement learning on memory utility. The PSM learns what’s worth remembering by observing what actually helped.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sleep-Time Consolidation
&lt;/h2&gt;

&lt;p&gt;Asynchronously, after sessions end, the PSM runs a consolidation loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;each&lt;/span&gt; &lt;span class="n"&gt;user_shard&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;episodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_recent_episodic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;since&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;last_consolidation&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PSM&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;extract_semantic_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;episodes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;patterns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;confidence&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upsert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;conflicts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_conflicts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pattern&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;conflicts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flag_for_review&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;conflicts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_decay&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;decay_schedule&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;apply_reinforcement&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;access_log&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;episodic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prune&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;covered_by&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;semantic_store&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The user’s next session begins with a reorganized, consolidated memory store — without any increase in retrieval cost.&lt;/p&gt;




&lt;h2&gt;
  
  
  How This Differs from Existing Work
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Key Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Letta / MemGPT&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;LLM manages its own memory via tool calls — memory operations tax the primary reasoning model. PSM offloads this entirely.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;mem0 / Zep&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;External systems retrieve fragments. PSM replaces retrieval with a &lt;em&gt;learned&lt;/em&gt; memory management model.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;LoRA adapters per user&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Weights encode user-specific behavior. PSM explicitly avoids user content in weights.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Titans (DeepMind)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Neural memory updated via test-time gradients. PSM keeps memory stores separate from any gradient updates.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Apple on-device models&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Closest analogue architecturally, but not trained on memory operations explicitly.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  What’s Still Open
&lt;/h2&gt;

&lt;p&gt;This is a prior art disclosure, not a finished system. The open problems are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optimal PSM-to-LLM interface via embeddings (requires LLM architecture changes)&lt;/li&gt;
&lt;li&gt;Cold start problem for new users&lt;/li&gt;
&lt;li&gt;Exact training curriculum for memory operations&lt;/li&gt;
&lt;li&gt;Infrastructure for async consolidation at scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are tractable engineering problems, not fundamental blockers.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Core Claim
&lt;/h2&gt;

&lt;p&gt;The field has treated AI memory as a retrieval problem.&lt;/p&gt;

&lt;p&gt;This architecture treats it as a &lt;strong&gt;cognitive skill problem&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A model that learns the art of remembering — operating on a personal store it curates, running consolidation asynchronously, decaying and strengthening memories based on utility — is architecturally closer to biological memory than any database-backed retrieval system.&lt;/p&gt;

&lt;p&gt;That’s not a coincidence. Evolution had a long time to find the right answer.&lt;/p&gt;




&lt;p&gt;📄 &lt;strong&gt;Full paper (CC0, public domain):&lt;/strong&gt; &lt;a href="https://zenodo.org/records/19647417" rel="noopener noreferrer"&gt;https://zenodo.org/records/19647417&lt;/a&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>agents</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
