<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Amjad Shahzad</title>
    <description>The latest articles on DEV Community by Amjad Shahzad (@pg-amjad).</description>
    <link>https://dev.to/pg-amjad</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3996859%2F422f942d-17ea-4b47-a057-2f01bf72f90d.jpg</url>
      <title>DEV Community: Amjad Shahzad</title>
      <link>https://dev.to/pg-amjad</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/pg-amjad"/>
    <language>en</language>
    <item>
      <title>Decay aware agent memory in one exact Postgres query</title>
      <dc:creator>Amjad Shahzad</dc:creator>
      <pubDate>Mon, 22 Jun 2026 11:59:06 +0000</pubDate>
      <link>https://dev.to/pg-amjad/decay-aware-agent-memory-in-one-exact-postgres-query-1h8l</link>
      <guid>https://dev.to/pg-amjad/decay-aware-agent-memory-in-one-exact-postgres-query-1h8l</guid>
      <description>&lt;p&gt;Most "agent memory" is just a vector search. You embed what the agent said, store it, and at recall time you do a nearest-neighbor lookup. It works, until you notice that a note from three weeks ago ranks exactly the same as one from three minutes ago. My assistant would confidently resurface a preference I had changed months earlier.&lt;/p&gt;

&lt;p&gt;That is not memory. It is a filing cabinet with good search.&lt;/p&gt;

&lt;p&gt;I wanted recall to rank by &lt;strong&gt;similarity x importance x recency&lt;/strong&gt;: a fresh, important memory should beat a slightly-more-similar but stale one, and trivial old memories should fade. This post is about the one idea that made that cheap and exact, and it ended up as a small Postgres extension called &lt;a href="https://github.com/pg-amjad/pgmemai" rel="noopener noreferrer"&gt;pgmemai&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The obvious approach, and why it falls short
&lt;/h2&gt;

&lt;p&gt;The naive version is "over-fetch by similarity, then re-rank":&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
       &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;lambda&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;age_days&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;AS&lt;/span&gt; &lt;span class="n"&gt;score&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;q&lt;/span&gt;      &lt;span class="c1"&gt;-- nearest by cosine&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;                       &lt;span class="c1"&gt;-- grab a big candidate pool&lt;/span&gt;
&lt;span class="c1"&gt;-- ... then re-sort by score in app code, take top 10&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The problem: the memory that &lt;em&gt;should&lt;/em&gt; win on importance and recency is often &lt;strong&gt;not&lt;/strong&gt; in the similarity-top-K at all. So you have to fetch a large candidate pool to even have a chance of seeing it, and you still miss high-importance or recent-but-moderately-similar memories that fell outside the pool. You are fighting your own index.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trick: fold the objective into the vector
&lt;/h2&gt;

&lt;p&gt;The score I want is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Watch what happens if I bake importance and recency into the stored vector at insert time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;embedding_wd&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now take the inner product of a normalized query with that folded vector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nf"&gt;unit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;embedding_wd&lt;/span&gt;
  &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cos&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compare that to the score I actually want. They differ only by a factor of &lt;code&gt;exp(-lambda * now)&lt;/code&gt;. And &lt;code&gt;exp(-lambda * now)&lt;/code&gt; is &lt;strong&gt;the same constant for every row in a given query&lt;/strong&gt;, so it does not change the top-K ordering. It just scales everything.&lt;/p&gt;

&lt;p&gt;Two facts make this hold:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;exp(-lambda * now)&lt;/code&gt; is a per-query constant, so it drops out of the ranking.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;created_at&lt;/code&gt; is immutable, so &lt;code&gt;exp(lambda * created_at)&lt;/code&gt; is computed &lt;strong&gt;once at insert&lt;/strong&gt; and never needs updating.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;So a single plain inner-product nearest-neighbor search over &lt;code&gt;embedding_wd&lt;/code&gt; ranks rows by the full &lt;code&gt;similarity x importance x recency&lt;/code&gt; objective, &lt;strong&gt;exactly&lt;/strong&gt;. No re-ranking pass. No background job re-scoring rows as time passes. No special time-aware index.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it looks like in Postgres
&lt;/h2&gt;

&lt;p&gt;It is built on &lt;a href="https://github.com/pgvector/pgvector" rel="noopener noreferrer"&gt;pgvector&lt;/a&gt;. A &lt;code&gt;BEFORE INSERT&lt;/code&gt; trigger computes the folded vector:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- inside a BEFORE INSERT trigger:&lt;/span&gt;
&lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;importance&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lambda&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;epoch_day&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding_wd&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;l2_normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;NEW&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;   &lt;span class="c1"&gt;-- scale the unit vector by w&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The folded column gets an HNSW index with inner-product ops:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;CREATE&lt;/span&gt; &lt;span class="k"&gt;INDEX&lt;/span&gt; &lt;span class="k"&gt;ON&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt; &lt;span class="k"&gt;USING&lt;/span&gt; &lt;span class="n"&gt;hnsw&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;embedding_wd&lt;/span&gt; &lt;span class="n"&gt;vector_ip_ops&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And recall is one indexed top-K (&lt;code&gt;&amp;lt;#&amp;gt;&lt;/code&gt; is pgvector's inner-product operator):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="k"&gt;SELECT&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt; &lt;span class="n"&gt;memories&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt; &lt;span class="n"&gt;agent_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="k"&gt;AND&lt;/span&gt; &lt;span class="n"&gt;superseded_at&lt;/span&gt; &lt;span class="k"&gt;IS&lt;/span&gt; &lt;span class="k"&gt;NULL&lt;/span&gt;
&lt;span class="k"&gt;ORDER&lt;/span&gt; &lt;span class="k"&gt;BY&lt;/span&gt; &lt;span class="n"&gt;embedding_wd&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;#&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;l2_normalize&lt;/span&gt;&lt;span class="p"&gt;(:&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;LIMIT&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is the whole hot path. One index scan.&lt;/p&gt;

&lt;h2&gt;
  
  
  The one gotcha: overflow
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;exp(lambda * created_at)&lt;/code&gt; grows over time, so left alone it would eventually overflow a float. The fix is a periodic &lt;code&gt;re_center()&lt;/code&gt; that multiplies every folded vector by a single constant to pull the exponent back down. Because it is a global scale, it does not change inner-product ordering, so recall is unchanged. It is a no-op until &lt;code&gt;lambda * (now - t_ref) &amp;gt; 40&lt;/code&gt;, which is years away for typical lambda, and it runs during maintenance.&lt;/p&gt;

&lt;h2&gt;
  
  
  Does it actually return the right memories?
&lt;/h2&gt;

&lt;p&gt;I measured recall@10 against an exact brute-force computation of the same objective (so 1.000 means HNSW returned the same top-10 as the exact answer, it is a statement about index approximation, not "perfect memory"):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;memories&lt;/th&gt;
&lt;th&gt;ef_search=40&lt;/th&gt;
&lt;th&gt;ef_search=100&lt;/th&gt;
&lt;th&gt;ef_search=200&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100k&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;0.945&lt;/td&gt;
&lt;td&gt;0.995&lt;/td&gt;
&lt;td&gt;1.000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;ef_search&lt;/code&gt; is the standard HNSW recall/latency knob. Same 1.000 on real &lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt; embeddings, not just synthetic clusters. Latency is about 13 ms per call at 100k on a debug build. The benchmark scripts are in the repo if you want to run your own data through them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rest of the system
&lt;/h2&gt;

&lt;p&gt;Recall is the interesting part, but a memory store needs more to be usable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Lifecycle:&lt;/strong&gt; memories are range-partitioned by &lt;code&gt;created_at&lt;/code&gt; (immutable membership, so no row movement), with roll-up of old partitions and an opt-in &lt;code&gt;expire(retention_days)&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supersession:&lt;/strong&gt; give a changing fact a stable &lt;code&gt;mem_key&lt;/code&gt;. A new value retires the old one for recall but keeps it for a time-travel &lt;code&gt;audit(agent, as_of)&lt;/code&gt; query ("what did the agent know on date X?").&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting:&lt;/strong&gt; memories whose activation &lt;code&gt;importance * exp(-lambda * age)&lt;/code&gt; drops below a floor are evicted.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SDKs:&lt;/strong&gt; Python and TypeScript, plus drop-in LangChain, CrewAI, and AutoGen adapters.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Honest limitations
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;It is pre-1.0, so minor versions may change the schema.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;lambda&lt;/code&gt; (the decay rate) is fixed per store because it is baked into the index. That is the whole trick, but it means you choose a decay rate up front.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;recall()&lt;/code&gt; writes a little on every call (it bumps an access counter for reinforcement), so it is not a pure read. I think it should be optional, and that is on the list.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;It is Apache-2.0 and runs in the Postgres you already have:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;extension &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; make &lt;span class="nb"&gt;install
&lt;/span&gt;psql &lt;span class="nt"&gt;-d&lt;/span&gt; mydb &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"CREATE EXTENSION pgmemai CASCADE;"&lt;/span&gt;
psql &lt;span class="nt"&gt;-d&lt;/span&gt; mydb &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"SELECT pgmemai.create_store(1536, 0.05);"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repo: &lt;a href="https://github.com/pg-amjad/pgmemai" rel="noopener noreferrer"&gt;github.com/pg-amjad/pgmemai&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I would genuinely love feedback on the approach and the math, and especially to hear where the decay-fold breaks on a case I have not hit. How are you handling agent memory today?&lt;/p&gt;

</description>
      <category>postgres</category>
      <category>ai</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
