<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Abhishek Chauhan</title>
    <description>The latest articles on DEV Community by Abhishek Chauhan (@ac12644).</description>
    <link>https://dev.to/ac12644</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1036391%2F760d29a0-e504-4450-9a4b-cf0c63b0ca3a.png</url>
      <title>DEV Community: Abhishek Chauhan</title>
      <link>https://dev.to/ac12644</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ac12644"/>
    <language>en</language>
    <item>
      <title>Production Agent Memory: Compaction, Decay, and the Observation Engine</title>
      <dc:creator>Abhishek Chauhan</dc:creator>
      <pubDate>Mon, 18 May 2026 04:58:57 +0000</pubDate>
      <link>https://dev.to/ac12644/production-agent-memory-compaction-decay-and-the-observation-engine-24gf</link>
      <guid>https://dev.to/ac12644/production-agent-memory-compaction-decay-and-the-observation-engine-24gf</guid>
      <description>&lt;p&gt;Most guides on agent memory stop at storage. Pick a vector store, embed your documents, retrieve the top-k. That works for RAG. It does not work for agents that run continuously across weeks and months, accumulating behavioral history about real users making real decisions.&lt;/p&gt;

&lt;p&gt;Production agent memory is a different problem. The questions aren't just "what do I store?" and "how do I retrieve it?" They are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How does the agent learn that a user always makes the same correction, without being explicitly told?&lt;/li&gt;
&lt;li&gt;How do you give the agent three months of behavioral history without flooding the context window?&lt;/li&gt;
&lt;li&gt;What happens when a retrieved memory is wrong — not just irrelevant, but actively contradicted by the user?&lt;/li&gt;
&lt;li&gt;When should an observed pattern become a permanent rule?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post builds a complete architecture for answering those questions — from taxonomy to scoring formula to the nightly maintenance job that keeps it all clean.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Three Failure Modes
&lt;/h2&gt;

&lt;p&gt;Before any architecture decision, name the failure modes you're designing against:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Too much&lt;/strong&gt; — flooding the context with everything you know. The model gets slow, expensive, and loses precision. Ironically, more memory makes the agent worse.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Too little&lt;/strong&gt; — injecting nothing. The agent repeats mistakes, ignores learned rules, asks the user to re-explain preferences they stated last week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wrong type&lt;/strong&gt; — injecting stale, contradicted, or irrelevant memories. Worse than nothing: the agent acts on false information confidently.&lt;/p&gt;

&lt;p&gt;Every design decision in this post traces back to avoiding one of these three.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Four Memory Types
&lt;/h2&gt;

&lt;p&gt;Not all memory is the same. A production system needs four distinct types, each with a different storage backend, lifecycle, and injection strategy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Working Memory
&lt;/h3&gt;

&lt;p&gt;The live context for the current task only. Exists for the duration of one agent run. Discarded when the task ends.&lt;/p&gt;

&lt;p&gt;What it holds: the current task payload, intermediate reasoning steps, partial tool call results, running approval state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key constraint: 4,000 token maximum&lt;/strong&gt;, enforced before every LLM call. If working memory exceeds this, compress intermediate steps to a summary using a cheap model call first. Never silently truncate — truncation loses state. Compress explicitly.&lt;/p&gt;

&lt;p&gt;Storage: in-process object. No database write during execution — only persisted when the task reaches terminal state (done, failed, rolled back).&lt;/p&gt;




&lt;h3&gt;
  
  
  Episodic Memory
&lt;/h3&gt;

&lt;p&gt;A timestamped log of specific past events. The raw ground truth. "On Friday at 15:04, the agent drafted a reply to a supplier and the user modified it before sending."&lt;/p&gt;

&lt;p&gt;What it holds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every completed agent action (who, what, when, outcome)&lt;/li&gt;
&lt;li&gt;Every approval decision (approved / rejected / modified, with the modification text)&lt;/li&gt;
&lt;li&gt;Every exception the agent surfaced and how it was resolved&lt;/li&gt;
&lt;li&gt;Every user correction — agent did X, user changed it to Y&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Storage: relational rows in an &lt;code&gt;episodes&lt;/code&gt; table (append-only, never modified) plus a vector embedding per episode for semantic retrieval.&lt;/p&gt;

&lt;p&gt;Retrieval: &lt;strong&gt;hybrid&lt;/strong&gt; — BM25 full-text search for exact matches on names, amounts, dates, combined with cosine similarity on the embedding for conceptually related events. Results merged, rescored with the decay function, top-k injected.&lt;/p&gt;




&lt;h3&gt;
  
  
  Semantic Memory
&lt;/h3&gt;

&lt;p&gt;Facts independent of any specific event. Stable knowledge about the user, their contacts, their company, their preferences. Changes slowly.&lt;/p&gt;

&lt;p&gt;What it holds:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User profile: name, company, tone preference, industry, CRM type&lt;/li&gt;
&lt;li&gt;Contact profiles: name, relationship, tone preference, known quirks, routing notes&lt;/li&gt;
&lt;li&gt;Company rules: payment terms per supplier, invoice thresholds, routing rules ("technical emails → forward to Lara")&lt;/li&gt;
&lt;li&gt;Agent configuration: which capabilities are enabled, global off-limits (contacts never auto-replied to, folders never touched)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;This type does not need vector search.&lt;/strong&gt; Direct SQL lookups are faster, cheaper, and more precise for structured facts. Before every agent call, the orchestrator fetches the relevant semantic facts for that agent type and injects them as a structured block in the system prompt. Deterministic, synchronous, never misses.&lt;/p&gt;




&lt;h3&gt;
  
  
  Procedural Memory
&lt;/h3&gt;

&lt;p&gt;Learned workflows — rules of thumb the agent has inferred from repeated user corrections and confirmed behavioral patterns. Not facts, not events, but &lt;em&gt;how to behave&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;What it holds (example rules, in plain language):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Never use semicolons — the user always removes them"&lt;/li&gt;
&lt;li&gt;"Emails from Lara: never archive automatically"&lt;/li&gt;
&lt;li&gt;"Quotes above €10,000: don't prepare a draft, the user always rewrites them"&lt;/li&gt;
&lt;li&gt;"Friday afternoon after 14:30: defer everything to Monday"&lt;/li&gt;
&lt;li&gt;"Pelletteria Veneto SRL: always flag, never route autonomously"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Storage: a &lt;code&gt;procedures&lt;/code&gt; table. Each row has: agent, rule text (plain language, injected verbatim), source observation ID (foreign key to the observation that triggered promotion), promoted timestamp, confirmation count, last applied timestamp.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No vector search needed here either.&lt;/strong&gt; All active procedures for the current agent are fetched in full and injected at the start of every system prompt. There are never enough procedures to overflow context — the promotion threshold prevents noise accumulation.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Observation Engine
&lt;/h2&gt;

&lt;p&gt;This is the component most memory systems don't have — and its absence is why agents that "remember" things still feel dumb.&lt;/p&gt;

&lt;p&gt;The observation engine is the mechanism that detects behavioral patterns from raw episodic data and promotes them into procedural rules. It's the bridge between "things that happened" and "rules that govern future behavior."&lt;/p&gt;

&lt;h3&gt;
  
  
  Sources of Raw Signals
&lt;/h3&gt;

&lt;p&gt;Every agent in the system feeds signals into a queue. The signals are not interpretations — they're raw behavioral data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;What it captures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;User modifies a draft before sending&lt;/td&gt;
&lt;td&gt;Which part changed, how many times this pattern has repeated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User rejects an approval&lt;/td&gt;
&lt;td&gt;What was rejected, the rejection reason if given&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User routes something differently than predicted&lt;/td&gt;
&lt;td&gt;Where the agent sent it, where the user moved it&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Agent surfaces an exception&lt;/td&gt;
&lt;td&gt;New contact with no known rule, amount outside threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User correction&lt;/td&gt;
&lt;td&gt;Agent did X, user explicitly changed it to Y&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These signals go into a queue. The observation engine processes them nightly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pattern Detection
&lt;/h3&gt;

&lt;p&gt;The nightly job (runs at 02:00) reads the last 30 days of the episodes table and prompts an LLM to identify genuine repeated patterns. The prompt enforces strict constraints:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;observation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;engine.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Analyse&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;user&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;behavioral&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;identify&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;genuine,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;repeated&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;patterns.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;DO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;invent.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;DO&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;NOT&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;generalise&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;single&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;event.&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;An&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;observation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;only&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;valid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;it&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;has&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;least&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;consistent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;occurrences.&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Required&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;format&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;each&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;observation:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;observation&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"writing-style | rhythm | people | tools | decisions"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"agent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"email | accounting | crm | relay | files | system"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"quote"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"direct statement, max 25 words"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"evidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"concise phrase with supporting numbers, max 40 words"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"occurrences"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"low | medium | high | very-high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"promotion_candidate"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;&amp;lt;/observation&amp;gt;&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;Rules:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;confidence&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;high&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;only&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;occurrences&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;&amp;gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;AND&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;pattern&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="err"&gt;%+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;consistent&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;promotion_candidate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;only&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;observation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;implies&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;clear&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;action&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;rule&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Maximum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;new&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;observations&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;per&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;run&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The 3-observation cap per run prevents the system from flooding the observations table. Patterns that are genuine will recur and be detected in subsequent nightly runs, accumulating evidence over time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deduplication Before Insert
&lt;/h3&gt;

&lt;p&gt;Before any new observation is written, it's checked against existing ones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;isDuplicate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newObs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;ObservationCandidate&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;boolean&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;embedDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;newObs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;quote&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="s2"&gt;`
    SELECT o.id, o.quote,
      vec_distance_cosine(ov.embedding, &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;) as dist
    FROM observations_vec ov
    JOIN observations o ON o.id = ov.id
    WHERE o.agent    = &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;newObs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;
      AND o.category = &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;newObs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;category&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;
      AND o.status   = 'active'
      AND vec_distance_cosine(ov.embedding, &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;) &amp;lt; 0.15
    LIMIT 1
  `&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;length&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If a near-duplicate exists and the new one has higher &lt;code&gt;occurrences&lt;/code&gt;, the existing row's evidence is updated and occurrence count incremented — not replaced. Observations grow stronger over time, they don't get duplicated.&lt;/p&gt;

&lt;h3&gt;
  
  
  Confidence Thresholds
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;low          → 3-4 occurrences,  pattern &amp;lt; 70% consistent
medium       → 4-5 occurrences,  pattern 70-80% consistent
high         → 5+  occurrences,  pattern &amp;gt; 80% consistent
very-high    → 8+  occurrences,  pattern &amp;gt; 90% consistent, no contradictions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Promotion to Procedure
&lt;/h3&gt;

&lt;p&gt;A procedural rule is promoted when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Confidence = high or very-high&lt;/li&gt;
&lt;li&gt;&lt;code&gt;promotion_candidate = true&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;User has not marked it "wrong" (see feedback loop below)&lt;/li&gt;
&lt;li&gt;User has not dismissed it within 48 hours of creation&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The promotion threshold is deliberately conservative. A rule injected into every agent call for weeks shapes behavior continuously. False positives are more damaging than false negatives.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Feedback Loop
&lt;/h3&gt;

&lt;p&gt;Every observation is shown to the user with two buttons: "You're right" and "You're wrong."&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleFeedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;obsId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;correct&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;wrong&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;feedback&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;correct&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;observations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;confidenceBoost&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="s2"&gt;`confidence_boost + 0.2`&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;observations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;obsId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;maybePromoteToProcedure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;obsId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Set status to rejected — excluded from all future retrieval&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;observations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;rejected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;observations&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;obsId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;// Delete embedding — rejected observations never surface in retrieval&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;deleteEmbedding&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;observation&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;obsId&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;// Demote any procedure that was promoted from this observation&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;procedures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;demoted&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;where&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;eq&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;procedures&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;sourceObservationId&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;obsId&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c1"&gt;// Write a counter-signal so the nightly job doesn't regenerate&lt;/span&gt;
    &lt;span class="c1"&gt;// the same wrong observation next run&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;insert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;agentSignals&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;signalType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;observation_rejected&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stringify&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;rejectedQuote&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;obs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;quote&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The counter-signal write is the part most implementations miss. Without it, the nightly pattern detection job will see the same 30 days of data, detect the same pattern, and reinsert the same observation you just rejected. The counter-signal closes the loop.&lt;/p&gt;




&lt;h2&gt;
  
  
  The No-Delete Principle
&lt;/h2&gt;

&lt;p&gt;This is the most important design decision in the entire architecture, and the one teams get wrong most often.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do not delete episodic memory rows when their decay score falls below a threshold.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The reasoning matters. Decay score measures &lt;em&gt;recency of access&lt;/em&gt; — how long since this episode was retrieved. It does not measure behavioral importance.&lt;/p&gt;

&lt;p&gt;Consider an episode from 95 days ago: "User rejected the Magnani draft three times, never wanted assertive language with this contact." This episode scores low today because it hasn't been retrieved recently. The moment the agent receives a new email from Magnani, that old episode is the most critical thing in memory. If you deleted it on decay grounds, the agent has permanent amnesia about a behaviorally defining pattern — and will repeat the exact mistake the user corrected three times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The correct model: low decay score means retrieve this less often, not destroy it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Episodes are compacted — compressed into summary records that cost far less to store, embed, and retrieve — but the raw rows are never deleted by an automated job.&lt;/p&gt;

&lt;p&gt;The only paths to actual hard deletion:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Explicit user GDPR erasure request&lt;/li&gt;
&lt;li&gt;User marks an observation as wrong (flags its source episodes as contradicted)&lt;/li&gt;
&lt;li&gt;Admin-level seat deletion&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Everything else is compaction.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Compaction Pipeline
&lt;/h2&gt;

&lt;p&gt;Compaction is a lossless-to-lossy compression pipeline that preserves behavioral signal in progressively smaller form. It's how you give an agent three months of behavioral history without overflowing the context window.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Tier 0 — Raw episodes
  Individual action records. Full detail. Embedded individually.
  "Mon 09:11, agent drafted reply to Bertelli re: Q3 order.
  User approved without modification."
  → Used for: recent retrieval (&amp;lt; 30 days), audit, rollback.

Tier 1 — Weekly compaction (triggered at 30 days)
  Groups of 5-15 raw episodes from the same agent + contact cluster,
  spanning one week, summarised into a single compact record.
  "Week of 3-9 June: agent handled 8 interactions with Bertelli.
  6 approved without edits (orders, shipping confirmations).
  1 modified: removed semicolon, shortened opening paragraph.
  1 routed to manager (quote request). Pattern confirmed: direct,
  no preamble, fast approval rate."
  Source raw episodes → status='superseded_raw', embeddings deleted.
  Compact record → status='active', fresh embedding from summary.
  Storage ratio: ~8:1.

Tier 2 — Monthly compaction (triggered at 90 days)
  Groups of tier-1 compact records from the same agent + contact + month,
  summarised into a period record.
  "June 2026 — email agent × Bertelli: 32 interactions.
  Approval rate 94%. Established pattern: direct opener, no semicolons,
  route quotes to manager. No exceptions. Tone: formal-concise, confirmed."
  Source tier-1 records → status='superseded_tier1', embeddings deleted.
  Compact record → status='active', fresh embedding.
  Storage ratio from raw: ~32:1.

Tier 2 records never compact further.
They are permanent behavioral summaries.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The schema needs to track tier, status, and compact group ID on every episode row:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;episodes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sqliteTable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;episodes&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;             &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;primaryKey&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;$defaultFn&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;crypto&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randomUUID&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
  &lt;span class="na"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;          &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;notNull&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="na"&gt;eventType&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;event_type&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;notNull&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="c1"&gt;// 'action_completed' | 'approval_approved' | 'approval_rejected' |&lt;/span&gt;
  &lt;span class="c1"&gt;// 'approval_modified' | 'exception_raised' | 'user_correction' | 'compact'&lt;/span&gt;

  &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;summary&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;          &lt;span class="c1"&gt;// raw episode: plain text, max 100 chars&lt;/span&gt;
  &lt;span class="na"&gt;outcome&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;        &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;outcome&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;          &lt;span class="c1"&gt;// done | approved | rejected | modified | exception&lt;/span&gt;
  &lt;span class="na"&gt;entities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;       &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;entities&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;         &lt;span class="c1"&gt;// JSON: [{type: 'contact', name: 'John Smith'}]&lt;/span&gt;

  &lt;span class="na"&gt;compactSummary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;compact_summary&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;  &lt;span class="c1"&gt;// compact record: multi-sentence narrative&lt;/span&gt;
  &lt;span class="na"&gt;compactionTier&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;compaction_tier&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;notNull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;compactGroupId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;compact_group_id&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// ID of the compact record covering this row&lt;/span&gt;

  &lt;span class="na"&gt;status&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;status&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;notNull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="k"&gt;default&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;raw&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="c1"&gt;// 'raw'              → live raw episode, has embedding&lt;/span&gt;
  &lt;span class="c1"&gt;// 'active'           → live compact record (tier-1 or tier-2), has embedding&lt;/span&gt;
  &lt;span class="c1"&gt;// 'superseded_raw'   → absorbed into tier-1, row kept, embedding gone&lt;/span&gt;
  &lt;span class="c1"&gt;// 'superseded_tier1' → absorbed into tier-2, row kept, embedding gone&lt;/span&gt;

  &lt;span class="na"&gt;lastAccessedAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;last_accessed_at&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// updated on each retrieval — feeds decay calc&lt;/span&gt;
  &lt;span class="na"&gt;createdAt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;      &lt;span class="nf"&gt;integer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;created_at&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;notNull&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;$defaultFn&lt;/span&gt;&lt;span class="p"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;episodes_vec&lt;/code&gt; virtual table holds embeddings &lt;strong&gt;only for retrieval-eligible rows&lt;/strong&gt; — &lt;code&gt;status='raw'&lt;/code&gt; and &lt;code&gt;status='active'&lt;/code&gt;. Superseded rows have no embedding. This means the semantic search naturally covers the full timeline at the right level of granularity: recent events as individual rows, older events as compact summaries. No extra filtering needed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tier-Aware Decay Scoring
&lt;/h2&gt;

&lt;p&gt;The retrieval score governs which memories float to the top when assembling the context window. It is computed dynamically at retrieval time — not pre-computed, not stored.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight mathematica"&gt;&lt;code&gt;&lt;span class="nv"&gt;retrieval&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;score&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;episode&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nv"&gt;cosine&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;episode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;×&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;recency&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;episode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;last&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;accessed&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;at&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;×&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;importance&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;episode&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nv"&gt;compaction&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="nv"&gt;recency&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;weight&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;t&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;e&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="err"&gt;−λ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;×&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;days&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;since&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;last&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;access&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;λ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0.04&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nv"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;raw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;episodes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;17&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;day&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;half&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;life&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;λ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0.015&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;compact&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;46&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;day&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;half&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;life&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;λ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0.005&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;compact&lt;/span&gt;&lt;span class="w"&gt;           &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;138&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;day&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;half&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;life&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;λ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="nv"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;procedures&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;decay&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;active&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;rules&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;don&lt;/span&gt;&lt;span class="o"&gt;'&lt;/span&gt;&lt;span class="nv"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;fade&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="err"&gt;λ&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0.02&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nv"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;observations&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;35&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;day&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;half&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;life&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="nv"&gt;importance&lt;/span&gt;&lt;span class="o"&gt;_&lt;/span&gt;&lt;span class="nv"&gt;weight&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nv"&gt;raw&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;episode&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1.0&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;full&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;signal&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;compact&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1.2&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;confirmed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;repeated&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;patterns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;boosted&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nv"&gt;tier&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="m"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;compact&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1.1&lt;/span&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;period&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;summaries&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things to notice in these numbers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier-1 compact records get a higher importance weight (1.2) than raw episodes (1.0).&lt;/strong&gt; This is intentional. A weekly summary exists because 5-15 individual events were similar enough to summarise together. That repetition is itself a signal — these patterns proved their worth. They should rank higher than a single raw event of equivalent semantic similarity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier-2 records decay slower than tier-1 (λ=0.005 vs 0.015)&lt;/strong&gt; because monthly period summaries represent stable, long-running patterns. A summary describing three months of consistent behavior should remain relevant for much longer than a summary of last week's activity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Procedures have λ=0.&lt;/strong&gt; A learned rule like "never use semicolons" doesn't become less applicable just because it hasn't been triggered recently. Decay doesn't touch rules.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Full Scoring Implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;scoreEpisode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;EpisodeRow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;semanticScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;keywordScore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;now&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;λ&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compaction_tier&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mf"&gt;0.04&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compaction_tier&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mf"&gt;0.015&lt;/span&gt;
          &lt;span class="p"&gt;:&lt;/span&gt;                             &lt;span class="mf"&gt;0.005&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;daysSinceAccess&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;last_accessed_at&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;created_at&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;86&lt;/span&gt;&lt;span class="nx"&gt;_400_000&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;recency&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nx"&gt;λ&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;daysSinceAccess&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;importanceWeight&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compaction_tier&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;row&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;compaction_tier&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mf"&gt;1.2&lt;/span&gt;
  &lt;span class="p"&gt;:&lt;/span&gt;                             &lt;span class="mf"&gt;1.1&lt;/span&gt;

  &lt;span class="c1"&gt;// Semantic similarity weighted higher than keyword match&lt;/span&gt;
  &lt;span class="k"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;semanticScore&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nx"&gt;keywordScore&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;recency&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;importanceWeight&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hybrid retrieval — BM25 keyword search merged with semantic similarity — is worth the implementation complexity. Contact names, amounts, and dates don't embed well: "Marco Bertelli" and "Bertelli" produce different vectors but BM25 catches both as exact matches. For memory systems grounded in real-world entities, keyword recall fills the gaps that pure vector similarity misses.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Context Budget
&lt;/h2&gt;

&lt;p&gt;One of the most underspecified parts of agent memory design is how much of the context window each memory type should occupy. Without explicit budgets, whichever retrieval path returns the most text wins — which is almost never the right outcome.&lt;/p&gt;

&lt;p&gt;Here's a concrete token budget for a 32,000-token context window:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Slot&lt;/th&gt;
&lt;th&gt;Tokens&lt;/th&gt;
&lt;th&gt;Content&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;System prompt base&lt;/td&gt;
&lt;td&gt;800&lt;/td&gt;
&lt;td&gt;Agent persona, core instructions, behavioral pact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic facts&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;td&gt;User profile + relevant contact profiles + company rules&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Active procedures&lt;/td&gt;
&lt;td&gt;400&lt;/td&gt;
&lt;td&gt;All active rules for this agent (typically 3-8 rules)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieved episodic memories&lt;/td&gt;
&lt;td&gt;1,200&lt;/td&gt;
&lt;td&gt;Top-5 most relevant past events, scored and formatted&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieved observations&lt;/td&gt;
&lt;td&gt;600&lt;/td&gt;
&lt;td&gt;Top-3 most relevant observations for this task type&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Current task / working memory&lt;/td&gt;
&lt;td&gt;4,000&lt;/td&gt;
&lt;td&gt;The actual task payload&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool call history (this session)&lt;/td&gt;
&lt;td&gt;2,000&lt;/td&gt;
&lt;td&gt;Tool calls and results so far&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Response buffer&lt;/td&gt;
&lt;td&gt;2,000&lt;/td&gt;
&lt;td&gt;Reserved for model output&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~11,600&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Leaves substantial headroom for larger payloads&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The key insight: &lt;strong&gt;semantic facts and procedures are the cheapest and most reliable memory.&lt;/strong&gt; 400 tokens of active procedural rules — plain-language behavioral constraints injected verbatim — have more impact on agent behavior than 1,200 tokens of retrieved episodic memories. Procedures are pre-validated, zero retrieval error, zero semantic ambiguity. Don't underallocate them to make room for more episodes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Overflow Handling
&lt;/h3&gt;

&lt;p&gt;When the current task payload exceeds its budget (a long email thread, a large invoice batch):&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Extract only the last 3 exchanges from the thread&lt;/li&gt;
&lt;li&gt;Summarise older exchanges in 3 sentences using a cheap model call&lt;/li&gt;
&lt;li&gt;Append the full text as a reference block the agent can query via tool if needed&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Never silently truncate. Truncation removes content without the agent knowing it's missing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Injection Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;Concrete example: the email agent handles an incoming email from a known contact.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Semantic facts injected:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: Maria Rossi, Nico Rossi Ltd, fashion sector, formal-concise tone, Italian
Contact: Marco Bertelli (Bertelli &amp;amp; Co, client): formal tone, no exclamation marks,
  reliable payments, primary contact for autumn/winter orders
Rules: never send without approval · emails containing 'urgent': high priority
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Procedures injected (for the email agent):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- Never use semicolons — the user always removes them
- With technical clients: direct, no opening pleasantries, get to the point in the first line
- Quotes above €10,000: don't prepare a draft, the user always rewrites them
- Friday after 14:30: defer to Monday, don't prepare a response
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Episodes injected (top 5 for this contact — mixed tiers):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[2 days ago] Drafted reply re: Q3 order. Outcome: approved and sent.

[1 week ago] Drafted reply re: samples. Outcome: modified by user
(removed semicolon, shortened central paragraph).

[2 weeks ago] Incoming email: quote request. Outcome: forwarded to
manager with tag "quote-bertelli".

[5 weeks ago · weekly summary] Week of May 2-8: 6 interactions with
Bertelli. 5 approved without edits (orders, shipping confirmations).
1 modified: removed semicolon, shortened opener. No exceptions.
Pattern confirmed: direct tone, brief openings, fast approval rate.

[3 months ago · monthly summary] March 2026 — email agent × Bertelli:
24 interactions, 96% approval rate. Consolidated style: formal-concise,
no semicolons, quotes always forwarded to manager. No significant
exceptions in the month.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Observations injected:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;"With this contact you consistently use formal, concise tone.
No exclamation marks in 18 emails."
  high confidence · 18 occurrences

"Emails from this contact containing quote requests were always
marked 'to-do' — not responded to the same day."
  medium confidence · 4 occurrences
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Total context used: ~1,800 tokens for all memory + ~400 tokens for the actual email.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent has three months of behavioral history about this specific contact — recent events at full fidelity, older patterns as compact summaries — without the context window growing unboundedly. This is what the compaction pipeline earns you.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Nightly Maintenance Job
&lt;/h2&gt;

&lt;p&gt;All compaction, pattern detection, and promotion happens in a single nightly job. It runs at a quiet time (02:14 — offset from round hours to avoid resource contention with other scheduled jobs):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;nightlyMemoryMaintenance&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;day30ago&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;86&lt;/span&gt;&lt;span class="nx"&gt;_400_000&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;day90ago&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;90&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;86&lt;/span&gt;&lt;span class="nx"&gt;_400_000&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 1: Tier-1 compaction&lt;/span&gt;
  &lt;span class="c1"&gt;// Find raw episodes older than 30 days, grouped by agent × contact × week&lt;/span&gt;
  &lt;span class="c1"&gt;// Summarise into weekly compact records using a cheap model&lt;/span&gt;
  &lt;span class="c1"&gt;// Source rows → status='superseded_raw', embeddings deleted&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runTier1Compaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;day30ago&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 2: Tier-2 compaction&lt;/span&gt;
  &lt;span class="c1"&gt;// Find tier-1 compact records older than 90 days, grouped by agent × contact × month&lt;/span&gt;
  &lt;span class="c1"&gt;// Summarise into monthly period records&lt;/span&gt;
  &lt;span class="c1"&gt;// Source tier-1 records → status='superseded_tier1', embeddings deleted&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runTier2Compaction&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;day90ago&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 3: Observation consolidation&lt;/span&gt;
  &lt;span class="c1"&gt;// Merge observations with cosine similarity &amp;gt; 0.85 in the same agent + category&lt;/span&gt;
  &lt;span class="c1"&gt;// Winner keeps the richer evidence, loser → status='merged', embedding deleted&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;consolidateObservations&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 4: Pattern detection on last 30 days of active episodes&lt;/span&gt;
  &lt;span class="c1"&gt;// Reads raw + tier-1 compact, outputs up to 3 new observations&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;runPatternDetection&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 5: Promotion check&lt;/span&gt;
  &lt;span class="c1"&gt;// Promote observations that meet the confidence + candidate threshold to procedures&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;checkPromotionCandidates&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="c1"&gt;// Step 6: GDPR deletion requests&lt;/span&gt;
  &lt;span class="c1"&gt;// Process any queued erasure requests — all tiers, all memory types&lt;/span&gt;
  &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;processDeletionRequests&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

  &lt;span class="c1"&gt;// Note: no decay score refresh step.&lt;/span&gt;
  &lt;span class="c1"&gt;// Decay is computed dynamically from last_accessed_at at retrieval time.&lt;/span&gt;
  &lt;span class="c1"&gt;// Pre-computing and storing it would add complexity for no benefit.&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The ordering matters. Compaction runs before pattern detection so the pattern detector sees the already-compacted timeline — it reads compact summaries for older data, not thousands of raw rows. This keeps the pattern detection prompt short and cheap.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing an Embedding Model for Behavioral Memory
&lt;/h2&gt;

&lt;p&gt;For most RAG applications, &lt;code&gt;text-embedding-3-small&lt;/code&gt; is the right default. For behavioral memory systems with multilingual content, you need to think harder about one specific capability: &lt;strong&gt;negation handling&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Consider these two memories:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Never archive emails from Lara"&lt;/li&gt;
&lt;li&gt;"Archive emails from Lara"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A static embedding model — one that averages token embeddings without running attention over the full sentence — will produce nearly identical vectors for these. The negation ("never") is a single low-frequency token whose embedding gets averaged away. In a document retrieval system, this is annoying. In a behavioral memory system where a wrong rule gets injected verbatim into every agent call, this is a correctness failure.&lt;/p&gt;

&lt;p&gt;Contextual encoder models (XLM-RoBERTa family, E5 family) run full attention over the input. They produce meaningfully different embeddings for negated vs non-negated rules because the attention mechanism encodes the relationship between "non" and the rest of the sentence.&lt;/p&gt;

&lt;p&gt;For local deployment (no data leaving the device), &lt;code&gt;intfloat/multilingual-e5-small&lt;/code&gt; in q8 ONNX quantization is a strong choice:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;384 dimensions, 117M parameters&lt;/li&gt;
&lt;li&gt;~30MB on disk, ~90MB loaded&lt;/li&gt;
&lt;li&gt;12-20ms warm inference on CPU&lt;/li&gt;
&lt;li&gt;Strong multilingual quality including Italian, German, Spanish, French&lt;/li&gt;
&lt;li&gt;Ships a pre-built quantized ONNX via Transformers.js — no compilation step&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The E5 model requires prefixes on its inputs — &lt;code&gt;passage:&lt;/code&gt; for documents being stored, &lt;code&gt;query:&lt;/code&gt; for queries at retrieval time. This is a training requirement, not optional:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// For storing a document (episode summary, observation, procedure rule)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;docEmbedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`passage: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;pooling&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mean&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// For a retrieval query (task description, current agent context)&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;queryEmbedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`query: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;pooling&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mean&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Omitting the prefix degrades retrieval quality measurably. The asymmetric prefixes are how the model was trained — &lt;code&gt;passage:&lt;/code&gt; for longer, self-contained documents; &lt;code&gt;query:&lt;/code&gt; for shorter, lookup-intent strings.&lt;/p&gt;

&lt;p&gt;Run the model in a worker thread to keep the main process event loop unblocked. Embedding inference on CPU takes 12-20ms — tolerable in a background context, but a source of latency jitter if it blocks the main thread during agent execution.&lt;/p&gt;




&lt;h2&gt;
  
  
  GDPR: Memory Systems Have a Compliance Problem
&lt;/h2&gt;

&lt;p&gt;Behavioral agent memory is not just vector embeddings. It's observations about how a person writes, when they work, how they make decisions. Under GDPR, this is personal data. Under the EU AI Act (fully applicable from August 2026), agents that make consequential decisions using this data may be high-risk systems subject to documentation and traceability requirements.&lt;/p&gt;

&lt;p&gt;The memory architecture choices that matter for compliance:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Namespace every memory type by user from day one.&lt;/strong&gt; Not as an afterthought. If your behavioral data lives in a flat, unnested store, you cannot answer an Article 17 erasure request without a full table scan and potential collateral deletion. User-scoped namespaces make deletion an O(1) operation: &lt;code&gt;DELETE FROM episodes WHERE user_id = ?&lt;/code&gt; cascades cleanly.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Store behavioral signals, not content.&lt;/strong&gt; The episode table should store "user modified draft, removed semicolon from third paragraph" — not the email text itself. Content stays in working memory and is discarded at session end. Behavioral patterns are what matter for memory; content is the medium through which they were expressed. This distinction dramatically reduces your Article 35 DPIA scope.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The deletion pipeline must cover all tiers.&lt;/strong&gt; When a user requests erasure, you must delete: raw episodes, tier-1 compact records, tier-2 compact records, observations, procedures, embeddings in &lt;code&gt;episodes_vec&lt;/code&gt; and &lt;code&gt;observations_vec&lt;/code&gt;, and the source signal queue entries. A spec-level deletion path that covers only the main table and misses the vector tables is a compliance failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Encrypt exported memory files.&lt;/strong&gt; If you implement a memory export feature (for backup or portability), use AES-256-GCM with a scrypt-derived key from a user-supplied passphrase. The derived key should exist only in memory for the duration of the operation — never written to disk. A stolen backup file should reveal nothing without the passphrase.&lt;/p&gt;




&lt;h2&gt;
  
  
  Production Checklist
&lt;/h2&gt;

&lt;p&gt;Before shipping a behavioral memory system:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No-delete rule enforced:&lt;/strong&gt; decay score never triggers row deletion — only compaction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compaction nightly job:&lt;/strong&gt; tier-1 at 30 days, tier-2 at 90 days, never deletes source rows&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Feedback loop complete:&lt;/strong&gt; "wrong" feedback cascades to observation rejection + embedding deletion + procedure demotion + counter-signal write&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Counter-signal on rejection:&lt;/strong&gt; nightly pattern detection reads rejected quotes before inserting, skips near-duplicates of known rejections&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid retrieval:&lt;/strong&gt; BM25 + cosine similarity merged and rescored — not pure vector search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tier-aware decay:&lt;/strong&gt; different λ per compaction tier, λ=0 for procedures&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context budget enforced:&lt;/strong&gt; explicit token caps per memory type, overflow handled by compression not truncation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Procedures injected in full:&lt;/strong&gt; never vector-searched, always fetched entirely and prepended to system prompt&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic facts via direct SQL:&lt;/strong&gt; no vector search for structured relational lookups&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding model handles negation:&lt;/strong&gt; contextual encoder (E5 family), not static averaging&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding prefix discipline:&lt;/strong&gt; &lt;code&gt;passage:&lt;/code&gt; for documents, &lt;code&gt;query:&lt;/code&gt; for retrieval queries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Worker thread for inference:&lt;/strong&gt; never block the main event loop on embedding calls&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GDPR deletion covers all tiers:&lt;/strong&gt; raw episodes + compact records + embeddings + observations + procedures + signal queue&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Behavioral signal only:&lt;/strong&gt; episode table stores metadata and outcomes, not content&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Related Posts
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/agent-memory-mem0-zep-langmem-production"&gt;Adding Memory to Production AI Agents: Mem0, Zep, and LangMem Compared&lt;/a&gt; — when to use external memory layers vs building your own&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/agent-architecture-memory-management-design"&gt;Designing Agent Architecture with Memory: A Framework from Anthropic's Patterns and LangGraph's Primitives&lt;/a&gt; — matching Anthropic's workflow patterns to the right LangGraph memory architecture&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://dev.to/blog/gdpr-compliant-ai-eu-ai-act-guardrails"&gt;GDPR-Compliant AI: Building Guardrails for EU AI Act Readiness&lt;/a&gt; — the full compliance stack for EU-facing AI systems&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agentmemoryarchitecture</category>
      <category>episodicmemoryagents</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Your AI Agent Is Confidently Lying — And It's Your Memory System's Fault</title>
      <dc:creator>Abhishek Chauhan</dc:creator>
      <pubDate>Mon, 06 Apr 2026 20:40:32 +0000</pubDate>
      <link>https://dev.to/ac12644/your-ai-agent-is-confidently-lying-and-its-your-memory-systems-fault-4d82</link>
      <guid>https://dev.to/ac12644/your-ai-agent-is-confidently-lying-and-its-your-memory-systems-fault-4d82</guid>
      <description>&lt;p&gt;Last month, an AI agent I built told a user "As a Senior Engineer at Google, you should consider..." &lt;/p&gt;

&lt;p&gt;The user had been promoted to Staff Engineer three months earlier. The agent had no idea. No error. No warning. Just a confident, wrong answer served from stale memory.&lt;/p&gt;

&lt;p&gt;That's when I realized: &lt;strong&gt;the biggest risk in AI agents isn't hallucination — it's stale memory served with high confidence.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Talks About
&lt;/h2&gt;

&lt;p&gt;AI agents using memory systems (Mem0, Zep, Letta, LangMem) store facts about users, companies, and decisions. Things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"John works as Senior Engineer at Google"&lt;/li&gt;
&lt;li&gt;"Pro plan costs $99/month"
&lt;/li&gt;
&lt;li&gt;"Sarah reports to Mike in Engineering"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These facts get stored once and served forever. No expiration. No re-verification. No staleness check.&lt;/p&gt;

&lt;p&gt;Here's what makes it dangerous: memory systems decay facts by access frequency or TTL timers. But a frequently-retrieved memory about a user's job title is &lt;strong&gt;highly relevant&lt;/strong&gt; until the moment it's wrong — at which point it becomes &lt;em&gt;confidently wrong&lt;/em&gt; rather than just outdated.&lt;/p&gt;

&lt;p&gt;An agent without memory would ask "What do you do?" again. Slightly annoying, but honest. An agent with stale memory states the wrong answer as established fact. That's worse.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Big Is This Problem?
&lt;/h2&gt;

&lt;p&gt;I ran a simple experiment. I stored 24 real-world facts in Mem0 — job titles, pricing, company info, policies, technical details. Then I checked each one against its original source after simulating 90 days:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pricing facts&lt;/strong&gt; — 55% had changed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Policy facts&lt;/strong&gt; — 45% had changed
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Job titles&lt;/strong&gt; — 15% had changed&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Addresses&lt;/strong&gt; — 5% had changed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;More than a third of stored facts were wrong within 3 months. And agents were retrieving them hundreds of times without knowing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built: MemGuard
&lt;/h2&gt;

&lt;p&gt;I built an open-source platform that sits &lt;strong&gt;beside&lt;/strong&gt; your memory system (doesn't replace it) and continuously validates whether stored facts are still true.&lt;/p&gt;

&lt;p&gt;Think of it as &lt;strong&gt;Datadog for agent memory&lt;/strong&gt; — it monitors, validates, and alerts, but doesn't own the data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bkqhvjy36p59hqgsqfk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8bkqhvjy36p59hqgsqfk.png" alt="MemGuard Dashboard" width="800" height="811"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. Connect&lt;/strong&gt; — MemGuard plugs into your existing memory system. Native connectors for Mem0, Zep, Letta, LangMem, or any REST API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Validate&lt;/strong&gt; — Five strategies, from simple to AI-powered:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Strategy&lt;/th&gt;
&lt;th&gt;How&lt;/th&gt;
&lt;th&gt;Needs LLM?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Source-Linked&lt;/td&gt;
&lt;td&gt;Re-fetch original source URL, compare values&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cross-Reference&lt;/td&gt;
&lt;td&gt;Check against 2-3 independent sources&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temporal Pattern&lt;/td&gt;
&lt;td&gt;Statistical staleness prediction per fact-type&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Drift&lt;/td&gt;
&lt;td&gt;LLM detects contradictions in recent context&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Causal Chain&lt;/td&gt;
&lt;td&gt;Find dependent facts that break together&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;3. Score&lt;/strong&gt; — Every memory gets a composite trust score (0-100%) based on source reliability, freshness, cross-reference agreement, and retrieval frequency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Quarantine&lt;/strong&gt; — Facts below 30% trust are automatically quarantined so agents stop using them. Facts below 50% are flagged for review.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Alert&lt;/strong&gt; — Dashboard, webhooks, or MCP tools so agents can call &lt;code&gt;validate_memory()&lt;/code&gt; before acting on stored facts.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Trust Score
&lt;/h3&gt;

&lt;p&gt;This is the core of MemGuard. Each memory's trust score is a weighted combination of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Trust = 0.20 x source_reliability
      + 0.25 x freshness (exponential decay by fact-type)
      + 0.20 x cross_reference_agreement  
      + 0.10 x dependency_health
      + 0.15 x historical_accuracy
      + 0.10 x retrieval_importance
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;The key insight: &lt;strong&gt;retrieval frequency increases urgency, not trust.&lt;/strong&gt; A stale memory retrieved 100 times/day is more dangerous than one retrieved once/month. High retrieval + low trust = highest risk.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslnu58gz3dyyp4ko75qr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslnu58gz3dyyp4ko75qr.png" alt="Memories with Trust Scores" width="800" height="1048"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  MCP Integration — Agents Validate Before Acting
&lt;/h2&gt;

&lt;p&gt;MemGuard exposes an MCP server so agents can self-check before using memories:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Agent's internal flow
&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_job_title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Before acting on it, validate
&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validate_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;memory_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;trust_score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Safe to use
&lt;/span&gt;    &lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;As a &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Don't trust it, ask the user instead
&lt;/span&gt;    &lt;span class="nf"&gt;respond&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can you confirm your current role?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Four MCP tools available:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;validate_memory&lt;/code&gt; — check a specific fact before using it&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_memory_health&lt;/code&gt; — overall health metrics&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;report_stale_memory&lt;/code&gt; — agent reports suspected staleness&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_trusted_memories&lt;/code&gt; — retrieve only high-trust facts&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;p&gt;One command:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ac12644/MemGuard.git
&lt;span class="nb"&gt;cd &lt;/span&gt;MemGuard
docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Dashboard at &lt;code&gt;localhost:3000&lt;/code&gt;. API docs at &lt;code&gt;localhost:8001/docs&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Then: Add Connector -&amp;gt; Pick Mem0/Zep/Letta -&amp;gt; Enter API key -&amp;gt; Sync -&amp;gt; Run Validation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi45dh7thsvufn3crx9j0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fi45dh7thsvufn3crx9j0.png" alt="Validation Strategies" width="800" height="656"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Tech Stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Backend:&lt;/strong&gt; Python 3.12, FastAPI, SQLAlchemy 2.0, Celery&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Database:&lt;/strong&gt; PostgreSQL 16, Redis 7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dashboard:&lt;/strong&gt; React 18, Tailwind CSS, Vite, Recharts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LLM:&lt;/strong&gt; Anthropic Claude (optional — core works without it)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MCP:&lt;/strong&gt; Python MCP SDK for agent integration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deploy:&lt;/strong&gt; Docker Compose, Caddy for auto-TLS in production&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  What I Learned Building This
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Fact-type matters more than age.&lt;/strong&gt; Pricing changes every quarter. Addresses change every decade. A blanket TTL is useless — you need per-category staleness curves.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. The most dangerous memories are the most useful ones.&lt;/strong&gt; High-retrieval memories are the ones agents rely on most. When they go stale, the blast radius is massive.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Agents should validate, not just retrieve.&lt;/strong&gt; The MCP integration changes the agent's behavior from "retrieve and trust" to "retrieve, validate, then decide." That single change prevents most stale-memory errors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. You don't need LLM for most validation.&lt;/strong&gt; Source re-fetch and temporal patterns catch 80% of staleness without any LLM cost. Save the AI-powered strategies for edge cases.&lt;/p&gt;
&lt;h2&gt;
  
  
  Open Source — Apache 2.0
&lt;/h2&gt;

&lt;p&gt;The full project is on GitHub:&lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/ac12644" rel="noopener noreferrer"&gt;
        ac12644
      &lt;/a&gt; / &lt;a href="https://github.com/ac12644/MemGuard" rel="noopener noreferrer"&gt;
        MemGuard
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      AI Agent Memory Validation Platform — continuously verify whether facts stored in AI agent memory systems (Mem0, Zep, Letta,     LangMem) are still true. Like Datadog for agent memory.   
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;p&gt;
  &lt;a rel="noopener noreferrer" href="https://github.com/ac12644/MemGuard/memguard_logo/memguard-logo-compact.svg"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fac12644%2FMemGuard%2FHEAD%2Fmemguard_logo%2Fmemguard-logo-compact.svg" alt="MemGuard" width="260"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
  &lt;strong&gt;AI Agent Memory Validation Platform&lt;/strong&gt;&lt;br&gt;
  Continuously verify whether facts stored in AI agent memory systems are still true
&lt;/p&gt;
&lt;p&gt;
  &lt;a href="https://github.com/ac12644/MemGuard/actions/workflows/ci.yml" rel="noopener noreferrer"&gt;&lt;img src="https://github.com/ac12644/MemGuard/actions/workflows/ci.yml/badge.svg" alt="CI"&gt;&lt;/a&gt;
  &lt;a href="https://github.com/ac12644/MemGuard/blob/main/LICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/5b60841bea9e11d9d0b0950d690c9bc554e06385634056a7d5d62a15d1a4eabe/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4170616368655f322e302d626c75652e737667" alt="License"&gt;&lt;/a&gt;
  &lt;a href="https://www.python.org/downloads/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/23f5099969070c4ec78f5f80f956edcf95debda0c20c9efb471ca70d66c9356a/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f707974686f6e2d332e31322b2d3337373641422e7376673f6c6f676f3d707974686f6e266c6f676f436f6c6f723d7768697465" alt="Python"&gt;&lt;/a&gt;
  &lt;a href="https://fastapi.tiangolo.com" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/8d2924f345149137c7d295f97160adac3096b87ede7ee14b1431736c14c33850/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f466173744150492d3030393638382e7376673f6c6f676f3d66617374617069266c6f676f436f6c6f723d7768697465" alt="FastAPI"&gt;&lt;/a&gt;
  &lt;a href="https://react.dev" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/10cc78e75896498c7565b96dcf7a0b20a891cccce4e7ca56e08202e6385411ab/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f52656163742d31382d3631444146422e7376673f6c6f676f3d7265616374266c6f676f436f6c6f723d7768697465" alt="React"&gt;&lt;/a&gt;
  &lt;a href="https://github.com/ac12644/MemGuard/pulls" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/dd0b24c1e6776719edb2c273548a510d6490d8d25269a043dfabbd38419905da/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f5052732d77656c636f6d652d627269676874677265656e2e737667" alt="PRs Welcome"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;p&gt;
  &lt;a href="https://github.com/ac12644/MemGuard#quick-start" rel="noopener noreferrer"&gt;Quick Start&lt;/a&gt; ·
  &lt;a href="https://github.com/ac12644/MemGuard#connectors" rel="noopener noreferrer"&gt;Connectors&lt;/a&gt; ·
  &lt;a href="https://github.com/ac12644/MemGuard#validation-strategies" rel="noopener noreferrer"&gt;Strategies&lt;/a&gt; ·
  &lt;a href="https://github.com/ac12644/MemGuard#api-reference" rel="noopener noreferrer"&gt;API&lt;/a&gt; ·
  &lt;a href="https://github.com/ac12644/MemGuard#contributing" rel="noopener noreferrer"&gt;Contributing&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
  &lt;a rel="noopener noreferrer" href="https://github.com/ac12644/MemGuard/docs/screenshots/screenshot-6.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fac12644%2FMemGuard%2FHEAD%2Fdocs%2Fscreenshots%2Fscreenshot-6.png" alt="Dashboard" width="800"&gt;&lt;/a&gt;
&lt;/p&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why MemGuard?&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;AI agents store facts in memory systems — a user's job title, a product's price, a company's address. These facts go stale silently. The agent keeps using them with high confidence, delivering &lt;strong&gt;wrong answers without any warning&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;MemGuard sits &lt;strong&gt;beside&lt;/strong&gt; your memory system (Mem0, Zep, Letta, LangMem, or any REST API) as a sidecar that monitors, validates, and alerts — like Datadog for agent memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Core insight:&lt;/strong&gt; Memory systems decay facts by access frequency or TTL timers. But a frequently-retrieved memory about a user's employer is highly relevant until it's wrong — then it becomes &lt;em&gt;confidently wrong&lt;/em&gt; rather than just outdated. MemGuard detects this proactively.&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Screenshots&lt;/h2&gt;
&lt;/div&gt;


&lt;strong&gt;Memories&lt;/strong&gt; — Browse and filter tracked memories with trust scores
&lt;br&gt;
&lt;a rel="noopener noreferrer" href="https://github.com/ac12644/MemGuard/docs/screenshots/screenshot-5.png"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fac12644%2FMemGuard%2FHEAD%2Fdocs%2Fscreenshots%2Fscreenshot-5.png" alt="Memories" width="800"&gt;&lt;/a&gt;


&lt;p&gt;&lt;br&gt;
&lt;strong&gt;Validations&lt;/strong&gt; — Run…&lt;/p&gt;
&lt;/div&gt;
&lt;br&gt;
  &lt;/div&gt;
&lt;br&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/ac12644/MemGuard" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;br&gt;
&lt;/div&gt;
&lt;br&gt;


&lt;ul&gt;
&lt;li&gt;5 connectors (Mem0, Zep, Letta, LangMem, Generic REST)&lt;/li&gt;
&lt;li&gt;5 validation strategies&lt;/li&gt;
&lt;li&gt;40 API endpoints&lt;/li&gt;
&lt;li&gt;Dashboard with onboarding&lt;/li&gt;
&lt;li&gt;MCP server for agent integration&lt;/li&gt;
&lt;li&gt;Production-ready with Caddy TLS + automated backups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Contributions welcome. If you're building AI agents with memory systems, I'd love to hear what validation strategies matter most for your use cases.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;If your agent has ever confidently told a user something that was true six months ago but not today — that's the problem MemGuard solves.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Built a Multi-Agent Starter Kit with LangGraph — 6 Patterns, 5 Providers, One Command</title>
      <dc:creator>Abhishek Chauhan</dc:creator>
      <pubDate>Sun, 05 Apr 2026 14:09:08 +0000</pubDate>
      <link>https://dev.to/ac12644/i-built-a-multi-agent-starter-kit-with-langgraph-6-patterns-5-providers-one-command-b8g</link>
      <guid>https://dev.to/ac12644/i-built-a-multi-agent-starter-kit-with-langgraph-6-patterns-5-providers-one-command-b8g</guid>
      <description>&lt;p&gt;If you've built more than one LangGraph project, you know the drill. Supervisor setup. Provider config. Handoff tools. Persistence. Streaming endpoint. Same boilerplate, different repo.&lt;/p&gt;

&lt;p&gt;So I stopped rewriting it and packaged the whole thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;a href="https://github.com/ac12644/langgraph-starter-kit" rel="noopener noreferrer"&gt;LangGraph Starter Kit&lt;/a&gt;
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx create-langgraph-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Interactive CLI. Pick your provider, pick your patterns, get a project that runs.&lt;/p&gt;

&lt;p&gt;Or clone the full kit with everything included.&lt;/p&gt;


&lt;h2&gt;
  
  
  6 Patterns
&lt;/h2&gt;

&lt;p&gt;Each one is a standalone app you can use, modify, or delete:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Supervisor&lt;/strong&gt; — central coordinator routes tasks to worker agents&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Swarm&lt;/strong&gt; — agents hand off to each other with transfer tools, no central brain&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-Loop&lt;/strong&gt; — graph pauses for approval before destructive actions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured Output&lt;/strong&gt; — typed JSON responses validated by Zod&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Research Agent&lt;/strong&gt; — web search + scraping, supervisor coordinates a researcher and writer&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RAG&lt;/strong&gt; — in-memory vector store, semantic retrieval, no external DB&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  5 Providers
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Two lines. Done.&lt;/p&gt;

&lt;p&gt;OpenAI, Anthropic, Google, Groq, Ollama (local). Each has a sensible default model. Override with &lt;code&gt;LLM_MODEL&lt;/code&gt; if you want.&lt;/p&gt;


&lt;h2&gt;
  
  
  Extending It
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;createMyApp&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;makeAgent&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my_agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="cm"&gt;/* your tools */&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="na"&gt;system&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;

  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;makeSupervisor&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nx"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="nx"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;outputMode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;last_message&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;supervisorName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;my_supervisor&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Register in the server. New endpoint with streaming, threads, and persistence.&lt;/p&gt;
&lt;h2&gt;
  
  
  Also Ships With
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;MCP tool integration (stdio + HTTP)&lt;/li&gt;
&lt;li&gt;SSE streaming on every endpoint&lt;/li&gt;
&lt;li&gt;LangGraph Studio config&lt;/li&gt;
&lt;li&gt;LangSmith tracing (one env var)&lt;/li&gt;
&lt;li&gt;Docker Compose with Postgres&lt;/li&gt;
&lt;li&gt;25+ tests, GitHub Actions CI&lt;/li&gt;
&lt;li&gt;Railway + Render deploy configs&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  Get Started
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx create-langgraph-app
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Or:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/ac12644/langgraph-starter-kit.git
&lt;span class="nb"&gt;cd &lt;/span&gt;langgraph-starter-kit
npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cp&lt;/span&gt; .env.example .env
npm run dev
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://assets.dev.to/assets/github-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/ac12644" rel="noopener noreferrer"&gt;
        ac12644
      &lt;/a&gt; / &lt;a href="https://github.com/ac12644/langgraph-starter-kit" rel="noopener noreferrer"&gt;
        langgraph-starter-kit
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      Boilerplate for building multi-agent AI systems with LangGraph. Includes Swarm and Supervisor patterns, memory, tools, and HTTP API out of the box.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;LangGraph Starter Kit&lt;/h1&gt;
&lt;/div&gt;


&lt;p&gt;&lt;br&gt;
    The fastest way to build production-ready multi-agent apps with LangGraph&lt;br&gt;
    &lt;br&gt;&lt;br&gt;
    &lt;strong&gt;6 patterns. 5 providers. One command.&lt;/strong&gt;&lt;br&gt;
  &lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://github.com/ac12644/langgraph-starter-kit/actions/workflows/ci.yml" rel="noopener noreferrer"&gt;&lt;img src="https://github.com/ac12644/langgraph-starter-kit/actions/workflows/ci.yml/badge.svg" alt="CI"&gt;&lt;/a&gt;
  &lt;a href="https://opensource.org/licenses/Apache-2.0" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/5b60841bea9e11d9d0b0950d690c9bc554e06385634056a7d5d62a15d1a4eabe/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d4170616368655f322e302d626c75652e737667" alt="License"&gt;&lt;/a&gt;
  &lt;a href="https://www.typescriptlang.org/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/64b50ca1dffa77f84eac47bf09df7d345e04133cb52baf3c55cb8402f953ec81/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f547970655363726970742d352e392b2d3331373843362e737667" alt="TypeScript"&gt;&lt;/a&gt;
  &lt;a href="https://langchain-ai.github.io/langgraphjs/" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/dd44fd1df30cb117161a3a513349aa38d08ad7d4d0bb2e0a3985ca43b81beaf7/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c616e6747726170682d312e322b2d3743334145442e737667" alt="LangGraph"&gt;&lt;/a&gt;
  &lt;a href="https://github.com/ac12644/langgraph-starter-kit/stargazers" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/a051ce2f57ff858069ba9c12512f3d54ce8c158487640d90e2394f48460c29d2/68747470733a2f2f696d672e736869656c64732e696f2f6769746875622f73746172732f616331323634342f6c616e6767726170682d737461727465722d6b69743f7374796c653d736f6369616c" alt="Stars"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;
  &lt;a href="https://github.com/ac12644/langgraph-starter-kit#quick-start" rel="noopener noreferrer"&gt;Quick Start&lt;/a&gt; •
  &lt;a href="https://github.com/ac12644/langgraph-starter-kit#agent-patterns" rel="noopener noreferrer"&gt;Patterns&lt;/a&gt; •
  &lt;a href="https://github.com/ac12644/langgraph-starter-kit#llm-providers" rel="noopener noreferrer"&gt;Providers&lt;/a&gt; •
  &lt;a href="https://github.com/ac12644/langgraph-starter-kit#api-reference" rel="noopener noreferrer"&gt;API&lt;/a&gt; •
  &lt;a href="https://github.com/ac12644/langgraph-starter-kit/CONTRIBUTING.md" rel="noopener noreferrer"&gt;Contributing&lt;/a&gt;
&lt;/p&gt;




&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Why This Exists&lt;/h2&gt;

&lt;/div&gt;

&lt;p&gt;Building multi-agent systems with LangGraph means writing the same boilerplate over and over — setting up supervisors, wiring handoff tools, configuring providers, adding persistence. This starter kit gives you all of that out of the box so you can focus on your agent logic, not infrastructure.&lt;/p&gt;

&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;npx create-langgraph-app&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What you get:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pick your LLM provider (OpenAI, Anthropic, Google, Groq, or local Ollama)&lt;/li&gt;
&lt;li&gt;Choose which agent patterns you need&lt;/li&gt;
&lt;li&gt;Get a ready-to-run project with tests, types, and a Fastify server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Or clone the full kit with all 6 patterns included.&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Architecture&lt;/h2&gt;

&lt;/div&gt;

&lt;div class="snippet-clipboard-content notranslate position-relative overflow-auto"&gt;
&lt;pre class="notranslate"&gt;&lt;code&gt;              ┌─────────────────────────────────────────────┐
              │             LangGraph Starter Kit            │
              └──────────────────┬──────────────────────────┘
                                 │
              ┌──────────────────┼──────────────────────┐
              ▼                  ▼                       ▼
       ┌─────────────┐   ┌─────────────┐        ┌─────────────┐
       │  CLI Demo    │   │ HTTP Server │        │  LangGraph  │
       │  npm&lt;/code&gt;&lt;/pre&gt;…&lt;/div&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/ac12644/langgraph-starter-kit" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;



&lt;p&gt;Apache 2.0. PRs welcome.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;What are you building with LangGraph? Curious what patterns people are reaching for.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>ai</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
