<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Kaelii</title>
    <description>The latest articles on DEV Community by Kaelii (@kaelbit).</description>
    <link>https://dev.to/kaelbit</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3794803%2Fc6f90c25-040a-451e-99a7-3d78695e1f42.png</url>
      <title>DEV Community: Kaelii</title>
      <link>https://dev.to/kaelbit</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kaelbit"/>
    <language>en</language>
    <item>
      <title>How We Architected a Cognitive Memory Engine for AI Agents (10MB Rust Binary)</title>
      <dc:creator>Kaelii</dc:creator>
      <pubDate>Sun, 01 Mar 2026 20:32:18 +0000</pubDate>
      <link>https://dev.to/kaelbit/how-we-architected-a-cognitive-memory-engine-for-ai-agents-10mb-rust-binary-1m8f</link>
      <guid>https://dev.to/kaelbit/how-we-architected-a-cognitive-memory-engine-for-ai-agents-10mb-rust-binary-1m8f</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/kaelbit/adding-a-lifecycle-to-ai-agent-memory-372i"&gt;The previous article&lt;/a&gt; introduced engram-rs's three-layer memory architecture and design motivation. This one tackles a more specific question: &lt;strong&gt;how does retrieval quality not degrade as memories accumulate?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The answer lives in the scoring algorithms. Here's a visual breakdown of five core mechanisms.&lt;/p&gt;




&lt;h2&gt;
  
  
  1. Use It or Lose It
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl5qx0ua7u6ltgecl9zn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl5qx0ua7u6ltgecl9zn.png" alt="Memory Lifecycle" width="800" height="394"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Left panel: a memory that's never recalled after storage. Importance decays smoothly, sinking to the bottom layer.&lt;/p&gt;

&lt;p&gt;Right panel: a memory that gets periodically recalled. Each retrieval triggers an &lt;strong&gt;activation boost&lt;/strong&gt; (yellow dots), pushing importance back up. The red dashed line shows the unrecalled trajectory for comparison.&lt;/p&gt;

&lt;p&gt;This isn't a feature — it's the system's first principle: &lt;strong&gt;a memory's survival is determined by how often it's used.&lt;/strong&gt; Retrieval isn't just a read operation — it's also a vote telling the system this memory still matters.&lt;/p&gt;

&lt;p&gt;The result? After hundreds of consolidation epochs, frequently-used knowledge stays prominent, stale noise naturally sinks, and retrieval quality doesn't degrade as total memory count grows.&lt;/p&gt;




&lt;h2&gt;
  
  
  2. Exponential Decay, Not Linear
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxen8aek0lizpoe6ydjx7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxen8aek0lizpoe6ydjx7.png" alt="Ebbinghaus Forgetting Curve" width="800" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The previous article used &lt;code&gt;importance × e^(-decay_rate × idle_hours / 168)&lt;/code&gt; for retrieval-time recency weighting. But how does importance itself decay? That's what actually determines whether a memory lives or dies.&lt;/p&gt;

&lt;p&gt;Three curves show the decay trajectories for each memory kind:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Kind&lt;/th&gt;
&lt;th&gt;Half-life&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;episodic&lt;/td&gt;
&lt;td&gt;~35 epochs&lt;/td&gt;
&lt;td&gt;"Yesterday's debug log" — should fade if unused&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;semantic&lt;/td&gt;
&lt;td&gt;~58 epochs&lt;/td&gt;
&lt;td&gt;"Auth uses OAuth2" — knowledge decays slower&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;procedural&lt;/td&gt;
&lt;td&gt;~173 epochs&lt;/td&gt;
&lt;td&gt;"Deploy steps" — procedures should almost never fade&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The floor is 0.01. Memories never truly reach zero — given a precise enough query, a sunken memory can still be retrieved. This mirrors a human memory property: you think you've forgotten, but the right cue pulls it back.&lt;/p&gt;

&lt;p&gt;Why exponential instead of linear? Linear decay has a fatal flaw: &lt;strong&gt;the cliff.&lt;/strong&gt; The moment importance linearly decrements to zero, the memory is permanently lost with no chance of recovery. Exponential decay never reaches zero — it just gets closer and closer, leaving an infinitely long tail.&lt;/p&gt;




&lt;h2&gt;
  
  
  3. Logarithmic Saturation for Reinforcement
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvl0jyprfechae6hbptha.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvl0jyprfechae6hbptha.png" alt="Reinforcement Signals" width="800" height="347"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a memory is stored repeatedly or recalled multiple times, its weight increases. But the growth curve is logarithmic, not linear.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;rep_bonus  = 0.17 × ln(1 + repetition_count),  cap 0.7
access_bonus = 0.12 × ln(1 + access_count),    cap 0.55
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why logarithmic?&lt;/p&gt;

&lt;p&gt;Consider a counterexample: if rep_bonus were linear (say, 0.1 × count, cap 0.5), then a memory stored 5 times would max out its bonus. The 6th, 50th, and 500th submission — all identical in effect. You can't distinguish "mentioned a few times" from "repeatedly emphasized."&lt;/p&gt;

&lt;p&gt;Logarithmic growth pushes the saturation point out to ~30 reps and ~100 accesses. The first few interactions matter most, then returns diminish while still contributing. This matches human learning research — spaced repetition works, but each additional review yields less marginal benefit.&lt;/p&gt;




&lt;h2&gt;
  
  
  4. Additive Biases Instead of Multiplicative
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69dbbk1q32uu1yfqzxmg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F69dbbk1q32uu1yfqzxmg.png" alt="Kind × Layer Weight Bias" width="800" height="426"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A memory's final weight is also influenced by its kind and layer. The chart shows the weight effect for all nine combinations (3 kinds × 3 layers):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;procedural + core ranks highest (+0.15 + 0.1 = +0.25)&lt;/li&gt;
&lt;li&gt;episodic + buffer ranks lowest (-0.1 - 0.1 = -0.2)&lt;/li&gt;
&lt;li&gt;semantic + working is the baseline (0)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Why emphasize "additive"?&lt;/p&gt;

&lt;p&gt;An earlier version used multiplication: procedural memories ×1.3, core layer ×1.2. Sounds reasonable, but 1.3 × 1.2 = 1.56, while episodic × buffer = 0.8 × 0.8 = 0.64. The gap between the highest and lowest is &lt;strong&gt;2.4×&lt;/strong&gt; — procedural + core would systematically crush everything else, regardless of how relevant the content actually is.&lt;/p&gt;

&lt;p&gt;Additive biases compress this ratio to under 1.6×. Kind and layer still influence ranking, but not enough to override the semantic relevance signal itself.&lt;/p&gt;




&lt;h2&gt;
  
  
  5. Sigmoid Score Compression
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oidc8hp9kblaqzwlffv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1oidc8hp9kblaqzwlffv.png" alt="Sigmoid Score Compression" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The final ranking score combines semantic relevance, memory weight, and time decay. This raw score is mapped through a sigmoid to the 0–1 range:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;score = 2 / (1 + e^(-2x)) - 1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Why not just clamp at 1.0?&lt;/p&gt;

&lt;p&gt;Because clamping destroys information. Say two memories score 1.3 and 2.1 in raw — after clamping, both become 1.0, and the system thinks they're "equally good." Sigmoid approaches 1.0 asymptotically but never reaches it, preserving discrimination in the high-score region.&lt;/p&gt;

&lt;p&gt;The shaded area in the chart represents the ranking information that sigmoid preserves — the differences that a hard clamp would flatten.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Full Scoring Formula
&lt;/h2&gt;

&lt;p&gt;Putting all five mechanisms together, a memory's final retrieval score is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;weight = importance + rep_bonus + access_bonus + kind_bias + layer_bias

raw = relevance × (1 + 0.4 × weight + 0.2 × recency)

score = sigmoid(raw)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Where &lt;code&gt;relevance&lt;/code&gt; comes from a hybrid of semantic embeddings and BM25 keyword search, &lt;code&gt;recency&lt;/code&gt; is time-based exponential decay, and &lt;code&gt;importance&lt;/code&gt; is the value after per-epoch exponential decay (counteracted by activation boosts on recall).&lt;/p&gt;

&lt;p&gt;No magic numbers — every coefficient maps to an explainable cognitive mechanism.&lt;/p&gt;




&lt;h2&gt;
  
  
  Specs
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Language&lt;/td&gt;
&lt;td&gt;Rust, single binary, zero external dependencies&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Memory&lt;/td&gt;
&lt;td&gt;~100 MB RSS in production&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Storage&lt;/td&gt;
&lt;td&gt;SQLite, one .db file&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Search&lt;/td&gt;
&lt;td&gt;Semantic embeddings + BM25 (with CJK tokenization)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Platforms&lt;/td&gt;
&lt;td&gt;Linux, macOS, Windows&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/kael-bit/engram-rs" rel="noopener noreferrer"&gt;github.com/kael-bit/engram-rs&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>rust</category>
      <category>rag</category>
      <category>mcp</category>
    </item>
    <item>
      <title>My AI Agent Ran for a Week — Here's How It Remembers Things</title>
      <dc:creator>Kaelii</dc:creator>
      <pubDate>Sat, 28 Feb 2026 16:37:25 +0000</pubDate>
      <link>https://dev.to/kaelbit/my-ai-agent-ran-for-a-week-heres-how-it-remembers-things-2b65</link>
      <guid>https://dev.to/kaelbit/my-ai-agent-ran-for-a-week-heres-how-it-remembers-things-2b65</guid>
      <description>&lt;p&gt;I run a 24/7 AI agent connected to Telegram. It handles daily tasks, spawns sub-agents, and runs scheduled jobs.&lt;/p&gt;

&lt;p&gt;Most agent frameworks have some kind of memory mechanism — a markdown file that gets loaded at session start, where the agent writes things down and reads them back next time. Basic persistence works fine.&lt;/p&gt;

&lt;p&gt;But after running it for a while, I noticed a problem: &lt;strong&gt;the memory file kept growing.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The agent dumped everything in — debug logs, temporary state, duplicate information, long-outdated decisions. The file grew and grew, useful information buried under noise. And there was no cleanup mechanism — things went in, nothing came out.&lt;/p&gt;

&lt;p&gt;I realized the agent didn't just need "the ability to remember things." It needed a memory system with a lifecycle: what to remember, how long to keep it, and when to forget. So I plugged in an external memory service to replace the markdown file. But having the tool didn't mean the problem was solved — &lt;strong&gt;the hardest part was teaching the AI to store and retrieve correctly.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfall #1: It Doesn't Store Anything
&lt;/h2&gt;

&lt;p&gt;After setting up the memory service, I wrote storage instructions in the system prompt. The first version used a table: what to store, what tags to use, what category. Clean and structured — looked great to me.&lt;/p&gt;

&lt;p&gt;The result? The agent stored almost nothing.&lt;/p&gt;

&lt;p&gt;A 30-minute conversation where I corrected two mistakes, confirmed a technical approach, and set a rule — it didn't store a single one. After the session ended, all of it was gone.&lt;/p&gt;

&lt;p&gt;The reason is simple: an LLM's instinct is to &lt;strong&gt;respond&lt;/strong&gt;, not to &lt;strong&gt;record&lt;/strong&gt;. It'll go all-out answering your question, but it won't spontaneously think "is there something worth remembering in this conversation?" No matter how clean the table is, it won't pause mid-conversation to consult it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pitfall #2: Scattered Instructions
&lt;/h2&gt;

&lt;p&gt;After discovering the storage problem, I switched to more forceful imperative instructions with lots of emphasis markers. "CRITICAL: When I correct you, store first, then reply." "HIGHEST PRIORITY: User feedback." "⚠️ Don't miss storing."&lt;/p&gt;

&lt;p&gt;It worked better, but the rules were scattered across different parts of the prompt. CRITICAL appeared three times, ⚠️ twice, 🚫 once. They competed for attention. When everything screams "I'm the most important," nothing is.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pitfall #3: Over-Compression
&lt;/h2&gt;

&lt;p&gt;Realizing the prompt was too long, I did an aggressive trim. It backfired — shorter, yes, but the crucial guidance on &lt;em&gt;when&lt;/em&gt; to store got cut too. The agent went passive: it only stored things when I explicitly said "remember this," no longer proactively extracting decisions and lessons from conversations.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Structure That Finally Worked
&lt;/h2&gt;

&lt;p&gt;After about two or three weeks of iteration, I converged on a stable structure. Four lines, each with a clear function:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Principle&lt;/strong&gt;: Store everything valuable, store it immediately, never batch. Over-storing costs nothing, forgetting costs everything.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action rule&lt;/strong&gt;: User corrects you → store FIRST, then reply. If you think "I'll store this later," you're already wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What to store&lt;/strong&gt;: Identity, preferences, decisions, constraints, lessons, milestone recaps.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;What not to store&lt;/strong&gt;: Command output, step-by-step narration, info already in code/config files.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each line gets an emoji prefix (🧠⚠️✅🚫). Not for decoration — they're visual anchors that help the model parse the structure at a glance. All in one compact block, not scattered across multiple sections.&lt;/p&gt;

&lt;p&gt;Two specific wording changes made the biggest difference:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;"Over-storing costs nothing, forgetting costs everything" — eliminated the agent's hesitation. It stopped agonizing over "is this worth storing?" because the answer is always "storing it can't hurt."&lt;/li&gt;
&lt;li&gt;"Store FIRST, then reply" — solved the timing problem. After finishing a reply, the agent often forgot to store. Forcing store-before-reply meant corrections actually stuck.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Resume: The First Thing After Waking Up
&lt;/h2&gt;

&lt;p&gt;With storing solved, there was still the retrieval problem. Each new session starts as a blank slate — the agent needs to know what it already knows.&lt;/p&gt;

&lt;p&gt;I wrote a hard rule in the prompt: &lt;strong&gt;the first action of every session must be calling the resume endpoint, no exceptions.&lt;/strong&gt; Before replying to the user, before reading files, before anything.&lt;/p&gt;

&lt;p&gt;Resume doesn't return every memory in full (that would blow up the context). Instead, it returns an &lt;strong&gt;index&lt;/strong&gt; — like a table of contents listing all topics and how many memories each contains. When the agent needs details on a specific topic, it pulls them on demand.&lt;/p&gt;

&lt;p&gt;This design resolves a fundamental tension: the agent needs the confidence of "everything is saved" to be willing to store, but the context window can't actually load all memories. The index gives you both.&lt;/p&gt;

&lt;p&gt;But the real pitfall with resume wasn't the design — it was that &lt;strong&gt;it often didn't get triggered&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;A long-running agent continuously accumulates context. The framework periodically compresses it (compaction). The compressed summary preserves the rough outline of the conversation but loses details. The problem: the summary looks "good enough" — the agent reads the compressed context, thinks it knows what's going on, and starts working, &lt;strong&gt;not feeling any need to call resume&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This is a variant of hallucination: the agent gets false confidence from the compressed summary, believing it has enough context, when it's actually lost a mass of specifics — the exact wording of a rule, the reasoning behind a decision, the lesson from last time's mistake.&lt;/p&gt;

&lt;p&gt;Writing "MANDATORY FIRST ACTION" in the prompt wasn't enough. Because the post-compaction context might already contain a seemingly reasonable conversation history, the agent prioritizes responding to that context over following a rule that "doesn't seem urgent."&lt;/p&gt;

&lt;p&gt;My final solution wasn't a prompt rule — it was a &lt;strong&gt;file hook&lt;/strong&gt;. I created a &lt;code&gt;WORKFLOW_AUTO.md&lt;/code&gt; that the framework force-loads after every compaction. The file says one thing: call resume. No matter how the context gets compressed, the agent reads this file and triggers the resume call.&lt;/p&gt;

&lt;p&gt;Moving a critical behavior from "a rule in the prompt" to "a hook in the filesystem" is a completely different level of reliability. Prompts can be ignored. File loading is deterministic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Triggers: Reflexes Before Actions
&lt;/h2&gt;

&lt;p&gt;Once, the agent did something I had explicitly told it not to do. It wasn't being defiant — it had "remembered" the rule (it was in the memory service), but it didn't think to check before executing the action.&lt;/p&gt;

&lt;p&gt;This led me to add a trigger mechanism. When the agent learns a lesson, it stores it with a trigger tag (e.g., &lt;code&gt;trigger:git-push&lt;/code&gt;). Before executing a related action, the prompt instructs it to check for relevant lessons first.&lt;/p&gt;

&lt;p&gt;It's like muscle memory — no conscious recall needed. When the relevant action comes up, the lesson surfaces automatically. Far more reliable than depending on the agent to "remember."&lt;/p&gt;




&lt;h2&gt;
  
  
  After One Week
&lt;/h2&gt;

&lt;p&gt;The agent has been running stably for a week now. Over a hundred memories, automatically clustered into a dozen-plus topics. It remembers who I am, the project's technical decisions, and mistakes it made before. Context restoration after a session restart takes about 300ms.&lt;/p&gt;

&lt;p&gt;Looking back, the biggest lesson isn't technical — it's that &lt;strong&gt;the essence of prompt engineering isn't "what to say" but "how to say it so the model actually listens."&lt;/strong&gt; The same rule, scattered vs. consolidated, passive vs. active voice, with or without explaining why — the difference in effectiveness is night and day.&lt;/p&gt;

&lt;p&gt;The memory system's architecture matters, of course. But if the agent won't use it, the best architecture in the world is useless.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Memory service: &lt;a href="https://github.com/kael-bit/engram-rs" rel="noopener noreferrer"&gt;engram&lt;/a&gt; — single Rust binary, MCP-compatible. Agent framework: &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>mcp</category>
      <category>rag</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Adding a Lifecycle to AI Agent Memory</title>
      <dc:creator>Kaelii</dc:creator>
      <pubDate>Fri, 27 Feb 2026 10:59:22 +0000</pubDate>
      <link>https://dev.to/kaelbit/adding-a-lifecycle-to-ai-agent-memory-372i</link>
      <guid>https://dev.to/kaelbit/adding-a-lifecycle-to-ai-agent-memory-372i</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;This isn't a product pitch. I just want to share some real problems I ran into while building persistent memory for an AI agent, and the approach I ended up with. The code is open source — my approach might not be the best one, and I'd love to hear how others are tackling the same problems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;When building memory for an agent, the most immediate question is: &lt;strong&gt;once you've stored hundreds of memories, how do you make sure the most relevant ones surface during retrieval?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Information has a shelf life. Yesterday's debug log, last week's temporary workaround, last month's architecture decision — they all have very different levels of importance. If every memory is treated equally, retrieval results get flooded with stale noise, and the actually valuable stuff gets buried.&lt;/p&gt;

&lt;p&gt;My approach was to give memories a &lt;strong&gt;lifecycle&lt;/strong&gt; — new information starts in an observation period, valuable stuff gets promoted upward, and outdated entries naturally sink to the bottom.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three-Layer Design: Buffer → Working → Core
&lt;/h2&gt;

&lt;p&gt;I settled on a three-layer structure:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug0gqq9l0nzr0ikgh2cg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fug0gqq9l0nzr0ikgh2cg.png" width="800" height="619"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Most ephemeral information ("just ran a test", "build passed") stays in Buffer and naturally sinks. The genuinely valuable stuff floats up over time. No manual curation needed — the system filters on its own.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decay: Letting Priority Shift Over Time
&lt;/h2&gt;

&lt;p&gt;Decay doesn't delete data. It adjusts &lt;strong&gt;retrieval ranking to reflect recency&lt;/strong&gt;. The longer a memory goes unused, the lower it ranks in search results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;decay_score = importance × e^(−decay_rate × idle_hours / 168)

Buffer:   decay_rate = 5.0   → sinks within days of inactivity
Working:  decay_rate = 1.0   → takes weeks to noticeably drop
Core:     decay_rate = 0.01  → practically permanent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There's one special case — procedural knowledge (deployment steps, coding standards, etc.). These get a decay rate of 0.01 regardless of layer, because process knowledge shouldn't lose priority over time. It doesn't matter if you haven't looked up "how to deploy" in a month — it needs to be there when you need it.&lt;/p&gt;

&lt;p&gt;An early mistake I made: &lt;strong&gt;applying uniform decay to all memories&lt;/strong&gt;. The result was that the agent kept losing track of deployment procedures and had to ask again every time. Once I differentiated by memory type, the problem went away.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repetition = Reinforcement
&lt;/h2&gt;

&lt;p&gt;Human memory has a well-known property: repeated exposure strengthens retention. I mimicked this in the system:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe88kko18v4ij7f1alma1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe88kko18v4ij7f1alma1.png" width="800" height="570"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The more often the same knowledge is mentioned, the more "durable" it becomes — higher importance, still ranks well even after decay. This wasn't part of the original design; it was added after noticing in practice that the agent kept failing to recall things I'd told it multiple times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Retrieval: Semantic + Keyword Hybrid
&lt;/h2&gt;

&lt;p&gt;Storing memories is only half the problem — you also need to find them. Retrieval uses a hybrid strategy:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fih62znbch0tfj8fy7ei7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fih62znbch0tfj8fy7ei7.png" width="800" height="728"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;One real-world gotcha I ran into: &lt;strong&gt;short CJK queries produce unreliable embeddings&lt;/strong&gt;. For example, searching "部署" (deploy) — the embedding model returns nearly identical similarity scores for all Chinese-language memories, making discrimination impossible. The fix was a special case: for short CJK queries, reduce the weight of semantic search and lean harder on keyword matching.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why SQLite
&lt;/h2&gt;

&lt;p&gt;This might be the most controversial choice, but I think it fits the use case well.&lt;/p&gt;

&lt;p&gt;My scenario is single-agent use with hundreds to a few thousand memories. At this scale, SQLite's read/write performance is more than sufficient, and it comes with built-in SQL queries and FTS5 full-text search — no extra dependencies needed.&lt;/p&gt;

&lt;p&gt;The end result: &lt;strong&gt;the entire system compiles to a single binary, runs directly on any machine, and all data lives in one &lt;code&gt;.db&lt;/code&gt; file&lt;/strong&gt;. Backup is &lt;code&gt;cp&lt;/code&gt;. Migration is &lt;code&gt;scp&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Of course, for scenarios with many agents writing concurrently or significantly larger data volumes, the storage choice would need to be reconsidered.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results So Far
&lt;/h2&gt;

&lt;p&gt;It's been running for a few days now, with 80+ memories distributed across the three layers. Ephemeral information in Buffer typically sinks within hours to a day, while valuable entries gradually promote to Working and Core.&lt;/p&gt;

&lt;p&gt;One interesting case: the agent genuinely stops repeating past mistakes — because lesson-type memories are tagged with triggers, and those triggers fire automatically before related operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Open Questions
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;How to organize memories at scale?&lt;/strong&gt; 80 entries is manageable; what about 800? I've since built a self-organizing topic tree (k-means clustering), but that's a separate discussion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-agent memory sharing&lt;/strong&gt; — the system supports multiple agents on a single instance via namespace isolation, but how agents could safely share subsets of memory is still an open question.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluation metrics&lt;/strong&gt; — how do you quantify "memory quality"? Right now I'm eyeballing logs, which isn't exactly scientific.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Code is on &lt;a href="https://github.com/kael-bit/engram-rs" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Written in Rust, MIT licensed.&lt;/p&gt;

&lt;p&gt;If you're working on agent memory too, I'd love to hear from you — especially around how you handle memory lifecycle management.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>rust</category>
      <category>opensource</category>
    </item>
  </channel>
</rss>
