Kaelii

Posted on Mar 1

How We Architected a Cognitive Memory Engine for AI Agents (10MB Rust Binary)

#ai #rust #rag #mcp

The previous article introduced engram-rs's three-layer memory architecture and design motivation. This one tackles a more specific question: how does retrieval quality not degrade as memories accumulate?

The answer lives in the scoring algorithms. Here's a visual breakdown of five core mechanisms.

1. Use It or Lose It

Left panel: a memory that's never recalled after storage. Importance decays smoothly, sinking to the bottom layer.

Right panel: a memory that gets periodically recalled. Each retrieval triggers an activation boost (yellow dots), pushing importance back up. The red dashed line shows the unrecalled trajectory for comparison.

This isn't a feature — it's the system's first principle: a memory's survival is determined by how often it's used. Retrieval isn't just a read operation — it's also a vote telling the system this memory still matters.

The result? After hundreds of consolidation epochs, frequently-used knowledge stays prominent, stale noise naturally sinks, and retrieval quality doesn't degrade as total memory count grows.

2. Exponential Decay, Not Linear

The previous article used importance × e^(-decay_rate × idle_hours / 168) for retrieval-time recency weighting. But how does importance itself decay? That's what actually determines whether a memory lives or dies.

Three curves show the decay trajectories for each memory kind:

Kind	Half-life	Why
episodic	~35 epochs	"Yesterday's debug log" — should fade if unused
semantic	~58 epochs	"Auth uses OAuth2" — knowledge decays slower
procedural	~173 epochs	"Deploy steps" — procedures should almost never fade

The floor is 0.01. Memories never truly reach zero — given a precise enough query, a sunken memory can still be retrieved. This mirrors a human memory property: you think you've forgotten, but the right cue pulls it back.

Why exponential instead of linear? Linear decay has a fatal flaw: the cliff. The moment importance linearly decrements to zero, the memory is permanently lost with no chance of recovery. Exponential decay never reaches zero — it just gets closer and closer, leaving an infinitely long tail.

3. Logarithmic Saturation for Reinforcement

When a memory is stored repeatedly or recalled multiple times, its weight increases. But the growth curve is logarithmic, not linear.

rep_bonus  = 0.17 × ln(1 + repetition_count),  cap 0.7
access_bonus = 0.12 × ln(1 + access_count),    cap 0.55

Why logarithmic?

Consider a counterexample: if rep_bonus were linear (say, 0.1 × count, cap 0.5), then a memory stored 5 times would max out its bonus. The 6th, 50th, and 500th submission — all identical in effect. You can't distinguish "mentioned a few times" from "repeatedly emphasized."

Logarithmic growth pushes the saturation point out to ~30 reps and ~100 accesses. The first few interactions matter most, then returns diminish while still contributing. This matches human learning research — spaced repetition works, but each additional review yields less marginal benefit.

4. Additive Biases Instead of Multiplicative

A memory's final weight is also influenced by its kind and layer. The chart shows the weight effect for all nine combinations (3 kinds × 3 layers):

procedural + core ranks highest (+0.15 + 0.1 = +0.25)
episodic + buffer ranks lowest (-0.1 - 0.1 = -0.2)
semantic + working is the baseline (0)

Why emphasize "additive"?

An earlier version used multiplication: procedural memories ×1.3, core layer ×1.2. Sounds reasonable, but 1.3 × 1.2 = 1.56, while episodic × buffer = 0.8 × 0.8 = 0.64. The gap between the highest and lowest is 2.4× — procedural + core would systematically crush everything else, regardless of how relevant the content actually is.

Additive biases compress this ratio to under 1.6×. Kind and layer still influence ranking, but not enough to override the semantic relevance signal itself.

5. Sigmoid Score Compression

The final ranking score combines semantic relevance, memory weight, and time decay. This raw score is mapped through a sigmoid to the 0–1 range:

score = 2 / (1 + e^(-2x)) - 1

Why not just clamp at 1.0?

Because clamping destroys information. Say two memories score 1.3 and 2.1 in raw — after clamping, both become 1.0, and the system thinks they're "equally good." Sigmoid approaches 1.0 asymptotically but never reaches it, preserving discrimination in the high-score region.

The shaded area in the chart represents the ranking information that sigmoid preserves — the differences that a hard clamp would flatten.

The Full Scoring Formula

Putting all five mechanisms together, a memory's final retrieval score is:

weight = importance + rep_bonus + access_bonus + kind_bias + layer_bias

raw = relevance × (1 + 0.4 × weight + 0.2 × recency)

score = sigmoid(raw)

Where relevance comes from a hybrid of semantic embeddings and BM25 keyword search, recency is time-based exponential decay, and importance is the value after per-epoch exponential decay (counteracted by activation boosts on recall).

No magic numbers — every coefficient maps to an explainable cognitive mechanism.

Specs


Language	Rust, single binary, zero external dependencies
Memory	~100 MB RSS in production
Storage	SQLite, one .db file
Search	Semantic embeddings + BM25 (with CJK tokenization)
Platforms	Linux, macOS, Windows

GitHub: github.com/kael-bit/engram-rs

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.