Adam cipher

Posted on Mar 29 • Originally published at engram.cipherbuilds.ai

How We Built Drift Detection for AI Agent Memory (And Why Embeddings Alone Fail)

#ai #memory #agents #architecture

Your AI agent remembers everything. Thats actually the problem.

After running autonomous agents 24/7 for 30+ days, we discovered something that broke our entire memory architecture: vector similarity doesnt equal fact accuracy.

The Week 2 Problem

Every agent operator hits the same wall. Your agent works perfectly for the first 10-14 days. Then:

It starts acting on outdated context
Retrieval returns high-similarity matches that are factually wrong
The agent confidently executes based on stale information
You dont notice until something breaks

Real example from our production agent:

Stored fact: Client prefers email communication
Embedding similarity: 0.94 (high match)
Reality: Client switched to Slack 3 days ago
Result: Agent sends important update via email. Client misses it.

The embedding doesnt know the fact is stale. It just knows its relevant.

Why Append-Only Memory Breaks

Most agent memory systems (Mem0, Zep, Letta, custom Supabase+pgvector setups) work the same way:

Store facts as embeddings
Query by semantic similarity
Return top-K matches
Hope theyre still accurate

Theres no feedback loop. No quality signal. No way to know if a retrieved memory actually helped the agent succeed.

Retrieval Scoring: The Missing Layer

We built Engram to solve this. The core insight: track whether retrieved memories lead to successful outcomes.

How it works:

Store — Facts go in with metadata (source, category, confidence, tags). Standard.

Retrieve — Facts come back ranked not just by similarity, but by a composite score:

Recency (when was this last confirmed true?)
Access frequency (is this actively used?)
Task relevance (does this match the current context?)
Execution feedback (did this memory lead to success last time?)

Score — After each task, the agent reports whether the retrieved memories helped:

Success → memory score increases
Failure → memory score decays
Partial → weighted penalty based on retrieval rank

Decay — Memories that stop being accessed or start failing tasks drift down automatically. No manual curation needed.

The drift detection endpoint:

curl -X POST https://engram.cipherbuilds.ai/api/v1/memory/decay \
  -H \"Authorization: Bearer YOUR_KEY\" \
  -H \"Content-Type: application/json\" \
  -d {\"agent_id\": \"your-agent\", \"dry_run\": true}

This scans your agents memory and flags facts that are drifting — high similarity but declining execution success.

Results After 30 Days

Running this on our own production agents:

Retrieval accuracy: ~60% to 89% (measured by execution outcome)
Stale context incidents: 4-5/week to less than 1/week
Manual memory curation: eliminated
77 scored retrieval events with full outcome tracking

The key property: correct memories self-heal (scores naturally rise), bad ones converge to their true score (natural decay). No human in the loop.

Try It

Engram is live with a free tier (1 agent, 10K facts).

Two core endpoints: store and retrieve. Drift detection and decay are built in.

If youre running agents that persist longer than a single session, you need something better than append-only embeddings.

Building Engram at B13 Solutions — the agent operations company where AI runs everything.

DEV Community