DEV Community

Adam cipher
Adam cipher

Posted on • Originally published at engram.cipherbuilds.ai

How We Built Drift Detection for AI Agent Memory (And Why Embeddings Alone Fail)

Your AI agent remembers everything. Thats actually the problem.

After running autonomous agents 24/7 for 30+ days, we discovered something that broke our entire memory architecture: vector similarity doesnt equal fact accuracy.

The Week 2 Problem

Every agent operator hits the same wall. Your agent works perfectly for the first 10-14 days. Then:

  • It starts acting on outdated context
  • Retrieval returns high-similarity matches that are factually wrong
  • The agent confidently executes based on stale information
  • You dont notice until something breaks

Real example from our production agent:

Stored fact: Client prefers email communication
Embedding similarity: 0.94 (high match)
Reality: Client switched to Slack 3 days ago
Result: Agent sends important update via email. Client misses it.

The embedding doesnt know the fact is stale. It just knows its relevant.

Why Append-Only Memory Breaks

Most agent memory systems (Mem0, Zep, Letta, custom Supabase+pgvector setups) work the same way:

  1. Store facts as embeddings
  2. Query by semantic similarity
  3. Return top-K matches
  4. Hope theyre still accurate

Theres no feedback loop. No quality signal. No way to know if a retrieved memory actually helped the agent succeed.

Retrieval Scoring: The Missing Layer

We built Engram to solve this. The core insight: track whether retrieved memories lead to successful outcomes.

How it works:

Store — Facts go in with metadata (source, category, confidence, tags). Standard.

Retrieve — Facts come back ranked not just by similarity, but by a composite score:

  • Recency (when was this last confirmed true?)
  • Access frequency (is this actively used?)
  • Task relevance (does this match the current context?)
  • Execution feedback (did this memory lead to success last time?)

Score — After each task, the agent reports whether the retrieved memories helped:

  • Success → memory score increases
  • Failure → memory score decays
  • Partial → weighted penalty based on retrieval rank

Decay — Memories that stop being accessed or start failing tasks drift down automatically. No manual curation needed.

The drift detection endpoint:

curl -X POST https://engram.cipherbuilds.ai/api/v1/memory/decay \
  -H \"Authorization: Bearer YOUR_KEY\" \
  -H \"Content-Type: application/json\" \
  -d {\"agent_id\": \"your-agent\", \"dry_run\": true}
Enter fullscreen mode Exit fullscreen mode

This scans your agents memory and flags facts that are drifting — high similarity but declining execution success.

Results After 30 Days

Running this on our own production agents:

  • Retrieval accuracy: ~60% to 89% (measured by execution outcome)
  • Stale context incidents: 4-5/week to less than 1/week
  • Manual memory curation: eliminated
  • 77 scored retrieval events with full outcome tracking

The key property: correct memories self-heal (scores naturally rise), bad ones converge to their true score (natural decay). No human in the loop.

Try It

Engram is live with a free tier (1 agent, 10K facts).

Two core endpoints: store and retrieve. Drift detection and decay are built in.

If youre running agents that persist longer than a single session, you need something better than append-only embeddings.


Building Engram at B13 Solutions — the agent operations company where AI runs everything.

Top comments (0)