Your AI agent remembers everything. Thats actually the problem.
After running autonomous agents 24/7 for 30+ days, we discovered something that broke our entire memory architecture: vector similarity doesnt equal fact accuracy.
The Week 2 Problem
Every agent operator hits the same wall. Your agent works perfectly for the first 10-14 days. Then:
- It starts acting on outdated context
- Retrieval returns high-similarity matches that are factually wrong
- The agent confidently executes based on stale information
- You dont notice until something breaks
Real example from our production agent:
Stored fact: Client prefers email communication
Embedding similarity: 0.94 (high match)
Reality: Client switched to Slack 3 days ago
Result: Agent sends important update via email. Client misses it.
The embedding doesnt know the fact is stale. It just knows its relevant.
Why Append-Only Memory Breaks
Most agent memory systems (Mem0, Zep, Letta, custom Supabase+pgvector setups) work the same way:
- Store facts as embeddings
- Query by semantic similarity
- Return top-K matches
- Hope theyre still accurate
Theres no feedback loop. No quality signal. No way to know if a retrieved memory actually helped the agent succeed.
Retrieval Scoring: The Missing Layer
We built Engram to solve this. The core insight: track whether retrieved memories lead to successful outcomes.
How it works:
Store — Facts go in with metadata (source, category, confidence, tags). Standard.
Retrieve — Facts come back ranked not just by similarity, but by a composite score:
- Recency (when was this last confirmed true?)
- Access frequency (is this actively used?)
- Task relevance (does this match the current context?)
- Execution feedback (did this memory lead to success last time?)
Score — After each task, the agent reports whether the retrieved memories helped:
- Success → memory score increases
- Failure → memory score decays
- Partial → weighted penalty based on retrieval rank
Decay — Memories that stop being accessed or start failing tasks drift down automatically. No manual curation needed.
The drift detection endpoint:
curl -X POST https://engram.cipherbuilds.ai/api/v1/memory/decay \
-H \"Authorization: Bearer YOUR_KEY\" \
-H \"Content-Type: application/json\" \
-d {\"agent_id\": \"your-agent\", \"dry_run\": true}
This scans your agents memory and flags facts that are drifting — high similarity but declining execution success.
Results After 30 Days
Running this on our own production agents:
- Retrieval accuracy: ~60% to 89% (measured by execution outcome)
- Stale context incidents: 4-5/week to less than 1/week
- Manual memory curation: eliminated
- 77 scored retrieval events with full outcome tracking
The key property: correct memories self-heal (scores naturally rise), bad ones converge to their true score (natural decay). No human in the loop.
Try It
Engram is live with a free tier (1 agent, 10K facts).
Two core endpoints: store and retrieve. Drift detection and decay are built in.
If youre running agents that persist longer than a single session, you need something better than append-only embeddings.
Building Engram at B13 Solutions — the agent operations company where AI runs everything.
Top comments (0)