DEV Community

Cover image for AI Memory Is Kind of Broken. A Cambridge Researcher Proved It .
Vektor Memory
Vektor Memory

Posted on • Originally published at Medium

AI Memory Is Kind of Broken. A Cambridge Researcher Proved It .


Imagine hiring the sharpest assistant you’ve ever worked with.

Day one: brilliant. They absorb everything — your project context, your preferences, your past decisions, your naming conventions. They ask exactly the right questions. They remember the answers.

Day two: you ask them to build on what you discussed yesterday.

They look at you blankly. Then they ask the same questions again. Same ones. Word for word.

Except it’s worse than that. Because this assistant doesn’t just forget — they misremember. They confidently recall things that never happened, blend old decisions with new ones, treat contradictions as equally valid, and surface three-week-old context they should have discarded alongside the important context you actually need.

This is what every AI agent you’re using right now is doing to your data. Not because the models are bad. Because the memory layer underneath them is architecturally broken.

In March 2026, researchers at Cambridge and an independent AI lab published a paper that proved exactly why — and what the correct fix looks like.

We built that fix…
Part 1 — The Research: Why Your AI’s Memory Is Making Things Worse
arXiv:2603.15994 — “Selective Memory for Artificial Intelligence: Write-Time Gating with Hierarchical Archiving” — Zahn & Chana, March 2026

The paper opens with a clean diagnosis of the two dominant memory paradigms for AI:

Paradigm 1: RAG (Retrieval-Augmented Generation)
→ Stores everything. Every document, utterance, summary.
→ Retrieves by similarity at query time.
→ Problem: no quality filter on write. Noise accumulates.
More data = more degradation, not more accuracy.
Paradigm 2: Parametric (weights / fine-tuning)
→ Compresses knowledge into model weights.
→ Problem: updates require retraining.
You can't selectively correct a fact. You retrain the whole model.
Neither mirrors how biological memory actually works. Your brain doesn’t store everything indiscriminately and sort it out later. It filters at the moment of encoding — gating what gets remembered based on salience, novelty, and relevance. And when something you knew becomes outdated, your brain doesn’t delete it — it archives it, creating a hierarchy where the new knowledge supersedes the old without destroying the chain.

The paper’s core proposition:
Apply the same principles to AI memory. Gate at write time. Archive rather than overwrite.

The Experiment That Changes Everything
The researchers tested three conditions across Wikipedia entities, procedurally generated pharmacology data, and 2026 arXiv papers:

Ungated RAG — store everything, filter at read time
Self-RAG — read-time filtering (the current state of the art)
Write-time gating — filter before storage using salience scores
The baseline results were already stark. Ungated stores achieved 13% accuracy. Write gating: 100%.

Then they scaled the distractors:

Distractor ratio test (noise:signal in the memory store)
────────────────────────────────────────────────────────
Ratio Ungated RAG Self-RAG Write Gating
────────────────────────────────────────────────────────
1:1 13% — 100%
2:1 — — 100%
4:1 — collapses 100%
8:1 — 0% 100%
────────────────────────────────────────────────────────
At 8:1 distractors, Self-RAG hits zero.
Write gating holds at 100%.
This is not a marginal improvement. At realistic noise levels — the kind that accumulate naturally over any long-running agent session — read-time filtering completely collapses. Write-time gating doesn’t degrade at all.

The additional finding: write gating matches Self-RAG accuracy at one-ninth the query-time cost. Filtering once at write time is nine times cheaper than filtering on every read.

The Salience Gate
How does write-time gating decide what gets in? The paper proposes a composite salience score built from three signals:

Composite Salience Score
────────────────────────────────────────
Source reputation → who/what produced this?
Novelty → does this add new information?
Reliability → is this consistent with known facts?
────────────────────────────────────────
Below threshold → cold storage (archived, not deleted)
Above threshold → write to active memory graph
Critically: objects below threshold are archived, not discarded. The information still exists — it’s just deprioritized. The system can still answer “what was the previous state?” because the superseded node is retained in cold storage.

Supersession Chains
This is the concept that matters most for long-running agents.

Standard RAG performs overwrites. When something changes — a decision gets revised, a fact becomes stale, a user preference updates — the old value is either kept (creating contradiction) or deleted (destroying history).

The paper proposes supersession chains instead:

Standard RAG update:
OLD: "Deploy on Vercel" ──────────────────▶ [OVERWRITTEN]
NEW: "Deploy on Railway" → stored as new fact
Result: old decision is gone. No version history.
Agent cannot answer: "what did we decide before?"
Supersession chain:
OLD: "Deploy on Vercel" ──── superseded_by ──▶ ARCHIVED
NEW: "Deploy on Railway" → active node │

retrievable by
temporal query
Result: current state is clear. History is preserved.
Agent can answer both "what do we use now?"
and "what did we decide before, and why did we change?"
A system tracking that a CEO changed retains the ability to recall who the previous CEO was. The supersession creates hierarchy rather than replacement.

Part 2 — Why This Matters for Real Agent Work
The paper validates something developers using AI agents hit instinctively after a few weeks of serious use: more memory is not better memory.

Every long-running agent session produces noise. Contradictory drafts. Interim decisions that got reversed. Redundant observations. Throwaway context that shouldn’t persist. In a flat, ungated store, all of this accumulates with equal weight — and at real session lengths, the 8:1 distractor ratio the paper tested isn’t a stress test. It’s Tuesday.

Write on Medium
The failure modes are specific and recognisable:

Failure 1: Contradiction accumulation
"Use Postgres" stored week 1
"Maybe try MongoDB?" stored week 3
"Actually, stick with Postgres" stored week 3
→ Ungated store now has three equal-weight entries
→ Agent hedges. Asks clarifying questions. Loses confidence.
Failure 2: Stale context wins
"Deploy to staging" (urgent, high-recency)
"Production credentials are X" (old, but critically important)
→ Similarity search surfaces the urgent recent context
→ Old-but-critical context gets pushed below retrieval threshold
→ Agent proceeds with wrong credentials
Failure 3: Decision amnesia
Three weeks ago: "We chose Stripe because our EU compliance
requirements rule out PayPal."
Today: Agent suggests PayPal. Doesn't remember why Stripe was chosen.
→ No supersession chain. No archived reasoning. Just absence.
These aren’t edge cases. They’re the normal operating conditions of any agent doing real work on a real project over real time. The paper’s contribution is showing that these failure modes are structural properties of ungated stores — not fixable by better retrieval algorithms, better prompts, or bigger context windows. The fix has to happen at write time.

Part 3 — How VEKTOR Implements the Architecture the Paper Describes
The terminology is different. The architecture is similar.

AUDN: Write-Time Gating
Every memory that enters VEKTOR passes through the AUDN curation loop before being written to the graph. AUDN evaluates every incoming memory object against the existing graph and makes one of four decisions:

Incoming: "User now prefers Railway over Vercel"


┌────────────────────────┐
│ AUDN Loop │
│ │
│ Check existing graph │
│ "Deploy on Vercel" │
│ exists at node #441 │
│ │
│ Decision: UPDATE │
│ → supersede #441 │
│ → archive old node │
│ → write new node │
│ → create temporal │
│ edge between them │
└────────────────────────┘


Graph: node #441 (archived, superseded_by #847)
node #847 (active: "Deploy on Railway")
temporal edge: #441 → #847, timestamp
The four AUDN decisions map directly to the paper’s framework:

AUDN Decision Paper equivalent
──────────────────────────────────────────────────
ADD Above salience threshold → write
UPDATE Supersession → archive old, write new
DELETE Below threshold + contradicts known fact
NO_OP Below threshold + already known → cold archive
Zero contradictions accumulate. Every update preserves its history. The graph stays clean at write time — not at read time, where it’s already too late.

MAGMA: The Graph That Supersession Chains Live In
The paper proposes that superseded nodes should be retained in cold storage, retrievable via temporal queries. VEKTOR’s MAGMA graph has dedicated architecture for exactly this:

MAGMA — 4 Layer Associative Graph
═══════════════════════════════════════════════════════════════
SEMANTIC │ Similarity between active memory nodes
│ importance-scored · decays over time
────────────┼──────────────────────────────────────────────────
CAUSAL │ Cause → Effect edges
│ why decisions were made · reasoning chains
────────────┼──────────────────────────────────────────────────
TEMPORAL │ Before → After sequences
│ supersession chains live here
│ "what changed, when, and from what"
────────────┼──────────────────────────────────────────────────
ENTITY │ People · projects · events · auto-linked
│ co-occurrence connections across memory
═══════════════════════════════════════════════════════════════
The temporal layer is where supersession chains persist. When AUDN archives a node with an UPDATE decision, the temporal layer records the edge — old node, new node, timestamp, reason for supersession. The agent can then traverse this chain in either direction: forwards to find the current state, backwards to find what was believed before, and why it changed.

The memory.delta() Method
This is the practical interface for supersession chain queries:

// What changed on this topic in the last 30 days?
const changes = await memory.delta("deployment preferences", { days: 30 });
// Returns:
// [
// {
// from: "Deploy on Vercel",
// to: "Deploy on Railway",
// superseded_at: "2026-04-12T14:23:00Z",
// reason: "Vercel pricing changed, Railway better fit for self-hosted"
// }
// ]
The agent doesn’t just know the current state. It knows the history, the sequence, and — if the reason was stored — the why behind each change.

REM: The Nightly Consolidation That Keeps the Gate Clean
Write-time gating handles quality on input. But real agent sessions also produce noise from the session itself — contradictory drafts, interim states, redundant observations that were valid during the session but shouldn’t persist as permanent memory.

VEKTOR’s 7-phase REM consolidation cycle runs while idle:

Raw session nodes (before REM) After REM: 50:1 compression
────────────────────────────────── ────────────────────────────────
"considering approach A" RESOLVED: "Approach B selected.
"approach A has latency issues" Reason: A added 200ms latency on
"trying approach B" cold start. B benchmarked at 12ms.
"B is faster on cold start" Decision final. A archived."
"A vs B, not sure yet"
"B confirmed, deploying B"
─── 6 raw nodes, 98% noise ────── ─── 1 truth node, full reasoning ──
The noise doesn’t survive REM. The reasoning does. The archived nodes remain accessible via temporal query. The active graph gets sharper overnight, not noisier.

Part 4 — The Memory Gaps You’re Actually Living With
The paper describes the theory. Here’s how it maps to what developers experience.

Gap 1: Your agent re-asks questions you’ve already answered.

This is contradiction accumulation. The agent has equal-weight evidence for multiple positions and hedges by asking again. AUDN’s write-time gate prevents this — each update resolves the contradiction rather than adding to it.

Gap 2: Your agent forgets why a decision was made.

Three weeks ago you chose Postgres over MongoDB for specific reasons — EU data residency, your team’s expertise, a specific query pattern. Next month the agent suggests MongoDB. It doesn’t remember the reasoning, only the decision — and decisions without reasoning can always be re-litigated.

VEKTOR’s causal layer stores the edge: chose Postgres → because: EU data residency + team expertise. Graph traversal surfaces the reasoning alongside the decision. The agent can apply that logic to new situations.

Gap 3: Your agent treats stale context as current.

Production credentials from six months ago. A naming convention that was revised. An API endpoint that changed. In a flat ungated store, these persist with the same weight as your most recent session’s context. VEKTOR’s temporal decay scoring and REM consolidation progressively deprioritise nodes that haven’t been reinforced — the old credentials don’t disappear, but they don’t compete with current context either.

Gap 4: History is permanently unrecoverable.

Standard RAG overwrites. Once the new state is stored, the prior state is gone. You can’t ask “what did we decide before we changed this?” because there’s no supersession chain — just the current value, no lineage.

VEKTOR’s temporal layer preserves every superseded node. memory.delta() makes the full history queryable. The agent can answer both the current-state question and the history question from the same graph.

The Architecture the Research Points To
The paper by Zahn and Chana establishes something important: the problem with AI memory isn’t retrieval quality, context window size, or model capability. It’s the absence of a write-time gate.

Current AI memory (ungated):
Every input → stored → retrieved by similarity
More data → more noise → worse results
At 8:1 distractor ratio → complete collapse
Correct AI memory (write-time gated):
Every input → salience evaluation → gate decision
Contradictions resolved on write, not on read
Supersession chains preserve history
At 8:1 distractor ratio → 100% accuracy maintained
VEKTOR is this architecture, running locally on your machine, connected to every major AI client, at 8ms recall with zero cloud dependency.

Your agent has been working from a broken memory system. The fix isn’t a prompt. It isn’t a bigger context window. It’s a write-time gate, a graph with supersession chains, and a consolidation cycle that runs while you sleep.

Get VEKTOR Slipstream →
Read the paper →
Read the docs →

VEKTOR Slipstream is a local-first MCP server for persistent AI memory. AUDN write-time curation. MAGMA 4-layer graph. REM consolidation. 8ms recall. One-time purchase. Zero cloud.

Ai Agentic
Ai Memory
Agentic Workflow
Agentic Rag

Top comments (0)