Vektor Memory

Posted on Jun 11

Your vector memory database remembers everything. That’s exactly the issue.

#ai #vectordatabase #memory #rag

There is a design assumption baked into almost every vector database and AI memory implementation that sounds reasonable until you watch it grow nodes in production: that remembering more is always better.

Through testing and refining our AUDN code, that is not exactly correct.

After running VEKTOR Slipstream against real development sessions for 99 days, the database held 1,413 stored memories across four namespaces. Looking at the importance score distribution, 83 percent of those memories sat below 0.25 out of 1.0, what the system considers the noise floor. The remaining 17 percent, just 60 memories out of 1,413, sat above 0.75 and dominated every recall result.

This is exactly what a curation layer is supposed to produce.

Those 1,154 low-scored memories are accurate. They are not deleted. They are retrievable by direct query. What they are not is important enough to compete with the 60 high-signal entries every time the agent needs context.

AUDN penalised them gradually over hundreds of writes because similar, more specific, or more frequently reinforced memories covered the same ground better. The system created a hierarchy. Without curation, all 1,413 memories would compete equally for every recall slot — and the agent would consistently surface redundant, lower-value context alongside the things that actually matter.

That is what standard vector memory looks like without a curation layer. A slow, invisible degradation that nobody notices until the agent starts confidently giving you answers that are three months out of date.

Every memory node in Vektor carries an importance score between 0 & 1.

When a memory is first stored, it receives a score based on the content’s estimated significance. That score is not fixed. Every time a new memory arrives that is semantically related but not directly contradictory, the compatible verdict for that existing memory takes a small redundancy penalty.

The penalty is intentionally modest: a factor based on how similar the incoming content is, typically reducing the score by 10 to 15 percent per occurrence. But across hundreds of sessions, the effect compounds. A memory about project tooling that gets reinforced by similar writes across a dozen conversations will have its score driven down steadily until it sits below the noise floor threshold where it no longer competes in active recall.

The noise floor is not a bin for broken or wrong memories. It is where memories go when the system has determined they are not the most important version of what they represent.

They are still stored and still retrievable by direct query. They stop dominating recall alongside the 60 high-signal entries that floated to the top of the distribution. This is the intended behavior: a natural hierarchy where what matters most surfaces first, and everything else remains available without contributing noise to every retrieval.

The Mechanism Nobody Talks About

Vector databases are extraordinarily good at one thing: storing info and finding information that is semantically similar to other things. That is genuinely useful but is not the best method currently available.

When a user tells your agent “I work in finance” in January, and “I left banking last month” in April, a vector store dutifully records both facts.

The embeddings sit close together in the vector space because they are about the same topic. When you query for professional context in May, you get both back. The agent receives two conflicting truths with no metadata to tell it which one is current, and it does what language models do when given ambiguous context: it synthesises a plausible-sounding answer that may or may not reflect reality.

This is not a retrieval problem. You cannot fix it at recall time by adding better filters or smarter reranking, because by the time you are querying, the contradiction is already in the graph and competing for attention. The only place to fix it is at the write layer, before the conflicting fact is committed.

This is the insight that drove the architecture of the AUDN gate. Belwo is a real production at work, semantic, causal, temporal, and entity nodes in formation.

(Production graph, temporal nodes only)

What a Write-Layer Curation Gate Actually Does

AUDN runs synchronously on every single memory write before anything touches the database. Every incoming piece of information is compared against the 200 most recent active memories using cosine similarity, which is a pure SQLite operation that completes in under two milliseconds. If nothing similar exists, the memory is committed immediately as a fresh addition. If something similar does exist at a cosine score above 0.72, the gate sends the pair to an LLM for classification.

The LLM used for this test run is the Groq llama3–8b-8192. It is fast, has generous free tier limits, and is accurate at the kind of binary classification this requires.

To keep API costs and rate limits manageable, pairs are batched: up to ten candidate pairs are classified in a single call. If the LLM is unavailable for any reason, AUDN falls back to a heuristic where similarity above 0.95 becomes a no-operation and everything else is treated as compatible. A write is never blocked. The fallback trades accuracy for availability, which is the correct tradeoff.

The classification is not binary. There are five possible verdicts, and each one produces a different action.

(Entity node advising: Supersession chain explicit edges — AUDN UPDATE)

Compatible means both facts are simultaneously true from different angles. The incoming memory is stored, but the existing memory takes a small redundancy penalty to its importance score. Over time this naturally surfaces the more specific, more frequently accessed, and more recent memories to the top of the priority stack.

Contradictory means the incoming fact directly conflicts with an existing one. The new fact wins, subject to one important condition: the trust matrix. If the incoming memory carries a trust score below 80 percent of the existing memory’s trust score, the verdict is downgraded to Compatible instead. A hedged conversational fragment cannot overwrite a verified session fact, regardless of semantic similarity. When a true contradiction is confirmed, the existing memory is suppressed using an exponential decay function over a 30-day window rather than being deleted outright.

Subsumes means the incoming fact is more general and logically contains the existing one. The existing memory is moved to cold storage, where it is archived but no longer competes in active recall.

Subsumed means the existing fact is more general and the incoming one adds nothing. The new memory is dropped entirely and the existing memory receives a small importance boost instead.

No-Op means the incoming fact is already known at high confidence. At cosine similarity above 0.95, the write is skipped and the existing memory’s access count increments. This is how the system handles the natural tendency to keep storing the same things: the second instance of a fact strengthens the first rather than creating a duplicate.

The 83 Percent Finding

Looking at the actual data from our live development database makes the shape of this problem visible in a way that abstract descriptions do not.

Of 1,413 active memories accumulated over 99 days, 1,154 carry an importance score between 0.10 and 0.25. These are memories that have been through the Compatible path multiple times, accumulating small redundancy penalties each time a related but non-contradictory memory was written nearby. None of them are wrong or contradictory.

They are simply less important than the 60 memories at the top of the importance distribution that have been reinforced, accessed repeatedly, and never penalised.

This is the intended outcome. A flat vector store treats every fact as equally important forever, which means retrieval quality degrades as the graph grows because signal and noise compete on equal terms. A curated graph creates a natural hierarchy where the most meaningful, most reinforced, most current facts rise to the top and everything else stays available but stops dominating recall.

The 60 high-signal memories in that database are session handover notes, confirmed architectural decisions, and key project facts that have been written and rewritten and accessed across dozens of sessions. They float. The rest sinks. Retrieval becomes faster and more accurate as the database grows rather than slower and noisier, which is the opposite of what happens in an uncurated vector store.

The Trust Matrix in Practice

Seventeen percent of the memories in that database carry a trust score below 0.7. These are predominantly extracted conversation fragments and inferred facts, stored automatically during session ingestion. The remaining 70 percent carry trust scores above 0.9, representing directly stored and confirmed information.

The trust guard exists because not all writes come from the same source. A user speaking casually in a conversation generates different quality information than an agent explicitly recording a confirmed decision.

When a low-trust fragment arrives with high semantic similarity to a high-trust existing memory, the Contradictory verdict is overridden. The system does not allow speculation to overwrite certainty, even when they are talking about the same thing.

This protects against a failure mode that is easy to trigger without the guard: an agent processing a stream of hedged, uncertain user statements gradually erodes its verified knowledge base because every “I think maybe” and “probably something like” crosses the similarity threshold and overwrites something solid.

Nothing Disappears Silently

Every AUDN decision is written to an audit log before the underlying action executes. The schema stores the action taken, the memory affected, a 500-character snapshot of the content at the time of decision, the reason the LLM gave for its classification, the cosine similarity score that triggered the call, and the timestamp. This means you can query the full decision history:

const decisions = await memory.auditLog({ action: 'CONTRADICTORY', since: '30d' });
// Returns each contradiction resolved in the last 30 days,
// including what was suppressed, what replaced it, and why.
The reason field is worth particular attention. When Groq classifies a pair as Contradictory, it returns a brief natural-language explanation alongside the verdict. That explanation is stored verbatim. You can surface it to users, use it to debug unexpected agent behaviour, or build explainability features on top of it. This turns what would otherwise be an opaque curation mechanism into something observable and trustworthy.

Cold-archived memories are also still retrievable via direct query. Nothing is permanently deleted. The lineage of how knowledge evolved is preserved. If you need to understand why the agent believes what it currently believes, you can trace the chain of decisions that got it there.

The Deeper Point

Most of the work on improving agent memory has focused on retrieval: better reranking, hybrid search, query expansion, context window management. These are genuinely useful improvements. They do not solve the underlying problem because the underlying problem is not about retrieval.

The problem is that vector memory, as typically implemented, is an append-only log. It grows indefinitely. It accumulates contradictions silently. It degrades in signal quality over time while appearing to grow in capability because the database keeps getting larger. By the time the degradation is visible in agent output quality, the problem is months old and deeply embedded in the graph.

The fix is not a better retrieval algorithm. It is a state machine at the write layer that maintains a consistent, curated, non-contradictory representation of what is currently known, with full lineage tracking for how that knowledge evolved.

After several months of node curation, your graph unfolds with deeper insights.

Vector memory that just stores things is a log. Vector memory with a curation gate is an epistemological layer. The difference shows up quietly at first, and then everywhere at once.

“The principle of LLM-arbitrated curation at the write layer is grounded in published research. The Mem0 paper (arXiv:2504.19413) demonstrated that structured memory management consistently outperforms append-only approaches across single-hop, temporal, multi-hop, and open-domain question categories on the LoCoMo benchmark.”

Yadav, Deshraj et al. “Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory.” arXiv:2504.19413 (April 2026).

Link: https://arxiv.org/abs/2504.19413

VEKTOR Slipstream ships AUDN as part of the core write path on every memory.remember() call. The audit log, trust matrix, cold storage, and five-verdict conflict resolution are all active by default. Documentation and source at vektormemory.com.

Audn
Vector Database
Llm Agent
Memory Management

DEV Community

Your vector memory database remembers everything. That’s exactly the issue.

Top comments (0)