Authora Dev

Posted on Apr 14

Why your AI agent gets dumber over time (and how to fix memory drift)

#programming #ai #tutorial #devops

Last week, a coding agent in a test repo did something weird: it opened the right files, referenced the wrong API version, and confidently wrote code for a migration we had already rolled back.

Nothing was “broken” in the usual sense. The prompts were fine. The tools were available. The model was good.

The problem was memory drift.

If you’ve built anything with long-running agents, you’ve probably seen it too: the agent starts strong, then gradually retrieves stale facts, outdated decisions, or half-relevant chunks from old work. Over time, its “memory” turns into a confidence amplifier for bad context.

A lot of teams try to solve this with a bigger vector store. That helps… until it doesn’t.

The real issue: vector stores decay quietly

Vector stores are great for fuzzy retrieval. If your agent needs “something similar to this design doc” or “the auth code near this endpoint,” embeddings are useful.

But agent memory is not just similarity search.

It’s often:

what changed
what supersedes what
who approved a decision
which fact is still valid
what depends on what
what should never be forgotten

That’s where vector-only memory starts to decay.

A simple example

Suppose your agent stores these facts over time:

JWT auth is used for internal APIs
Moved to mTLS for service-to-service auth
JWT still used for browser sessions
Deprecated auth middleware in v3
Hotfix restored old middleware for admin routes

A vector store can retrieve “similar auth-related stuff,” but it won’t naturally answer:

which statement is the latest truth?
which fact overrides another?
which context applies only to admin routes?
which decision was temporary?

That’s not an embedding problem. That’s a relationship problem.

Knowledge graphs don’t replace vectors — they constrain them

The best pattern I’ve seen is:

vector store for recall
knowledge graph for truth maintenance

Think of it like this:

User query
   |
   v
[Vector Search] ---> finds possibly relevant notes/docs/chunks
   |
   v
[Knowledge Graph] ---> resolves relationships:
                      - supersedes
                      - depends_on
                      - approved_by
                      - valid_for
                      - expires_at
   |
   v
[LLM Context] ---> smaller, fresher, less contradictory

A knowledge graph gives your system structure around memory:

entities: services, APIs, users, incidents, tasks
edges: supersedes, blocked_by, owned_by, approved_by
timestamps: when a fact became true
scope: where that fact applies
confidence: whether it’s canonical or provisional

Instead of asking “what text looks similar?”, you can ask:

“What is the current auth method for internal APIs?”
“What decision replaced this one?”
“Which open task depends on this migration?”
“What facts are stale after last deploy?”

That’s how you stop memory from becoming a junk drawer.

A practical rule of thumb

Use a vector store when you need:

semantic search
fuzzy recall
document retrieval
broad context gathering

Use a knowledge graph when you need:

state over time
versioned truth
explicit dependencies
conflict resolution
auditable memory

If you only use vectors, your agent will eventually retrieve both the old answer and the new answer and act like they’re equally valid.

A tiny runnable example

Here’s a minimal Node example using a graph to resolve the “latest truth” for a fact.

npm install graphology

const Graph = require("graphology");

const graph = new Graph();

graph.addNode("auth_v1", { value: "JWT for internal APIs", ts: 1 });
graph.addNode("auth_v2", { value: "mTLS for internal APIs", ts: 2 });

graph.addDirectedEdge("auth_v2", "auth_v1", { type: "supersedes" });

function currentFact(nodes) {
  return nodes
    .filter((n) => graph.inDegree(n) === 0)
    .map((n) => graph.getNodeAttribute(n, "value"));
}

console.log(currentFact(["auth_v1", "auth_v2"]));
// => [ 'mTLS for internal APIs' ]

Obviously, real systems need more than this. But the core idea matters: memory should encode replacement, not just storage.

What this looks like in production

A useful pattern is:

Store raw docs, chats, and artifacts in a vector index
Extract durable facts into a graph
Mark facts with:
- source
- timestamp
- scope
- confidence
- supersession links
Retrieve from both systems
Let the graph filter or rank what the LLM actually sees

If you already have a policy engine like OPA in your stack, this is also a good place to enforce rules like:

only approved memories can be treated as canonical
expired decisions should not be retrieved
temporary incident workarounds should not leak into normal planning

That’s usually a better answer than trying to prompt-engineer your way out of stale context.

The trap nobody mentions

The biggest mistake isn’t “using vectors.”

It’s treating all memory as text.

Some memory is text.
Some memory is state.
Some memory is policy.
Some memory is provenance.

If you flatten all of that into embeddings, your agent can retrieve context — but it can’t reliably reason about whether that context is still true.

That’s where drift starts.

Try it yourself

If you’re building agents and want to pressure-test the surrounding security and tooling:

Want to check your MCP server? Try https://tools.authora.dev
Run npx @authora/agent-audit to scan your codebase
Add a verified badge to your agent: https://passport.authora.dev
Check out https://github.com/authora-dev/awesome-agent-security for more resources

My take

Vector stores are still the right tool for retrieval.

But if you want long-lived agents that don’t slowly poison themselves with stale context, you need something that models truth over time.

Usually that means adding a knowledge graph, or at least graph-like relationships, on top of your retrieval layer.

How are you handling agent memory today: pure RAG, graph-backed memory, or something else? Drop your approach below.

-- Authora team

This post was created with AI assistance.

Top comments (1)

Hollow House Institute • Apr 17

Strong framing. The distinction between text, state, policy, and provenance is the right direction.

The gap is not memory structure. It is enforcement during execution.

A graph can model truth over time. It does not guarantee that the system respects that truth while it runs.

What matters in production is not just retrieval quality, but whether:

expired or superseded facts are blocked from use
conflicting facts trigger resolution before action
policy level constraints override retrieved context
decisions are gated when confidence or provenance is weak

Without that layer, the system still retrieves better context but can act on invalid or outdated state.

This is where drift becomes operational, not just informational.

The shift is from memory systems to control systems.

Memory answers what is true.
Governance enforces what is allowed to execute.