David Van Assche (S.L)

Posted on Mar 6

Why Your AI Agent Needs Memory That Decays (and How Qdrant Makes It Work)

#qdrant #ai #opensource #python

I've been building an open-source epistemic measurement framework called Empirica, and one of the core challenges I ran into early on was memory — not the "stuff vectors in a database and retrieve them" kind, but memory that actually behaves like memory. Things fade. Patterns strengthen with repetition. A dead-end from three weeks ago should still surface when the AI is about to walk into the same wall, but a finding from a one-off debugging session probably shouldn't carry the same weight six months later.

That's where Qdrant comes in, and I want to share how we're using it because it's a fairly different use case from the typical RAG setup.

The problem with flat retrieval

Most RAG implementations treat memory as a flat store — embed a chunk, retrieve by similarity, done. That works for document Q&A, but it falls apart when you need temporal awareness. An AI agent working across sessions and projects needs to know not just what was discovered, but when, how confident we were, and whether that knowledge is still valid.

Think about how your own memory works — you don't recall every detail of every workday equally. The time you accidentally dropped the production database? That stays vivid. The routine PR you reviewed last Tuesday? Already fading. That asymmetry is functional, not a bug.

Two memory types, one vector store

We use Qdrant for two distinct memory layers:

Eidetic memory — facts with confidence scores. These are discrete epistemic artifacts: findings ("the auth system uses JWT refresh with 15min expiry"), dead-ends ("tried migrating to async but the ORM doesn't support it"), decisions ("chose SQLite over Postgres because single-user, no server needed"), mistakes ("forgot to check null on the config reload path"). Each carries a confidence score that gets challenged when new evidence contradicts it — a finding's confidence drops if a related finding surfaces that undermines it. Think of it as an immune system: findings are antigens, lessons are antibodies.

Episodic memory — session narratives with temporal decay. These capture the arc of a work session: what was the AI investigating, what did it learn, how did its confidence change from start to finish. Episodic memories naturally decay over time — a session from yesterday is more relevant than one from last month, unless the pattern keeps repeating, in which case it strengthens instead of fading.

Both live in Qdrant as separate collections per project, which gives us clean isolation and lets us do cross-project pattern discovery when we need it.

The retrieval side — Noetic RAG

I've been calling this approach "Noetic RAG" — retrieval augmented generation on the thinking, not just the artifacts. When an AI agent starts a new session, we don't just load documents. We load:

Dead-ends that match the current task (so it doesn't repeat failed approaches)
Mistake patterns with prevention strategies
Decisions and their rationale (so it understands why things are the way they are)
Episodic arcs from similar sessions (temporal context)
Cross-project patterns (if the same anti-pattern appeared in project A, surface it in project B)

The similarity search here isn't just cosine distance on the task description — it's filtered by recency, weighted by confidence, and scoped by project (with optional global reach for cross-project learnings).

What this looks like in practice

# Focused search: eidetic facts + episodic session arcs
empirica project-search --project-id <ID> --task "auth token rotation"

# Full search: all collections
empirica project-search --project-id <ID> --task "auth token rotation" --type all

# Include cross-project patterns
empirica project-search --project-id <ID> --task "auth token rotation" --global

When context compacts (and it will — Claude Code's 200k window fills up fast), the bootstrap reloads ~800 tokens of epistemically ranked context instead of trying to reconstruct everything from scratch. Findings, unknowns, active goals, architectural decisions — weighted by confidence and recency.

The temporal dimension

This is the part that makes Qdrant particularly well-suited. We store timestamps and decay parameters as payload fields, and filter on them at query time. A dead-end from yesterday with high confidence outranks a finding from last month with medium confidence. But a pattern that's been confirmed three times across two projects? That climbs in relevance regardless of age.

The decay isn't a fixed curve — it's modulated by reinforcement. Every time a pattern re-emerges, its effective age resets. Qdrant's payload filtering makes this efficient: we can do the temporal math at query time without re-embedding anything.

Why this matters beyond the obvious

The real value isn't just "AI remembers things" — it's that the memory is epistemically grounded. Every artifact has uncertainty quantification. Every session has calibration data (how accurate was the AI's self-assessment compared to objective evidence like test results and code quality metrics). The memory doesn't just tell you what happened — it tells you how much to trust what happened.

After 5,600+ measured transactions, the calibration data shows AI agents consistently overestimate their own confidence by 20-40%. Having memory that carries that calibration forward means the system gets more honest over time, not just more knowledgeable.

Try it

Empirica is MIT licensed and open source. If you're building anything where AI agents need to remember across sessions — especially if temporal awareness matters — the prosodic/episodic/eidetic architecture might be worth looking at.

GitHub: github.com/Nubaeon/empirica
Website: getempirica.com
Install: pip install empirica

Happy to answer questions about the Qdrant integration or the broader noetic RAG architecture.

DEV Community