Harshit Kumar

Posted on Feb 4

Memory Is Not a Vector Database: Why AI Agents Need Beliefs, Not Storage

#ai #agents #opensource #engram

Why storage is not the same as remembering

If you've built an AI agent that works with users over multiple sessions, you've probably hit this wall: the agent keeps forgetting things it should know.

You store user preferences. The agent ignores them. You correct it. It makes the same mistake tomorrow. You add more context to the prompt. It works for a while, then breaks again.

So you reach for the obvious solution: a vector database. Store everything, retrieve what's relevant, inject it into the prompt. Problem solved, right?

Not quite.

I've been building agents for a while now, and I keep seeing the same pattern. Vector retrieval gets you 70% of the way there. The last 30% is where things fall apart—and it's the part that actually matters for user experience.

The Pattern That Keeps Breaking

Here's a scenario I've seen repeatedly:

A user tells your agent: "I prefer dark mode." You store it. Great.

A week later, the user says: "Actually, I've switched to light mode—easier on my eyes during the day."

What happens? Your vector store now has two contradictory statements. When the agent retrieves "user display preferences," it might get either one. Or both. It has no way to know which is current, which is outdated, or how confident it should be in either.

The agent ends up flip-flopping, or worse, confidently asserting the wrong preference.

This isn't a retrieval problem. It's a representation problem. We're treating memory as storage when it should be treated as belief.

Here's another failure mode:

A support agent learns that asking clarifying questions before proposing a fix reduces churn. You observe this pattern manually. The agent never does.

You can store past conversations. You can retrieve similar ones. But nothing in a vector store turns "this approach worked" into "do this more often."

That's not memory. That's logging.

I call this the storage fallacy: assuming persistence automatically produces understanding.

A List of Embeddings Is Not Memory

Vector databases are excellent at one thing: finding semantically similar content. But similarity isn't the same as relevance, and retrieval isn't the same as remembering.

Human memory doesn't work like a filing cabinet where you pull out documents. It's an active system that:

Reinforces things you encounter repeatedly
Forgets things you don't use
Updates when new information contradicts old information
Weighs memories by confidence, not just similarity

When you tell someone the same thing three times, they become more confident it's true. When you contradict yourself, they become less certain about both statements. When you don't mention something for months, it fades.

None of this happens in a vector store. Every embedding sits there with equal weight, forever, until you manually delete it.

Memory as Belief

The shift that changed how I think about this: stop treating memories as facts and start treating them as beliefs.

A belief has properties that a stored fact doesn't:

Confidence. How certain are we that this is true? A preference mentioned once in passing is different from one stated emphatically three times.

Reinforcement. When we encounter similar information again, confidence should increase. "User likes dark mode" and "User prefers dark themes" shouldn't create two entries—they should strengthen one belief.

Decay. Beliefs that aren't accessed or reinforced should fade over time. Not deleted, but deprioritized. The user's preference from two years ago probably matters less than what they said last week.

Contradiction handling. When new information conflicts with existing beliefs, both should be affected. The old belief loses confidence. The new one starts with moderate confidence. The system acknowledges uncertainty rather than pretending it doesn't exist.

Beliefs are functions of time, not static rows.

Here's what this looks like in practice:

"User prefers dark mode" → stored with confidence 0.6

User mentions dark mode again → confidence rises to 0.75

User later says "I prefer light mode" →
  - "dark mode" drops to 0.45
  - "light mode" created at 0.65

Now the agent doesn't see "two facts." It sees uncertainty. It can say "I think you prefer light mode now, though you used to prefer dark" instead of confidently asserting the wrong thing.

Concretely, a belief looks more like:

{
  "content": "User prefers dark mode",
  "confidence": 0.45,
  "last_verified_at": "2024-01-12",
  "reinforcement_count": 3,
  "source": "user_statement"
}

Not a document. Not an embedding. A stateful object.

Memory Isn't One Thing

The other realization: I've been lumping together very different types of memory under one label.

Cognitive science has known for decades that human memory isn't monolithic. There are distinct systems serving different purposes. For agents, four types matter most:

Semantic memory is factual knowledge. "The user is a backend engineer." "They work at a fintech company." "They prefer Python for scripting." These are beliefs about the world and the user that persist across sessions.

Episodic memory is experiential. Not just what happened, but the context around it—when, where, what was the emotional tone, what was the outcome. "Last Tuesday, the user was frustrated about a deployment failure. We helped them set up monitoring. They were satisfied." This is richer than extracted facts.

Working memory is the active scratchpad. What's the current goal? What context is relevant right now? This is session-scoped and limited in capacity—you can't hold everything active at once.

Procedural memory is learned skills. Not facts, but patterns of successful action. "When a user says they want to cancel, offering a discount before processing usually leads to retention." This is how agents get better at their jobs over time.

This isn't academic purity. Each type maps cleanly to an engineering responsibility:

Semantic → long-term knowledge store
Episodic → append-only experience log
Working → active context assembler
Procedural → policy selection system

Most "memory" implementations I've seen treat everything as semantic memory. They miss the temporal richness of episodes, the active focus of working memory, and the skill accumulation of procedural memory.

Bigger Models Don't Fix Memory

Larger context windows help agents remember more, but they don't help agents remember better.

Without reinforcement, decay, and contradiction handling, bigger context just means more clutter. You can fit 100k tokens in a prompt. You still can't represent "I used to believe X but now I believe Y with moderate confidence."

Many agent failures that look like reasoning problems are actually memory problems. The model reasons fine—it's just reasoning over the wrong context because retrieval gave it stale or contradictory information with no signal about which to trust.

What We're Building

This is what led me to build Engram, a cognitive memory layer for AI agents.

The core idea: memory should behave like memory, not like storage.

Think of it as a cognitive operating layer that sits between your agent and its storage. The current focus is correctness and cognitive behavior, not feature breadth.

When you store a belief, it has confidence. When you encounter it again, it reinforces. When you contradict it, both beliefs adjust. When you don't access it, it decays. When you retrieve, you get not just similarity but a weighted score that accounts for confidence, recency, and relevance.

You can store episodes with full context—entities, emotional valence, outcomes and have the system extract semantic beliefs automatically. You can record what worked and what didn't, and have procedural patterns emerge that make your agent better over time.

We consistently see agents stop repeating the same errors after a few dozen interactions. That's not because we tuned anything, it's because the memory system does what memory should do.

It's an HTTP API. You plug it into whatever agent framework you're using. It handles the cognitive complexity so your agent code stays clean.

What This Isn't

A few things Engram explicitly doesn't do, because scope matters:

Not an agent framework. We're not competing with LangChain or CrewAI. We're infrastructure they can use.

Not a vector database. We use vectors, but that's an implementation detail. The interface is cognitive, not geometric.

Not long-term context stuffing. We don't build giant prompts. We build systems that knows what matters.

The goal is to be the memory layer—one thing, done well.

Who This Is For

If you're building agents that interact with users over time, and you've been frustrated by:

Context limits forcing you to drop important information
Agents repeating the same mistakes
Preferences that don't stick
No sense of learning or improvement

Then this might be useful.

It's early. The API is stabilizing. We're looking for people who want to build with it and give feedback.

What's Next

We're publishing examples and benchmarks showing the learning dynamics in action. There's a demo that shows error rates dropping as an agent accumulates procedural memory, not because we tweaked numbers, but because the memory system is doing what memory should do.

Here's the repo: github.com/Harshitk-cp/engram

If this resonates with how you've been thinking about agent memory, I'd love to hear from you. If you think I'm wrong about something, I'd love to hear that too.

Building in public means being wrong in public. That's the tradeoff for building things that actually work :)

DEV Community