Arindam Majumder

Posted on May 28

Building AI Agents That Actually Remember: Memory Systems Explained

#agents #memory #ai #rag

You built an AI agent. It calls tools, plans multi-step workflows, and the first time you run it, the demo feels magical. Then you run it again the next day, and it greets you like a stranger. Same clarifying questions. Same mistakes. Same steps reconstructed from scratch.

That is not a tooling problem. That is a memory problem.

Most agents shipping today are stateless. They execute inside a loop, but they do not accumulate knowledge across runs. Every session starts at zero, which means every improvement they appeared to make last time is gone. If agents are going to automate real work, they have to get better the longer you use them. That only happens with memory, and not just stored data, but structured, evolving memory that compresses experience into knowledge.

In this article, I will break down what memory really means inside an agent, where the standard implementations fall short, how to design memory as a first-class layer in your agent loop, and finally how a system like Engram handles this in practice. By the end, you should have a clear mental model for adding real long-term memory to whatever you are building.

If you like Video more, You can watch this:

What "Memory" Actually Means in an Agent

When people say "memory" in AI, they usually mean storing chat history and pulling it back later. That is not memory. That is replay.

An agent already has context during execution. The system prompt, the tools, the intermediate tool calls, the partial outputs, all of it sits inside the model's context window for the duration of a single run. The moment the run ends, that context evaporates.

The common patch is to dump everything into a vector database, then retrieve the chunks that look semantically similar to the next prompt. This keeps the agent informed, but it does not make it smarter. It is the equivalent of handing someone a transcript of a meeting they never attended and hoping they get up to speed.

Real memory does something more interesting. It compresses raw experience into reusable knowledge. Humans do not remember entire conversations word for word. We remember conclusions, preferences, and patterns. Agents need the same transformation layer between raw execution and stored memory, or they will keep drowning in their own logs.

How Most Agents Are Built Today

The default agent loop looks roughly like this:

The user gives the agent a task.
The agent plans a sequence of steps.
It calls tools, observes results, and produces an output.
The whole transcript is dumped into a database, often a vector store.
On the next run, similar chunks are retrieved and injected back into the prompt.

At a glance, this feels like memory. In practice it builds a system that can recall but cannot learn.

The agent never refines what it knows. It never resolves contradictions. It never updates outdated information. It just accumulates. Over time, the store fills with multiple versions of the same idea, conflicting preferences, and abandoned half-thoughts. Retrieval gets noisier, irrelevant context starts crowding the prompt, token usage climbs, and accuracy drops.

This is the reason so many agents feel stuck at the same level no matter how many sessions you put through them.

The Missing Piece: A Learning Loop

The gap is a learning step between execution and storage.

Most loops look like this:

input → plan → execute → store → end

A real long-term-memory loop should look like this:

input → plan → execute → learn → update memory → next run

That learning step is where transformation happens. Instead of saving raw logs, the system pulls out structured insights and writes those. If a user says they prefer Postgres over MySQL, that is a stable preference, not a line buried in a chat log. If an agent tried three approaches and only one worked, the successful path is a reusable strategy, not noise mixed in with two failures.

Without this step, the agent keeps rediscovering the same things on every run, and you pay for it in tokens, latency, and user trust.

The Three Types of Memory Inside an Agent

It helps to split memory into layers, the same way cognitive science does. Most current systems collapse all three into one bucket, which is exactly why they get confused.

Episodic memory: events. Tool calls, inputs, outputs, failures, timestamps. Useful for traceability and debugging, but rarely the right thing to inject back into a prompt.
Semantic memory: distilled knowledge. Preferences, facts, constraints, decisions. This is what the agent should actually carry across sessions.
Procedural memory: how to do things. Sequences of steps that worked before. This is where agents start becoming efficient, because they can reuse solutions instead of recomputing them.

A good memory system treats these differently. Episodes get logged and mostly left alone. Semantic facts get reconciled and updated in place. Procedures get versioned, scored, and promoted when they keep working.

What a Real Memory System Needs

A working memory layer for agents needs three core operations:

Extraction: take raw input and decide what is worth remembering. Most of what an agent sees is noise. Extraction is the filter that separates a durable fact ("user is on macOS Sonoma, prefers pnpm") from disposable chatter.
Reconciliation: compare new information with what already exists. Update when the facts have changed. Resolve when they conflict. Merge when they are redundant. This is the step that keeps memory clean instead of letting it sprawl.
Retrieval: when the agent runs again, hand it the relevant pieces, not the whole archive. The goal is precision, not volume. A 200-token answer with the right three facts beats a 4,000-token dump every time.

This pipeline is what turns memory into something the agent can rely on instead of something it has to sift through every turn.

How This Changes Agent Behavior

Once a memory layer like this is in place, the agent's behavior changes in ways the user actually notices:

It stops asking the same clarifying questions because it already has the answers.
It avoids paths it has already tried and failed on.
It adapts to the user because preferences are stored and continuously updated.
It gets faster because it reuses successful strategies instead of recomputing them.

At this point the agent stops being a pure execution engine and starts being something closer to an accumulating knowledge system. That is the line between "impressive demo" and "tool I actually use every day."

Engram: A Memory System for Agents

This brings us to Engram, an open-source memory layer designed to sit alongside your agent rather than inside it. The project bills itself as "persistent cognitive memory for AI agents" and is built around exactly the extract → reconcile → retrieve pipeline above.

Here is what makes Engram worth a look:

Memory pipeline, not a bucket: incoming data flows through an extraction step that classifies items as facts, preferences, events, or decisions and assigns an importance score.
Reconciliation built in: new memories are checked against existing ones. Duplicates are merged, conflicts are resolved, and stale entries get pruned instead of piling up.
Dream Cycle: a background consolidation job that runs on a schedule (the docs describe a nightly cadence). It refreshes scores, dedupes, extracts patterns across memories, and prunes things that have gone stale. The biological analogy is not a coincidence.
Ensemble retrieval: multiple embedding models are queried in parallel and combined with Reciprocal Rank Fusion. Recency is weighted so fresh context wins ties.
Memory pools: shared memory spaces with access control, so multiple agents can read and write into the same knowledge base without trampling each other.
Open and self-hostable: Apache 2.0 licensed, runs locally on Apple Silicon or CUDA, with an optional hybrid mode that mixes in cloud embeddings.

Engram exposes itself through a TypeScript SDK, a REST API, and an MCP server, which means it slots into Claude Desktop, Cursor, Windsurf, and anything else that speaks the Model Context Protocol.

Installing Engram

The Python core is a single pip install:

pip install engram-core

Optional extras cover the server, the MCP integration, embedding backends, or the full bundle. For the TypeScript SDK, install the client package alongside whatever agent framework you are already using.

The self-hosted version of openengram.ai ships a setup wizard that walks you through account creation and model configuration, and all features unlock locally at no cost. If you want to start with the cloud-hosted control plane, you grab an API key (ek_...) and point the SDK at it.

A Minimal Code Walkthrough

Here is the smallest end-to-end example using the Python SDK:

from engram import Memory

mem = Memory()

mem.store("User prefers Python", type="preference", importance=8)
mem.store("Project uses Postgres, not MySQL", type="fact", importance=9)

results = mem.search("programming language")
context = mem.recall(limit=10)

store is the extraction entry point. You tag each memory with a type and an importance score so the consolidation pass can reason about it later. search runs full-text and semantic retrieval. recall pulls the top-N most relevant memories for prompt injection.

The TypeScript flavor is just as light:

import { Engram } from '@engram/client';

const engram = new Engram({ apiKey: 'ek_...' });

await engram.remember("User prefers dark mode");
const memories = await engram.recall("UI preferences");

Both SDKs share the same mental model: write through remember / store, read through recall / search, and let the consolidation pipeline keep the underlying store clean in the background.

You can also link memories explicitly to build a small knowledge graph the agent can walk:

bug_id = mem.store("Login fails on Safari", type="error_fix", importance=9)
fix_id = mem.store("Added WebKit prefix to CSS", type="error_fix")

mem.link(bug_id, fix_id, "caused_by")
graph = mem.graph(bug_id, max_depth=2)

That link call is what turns a flat collection of facts into procedural memory the agent can actually follow.

Project 1: Give a Developer Agent a Real Long-Term Memory

Let us put this together with a concrete scenario. You are building a developer agent that helps scaffold and maintain a project. Without memory, every new session restarts the same conversation about your stack.

Bootstrap a tiny project:

mkdir agent-with-memory && cd agent-with-memory
python -m venv .venv && source .venv/bin/activate
pip install engram-core openai

Wire memory into a basic loop:

from engram import Memory
from openai import OpenAI

mem = Memory()
client = OpenAI()

def chat(user_input: str) -> str:
    context = mem.context(user_input, max_tokens=500)

    messages = [
        {"role": "system", "content": f"Known about the user:\n{context}"},
        {"role": "user", "content": user_input},
    ]
    reply = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages,
    ).choices[0].message.content

    mem.store(user_input, type="event", importance=4)
    mem.store(reply, type="event", importance=3)
    return reply

print(chat("Set up a new Node service. I prefer Postgres and pnpm."))
print(chat("Add a users table to the project."))

On the first call, the agent has nothing to recall, so it answers from scratch and writes two events to memory. On the second call, mem.context() returns the relevant prior decisions (Postgres, pnpm) and injects them into the system prompt. The agent never has to ask "what package manager do you use" again, and you never have to repeat yourself.

If you then say "actually, switch this project to Bun," Engram's reconciliation step updates the existing preference rather than appending a contradicting one. That is the difference between a memory system and a log.

Project 2: Wire Engram Into Your Coding Agent Over MCP

The second project shows how to plug Engram into an existing coding agent without writing client code at all. Most modern coding agents (Claude Desktop, Cursor, Windsurf, the Antigravity CLI, and others) speak MCP, and Engram ships an MCP server out of the box.

Start the MCP server:

pip install "engram-core[mcp]"
engram mcp serve

Then add it to your agent's MCP config. For Claude Desktop, edit claude_desktop_config.json:

{
  "mcpServers": {
    "engram": {
      "command": "engram",
      "args": ["mcp", "serve"]
    }
  }
}

Restart the agent. From inside any session you can now ask things like:

Remember that this project uses Drizzle ORM, not Prisma.

What have I told you about my testing setup?

The agent calls Engram's MCP tools to write and read memories, and the consolidation loop keeps the store tidy in the background. The same memory pool is now visible to every MCP-aware agent you use, which is the part most people underestimate the first time they try it.

How to Think About Memory While Building Agents

When you sit down to design an agent, the first question is usually "which model?" or "which tools?" That is the wrong starting point. The first question should be:

What does this agent need to remember, and how should that memory evolve?

Decide what qualifies as durable knowledge for your domain. Decide what should never be stored (raw PII, transient state, things that age out fast). Decide how updates and conflicts are resolved. Then design your loop so that every execution contributes back to that memory, not just consumes from it.

A few practical heuristics:

Store conclusions, not transcripts. A summarized decision is worth ten chat logs.
Score importance at write time. Future-you needs a signal for what to keep when the store gets crowded.
Reconcile, do not append. When the same fact shows up twice, the system should update, not duplicate.
Retrieve narrowly. A small, precise context wins over a giant relevant-ish blob.

That is what turns a collection of tools into a system that actually improves over time.

Honest Assessment: Strengths and Limitations

Engram is one of the most thoughtful entries in the agent-memory space right now, but it is worth going in with calibrated expectations.

Where it shines:

The extract → reconcile → retrieve pipeline matches the way memory should work, not just the way it is easiest to ship.
The Dream Cycle is the right idea. Background consolidation is what keeps memory stores from rotting over weeks of use.
Ensemble retrieval with RRF is a meaningful step up from single-model vector search. Recall stays high even when the query phrasing drifts from how the memory was originally written.
MCP support means it works with the agent you are already using today, not just a bespoke SDK.
Apache 2.0 and self-hostable. Your memories live where you want them to live.

Where to be careful:

Extraction is LLM-driven, which means it can occasionally classify the wrong thing as a durable fact. Importance scoring helps, but you should still spot-check what is being written.
"Memory" is only as good as your write discipline. If you treat it as a dump-everything store, you will recreate the noisy-vector-DB problem inside a nicer wrapper.
The ecosystem around agent memory is moving fast, and Engram is one of several real implementations (some Go, some Rust, some Python). Pick the one whose architecture and license match your deployment, and expect interfaces to keep evolving for the next year.
Hybrid cloud mode is convenient but read the data-handling policy if you are working with sensitive content. Self-hosting is the safer default for regulated workloads.

What to Learn Next

Once you have a memory layer wired in, there are a few directions worth exploring:

Memory pools: share a single store across multiple agents (a researcher, a writer, a reviewer) and watch them coordinate without you writing any glue.
Procedural memory: start storing successful tool-call sequences as reusable strategies, not just facts.
Eval your memory: write tests that check whether your agent remembers the right things across sessions. Memory regressions are real and they are easy to miss without a harness.
MCP everywhere: once memory is exposed as MCP, every agent you use can read and write into the same brain. That is when things actually get interesting.

The Engram documentation at engram.to and openengram.ai is the best place to go deeper, and the GitHub organizations behind the various implementations are active and worth tracking.

Final Thoughts

Most agents today feel powerful in the moment and forget everything afterward. That is the ceiling on what they can become.

Memory raises that ceiling. It lets agents accumulate knowledge, refine behavior, and adapt with use instead of resetting every session. Engram is one good implementation of this idea, but the larger shift is architectural. Agents are not just reasoning systems anymore. They are learning systems, and memory is the layer that makes the learning stick.

If you are building agents right now, focus less on making them smarter inside a single run. Focus on making them better across many runs.

That is where the real leverage is. Give it a try in your next project and see how quickly the dynamic changes.

DEV Community