DEV Community

Cover image for I Tested 33 AI Memory Engines — Here's What Actually Works
ClawBase
ClawBase

Posted on

I Tested 33 AI Memory Engines — Here's What Actually Works

6 months ago, I asked my AI agent what we'd been working on last week. It had no idea. Not because it couldn't remember — ChatGPT has memory, Claude has memory — but because I couldn't see what it stored, couldn't query it, couldn't tell it what to forget. A black box with a toggle that says "memory: on."

So I started testing every memory framework I could find — 33 engines total, running on OpenClaw (350K+ GitHub stars). Most solved one problem well and failed at everything else.

After 6 months, I landed on an architecture that actually works. It's not about one magic engine — it's about layers.

The memory stack your agent actually needs

Before diving into the 33 engines, here's what I learned: agent memory isn't one thing. It's a stack, like a human brain has short-term memory, long-term memory, and the ability to look things up.

A working agent memory stack has 3 layers:

Layer 1: Conversation compression — remembering what just happened

Every conversation eventually hits the context window limit. Without this layer, your agent literally forgets the beginning of your current conversation. A conversation compressor (like Lossless-Claw) keeps a DAG of summaries — compacting older turns into condensed summaries while keeping the most recent turns untouched. Your agent never loses mid-session context.

Layer 2: Native files + semantic search — the persistent record

Plain markdown files your agent reads and writes: daily journals (2026-05-28.md), a curated MEMORY.md, preference files, project notes. Simple, version-controlled, human-readable. No database, no API, no dependencies — this is the memory layer that survives everything.

A local embedding model indexes these files and lets your agent search by meaning, not just keywords. "How did we handle the auth migration?" finds the right entry even if it never used the word "auth." QMD runs a 333MB GGUF model locally — sub-second search, no API costs, no data leaving your machine. The files are the source of truth; the embeddings make them instantly searchable.

Layer 3: The long-term intelligence engine — this is where you choose

The first two layers are table stakes. Every serious agent needs them. The third layer is where the 33 engines I tested come in — and where the real differences emerge.

The 33 engines I tested

Here's every memory framework I put through real-world use — not benchmarks, not demos, actual daily agent work. They naturally group into 6 categories, each solving a different type of remembering:

Vector similarity — the foundation layer

These engines store embeddings and retrieve by semantic similarity. They're the building blocks most other memory systems are built on top of.

# Engine What it does
1 ChromaDB Embedding-based semantic search, lightweight and developer-friendly
2 Qdrant High-performance vector similarity search with filtering
3 Weaviate Hybrid vector + keyword search with pluggable modules
4 Milvus Distributed vector database built for scale
5 Pinecone Serverless managed vector search
6 pgvector Vector similarity search as a PostgreSQL extension
7 FAISS Meta's similarity search library — raw speed, no frills
8 Redis Vector Vector similarity on Redis Stack
9 Supabase Vector pgvector on managed Postgres with auth and APIs
10 Marqo End-to-end tensor search engine
11 Deep Lake Vector store optimized for AI dataset versioning
12 Vespa Hybrid search + ML serving at scale

These are excellent at "find me something similar to X" but they don't understand what they're storing. A vector store treats your preferences, your project architecture, and last Tuesday's standup notes the same way — as floating-point arrays. For RAG and document retrieval, they're essential. For agent memory, they're a necessary layer but not sufficient on their own.

Session & conversation memory — remembering the current thread

These keep track of what's been said within and across conversations.

# Engine What it does
13 Zep Long-term conversation memory with automatic fact extraction
14 Motorhead Redis-backed conversation memory server
15 OpenAI Memory ChatGPT's native conversation memory
16 Claude Memory Anthropic's native conversation memory

These solve the "I already told you this" problem within a session. Zep stands out here — it goes beyond simple buffer storage and extracts structured facts from conversations. But session memory alone doesn't give your agent a persistent understanding of your world.

Framework memory modules — memory as a feature

These are memory components built into larger agent/RAG frameworks.

# Engine What it does
17 LlamaIndex Memory Chat memory + knowledge index integration
18 LangChain Memory Buffer, summary, and entity memory modules
19 LangMem Memory management primitives for LangChain/LangGraph
20 Haystack Memory Document store memory in RAG pipelines
21 txtai All-in-one embeddings database with workflows
22 CrewAI Memory Short/long/entity memory for multi-agent crews

Good if you're already inside that ecosystem. They give you memory abstractions (buffers, summaries, entity tracking) but they're tightly coupled to their framework. Memory is a feature of these tools, not their core mission.

Agentic & autonomous memory — the agent manages its own memory

These let the agent itself decide what to remember and what to forget.

# Engine What it does
23 Letta (MemGPT) Self-editing memory with inner/outer monologue
24 AutoGPT Memory File + vector memory for autonomous agents
25 Memary Knowledge graph memory for autonomous agents
26 AGiXT Adaptive memory with chained agent context
27 BabyAGI Task-driven memory with priority queues

Fascinating research direction. Letta/MemGPT in particular pioneered the idea of the model managing its own memory tiers. The challenge in production: you're trusting the LLM to decide what's worth keeping, and that decision quality varies with the model and context.

Personal AI & bookmarks — memory for humans, not agents

# Engine What it does
28 Khoj Self-hosted personal AI with file-based memory
29 SuperMemory AI-powered memory for saved content and bookmarks
30 Vanna RAG-based memory for database queries

These are designed more as personal knowledge tools than agent memory layers. They work well for their use case, but they're solving a different problem — helping you remember things, not giving your agent persistent understanding.

Structured memory engines — purpose-built for agent intelligence

These are the engines designed specifically to give agents structured, queryable, persistent memory:

# Engine What it does
31 Mem0 Intelligent fact extraction, deduplication, contradiction resolution
32 Cognee Entity-relationship knowledge graphs with 14 retrieval modes
33 Graphiti Temporal knowledge graph with validity windows

This is where it gets interesting — and where I spent most of my 6 months.

The 3 tiers of long-term memory

After testing all 33, the structured memory engines stood out. But here's the insight that took me months to reach: these three aren't meant to run together. They're evolutionary tiers. Each one supersedes the previous, adding capabilities while covering the lower tier's functionality.

Tier 1: Mem0 — facts and preferences

Mem0 (48K+ GitHub stars, $24M Series A) is the intelligent facts layer. Tell your agent "I prefer TypeScript" on Monday and "use Python for data scripts" on Thursday — Mem0 doesn't store two contradictory entries. It updates: TypeScript for general dev, Python for data. Every fact is categorized, timestamped, and confidence-scored.

Where Zep's fact extraction is a feature bolted onto session memory, Mem0's entire architecture is built around making facts reliable. Your agent starts every session already knowing your preferences, your project's quirks, and your conventions. No re-explaining.

Best for: developers and technical use cases. If your agent mainly needs to remember preferences, conventions, and project details across sessions, Mem0 is the right choice. It's the simplest to set up and the most focused.

Tier 2: Cognee — relationships and reasoning (supersedes Mem0)

Cognee ($7.5M seed, GitHub Secure Open Source graduate, running in 70+ companies) builds a knowledge graph — not isolated facts, but a web of entities, relationships, and semantic connections.

Where Mem0 knows "the client prefers blue branding," Cognee knows that the client's brand guidelines connect to last month's campaign performance, which connects to the audience segments that engaged most, which connects to the content calendar. It ships 14 retrieval modes and a self-improving "memify" feature that strengthens connections the more you use them.

Cognee handles everything Mem0 does (facts are just nodes in the graph) plus it maps the relationships between them. That's why it supersedes Tier 1 — you don't need Mem0 if you're running Cognee.

Best for: marketing, content, and multi-project work. If your agent needs to reason across brands, campaigns, audiences, and projects — understanding how things connect, not just what things are — Cognee is the right choice.

Tier 3: Graphiti — temporal reasoning (supersedes Cognee)

Graphiti by Zep is the temporal knowledge graph. Its core insight: knowing the current state isn't enough. You need to know when things changed and what was true before.

Every fact carries validity intervals. When new information conflicts with old, Graphiti doesn't overwrite — it creates a temporal record and invalidates the previous one, preserving full history. "When did this config change?" "What was different before the March deploy?" Graphiti answers directly, no digging through logs.

It outperforms MemGPT on the Deep Memory Retrieval benchmark using a combination of semantic search, keyword matching, and graph traversal.

Graphiti handles facts (like Mem0) and relationships (like Cognee) plus tracks how they change over time. It supersedes both lower tiers — but it's also the heaviest to run (FalkorDB, more compute, more complexity).

Best for: operations, executive, and business use cases. If your agent needs cause-and-effect reasoning across time — "what changed," "when did it break," "what was true before" — Graphiti is the right choice.

Pick one, not all three

Your use case Pick this tier Why
Developer / DevOps Mem0 You need fast, reliable fact recall. Preferences, conventions, project details.
Marketing / Content Cognee You need relationship reasoning. Brands, campaigns, audiences, how they connect.
Operations / Executive Graphiti You need temporal reasoning. What changed, when, and what broke.

The common mistake is thinking "more engines = better memory." It's not. Each tier already includes the capabilities of the one below it. Running Mem0 alongside Graphiti is redundant — Graphiti already stores facts. Running all three wastes compute and creates consistency conflicts.

Pick the tier that matches your work. Pair it with the base stack (conversation compression + native files with semantic search) and your agent will remember everything that matters.

The full architecture

Here's what a complete agent memory stack looks like:

Agent Memory Architecture — 3 layers: conversation compression, native files + semantic search, and a long-term intelligence engine (pick one tier)

Every layer feeds context to the model. The bottom two are always-on. The top one is your choice based on what kind of reasoning your agent needs.

Getting this running

The base stack (layers 1–2) is built into OpenClaw — conversation compression, native memory files, and semantic search work out of the box. The long-term engine (layer 3) requires additional setup: Mem0 needs a vector store, Cognee needs a graph database, Graphiti runs on FalkorDB.

OpenClaw is open source and you can self-host the full stack. If you want to skip the infrastructure work, I've been building ClawBase — managed OpenClaw hosting that pre-configures the right memory stack for your use case. But honestly, even if you self-host, the main takeaway here is the architecture: a 3-layer memory stack where you pick the long-term engine that matches your work.

The memory compounds over time — whichever way you run it, the longer you use it, the better it gets.

One thing I keep coming back to: once your agent has a real memory stack, it opens the door to something bigger — consistent shared memory across multiple agents. Imagine a team of agents that don't just remember their own context, but share a unified understanding of your projects, preferences, and decisions. That's a different kind of architecture entirely, and one I'll dig into in a future article.

Top comments (0)