Andrew

Posted on May 15 • Originally published at andrew.ooo

agentmemory Review: Persistent Memory for AI Coding Agents

#agentmemory #aicodingagents #claudecode #cursor

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.

TL;DR

agentmemory is an open-source persistent memory layer for AI coding agents. It silently captures what your agent does, compresses it into searchable memory, and injects the right context into the next session. One server, shared across Claude Code, Cursor, Codex CLI, Gemini CLI, Cline, Windsurf, Roo Code, OpenCode, and anything else that speaks MCP.

9,361 GitHub stars with 6,467 stars this week (#1 trending repo as of May 2026)
95.2% R@5 on LongMemEval-S (ICLR 2025, 500 questions) — beats mem0 (68.5%) and Letta/MemGPT (83.2%)
~170K tokens/year vs ~19.5M for paste-full-context approaches — roughly $10/yr running cost (free with local embeddings)
12 auto-capture hooks for Claude Code, 6 for Codex CLI, MCP server for everything else
51 MCP tools (memory_smart_search, memory_save, memory_sessions, memory_governance_delete, etc.)
Zero external dependencies — local-first SQLite + iii-engine, no Qdrant or Postgres required
License: MIT, npm package @agentmemory/agentmemory

If you've ever re-explained your auth setup, project conventions, or "why we chose X over Y" to your coding agent for the fifth time this week, this is the project to try.

Why This Matters

Every AI coding agent has the same problem: it forgets everything when the session ends. The official answer is a flat file — CLAUDE.md, .cursorrules, AGENTS.md — that caps out around 200 lines and goes stale within a sprint. You either re-paste the same architecture overview every session, or you accept that your agent re-discovers the same bugs and re-asks the same questions forever.

The deeper problem is that flat-file memory doesn't scale across agents. If you use Claude Code for refactoring, Cursor for autocomplete, and Codex CLI for shell tasks, none of them share what they learned. Each agent re-builds context from scratch every time.

agentmemory takes a different bet: run a single persistent memory server on localhost, let every agent read and write through MCP (or hooks, or REST), and use a real retrieval pipeline — BM25 + vectors + a knowledge graph fused with reciprocal rank fusion — instead of grepping a markdown file.

The README's example captures the value cleanly: "Session 1 you set up JWT auth. Session 2 you ask for rate limiting. The agent already knows your auth uses jose middleware in src/middleware/auth.ts, your tests cover token validation, and you chose jose over jsonwebtoken for Edge compatibility."

Benchmarks: What the Numbers Actually Say

agentmemory is one of the few memory layers that ships real benchmark numbers in the README, against published evaluations.

Retrieval Accuracy (LongMemEval-S, 500 questions)

System	R@5	R@10	MRR
agentmemory	95.2%	98.6%	88.2%
BM25-only fallback	86.2%	94.6%	71.5%

Compare that to the LoCoMo numbers the competitors publish:

System	R@5 (LoCoMo)	Source
agentmemory	95.2% (LongMemEval)	This repo
mem0	68.5%	mem0 paper
Letta / MemGPT	83.2%	Letta docs
CLAUDE.md grep	N/A (no semantic recall)	—

LongMemEval and LoCoMo aren't identical benchmarks, so this isn't a perfectly clean head-to-head — but the gap is large enough that the directional claim ("more accurate retrieval than the popular alternatives") survives the methodology caveat.

Token Economics

Approach	Tokens/year	Cost/year
Paste full context every session	19.5M+	Exceeds context window
LLM-summarized context	~650K	~$500
agentmemory	~170K	~$10
agentmemory + local embeddings (`all-MiniLM-L6-v2`)	~170K	$0

The local-embeddings path matters: it means you can run the whole stack without an OpenAI key and without sending your codebase context to a third party. For anyone doing client work under NDA, that's the difference between "interesting" and "actually usable."

Quick Start

The install is genuinely one command:

# Terminal 1: start the server (runs on localhost:3111)
npx @agentmemory/agentmemory

# Terminal 2: seed sample data and watch retrieval work
npx @agentmemory/agentmemory demo

Open http://localhost:3113 to watch the memory build live in the real-time viewer. The demo command seeds three realistic sessions (JWT auth setup, an N+1 query fix, and a rate-limiting implementation) and then runs semantic searches against them — including queries like "database performance optimization" that should retrieve the N+1 fix purely on semantics, not keyword overlap.

For Claude Code, the integration is two commands inside the agent:

/plugin marketplace add rohitg00/agentmemory
/plugin install agentmemory

The plugin registers 12 lifecycle hooks (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact, Stop, SessionEnd, Notification, TaskCompleted, PostToolUseFailure, Subagent, plus the new filesystem-watcher hook), 4 skills (/recall, /remember, /session-history, /forget), and auto-wires the @agentmemory/mcp stdio server. You get 51 MCP tools without touching any other config.

Codex CLI gets a slimmer 6-hook plugin (Codex doesn't expose Subagent or SessionEnd events yet):

codex plugin marketplace add rohitg00/agentmemory
codex plugin install agentmemory

For Cursor, Windsurf, Cline, Claude Desktop, Gemini CLI, OpenCode, Roo Code, or anything else that speaks MCP, the config is the same block in every host's mcpServers object:

{
  "mcpServers": {
    "agentmemory": {
      "command": "npx",
      "args": ["-y", "@agentmemory/mcp"],
      "env": {
        "AGENTMEMORY_URL": "http://localhost:3111"
      }
    }
  }
}

Verify everything is alive with curl http://localhost:3111/agentmemory/health. That's the whole install.

How It Actually Works

Four moving parts do the work. They're worth understanding because most of the failure modes for memory layers live in the seams between these pieces.

1. Auto-capture hooks

When you install the plugin, agentmemory's hooks fire on every prompt, every tool call, every tool result, and every stop. The hook scripts POST events to localhost:3111. There's no manual add() call to forget — capture is the default. This is the biggest practical difference from mem0, which requires you to explicitly call client.add() on the things you want remembered.

2. Four-tier memory pipeline

Raw captures go into a working memory buffer. Hourly sweeps compress observations into episodic memories (specific events: "JWT setup happened, here are the files touched"), then into semantic memories (generalized facts: "this project uses jose for JWT"), and finally into procedural memories (reusable workflows: "to add a protected route, edit middleware/auth.ts then add a test in tests/auth/"). Stale or duplicate entries get pruned automatically.

3. Hybrid retrieval with reciprocal rank fusion

When the agent fires a memory_smart_search, agentmemory runs three retrievers in parallel:

BM25 for keyword and identifier matches
Vector search over all-MiniLM-L6-v2 embeddings for semantic matches
Knowledge-graph traversal for entity-linked memories (files, functions, decisions)

Results get fused with RRF and reranked on-device. This is what the 95.2% R@5 number is measuring — and it's the architectural reason agentmemory beats vector-only systems like mem0 on retrieval quality.

4. MCP server with 51 tools

The same memory store is exposed over MCP so any client can read it. Notable tools:

memory_smart_search — the main retrieval entry point
memory_save — explicit save (most agents won't need this; the hooks handle it)
memory_sessions — list and replay past sessions
memory_governance_delete — audit-logged deletes (more on this below)
memory_export — pull the whole store as JSON for backup or migration

5. iii-engine and the real-time viewer

Under the hood, everything routes through iii-engine (a Rust runtime the same author maintains). The viewer at localhost:3113 reads the same mem::replay::load and mem::replay::sessions functions the CLI uses — no side-channel servers, no SSE proxy hacks. You can scrub through any past session with play/pause, 0.5×–4× speed, and keyboard shortcuts (space, arrow keys). For debugging "what did the agent see when it made that bad decision?" this is shockingly useful.

vs. The Alternatives

Feature	agentmemory	mem0 (53K ⭐)	Letta / MemGPT (22K ⭐)	CLAUDE.md (built-in)
Type	Memory engine + MCP server	Memory layer API	Full agent runtime	Static file
Retrieval R@5	95.2%	68.5%	83.2%	N/A (grep)
Auto-capture	12 hooks, zero manual effort	Manual `add()` calls	Agent self-edits	Manual editing
Search	BM25 + Vector + Graph (RRF)	Vector + Graph	Vector	Loads everything
Multi-agent shared memory	MCP + REST + leases	Per-instance API	Within Letta runtime only	Per-agent files
Framework lock-in	None (any MCP client)	None	High (must use Letta)	Per-agent format
External deps	None (SQLite + iii)	Qdrant or pgvector	Postgres + vector DB	None
Token efficiency	~1,900 tokens/session	Varies	Core memory in context	22K+ tokens at 240 obs
Self-hosted	Yes (default)	Optional (cloud-first)	Optional	Yes

The honest comparison: mem0 is more mature as a hosted product and has more existing framework integrations (LangChain, LlamaIndex, etc.). Letta is more powerful if you want a full Postgres-backed multi-tenant agent platform. agentmemory wins on (a) zero-config local-first install, (b) measured retrieval accuracy, and (c) the auto-capture-via-hooks approach, which means you don't need to remember to remember things.

Community Reactions

The project surfaced on Reddit's r/ClaudeAI, r/ClaudeCode, and r/ChatGPT in early May 2026, and the reception has been split in a useful way.

The enthusiastic camp likes that there's finally a memory tool that ships benchmarks. From r/ClaudeCode: "First memory project I've seen that actually publishes LongMemEval numbers instead of vibes." Several commenters cite the multi-agent shared-memory angle — running Claude Code and Cursor against the same memory store — as the killer feature for teams that use multiple agents per project.

The skeptical camp raises three recurring concerns. First, the dependency on iii-engine adds a Rust runtime to your stack that most developers haven't heard of. Second, some early users reported token-burn regressions in v0.7 and v0.8 where the auto-capture was too aggressive; the v0.9.0 release in April 2026 added a audit policy codified across every delete path and "health stops flagging memory_critical on tiny Node processes" — fixes that suggest the early concerns were real and now addressed. Third, there's a "please stop creating memory for your agent" school of thought on r/ClaudeCode that argues CLAUDE.md plus discipline beats any memory framework. That's a defensible position for solo developers on small projects; it stops being defensible the moment you're juggling three agents and a six-month-old codebase.

The Trendshift page tracks the repo at #1 trending across all of GitHub as of May 13, 2026, with the star-history graph showing a near-vertical climb from ~2K to ~9.4K stars in two weeks.

Honest Limitations

A short list of things that are real and not blockers for everyone:

Two terminals. You need the memory server running in a separate process. There's no daemon mode yet, so you either run it in tmux/screen or you'll need to remember to start it. A launchd/systemd setup is left as an exercise for the reader.
iii-engine is young. The runtime hit v0.11 in 2026 and is maintained by the same author. If you're allergic to single-maintainer dependencies, this is a real risk.
No team sync out of the box. Memory is per-machine. There's no built-in "share my project memory with my coworker" path — you'd export with memory_export, commit the JSON, and have the other dev import. For solo projects this is fine; for teams this is a missing feature.
SQLite is the storage layer. Great for local-first, but if you've already standardized on pgvector or Qdrant elsewhere, you're adding a new storage system instead of consolidating. There's no first-party Postgres backend.
Auto-capture means everything is captured. If you cat a secrets file during a session, that content lands in memory. The memory_governance_delete tool exists for exactly this case, but you have to know to use it. A built-in secret scanner would be welcome.
Local embedding model is small. all-MiniLM-L6-v2 is fast and free, but the recall numbers degrade vs. a larger embedding model. For most coding contexts, MiniLM is enough; for cross-language polyglot codebases, you'll probably want to switch to a larger model via the OpenAI embeddings endpoint.

Should You Use It?

Yes if any of these are true:

You use multiple AI coding agents (e.g. Claude Code + Cursor) on the same project and want them to share context.
You've felt the pain of re-explaining your architecture to your agent on a long-running project.
You want measured retrieval quality instead of trusting that vector search just works.
You want local-first memory that doesn't ship your code context to a cloud vendor.
You're already running an MCP-capable agent (which is now almost all of them).

Probably no if any of these are true:

You're a solo developer on a small, short-lived project where CLAUDE.md actually fits in 200 lines.
You need team-shared memory that syncs across developers' machines without manual export/import.
You're allergic to adding a Rust runtime (iii-engine) and a new SQLite store to your dev environment.
You're already deep on mem0 or Letta and the migration cost outweighs the retrieval gain.

For andrew.ooo's content pipeline — which juggles Claude Code, an OpenClaw bot, and occasional Cursor sessions against the same codebase — agentmemory is a clear win and is going on the eval shortlist for next sprint.

FAQ

Is agentmemory free?

Yes. MIT-licensed, runs entirely on your machine, and works with the local all-MiniLM-L6-v2 embedding model so there's no API key required. If you want stronger embeddings, you can plug in OpenAI's text-embedding-3-large (cost: roughly $10/year at typical coding-agent usage) or any other compatible embedding provider.

Does it work with Claude Code, Cursor, and Codex CLI at the same time?

Yes — that's a headline feature. All three (and anything else MCP-compatible) read and write the same memory store via the server on localhost:3111. Memory captured by Claude Code is retrievable from Cursor in the next session, and vice versa.

How is it different from `CLAUDE.md` or `.cursorrules`?

CLAUDE.md and .cursorrules are flat files you load wholesale into context. They cap out around 200 lines, go stale, and don't share across agents. agentmemory is a retrieval pipeline — it stores thousands of memories and injects only the ~5 most relevant ones per query, which means you can have hundreds of accumulated facts about your project without burning your context window.

Will it leak my code to a third party?

Not by default. The default install uses local SQLite for storage and local embeddings (all-MiniLM-L6-v2) for vectors. Nothing leaves your machine unless you explicitly configure an external embedding provider (OpenAI, Voyage, etc.). For NDA work, run with the defaults.

Can I run it on a server for a team?

You can — the server binds to localhost by default, but the README documents how to expose it over a network. You'd need to add auth (it ships without auth, which is fine for localhost-only) before exposing it publicly. The "team-shared memory" use case isn't first-class yet; the recommended pattern is exporting/importing JSON for now.

What happens if the memory server is down?

The MCP client falls back to no-memory mode (it'll work like a regular agent without memory). Hook scripts fail gracefully — they log the error but don't block the agent. This is the right default, but it does mean you'll silently lose new memory captures if you don't notice the server is down. Check localhost:3111/agentmemory/health periodically or wire it into your status bar.

How big does the SQLite database get?

The benchmarks page reports roughly 50MB per 1,000 sessions with full transcripts captured. The four-tier consolidation pipeline keeps long-term growth sub-linear because raw observations get compressed into semantic memories and pruned. For most users, you'll never notice the disk usage.

Bottom Line

agentmemory is the first persistent-memory project for AI coding agents that ships real benchmarks, a one-command install, and works across every major MCP-capable agent without lock-in. The 95.2% R@5 on LongMemEval is the headline number, but the architectural choice that matters most is the auto-capture-via-hooks approach: memory is the default, not something you have to remember to do.

The dependency on iii-engine is the main risk. The missing team-sync story is the main feature gap. Neither is a blocker for an individual developer or a small team who wants their AI agents to actually remember things.

If you've been waiting for the moment when AI coding agents stop being amnesiac, this is the moment.

Try it: npx @agentmemory/agentmemory
Repo: github.com/rohitg00/agentmemory
Site: agent-memory.dev

Top comments (2)

Kyle Carriedo • May 17

The "flat-file memory doesn't scale across agents" framing is the part that gets undersold. Once you're running Claude Code + a second harness (Cursor, Codex CLI, Aider, OpenClaw, whatever) against the same repo, every context-loss event compounds — and the compounding isn't linear, because each tool's "session resume" path is subtly different. Cursor reconstructs from the buffer state, Claude Code rehydrates from CLAUDE.md + last transcript, Codex CLI is basically stateless beyond shell history. Three resumption strategies, three subtly-different ideas of "what the project is right now."

A few observations from running multi-agent / multi-tool topologies for a while:

The thing that hurts most isn't "agent forgets X." It's that different agents disagree about X — one tool thinks the auth helper lives in lib/auth, another thinks it's lib/services/auth, because they each rehydrated from a different snapshot. The disagreement only surfaces when one agent writes code that imports the wrong path, which then either fails CI or worse, silently passes because of a re-export.
Persistent memory addresses the symptom (forgetting) but doesn't address the harder problem (consistency across concurrent resumers). If two agents both pull from a shared memory store and one writes back, the second's local context can go stale mid-task. The interesting design question is whether agentmemory (or anything in this space) has a notion of versioned reads — i.e., agent A's edits don't quietly invalidate agent B's plan.
The "fifth time this week" frequency is real and worth quantifying. We bisected one project: ~22% of all tool-result tokens across a week went to re-establishing context the agents had already learned earlier in the same week. That's a measurable cost, not a vibe.
One pattern that's been working for us until something like this matures: a single human-curated PROJECT_FACTS.md (auth shape, naming conventions, the ten files agents misroute to) that every agent reads on session start. Crude, but it pins the "stable" facts so memory drift only affects the legitimately-fluid stuff. The downside: humans curate it, which is exactly the thing you're trying to delegate.

Curious whether agentmemory handles the multi-writer case explicitly or whether it's mostly read-optimized — that's the load-bearing question for anyone running >1 agent against the same codebase concurrently.

Harjot Singh • May 29

persistent memory is the right primitive for multi-session agent work. orthogonal q: in single-shot saas gen (no multi-session - just 1 prompt -> shipped saas), memory becomes a TYPED OUTPUT artifact between phases instead of a runtime state. been building moonshift on that simpler model. $3 per shipped saas, code into ur own gh + vercel. first run free if u want a different shape of the memory problem to compare against.