DEV Community

Andrew
Andrew

Posted on • Originally published at andrew.ooo

agentmemory Review: Persistent Memory for AI Coding Agents

Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.

TL;DR

agentmemory is an open-source persistent memory layer for AI coding agents. It silently captures what your agent does, compresses it into searchable memory, and injects the right context into the next session. One server, shared across Claude Code, Cursor, Codex CLI, Gemini CLI, Cline, Windsurf, Roo Code, OpenCode, and anything else that speaks MCP.

  • 9,361 GitHub stars with 6,467 stars this week (#1 trending repo as of May 2026)
  • 95.2% R@5 on LongMemEval-S (ICLR 2025, 500 questions) — beats mem0 (68.5%) and Letta/MemGPT (83.2%)
  • ~170K tokens/year vs ~19.5M for paste-full-context approaches — roughly $10/yr running cost (free with local embeddings)
  • 12 auto-capture hooks for Claude Code, 6 for Codex CLI, MCP server for everything else
  • 51 MCP tools (memory_smart_search, memory_save, memory_sessions, memory_governance_delete, etc.)
  • Zero external dependencies — local-first SQLite + iii-engine, no Qdrant or Postgres required
  • License: MIT, npm package @agentmemory/agentmemory

If you've ever re-explained your auth setup, project conventions, or "why we chose X over Y" to your coding agent for the fifth time this week, this is the project to try.


Why This Matters

Every AI coding agent has the same problem: it forgets everything when the session ends. The official answer is a flat file — CLAUDE.md, .cursorrules, AGENTS.md — that caps out around 200 lines and goes stale within a sprint. You either re-paste the same architecture overview every session, or you accept that your agent re-discovers the same bugs and re-asks the same questions forever.

The deeper problem is that flat-file memory doesn't scale across agents. If you use Claude Code for refactoring, Cursor for autocomplete, and Codex CLI for shell tasks, none of them share what they learned. Each agent re-builds context from scratch every time.

agentmemory takes a different bet: run a single persistent memory server on localhost, let every agent read and write through MCP (or hooks, or REST), and use a real retrieval pipeline — BM25 + vectors + a knowledge graph fused with reciprocal rank fusion — instead of grepping a markdown file.

The README's example captures the value cleanly: "Session 1 you set up JWT auth. Session 2 you ask for rate limiting. The agent already knows your auth uses jose middleware in src/middleware/auth.ts, your tests cover token validation, and you chose jose over jsonwebtoken for Edge compatibility."


Benchmarks: What the Numbers Actually Say

agentmemory is one of the few memory layers that ships real benchmark numbers in the README, against published evaluations.

Retrieval Accuracy (LongMemEval-S, 500 questions)

System R@5 R@10 MRR
agentmemory 95.2% 98.6% 88.2%
BM25-only fallback 86.2% 94.6% 71.5%

Compare that to the LoCoMo numbers the competitors publish:

System R@5 (LoCoMo) Source
agentmemory 95.2% (LongMemEval) This repo
mem0 68.5% mem0 paper
Letta / MemGPT 83.2% Letta docs
CLAUDE.md grep N/A (no semantic recall)

LongMemEval and LoCoMo aren't identical benchmarks, so this isn't a perfectly clean head-to-head — but the gap is large enough that the directional claim ("more accurate retrieval than the popular alternatives") survives the methodology caveat.

Token Economics

Approach Tokens/year Cost/year
Paste full context every session 19.5M+ Exceeds context window
LLM-summarized context ~650K ~$500
agentmemory ~170K ~$10
agentmemory + local embeddings (all-MiniLM-L6-v2) ~170K $0

The local-embeddings path matters: it means you can run the whole stack without an OpenAI key and without sending your codebase context to a third party. For anyone doing client work under NDA, that's the difference between "interesting" and "actually usable."


Quick Start

The install is genuinely one command:

# Terminal 1: start the server (runs on localhost:3111)
npx @agentmemory/agentmemory

# Terminal 2: seed sample data and watch retrieval work
npx @agentmemory/agentmemory demo
Enter fullscreen mode Exit fullscreen mode

Open http://localhost:3113 to watch the memory build live in the real-time viewer. The demo command seeds three realistic sessions (JWT auth setup, an N+1 query fix, and a rate-limiting implementation) and then runs semantic searches against them — including queries like "database performance optimization" that should retrieve the N+1 fix purely on semantics, not keyword overlap.

For Claude Code, the integration is two commands inside the agent:

/plugin marketplace add rohitg00/agentmemory
/plugin install agentmemory
Enter fullscreen mode Exit fullscreen mode

The plugin registers 12 lifecycle hooks (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact, Stop, SessionEnd, Notification, TaskCompleted, PostToolUseFailure, Subagent, plus the new filesystem-watcher hook), 4 skills (/recall, /remember, /session-history, /forget), and auto-wires the @agentmemory/mcp stdio server. You get 51 MCP tools without touching any other config.

Codex CLI gets a slimmer 6-hook plugin (Codex doesn't expose Subagent or SessionEnd events yet):

codex plugin marketplace add rohitg00/agentmemory
codex plugin install agentmemory
Enter fullscreen mode Exit fullscreen mode

For Cursor, Windsurf, Cline, Claude Desktop, Gemini CLI, OpenCode, Roo Code, or anything else that speaks MCP, the config is the same block in every host's mcpServers object:

{
  "mcpServers": {
    "agentmemory": {
      "command": "npx",
      "args": ["-y", "@agentmemory/mcp"],
      "env": {
        "AGENTMEMORY_URL": "http://localhost:3111"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Verify everything is alive with curl http://localhost:3111/agentmemory/health. That's the whole install.


How It Actually Works

Four moving parts do the work. They're worth understanding because most of the failure modes for memory layers live in the seams between these pieces.

1. Auto-capture hooks

When you install the plugin, agentmemory's hooks fire on every prompt, every tool call, every tool result, and every stop. The hook scripts POST events to localhost:3111. There's no manual add() call to forget — capture is the default. This is the biggest practical difference from mem0, which requires you to explicitly call client.add() on the things you want remembered.

2. Four-tier memory pipeline

Raw captures go into a working memory buffer. Hourly sweeps compress observations into episodic memories (specific events: "JWT setup happened, here are the files touched"), then into semantic memories (generalized facts: "this project uses jose for JWT"), and finally into procedural memories (reusable workflows: "to add a protected route, edit middleware/auth.ts then add a test in tests/auth/"). Stale or duplicate entries get pruned automatically.

3. Hybrid retrieval with reciprocal rank fusion

When the agent fires a memory_smart_search, agentmemory runs three retrievers in parallel:

  • BM25 for keyword and identifier matches
  • Vector search over all-MiniLM-L6-v2 embeddings for semantic matches
  • Knowledge-graph traversal for entity-linked memories (files, functions, decisions)

Results get fused with RRF and reranked on-device. This is what the 95.2% R@5 number is measuring — and it's the architectural reason agentmemory beats vector-only systems like mem0 on retrieval quality.

4. MCP server with 51 tools

The same memory store is exposed over MCP so any client can read it. Notable tools:

  • memory_smart_search — the main retrieval entry point
  • memory_save — explicit save (most agents won't need this; the hooks handle it)
  • memory_sessions — list and replay past sessions
  • memory_governance_delete — audit-logged deletes (more on this below)
  • memory_export — pull the whole store as JSON for backup or migration

5. iii-engine and the real-time viewer

Under the hood, everything routes through iii-engine (a Rust runtime the same author maintains). The viewer at localhost:3113 reads the same mem::replay::load and mem::replay::sessions functions the CLI uses — no side-channel servers, no SSE proxy hacks. You can scrub through any past session with play/pause, 0.5×–4× speed, and keyboard shortcuts (space, arrow keys). For debugging "what did the agent see when it made that bad decision?" this is shockingly useful.


vs. The Alternatives

Feature agentmemory mem0 (53K ⭐) Letta / MemGPT (22K ⭐) CLAUDE.md (built-in)
Type Memory engine + MCP server Memory layer API Full agent runtime Static file
Retrieval R@5 95.2% 68.5% 83.2% N/A (grep)
Auto-capture 12 hooks, zero manual effort Manual add() calls Agent self-edits Manual editing
Search BM25 + Vector + Graph (RRF) Vector + Graph Vector Loads everything
Multi-agent shared memory MCP + REST + leases Per-instance API Within Letta runtime only Per-agent files
Framework lock-in None (any MCP client) None High (must use Letta) Per-agent format
External deps None (SQLite + iii) Qdrant or pgvector Postgres + vector DB None
Token efficiency ~1,900 tokens/session Varies Core memory in context 22K+ tokens at 240 obs
Self-hosted Yes (default) Optional (cloud-first) Optional Yes

The honest comparison: mem0 is more mature as a hosted product and has more existing framework integrations (LangChain, LlamaIndex, etc.). Letta is more powerful if you want a full Postgres-backed multi-tenant agent platform. agentmemory wins on (a) zero-config local-first install, (b) measured retrieval accuracy, and (c) the auto-capture-via-hooks approach, which means you don't need to remember to remember things.


Community Reactions

The project surfaced on Reddit's r/ClaudeAI, r/ClaudeCode, and r/ChatGPT in early May 2026, and the reception has been split in a useful way.

The enthusiastic camp likes that there's finally a memory tool that ships benchmarks. From r/ClaudeCode: "First memory project I've seen that actually publishes LongMemEval numbers instead of vibes." Several commenters cite the multi-agent shared-memory angle — running Claude Code and Cursor against the same memory store — as the killer feature for teams that use multiple agents per project.

The skeptical camp raises three recurring concerns. First, the dependency on iii-engine adds a Rust runtime to your stack that most developers haven't heard of. Second, some early users reported token-burn regressions in v0.7 and v0.8 where the auto-capture was too aggressive; the v0.9.0 release in April 2026 added a audit policy codified across every delete path and "health stops flagging memory_critical on tiny Node processes" — fixes that suggest the early concerns were real and now addressed. Third, there's a "please stop creating memory for your agent" school of thought on r/ClaudeCode that argues CLAUDE.md plus discipline beats any memory framework. That's a defensible position for solo developers on small projects; it stops being defensible the moment you're juggling three agents and a six-month-old codebase.

The Trendshift page tracks the repo at #1 trending across all of GitHub as of May 13, 2026, with the star-history graph showing a near-vertical climb from ~2K to ~9.4K stars in two weeks.


Honest Limitations

A short list of things that are real and not blockers for everyone:

  • Two terminals. You need the memory server running in a separate process. There's no daemon mode yet, so you either run it in tmux/screen or you'll need to remember to start it. A launchd/systemd setup is left as an exercise for the reader.
  • iii-engine is young. The runtime hit v0.11 in 2026 and is maintained by the same author. If you're allergic to single-maintainer dependencies, this is a real risk.
  • No team sync out of the box. Memory is per-machine. There's no built-in "share my project memory with my coworker" path — you'd export with memory_export, commit the JSON, and have the other dev import. For solo projects this is fine; for teams this is a missing feature.
  • SQLite is the storage layer. Great for local-first, but if you've already standardized on pgvector or Qdrant elsewhere, you're adding a new storage system instead of consolidating. There's no first-party Postgres backend.
  • Auto-capture means everything is captured. If you cat a secrets file during a session, that content lands in memory. The memory_governance_delete tool exists for exactly this case, but you have to know to use it. A built-in secret scanner would be welcome.
  • Local embedding model is small. all-MiniLM-L6-v2 is fast and free, but the recall numbers degrade vs. a larger embedding model. For most coding contexts, MiniLM is enough; for cross-language polyglot codebases, you'll probably want to switch to a larger model via the OpenAI embeddings endpoint.

Should You Use It?

Yes if any of these are true:

  • You use multiple AI coding agents (e.g. Claude Code + Cursor) on the same project and want them to share context.
  • You've felt the pain of re-explaining your architecture to your agent on a long-running project.
  • You want measured retrieval quality instead of trusting that vector search just works.
  • You want local-first memory that doesn't ship your code context to a cloud vendor.
  • You're already running an MCP-capable agent (which is now almost all of them).

Probably no if any of these are true:

  • You're a solo developer on a small, short-lived project where CLAUDE.md actually fits in 200 lines.
  • You need team-shared memory that syncs across developers' machines without manual export/import.
  • You're allergic to adding a Rust runtime (iii-engine) and a new SQLite store to your dev environment.
  • You're already deep on mem0 or Letta and the migration cost outweighs the retrieval gain.

For andrew.ooo's content pipeline — which juggles Claude Code, an OpenClaw bot, and occasional Cursor sessions against the same codebase — agentmemory is a clear win and is going on the eval shortlist for next sprint.


FAQ

Is agentmemory free?

Yes. MIT-licensed, runs entirely on your machine, and works with the local all-MiniLM-L6-v2 embedding model so there's no API key required. If you want stronger embeddings, you can plug in OpenAI's text-embedding-3-large (cost: roughly $10/year at typical coding-agent usage) or any other compatible embedding provider.

Does it work with Claude Code, Cursor, and Codex CLI at the same time?

Yes — that's a headline feature. All three (and anything else MCP-compatible) read and write the same memory store via the server on localhost:3111. Memory captured by Claude Code is retrievable from Cursor in the next session, and vice versa.

How is it different from CLAUDE.md or .cursorrules?

CLAUDE.md and .cursorrules are flat files you load wholesale into context. They cap out around 200 lines, go stale, and don't share across agents. agentmemory is a retrieval pipeline — it stores thousands of memories and injects only the ~5 most relevant ones per query, which means you can have hundreds of accumulated facts about your project without burning your context window.

Will it leak my code to a third party?

Not by default. The default install uses local SQLite for storage and local embeddings (all-MiniLM-L6-v2) for vectors. Nothing leaves your machine unless you explicitly configure an external embedding provider (OpenAI, Voyage, etc.). For NDA work, run with the defaults.

Can I run it on a server for a team?

You can — the server binds to localhost by default, but the README documents how to expose it over a network. You'd need to add auth (it ships without auth, which is fine for localhost-only) before exposing it publicly. The "team-shared memory" use case isn't first-class yet; the recommended pattern is exporting/importing JSON for now.

What happens if the memory server is down?

The MCP client falls back to no-memory mode (it'll work like a regular agent without memory). Hook scripts fail gracefully — they log the error but don't block the agent. This is the right default, but it does mean you'll silently lose new memory captures if you don't notice the server is down. Check localhost:3111/agentmemory/health periodically or wire it into your status bar.

How big does the SQLite database get?

The benchmarks page reports roughly 50MB per 1,000 sessions with full transcripts captured. The four-tier consolidation pipeline keeps long-term growth sub-linear because raw observations get compressed into semantic memories and pruned. For most users, you'll never notice the disk usage.


Bottom Line

agentmemory is the first persistent-memory project for AI coding agents that ships real benchmarks, a one-command install, and works across every major MCP-capable agent without lock-in. The 95.2% R@5 on LongMemEval is the headline number, but the architectural choice that matters most is the auto-capture-via-hooks approach: memory is the default, not something you have to remember to do.

The dependency on iii-engine is the main risk. The missing team-sync story is the main feature gap. Neither is a blocker for an individual developer or a small team who wants their AI agents to actually remember things.

If you've been waiting for the moment when AI coding agents stop being amnesiac, this is the moment.

Try it: npx @agentmemory/agentmemory
Repo: github.com/rohitg00/agentmemory
Site: agent-memory.dev

Top comments (0)