Originally published on andrew.ooo — visit the original for any updates, code snippets that aged out, or follow-up posts.
TL;DR
agentmemory is an open-source persistent memory layer for AI coding agents. It silently captures what your agent does, compresses it into searchable memory, and injects the right context into the next session. One server, shared across Claude Code, Cursor, Codex CLI, Gemini CLI, Cline, Windsurf, Roo Code, OpenCode, and anything else that speaks MCP.
- 9,361 GitHub stars with 6,467 stars this week (#1 trending repo as of May 2026)
- 95.2% R@5 on LongMemEval-S (ICLR 2025, 500 questions) — beats mem0 (68.5%) and Letta/MemGPT (83.2%)
- ~170K tokens/year vs ~19.5M for paste-full-context approaches — roughly $10/yr running cost (free with local embeddings)
- 12 auto-capture hooks for Claude Code, 6 for Codex CLI, MCP server for everything else
- 51 MCP tools (memory_smart_search, memory_save, memory_sessions, memory_governance_delete, etc.)
-
Zero external dependencies — local-first SQLite +
iii-engine, no Qdrant or Postgres required -
License: MIT, npm package
@agentmemory/agentmemory
If you've ever re-explained your auth setup, project conventions, or "why we chose X over Y" to your coding agent for the fifth time this week, this is the project to try.
Why This Matters
Every AI coding agent has the same problem: it forgets everything when the session ends. The official answer is a flat file — CLAUDE.md, .cursorrules, AGENTS.md — that caps out around 200 lines and goes stale within a sprint. You either re-paste the same architecture overview every session, or you accept that your agent re-discovers the same bugs and re-asks the same questions forever.
The deeper problem is that flat-file memory doesn't scale across agents. If you use Claude Code for refactoring, Cursor for autocomplete, and Codex CLI for shell tasks, none of them share what they learned. Each agent re-builds context from scratch every time.
agentmemory takes a different bet: run a single persistent memory server on localhost, let every agent read and write through MCP (or hooks, or REST), and use a real retrieval pipeline — BM25 + vectors + a knowledge graph fused with reciprocal rank fusion — instead of grepping a markdown file.
The README's example captures the value cleanly: "Session 1 you set up JWT auth. Session 2 you ask for rate limiting. The agent already knows your auth uses jose middleware in src/middleware/auth.ts, your tests cover token validation, and you chose jose over jsonwebtoken for Edge compatibility."
Benchmarks: What the Numbers Actually Say
agentmemory is one of the few memory layers that ships real benchmark numbers in the README, against published evaluations.
Retrieval Accuracy (LongMemEval-S, 500 questions)
| System | R@5 | R@10 | MRR |
|---|---|---|---|
| agentmemory | 95.2% | 98.6% | 88.2% |
| BM25-only fallback | 86.2% | 94.6% | 71.5% |
Compare that to the LoCoMo numbers the competitors publish:
| System | R@5 (LoCoMo) | Source |
|---|---|---|
| agentmemory | 95.2% (LongMemEval) | This repo |
| mem0 | 68.5% | mem0 paper |
| Letta / MemGPT | 83.2% | Letta docs |
| CLAUDE.md grep | N/A (no semantic recall) | — |
LongMemEval and LoCoMo aren't identical benchmarks, so this isn't a perfectly clean head-to-head — but the gap is large enough that the directional claim ("more accurate retrieval than the popular alternatives") survives the methodology caveat.
Token Economics
| Approach | Tokens/year | Cost/year |
|---|---|---|
| Paste full context every session | 19.5M+ | Exceeds context window |
| LLM-summarized context | ~650K | ~$500 |
| agentmemory | ~170K | ~$10 |
agentmemory + local embeddings (all-MiniLM-L6-v2) |
~170K | $0 |
The local-embeddings path matters: it means you can run the whole stack without an OpenAI key and without sending your codebase context to a third party. For anyone doing client work under NDA, that's the difference between "interesting" and "actually usable."
Quick Start
The install is genuinely one command:
# Terminal 1: start the server (runs on localhost:3111)
npx @agentmemory/agentmemory
# Terminal 2: seed sample data and watch retrieval work
npx @agentmemory/agentmemory demo
Open http://localhost:3113 to watch the memory build live in the real-time viewer. The demo command seeds three realistic sessions (JWT auth setup, an N+1 query fix, and a rate-limiting implementation) and then runs semantic searches against them — including queries like "database performance optimization" that should retrieve the N+1 fix purely on semantics, not keyword overlap.
For Claude Code, the integration is two commands inside the agent:
/plugin marketplace add rohitg00/agentmemory
/plugin install agentmemory
The plugin registers 12 lifecycle hooks (SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, PreCompact, Stop, SessionEnd, Notification, TaskCompleted, PostToolUseFailure, Subagent, plus the new filesystem-watcher hook), 4 skills (/recall, /remember, /session-history, /forget), and auto-wires the @agentmemory/mcp stdio server. You get 51 MCP tools without touching any other config.
Codex CLI gets a slimmer 6-hook plugin (Codex doesn't expose Subagent or SessionEnd events yet):
codex plugin marketplace add rohitg00/agentmemory
codex plugin install agentmemory
For Cursor, Windsurf, Cline, Claude Desktop, Gemini CLI, OpenCode, Roo Code, or anything else that speaks MCP, the config is the same block in every host's mcpServers object:
{
"mcpServers": {
"agentmemory": {
"command": "npx",
"args": ["-y", "@agentmemory/mcp"],
"env": {
"AGENTMEMORY_URL": "http://localhost:3111"
}
}
}
}
Verify everything is alive with curl http://localhost:3111/agentmemory/health. That's the whole install.
How It Actually Works
Four moving parts do the work. They're worth understanding because most of the failure modes for memory layers live in the seams between these pieces.
1. Auto-capture hooks
When you install the plugin, agentmemory's hooks fire on every prompt, every tool call, every tool result, and every stop. The hook scripts POST events to localhost:3111. There's no manual add() call to forget — capture is the default. This is the biggest practical difference from mem0, which requires you to explicitly call client.add() on the things you want remembered.
2. Four-tier memory pipeline
Raw captures go into a working memory buffer. Hourly sweeps compress observations into episodic memories (specific events: "JWT setup happened, here are the files touched"), then into semantic memories (generalized facts: "this project uses jose for JWT"), and finally into procedural memories (reusable workflows: "to add a protected route, edit middleware/auth.ts then add a test in tests/auth/"). Stale or duplicate entries get pruned automatically.
3. Hybrid retrieval with reciprocal rank fusion
When the agent fires a memory_smart_search, agentmemory runs three retrievers in parallel:
- BM25 for keyword and identifier matches
-
Vector search over
all-MiniLM-L6-v2embeddings for semantic matches - Knowledge-graph traversal for entity-linked memories (files, functions, decisions)
Results get fused with RRF and reranked on-device. This is what the 95.2% R@5 number is measuring — and it's the architectural reason agentmemory beats vector-only systems like mem0 on retrieval quality.
4. MCP server with 51 tools
The same memory store is exposed over MCP so any client can read it. Notable tools:
-
memory_smart_search— the main retrieval entry point -
memory_save— explicit save (most agents won't need this; the hooks handle it) -
memory_sessions— list and replay past sessions -
memory_governance_delete— audit-logged deletes (more on this below) -
memory_export— pull the whole store as JSON for backup or migration
5. iii-engine and the real-time viewer
Under the hood, everything routes through iii-engine (a Rust runtime the same author maintains). The viewer at localhost:3113 reads the same mem::replay::load and mem::replay::sessions functions the CLI uses — no side-channel servers, no SSE proxy hacks. You can scrub through any past session with play/pause, 0.5×–4× speed, and keyboard shortcuts (space, arrow keys). For debugging "what did the agent see when it made that bad decision?" this is shockingly useful.
vs. The Alternatives
| Feature | agentmemory | mem0 (53K ⭐) | Letta / MemGPT (22K ⭐) | CLAUDE.md (built-in) |
|---|---|---|---|---|
| Type | Memory engine + MCP server | Memory layer API | Full agent runtime | Static file |
| Retrieval R@5 | 95.2% | 68.5% | 83.2% | N/A (grep) |
| Auto-capture | 12 hooks, zero manual effort | Manual add() calls |
Agent self-edits | Manual editing |
| Search | BM25 + Vector + Graph (RRF) | Vector + Graph | Vector | Loads everything |
| Multi-agent shared memory | MCP + REST + leases | Per-instance API | Within Letta runtime only | Per-agent files |
| Framework lock-in | None (any MCP client) | None | High (must use Letta) | Per-agent format |
| External deps | None (SQLite + iii) | Qdrant or pgvector | Postgres + vector DB | None |
| Token efficiency | ~1,900 tokens/session | Varies | Core memory in context | 22K+ tokens at 240 obs |
| Self-hosted | Yes (default) | Optional (cloud-first) | Optional | Yes |
The honest comparison: mem0 is more mature as a hosted product and has more existing framework integrations (LangChain, LlamaIndex, etc.). Letta is more powerful if you want a full Postgres-backed multi-tenant agent platform. agentmemory wins on (a) zero-config local-first install, (b) measured retrieval accuracy, and (c) the auto-capture-via-hooks approach, which means you don't need to remember to remember things.
Community Reactions
The project surfaced on Reddit's r/ClaudeAI, r/ClaudeCode, and r/ChatGPT in early May 2026, and the reception has been split in a useful way.
The enthusiastic camp likes that there's finally a memory tool that ships benchmarks. From r/ClaudeCode: "First memory project I've seen that actually publishes LongMemEval numbers instead of vibes." Several commenters cite the multi-agent shared-memory angle — running Claude Code and Cursor against the same memory store — as the killer feature for teams that use multiple agents per project.
The skeptical camp raises three recurring concerns. First, the dependency on iii-engine adds a Rust runtime to your stack that most developers haven't heard of. Second, some early users reported token-burn regressions in v0.7 and v0.8 where the auto-capture was too aggressive; the v0.9.0 release in April 2026 added a audit policy codified across every delete path and "health stops flagging memory_critical on tiny Node processes" — fixes that suggest the early concerns were real and now addressed. Third, there's a "please stop creating memory for your agent" school of thought on r/ClaudeCode that argues CLAUDE.md plus discipline beats any memory framework. That's a defensible position for solo developers on small projects; it stops being defensible the moment you're juggling three agents and a six-month-old codebase.
The Trendshift page tracks the repo at #1 trending across all of GitHub as of May 13, 2026, with the star-history graph showing a near-vertical climb from ~2K to ~9.4K stars in two weeks.
Honest Limitations
A short list of things that are real and not blockers for everyone:
-
Two terminals. You need the memory server running in a separate process. There's no daemon mode yet, so you either run it in tmux/screen or you'll need to remember to start it. A
launchd/systemdsetup is left as an exercise for the reader. -
iii-engineis young. The runtime hit v0.11 in 2026 and is maintained by the same author. If you're allergic to single-maintainer dependencies, this is a real risk. -
No team sync out of the box. Memory is per-machine. There's no built-in "share my project memory with my coworker" path — you'd export with
memory_export, commit the JSON, and have the other dev import. For solo projects this is fine; for teams this is a missing feature. - SQLite is the storage layer. Great for local-first, but if you've already standardized on pgvector or Qdrant elsewhere, you're adding a new storage system instead of consolidating. There's no first-party Postgres backend.
-
Auto-capture means everything is captured. If you
cata secrets file during a session, that content lands in memory. Thememory_governance_deletetool exists for exactly this case, but you have to know to use it. A built-in secret scanner would be welcome. -
Local embedding model is small.
all-MiniLM-L6-v2is fast and free, but the recall numbers degrade vs. a larger embedding model. For most coding contexts, MiniLM is enough; for cross-language polyglot codebases, you'll probably want to switch to a larger model via the OpenAI embeddings endpoint.
Should You Use It?
Yes if any of these are true:
- You use multiple AI coding agents (e.g. Claude Code + Cursor) on the same project and want them to share context.
- You've felt the pain of re-explaining your architecture to your agent on a long-running project.
- You want measured retrieval quality instead of trusting that vector search just works.
- You want local-first memory that doesn't ship your code context to a cloud vendor.
- You're already running an MCP-capable agent (which is now almost all of them).
Probably no if any of these are true:
- You're a solo developer on a small, short-lived project where
CLAUDE.mdactually fits in 200 lines. - You need team-shared memory that syncs across developers' machines without manual export/import.
- You're allergic to adding a Rust runtime (
iii-engine) and a new SQLite store to your dev environment. - You're already deep on mem0 or Letta and the migration cost outweighs the retrieval gain.
For andrew.ooo's content pipeline — which juggles Claude Code, an OpenClaw bot, and occasional Cursor sessions against the same codebase — agentmemory is a clear win and is going on the eval shortlist for next sprint.
FAQ
Is agentmemory free?
Yes. MIT-licensed, runs entirely on your machine, and works with the local all-MiniLM-L6-v2 embedding model so there's no API key required. If you want stronger embeddings, you can plug in OpenAI's text-embedding-3-large (cost: roughly $10/year at typical coding-agent usage) or any other compatible embedding provider.
Does it work with Claude Code, Cursor, and Codex CLI at the same time?
Yes — that's a headline feature. All three (and anything else MCP-compatible) read and write the same memory store via the server on localhost:3111. Memory captured by Claude Code is retrievable from Cursor in the next session, and vice versa.
How is it different from CLAUDE.md or .cursorrules?
CLAUDE.md and .cursorrules are flat files you load wholesale into context. They cap out around 200 lines, go stale, and don't share across agents. agentmemory is a retrieval pipeline — it stores thousands of memories and injects only the ~5 most relevant ones per query, which means you can have hundreds of accumulated facts about your project without burning your context window.
Will it leak my code to a third party?
Not by default. The default install uses local SQLite for storage and local embeddings (all-MiniLM-L6-v2) for vectors. Nothing leaves your machine unless you explicitly configure an external embedding provider (OpenAI, Voyage, etc.). For NDA work, run with the defaults.
Can I run it on a server for a team?
You can — the server binds to localhost by default, but the README documents how to expose it over a network. You'd need to add auth (it ships without auth, which is fine for localhost-only) before exposing it publicly. The "team-shared memory" use case isn't first-class yet; the recommended pattern is exporting/importing JSON for now.
What happens if the memory server is down?
The MCP client falls back to no-memory mode (it'll work like a regular agent without memory). Hook scripts fail gracefully — they log the error but don't block the agent. This is the right default, but it does mean you'll silently lose new memory captures if you don't notice the server is down. Check localhost:3111/agentmemory/health periodically or wire it into your status bar.
How big does the SQLite database get?
The benchmarks page reports roughly 50MB per 1,000 sessions with full transcripts captured. The four-tier consolidation pipeline keeps long-term growth sub-linear because raw observations get compressed into semantic memories and pruned. For most users, you'll never notice the disk usage.
Bottom Line
agentmemory is the first persistent-memory project for AI coding agents that ships real benchmarks, a one-command install, and works across every major MCP-capable agent without lock-in. The 95.2% R@5 on LongMemEval is the headline number, but the architectural choice that matters most is the auto-capture-via-hooks approach: memory is the default, not something you have to remember to do.
The dependency on iii-engine is the main risk. The missing team-sync story is the main feature gap. Neither is a blocker for an individual developer or a small team who wants their AI agents to actually remember things.
If you've been waiting for the moment when AI coding agents stop being amnesiac, this is the moment.
Try it: npx @agentmemory/agentmemory
Repo: github.com/rohitg00/agentmemory
Site: agent-memory.dev
Top comments (0)