DEV Community

Cover image for Give Your AI Agent Persistent Memory in 30 Seconds
Jeff Witters
Jeff Witters

Posted on • Originally published at github.com

Give Your AI Agent Persistent Memory in 30 Seconds

Your AI agent is brilliant in the moment. Then the session ends, and it forgets everything.

Every new conversation starts from zero. It doesn't remember that you prefer TypeScript. It doesn't know the architectural decision you made last week. It doesn't know it already tried that approach and it didn't work.

This is the agent memory problem. Most solutions involve vector databases, API keys, and cloud infrastructure. engram-mcp doesn't.

npx -y @cartisien/engram-mcp
Enter fullscreen mode Exit fullscreen mode

That's it. Persistent semantic memory for Claude Desktop, Cursor, Windsurf, or any MCP client — in 30 seconds, no signup, no cloud.


The Problem with Agent Memory Today

The common approach: dump everything into a vector store. Every message, every fact, every decision — stored with equal confidence, recalled with equal weight.

The result after a few weeks: contradictory facts at similar confidence scores. The agent remembers both "user prefers dark mode" and "user prefers light mode" and doesn't know which is current. It remembers five different attempts at the same problem with no signal about which one worked.

More memory doesn't automatically mean better memory.


How engram-mcp Works

Storage: SQLite. No server to run, no port to expose, no Docker container. The database lives at ~/.engram/memory.db by default. It's a file.

Semantic search: Uses Ollama + nomic-embed-text locally. Embeddings are computed on your machine. No API key, no data leaving your box.

Fallback: If Ollama isn't running, it falls back to keyword search automatically. You never get a crash — you get a slightly less smart search.

Sessions: Memories are scoped by sessionId. Your Claude Desktop agent, your Cursor agent, and your personal automation scripts can each have their own isolated memory space — or share one.


Setup: Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "engram": {
      "command": "npx",
      "args": ["-y", "@cartisien/engram-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Restart Claude Desktop. You now have 5 new tools:

Tool What it does
remember Store a memory with automatic embedding
recall Semantic search — "what did I say about auth?"
history Recent entries in chronological order
forget Delete a memory, a session, or entries before a date
stats How much is in there, embedding coverage, etc.

Setup: Cursor / Windsurf

{
  "mcpServers": {
    "engram": {
      "command": "npx",
      "args": ["-y", "@cartisien/engram-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

What It Looks Like in Practice

After a few sessions, your agent builds up real context:

recall(sessionId="myproject", query="why did we choose SQLite over postgres?")

→ "Chose SQLite to avoid infra requirements for local-first tools. 
   Postgres adds a server dependency that breaks the zero-config install story."
   (similarity: 0.91)
Enter fullscreen mode Exit fullscreen mode

The agent can ask itself questions about its own history and get back coherent, relevant answers — not a flat list of everything it's ever stored.


Local-First Is a Real Constraint

Most agent memory tools are hosted. That's a fine choice for many teams, but it means:

  • Your agent's memory (and by extension, context about your work) lives on someone else's server
  • There's a network call on every recall
  • There's a subscription or usage cost as memory grows
  • There's a new service to keep running

engram-mcp stores everything in a SQLite file on your machine. The embedding model runs locally via Ollama. The search happens in-process. There's no external service to maintain, authenticate against, or pay for.


Semantic Search Without a Vector Database

This is the part people ask about most.

Traditional approach: run a vector database (Qdrant, Pinecone, Chroma), push embeddings into it, query by cosine similarity. Works great, but requires running and maintaining a separate process.

Our approach: store embeddings as raw floats in SQLite, compute cosine similarity in the application layer at query time. For personal-scale memory (thousands to tens of thousands of entries), this is fast enough — and it eliminates the dependency.

Ollama generates the embeddings locally. nomic-embed-text is small, fast, and good at semantic similarity on natural-language text.

# One-time setup
ollama pull nomic-embed-text
Enter fullscreen mode Exit fullscreen mode

After that, recall finds semantically similar memories even when the exact words don't match.


Install

# Verify it works
npx -y @cartisien/engram-mcp
Enter fullscreen mode Exit fullscreen mode

Built on @cartisien/engram — the underlying memory SDK if you want to integrate it directly rather than through MCP.


Curious what other approaches people are using for agent memory. The certainty/contradiction problem in particular — most tools I've seen treat all stored facts as equally valid, which compounds badly over time.

Top comments (0)