Give Your AI Agent Persistent Memory in 30 Seconds

#mcp #showdev #agents #typescript

Your AI agent is brilliant in the moment. Then the session ends, and it forgets everything.

Every new conversation starts from zero. It doesn't remember that you prefer TypeScript. It doesn't know the architectural decision you made last week. It doesn't know it already tried that approach and it didn't work.

This is the agent memory problem. Most solutions involve vector databases, API keys, and cloud infrastructure. engram-mcp doesn't.

npx -y @cartisien/engram-mcp

That's it. Persistent semantic memory for Claude Desktop, Cursor, Windsurf, or any MCP client — in 30 seconds, no signup, no cloud.

The Problem with Agent Memory Today

The common approach: dump everything into a vector store. Every message, every fact, every decision — stored with equal confidence, recalled with equal weight.

The result after a few weeks: contradictory facts at similar confidence scores. The agent remembers both "user prefers dark mode" and "user prefers light mode" and doesn't know which is current. It remembers five different attempts at the same problem with no signal about which one worked.

More memory doesn't automatically mean better memory.

How engram-mcp Works

Storage: SQLite. No server to run, no port to expose, no Docker container. The database lives at ~/.engram/memory.db by default. It's a file.

Semantic search: Uses Ollama + nomic-embed-text locally. Embeddings are computed on your machine. No API key, no data leaving your box.

Fallback: If Ollama isn't running, it falls back to keyword search automatically. You never get a crash — you get a slightly less smart search.

Sessions: Memories are scoped by sessionId. Your Claude Desktop agent, your Cursor agent, and your personal automation scripts can each have their own isolated memory space — or share one.

Setup: Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "engram": {
      "command": "npx",
      "args": ["-y", "@cartisien/engram-mcp"]
    }
  }
}

Restart Claude Desktop. You now have 5 new tools:

Tool	What it does
`remember`	Store a memory with automatic embedding
`recall`	Semantic search — "what did I say about auth?"
`history`	Recent entries in chronological order
`forget`	Delete a memory, a session, or entries before a date
`stats`	How much is in there, embedding coverage, etc.

Setup: Cursor / Windsurf

{
  "mcpServers": {
    "engram": {
      "command": "npx",
      "args": ["-y", "@cartisien/engram-mcp"]
    }
  }
}

What It Looks Like in Practice

After a few sessions, your agent builds up real context:

recall(sessionId="myproject", query="why did we choose SQLite over postgres?")

→ "Chose SQLite to avoid infra requirements for local-first tools. 
   Postgres adds a server dependency that breaks the zero-config install story."
   (similarity: 0.91)

The agent can ask itself questions about its own history and get back coherent, relevant answers — not a flat list of everything it's ever stored.

Local-First Is a Real Constraint

Most agent memory tools are hosted. That's a fine choice for many teams, but it means:

Your agent's memory (and by extension, context about your work) lives on someone else's server
There's a network call on every recall
There's a subscription or usage cost as memory grows
There's a new service to keep running

engram-mcp stores everything in a SQLite file on your machine. The embedding model runs locally via Ollama. The search happens in-process. There's no external service to maintain, authenticate against, or pay for.

Semantic Search Without a Vector Database

This is the part people ask about most.

Traditional approach: run a vector database (Qdrant, Pinecone, Chroma), push embeddings into it, query by cosine similarity. Works great, but requires running and maintaining a separate process.

Our approach: store embeddings as raw floats in SQLite, compute cosine similarity in the application layer at query time. For personal-scale memory (thousands to tens of thousands of entries), this is fast enough — and it eliminates the dependency.

Ollama generates the embeddings locally. nomic-embed-text is small, fast, and good at semantic similarity on natural-language text.

# One-time setup
ollama pull nomic-embed-text

After that, recall finds semantically similar memories even when the exact words don't match.

Install

# Verify it works
npx -y @cartisien/engram-mcp

Built on @cartisien/engram — the underlying memory SDK if you want to integrate it directly rather than through MCP.

Curious what other approaches people are using for agent memory. The certainty/contradiction problem in particular — most tools I've seen treat all stored facts as equally valid, which compounds badly over time.