DEV Community: Jeff Witters

Give Your AI Agent Persistent Memory in 30 Seconds

Jeff Witters — Thu, 12 Mar 2026 12:04:39 +0000

Your AI agent is brilliant in the moment. Then the session ends, and it forgets everything.

Every new conversation starts from zero. It doesn't remember that you prefer TypeScript. It doesn't know the architectural decision you made last week. It doesn't know it already tried that approach and it didn't work.

This is the agent memory problem. Most solutions involve vector databases, API keys, and cloud infrastructure. engram-mcp doesn't.

npx -y @cartisien/engram-mcp

That's it. Persistent semantic memory for Claude Desktop, Cursor, Windsurf, or any MCP client — in 30 seconds, no signup, no cloud.

The Problem with Agent Memory Today

The common approach: dump everything into a vector store. Every message, every fact, every decision — stored with equal confidence, recalled with equal weight.

The result after a few weeks: contradictory facts at similar confidence scores. The agent remembers both "user prefers dark mode" and "user prefers light mode" and doesn't know which is current. It remembers five different attempts at the same problem with no signal about which one worked.

More memory doesn't automatically mean better memory.

How engram-mcp Works

Storage: SQLite. No server to run, no port to expose, no Docker container. The database lives at ~/.engram/memory.db by default. It's a file.

Semantic search: Uses Ollama + nomic-embed-text locally. Embeddings are computed on your machine. No API key, no data leaving your box.

Fallback: If Ollama isn't running, it falls back to keyword search automatically. You never get a crash — you get a slightly less smart search.

Sessions: Memories are scoped by sessionId. Your Claude Desktop agent, your Cursor agent, and your personal automation scripts can each have their own isolated memory space — or share one.

Setup: Claude Desktop

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "engram": {
      "command": "npx",
      "args": ["-y", "@cartisien/engram-mcp"]
    }
  }
}

Restart Claude Desktop. You now have 5 new tools:

Tool	What it does
`remember`	Store a memory with automatic embedding
`recall`	Semantic search — "what did I say about auth?"
`history`	Recent entries in chronological order
`forget`	Delete a memory, a session, or entries before a date
`stats`	How much is in there, embedding coverage, etc.

Setup: Cursor / Windsurf

{
  "mcpServers": {
    "engram": {
      "command": "npx",
      "args": ["-y", "@cartisien/engram-mcp"]
    }
  }
}

What It Looks Like in Practice

After a few sessions, your agent builds up real context:

recall(sessionId="myproject", query="why did we choose SQLite over postgres?")

→ "Chose SQLite to avoid infra requirements for local-first tools. 
   Postgres adds a server dependency that breaks the zero-config install story."
   (similarity: 0.91)

The agent can ask itself questions about its own history and get back coherent, relevant answers — not a flat list of everything it's ever stored.

Local-First Is a Real Constraint

Most agent memory tools are hosted. That's a fine choice for many teams, but it means:

Your agent's memory (and by extension, context about your work) lives on someone else's server
There's a network call on every recall
There's a subscription or usage cost as memory grows
There's a new service to keep running

engram-mcp stores everything in a SQLite file on your machine. The embedding model runs locally via Ollama. The search happens in-process. There's no external service to maintain, authenticate against, or pay for.

Semantic Search Without a Vector Database

This is the part people ask about most.

Traditional approach: run a vector database (Qdrant, Pinecone, Chroma), push embeddings into it, query by cosine similarity. Works great, but requires running and maintaining a separate process.

Our approach: store embeddings as raw floats in SQLite, compute cosine similarity in the application layer at query time. For personal-scale memory (thousands to tens of thousands of entries), this is fast enough — and it eliminates the dependency.

Ollama generates the embeddings locally. nomic-embed-text is small, fast, and good at semantic similarity on natural-language text.

# One-time setup
ollama pull nomic-embed-text

After that, recall finds semantically similar memories even when the exact words don't match.

Install

# Verify it works
npx -y @cartisien/engram-mcp

Built on @cartisien/engram — the underlying memory SDK if you want to integrate it directly rather than through MCP.

Curious what other approaches people are using for agent memory. The certainty/contradiction problem in particular — most tools I've seen treat all stored facts as equally valid, which compounds badly over time.

I Gave My AI Assistant Permanent Memory — Here's Exactly How

Jeff Witters — Wed, 11 Mar 2026 17:47:05 +0000

My AI assistant woke up every morning with no idea who I was.

I'd been running the same assistant for months. It knew my stack, my projects, my preferences — but only within a session. The next day? Blank slate. Every conversation started with context-dumping. "Here's what we're building. Here's where we left off. Here's what matters."

I got tired of it. So I built the thing that fixes it.

The Problem With Context Windows

Most people solve AI memory by stuffing everything into the system prompt. Project docs, previous decisions, preferences — all of it, every session.

This works until it doesn't. Context windows have limits. More importantly, not all memory is equal. You don't need to know everything — you need the right things at the right time.

That's a retrieval problem, not a storage problem.

What I Built

@cartisien/engram — persistent, queryable memory for AI assistants. SQLite-backed, TypeScript-first, zero config.

The core API is intentionally simple:

import { Engram } from '@cartisien/engram';

const memory = new Engram({ dbPath: './assistant.db' });

// Store something
await memory.remember(sessionId, 'User is building a federal contracting app in React 19', 'user');

// Retrieve what's relevant
const context = await memory.recall(sessionId, 'what are we building?', 5);

That's it. Drop it into any agent loop, any chat handler, any LLM integration.

v0.1: Keyword Search

The first version was straightforward. SQLite table, indexes on session + timestamp, LIKE-based keyword matching on recall.

It worked. But keyword search has the obvious problem — it only finds what you literally asked for. "What are we building?" wouldn't surface a memory stored as "working on GovScout, a federal contracting app."

v0.2: Semantic Search via Local Embeddings

This week I shipped v0.2 with semantic search. The key decision: no external API, no managed vector database.

I'm running an RTX 5090 with Ollama locally. nomic-embed-text is already pulled. So the embedding call is a local HTTP request:

const response = await fetch('http://localhost:11434/api/embeddings', {
  method: 'POST',
  body: JSON.stringify({ model: 'nomic-embed-text', prompt: text })
});
const { embedding } = await response.json(); // 768-dim float array

On remember(), we embed the content and store the vector as JSON alongside the memory. On recall(), we embed the query, compute cosine similarity against every stored vector, and return the top-k by score.

private cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, magA = 0, magB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    magA += a[i] * a[i];
    magB += b[i] * b[i];
  }
  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}

No sqlite-vss extension. No pgvector. No Pinecone. Just math on JSON arrays.

For the scale Engram targets (one assistant, thousands of memories — not millions), this is plenty fast.

If Ollama is unreachable, it falls back to keyword search automatically. No crashes, no config required.

The Real Test: Does It Actually Work?

I'm running Engram as my own assistant's memory store right now. Every significant memory gets posted to a local API server (PM2, port 3470) alongside the markdown files I was already using.

First semantic query I ran:

curl "http://localhost:3470/memory/charli?query=what+projects+is+jeff+working+on&limit=5"

Result:

[
  { "content": "Jeff is building GovScout, a federal contracting app...", "similarity": 0.525 },
  { "content": "Engram v0.2 ships semantic search via nomic-embed-text...", "similarity": 0.396 }
]

"What projects is Jeff working on" surfaced the GovScout memory (0.53 similarity) over the Engram memory (0.40). No keyword overlap. Right answer.

The Architecture Behind It

Engram is part of a larger framework I'm calling the Cartisien Memory Suite:

@cartisien/engram — persistent memory (this)
@cartisien/extensa — vector infrastructure layer
@cartisien/cogito — agent identity and wake/sleep lifecycle

The framing comes from Descartes. Res cogitans (thinking substance) and res extensa (extended substance) — mind and body. Cogito is the agent's sense of self. Extensa is the vector layer it thinks through. Engram is where experience accumulates.

The thesis: agents need more than a context window. They need a substrate of self.

Install It

npm install @cartisien/engram

v0.2.0 is live. GitHub: github.com/Cartisien/engram

Still testing the semantic search in production before pushing to npm — watching for edge cases, checking Ollama timeout handling, making sure the cosine math holds up at scale.

If you're building agents and hitting the memory problem, I'd love to know what you're doing about it. The space is wide open.

Why AI Agents Forget Everything (And How to Fix It)

Jeff Witters — Wed, 11 Mar 2026 12:53:56 +0000

If you've built anything with AI agents, you've hit this wall.

Your agent has a great conversation. It learns the user's preferences, picks up context, starts feeling like it actually knows something. Then the session ends. Next time? Blank slate. It asks the same onboarding questions. It forgot the user hates dark mode. It forgot the decision you made last Tuesday.

This isn't a bug — it's how LLMs work. But it doesn't have to be how your agent works.

The Problem With "Just Use Context"

The first instinct is to dump everything into the context window. Just pass in the conversation history, right?

This breaks down fast:

Context windows are expensive. Sending 50k tokens of history every request adds up.
They have limits. Even 200k tokens isn't infinite — and most relevant history is older than that.
More context ≠ better recall. LLMs are famously bad at finding the needle in a haystack. Relevant information buried in a long context often gets missed.
They don't persist. Context is ephemeral by definition. When the session ends, it's gone.

What you need isn't more context. You need memory.

Memory vs. Context: What's the Difference?

Context is what the model can see right now. Memory is what the agent retains across sessions.

Real memory has properties that raw context doesn't:

Semantic retrieval — find related memories by meaning, not just keyword match
Importance weighting — not all information is equally worth remembering
Persistence — survives session resets
Agent-scoped — each agent has its own memory space

This is what we built @cartisien/engram for.

How Engram Works

Engram gives your agent a persistent memory store with semantic search. The API is intentionally simple:

import { Engram } from '@cartisien/engram'

const mem = new Engram({
  adapter: 'memory',
  agentId: 'my-agent',
})

await mem.wake()

// Store something worth remembering
await mem.store({
  content: 'The user prefers dark mode and works late at night',
  metadata: { source: 'observation', confidence: 0.9 },
  importance: 0.7,
})

// Later — semantic search, not keyword search
const results = await mem.search('user interface preferences', { limit: 5 })
results.forEach(({ memory, score }) => {
  console.log(score.toFixed(3), memory.content)
})

await mem.sleep()

The wake() / sleep() lifecycle mirrors how agents actually work — they come online, do work, and go dormant. Memory initializes on wake and persists on sleep.

The `importance` Field Actually Matters

One thing that separates this from just "storing strings in a database" is the importance score.

Not all memories are equal. "User mentioned they like coffee" is less important than "User said they're about to cancel their subscription." When you retrieve memories, importance influences what surfaces first.

This is closer to how human memory works — emotionally significant or practically important information is retained more reliably than background noise.

Multiple Adapters, Same API

adapter: 'memory'    // In-process, great for testing
adapter: 'sqlite'    // Local file, no server needed
adapter: 'postgres'  // Production scale with pgvector

Same Engram interface regardless of where you're storing. Swap adapters without changing your agent code.

Where This Fits in the Stack

Engram sits in the middle of the Cartisien memory stack:

Cogito  ←→  Engram  ←→  Extensa
identity    memory      vectors

Cogito handles agent identity and lifecycle. Extensa handles the vector infrastructure and embeddings layer. Engram is the bridge — the part your agent actually talks to.

You don't need the whole stack. Engram works standalone.

Install

npm install @cartisien/engram

Docs and source: github.com/cartisien/engram

If you're building agents that need to remember things across sessions, give it a try. And if you're hitting memory architecture questions that aren't covered here — drop them in the comments. This is a problem worth solving properly.

DEV Community: Jeff Witters

Give Your AI Agent Persistent Memory in 30 Seconds

The Problem with Agent Memory Today

How engram-mcp Works

Setup: Claude Desktop

Setup: Cursor / Windsurf

What It Looks Like in Practice

Local-First Is a Real Constraint

Semantic Search Without a Vector Database

Install

I Gave My AI Assistant Permanent Memory — Here's Exactly How

The Problem With Context Windows

What I Built

v0.1: Keyword Search

v0.2: Semantic Search via Local Embeddings

The Real Test: Does It Actually Work?

The Architecture Behind It

Install It

Why AI Agents Forget Everything (And How to Fix It)

The Problem With "Just Use Context"

Memory vs. Context: What's the Difference?

How Engram Works

The importance Field Actually Matters

Multiple Adapters, Same API

Where This Fits in the Stack

Install

The `importance` Field Actually Matters