I Gave My AI Assistant Permanent Memory — Here's Exactly How

#ai #typescript #agents #memory

My AI assistant woke up every morning with no idea who I was.

I'd been running the same assistant for months. It knew my stack, my projects, my preferences — but only within a session. The next day? Blank slate. Every conversation started with context-dumping. "Here's what we're building. Here's where we left off. Here's what matters."

I got tired of it. So I built the thing that fixes it.

The Problem With Context Windows

Most people solve AI memory by stuffing everything into the system prompt. Project docs, previous decisions, preferences — all of it, every session.

This works until it doesn't. Context windows have limits. More importantly, not all memory is equal. You don't need to know everything — you need the right things at the right time.

That's a retrieval problem, not a storage problem.

What I Built

@cartisien/engram — persistent, queryable memory for AI assistants. SQLite-backed, TypeScript-first, zero config.

The core API is intentionally simple:

import { Engram } from '@cartisien/engram';

const memory = new Engram({ dbPath: './assistant.db' });

// Store something
await memory.remember(sessionId, 'User is building a federal contracting app in React 19', 'user');

// Retrieve what's relevant
const context = await memory.recall(sessionId, 'what are we building?', 5);

That's it. Drop it into any agent loop, any chat handler, any LLM integration.

v0.1: Keyword Search

The first version was straightforward. SQLite table, indexes on session + timestamp, LIKE-based keyword matching on recall.

It worked. But keyword search has the obvious problem — it only finds what you literally asked for. "What are we building?" wouldn't surface a memory stored as "working on GovScout, a federal contracting app."

v0.2: Semantic Search via Local Embeddings

This week I shipped v0.2 with semantic search. The key decision: no external API, no managed vector database.

I'm running an RTX 5090 with Ollama locally. nomic-embed-text is already pulled. So the embedding call is a local HTTP request:

const response = await fetch('http://localhost:11434/api/embeddings', {
  method: 'POST',
  body: JSON.stringify({ model: 'nomic-embed-text', prompt: text })
});
const { embedding } = await response.json(); // 768-dim float array

On remember(), we embed the content and store the vector as JSON alongside the memory. On recall(), we embed the query, compute cosine similarity against every stored vector, and return the top-k by score.

private cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, magA = 0, magB = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    magA += a[i] * a[i];
    magB += b[i] * b[i];
  }
  return dot / (Math.sqrt(magA) * Math.sqrt(magB));
}

No sqlite-vss extension. No pgvector. No Pinecone. Just math on JSON arrays.

For the scale Engram targets (one assistant, thousands of memories — not millions), this is plenty fast.

If Ollama is unreachable, it falls back to keyword search automatically. No crashes, no config required.

The Real Test: Does It Actually Work?

I'm running Engram as my own assistant's memory store right now. Every significant memory gets posted to a local API server (PM2, port 3470) alongside the markdown files I was already using.

First semantic query I ran:

curl "http://localhost:3470/memory/charli?query=what+projects+is+jeff+working+on&limit=5"

Result:

[
  { "content": "Jeff is building GovScout, a federal contracting app...", "similarity": 0.525 },
  { "content": "Engram v0.2 ships semantic search via nomic-embed-text...", "similarity": 0.396 }
]

"What projects is Jeff working on" surfaced the GovScout memory (0.53 similarity) over the Engram memory (0.40). No keyword overlap. Right answer.

The Architecture Behind It

Engram is part of a larger framework I'm calling the Cartisien Memory Suite:

@cartisien/engram — persistent memory (this)
@cartisien/extensa — vector infrastructure layer
@cartisien/cogito — agent identity and wake/sleep lifecycle

The framing comes from Descartes. Res cogitans (thinking substance) and res extensa (extended substance) — mind and body. Cogito is the agent's sense of self. Extensa is the vector layer it thinks through. Engram is where experience accumulates.

The thesis: agents need more than a context window. They need a substrate of self.

Install It

npm install @cartisien/engram

v0.2.0 is live. GitHub: github.com/Cartisien/engram

Still testing the semantic search in production before pushing to npm — watching for edge cases, checking Ollama timeout handling, making sure the cosine math holds up at scale.

If you're building agents and hitting the memory problem, I'd love to know what you're doing about it. The space is wide open.