Mnemosy

Posted on Feb 24

We Built the First AI Agent Memory System With Zero LLM Calls — Here's the Architecture

#ai #typescript #webdev #opensource

We Built the First AI Agent Memory System With Zero LLM Calls

Every AI memory system on the market makes the same architectural choice: send your text to an LLM for extraction before storing it.

Mem0 calls GPT-4o. Zep makes multiple async LLM calls. Cognee uses LLMs for knowledge extraction. Letta's entire memory engine is an LLM.

That means every single memory.store() costs ~$0.01, takes 500ms-2s, and produces non-deterministic results. At 100K memories/month, you're paying $1,000-3,000 just to remember things.

We asked: what if you didn't need an LLM at all?

The result is Mnemosyne — the first cognitive memory OS for AI agents with zero LLM calls in the entire ingestion pipeline. 33 features, 5 cognitive layers, $0 per memory stored. MIT licensed.

The Cost Table Nobody Wants You to See

System	LLM Required?	Cost per memory	100K memories/mo
Mnemosyne	No	$0.00	~$60 (infra only)
Mem0	Yes (GPT-4o)	~$0.01	$1,000-3,000
Zep	Yes (multiple calls)	~$0.01	$1,000-2,000
Cognee	Yes (extraction)	~$0.01	$1,000-5,000
Letta/MemGPT	Yes (core engine)	~$0.01	$1,000-5,000

This isn't a criticism of these projects — Mem0 has 41K stars and popularized this entire space. But the LLM-in-the-loop architecture has fundamental trade-offs that nobody talks about.

Three Problems With LLM-Powered Memory

1. Non-deterministic behavior

The same input can produce different extracted facts on different runs. Your memory system's behavior changes when the model updates. In production, you need memory that behaves consistently.

2. Latency floor

Every store() requires an LLM API call — 500ms to 2 seconds minimum. When your agent processes 100 memories per session, that's 50-200 seconds of just waiting.

3. Linear cost scaling

At $0.01 per memory, 1M memories = $10,000. Per month. With no efficiency gains at scale.

How We Eliminated Every LLM Call

Mnemosyne's 12-step ingestion pipeline is 100% algorithmic:

Security Filter — blocks API keys, credentials, secrets (regex patterns)
Embedding — local vectors via Ollama (nomic-embed-text)
Dedup & Merge — cosine similarity ≥0.92 = duplicate
Entity Extraction — people, IPs, technologies, dates (pattern matching)
Type Classification — 7 types: episodic, semantic, procedural, preference, relationship, profile, core
Urgency Detection — critical / important / reference / background
Domain Classification — technical / personal / project / knowledge / general
Priority Scoring — urgency × domain composite
Confidence Rating — 3-signal composite with human-readable tiers
Vector Storage — 23-field metadata per memory
Auto-Linking — bidirectional links to related memories
Mesh Broadcast — published to agent network

Total time: <50ms. LLM calls: 0. Cost: $0.

The trade-off is real: LLM-based extraction catches implicit relationships that rule-based extractors miss. But for the vast majority of agent memory — where entities are explicit and speed matters — the algorithmic approach dominates.

But It's Not Just a Vector Wrapper

This is where it gets interesting. Mnemosyne implements 10 cognitive capabilities that previously only existed in research papers:

🧠 Activation Decay

Memories fade over time following the Ebbinghaus forgetting curve. Critical memories survive months. Background observations fade in hours. Procedural memories (like runbooks) never decay — just like how you never forget how to ride a bike.

⚡ Flash Reasoning

Query "why did auth crash?" and get the full chain:

Config changed → JWT expiry shortened → Session storm → Rollback fixed it

One recall. Complete narrative. BFS through linked memory graphs.

🤝 Theory of Mind for Agents

Agent A can ask "what does Agent B know about the database?" without talking to Agent B. Modeled after developmental psychology research (Baron-Cohen, 1985).

📈 Reinforcement Learning

Memories that consistently help → auto-promoted to permanent. Bad memories → flagged. Your memory system gets smarter through use, not manual curation.

🔗 Knowledge Graph (Built-in, Free)

Temporal entity graph with auto-linking, path finding, and timeline reconstruction. Mem0 charges $249/month for their knowledge graph. Ours ships with the MIT license.

🌐 Multi-Agent Mesh

When 3+ agents independently confirm the same fact, it's automatically promoted to "Mesh Fact" — the highest confidence tier. Real distributed consensus for AI knowledge.

33 Features, 5 Layers

L5  SELF-IMPROVEMENT
    Reinforcement · Consolidation · Flash Reasoning · ToMA · Synthesis

L4  COGNITIVE
    Activation Decay · Confidence · Priority · Diversity Reranking

L3  KNOWLEDGE GRAPH
    Temporal Graph · Auto-Linking · Path Traversal · Entity Extraction

L2  PIPELINE
    Security Filter · Classify · Dedup · Merge · 12-step Ingestion

L1  INFRASTRUCTURE
    Qdrant · FalkorDB · Redis Cache · Redis Pub/Sub

Every feature is independently toggleable. Start with just Qdrant, progressively enable as needed.

Running in Production Right Now

This isn't a demo. It's running on a 10-machine AI agent mesh:

13,000+ memories stored
<50ms ingestion (full 12-step pipeline)
<200ms recall (multi-signal ranked, graph-enriched)
>60% cache hit rate in conversational workloads
10 agents collaborating with shared memory

Quick Start (2 minutes)

npm install mnemosy-ai
docker run -d -p 6333:6333 qdrant/qdrant

import { createMnemosyne } from 'mnemosy-ai'

const m = await createMnemosyne({
  vectorDbUrl: 'http://localhost:6333',
  embeddingUrl: 'http://localhost:11434/v1/embeddings',
  agentId: 'my-agent'
})

// Full 12-step pipeline, <50ms, $0
await m.store({ text: "User prefers dark mode and TypeScript" })

// Multi-signal ranked recall
const memories = await m.recall({ query: "user preferences" })

// Memories learn from feedback
await m.feedback("positive")

Only hard requirement: Qdrant. Redis and FalkorDB are optional power-ups.

Honest Trade-offs

We're not claiming Mnemosyne is better at everything:

Mem0 has 41K stars, a great community, and production-hardened cloud offering
Cognee builds richer knowledge graphs via LLM extraction
Letta's LLM-directed memory management is genuinely innovative

Mnemosyne fills a specific niche: if you need cognitive intelligence + multi-agent collaboration + zero-LLM economics + self-improving memory — all in one open-source system — this is currently the only option that exists.

The Bet

We're betting that the future of AI memory is deterministic, local-first, and free at the point of storage. That cognitive capabilities don't require sending every memory through GPT-4. That you can build a brain without renting someone else's.

13,000 memories and counting. Zero LLM calls. The math speaks for itself.

GitHub: github.com/28naem-del/mnemosyne
npm: npm install mnemosy-ai
Website: mnemosy.ai
Discord: discord.gg/Sp6ZXD3X
License: MIT

Mnemosyne — Because intelligence without memory isn't intelligence.

DEV Community