DEV Community

Mnemosy
Mnemosy

Posted on

We Built the First AI Agent Memory System With Zero LLM Calls — Here's the Architecture

We Built the First AI Agent Memory System With Zero LLM Calls

Every AI memory system on the market makes the same architectural choice: send your text to an LLM for extraction before storing it.

Mem0 calls GPT-4o. Zep makes multiple async LLM calls. Cognee uses LLMs for knowledge extraction. Letta's entire memory engine is an LLM.

That means every single memory.store() costs ~$0.01, takes 500ms-2s, and produces non-deterministic results. At 100K memories/month, you're paying $1,000-3,000 just to remember things.

We asked: what if you didn't need an LLM at all?

The result is Mnemosyne — the first cognitive memory OS for AI agents with zero LLM calls in the entire ingestion pipeline. 33 features, 5 cognitive layers, $0 per memory stored. MIT licensed.

The Cost Table Nobody Wants You to See

System LLM Required? Cost per memory 100K memories/mo
Mnemosyne No $0.00 ~$60 (infra only)
Mem0 Yes (GPT-4o) ~$0.01 $1,000-3,000
Zep Yes (multiple calls) ~$0.01 $1,000-2,000
Cognee Yes (extraction) ~$0.01 $1,000-5,000
Letta/MemGPT Yes (core engine) ~$0.01 $1,000-5,000

This isn't a criticism of these projects — Mem0 has 41K stars and popularized this entire space. But the LLM-in-the-loop architecture has fundamental trade-offs that nobody talks about.

Three Problems With LLM-Powered Memory

1. Non-deterministic behavior

The same input can produce different extracted facts on different runs. Your memory system's behavior changes when the model updates. In production, you need memory that behaves consistently.

2. Latency floor

Every store() requires an LLM API call — 500ms to 2 seconds minimum. When your agent processes 100 memories per session, that's 50-200 seconds of just waiting.

3. Linear cost scaling

At $0.01 per memory, 1M memories = $10,000. Per month. With no efficiency gains at scale.

How We Eliminated Every LLM Call

Mnemosyne's 12-step ingestion pipeline is 100% algorithmic:

  1. Security Filter — blocks API keys, credentials, secrets (regex patterns)
  2. Embedding — local vectors via Ollama (nomic-embed-text)
  3. Dedup & Merge — cosine similarity ≥0.92 = duplicate
  4. Entity Extraction — people, IPs, technologies, dates (pattern matching)
  5. Type Classification — 7 types: episodic, semantic, procedural, preference, relationship, profile, core
  6. Urgency Detection — critical / important / reference / background
  7. Domain Classification — technical / personal / project / knowledge / general
  8. Priority Scoring — urgency × domain composite
  9. Confidence Rating — 3-signal composite with human-readable tiers
  10. Vector Storage — 23-field metadata per memory
  11. Auto-Linking — bidirectional links to related memories
  12. Mesh Broadcast — published to agent network

Total time: <50ms. LLM calls: 0. Cost: $0.

The trade-off is real: LLM-based extraction catches implicit relationships that rule-based extractors miss. But for the vast majority of agent memory — where entities are explicit and speed matters — the algorithmic approach dominates.

But It's Not Just a Vector Wrapper

This is where it gets interesting. Mnemosyne implements 10 cognitive capabilities that previously only existed in research papers:

🧠 Activation Decay

Memories fade over time following the Ebbinghaus forgetting curve. Critical memories survive months. Background observations fade in hours. Procedural memories (like runbooks) never decay — just like how you never forget how to ride a bike.

⚡ Flash Reasoning

Query "why did auth crash?" and get the full chain:

Config changed → JWT expiry shortened → Session storm → Rollback fixed it
Enter fullscreen mode Exit fullscreen mode

One recall. Complete narrative. BFS through linked memory graphs.

🤝 Theory of Mind for Agents

Agent A can ask "what does Agent B know about the database?" without talking to Agent B. Modeled after developmental psychology research (Baron-Cohen, 1985).

📈 Reinforcement Learning

Memories that consistently help → auto-promoted to permanent. Bad memories → flagged. Your memory system gets smarter through use, not manual curation.

🔗 Knowledge Graph (Built-in, Free)

Temporal entity graph with auto-linking, path finding, and timeline reconstruction. Mem0 charges $249/month for their knowledge graph. Ours ships with the MIT license.

🌐 Multi-Agent Mesh

When 3+ agents independently confirm the same fact, it's automatically promoted to "Mesh Fact" — the highest confidence tier. Real distributed consensus for AI knowledge.

33 Features, 5 Layers

L5  SELF-IMPROVEMENT
    Reinforcement · Consolidation · Flash Reasoning · ToMA · Synthesis

L4  COGNITIVE
    Activation Decay · Confidence · Priority · Diversity Reranking

L3  KNOWLEDGE GRAPH
    Temporal Graph · Auto-Linking · Path Traversal · Entity Extraction

L2  PIPELINE
    Security Filter · Classify · Dedup · Merge · 12-step Ingestion

L1  INFRASTRUCTURE
    Qdrant · FalkorDB · Redis Cache · Redis Pub/Sub
Enter fullscreen mode Exit fullscreen mode

Every feature is independently toggleable. Start with just Qdrant, progressively enable as needed.

Running in Production Right Now

This isn't a demo. It's running on a 10-machine AI agent mesh:

  • 13,000+ memories stored
  • <50ms ingestion (full 12-step pipeline)
  • <200ms recall (multi-signal ranked, graph-enriched)
  • >60% cache hit rate in conversational workloads
  • 10 agents collaborating with shared memory

Quick Start (2 minutes)

npm install mnemosy-ai
docker run -d -p 6333:6333 qdrant/qdrant
Enter fullscreen mode Exit fullscreen mode
import { createMnemosyne } from 'mnemosy-ai'

const m = await createMnemosyne({
  vectorDbUrl: 'http://localhost:6333',
  embeddingUrl: 'http://localhost:11434/v1/embeddings',
  agentId: 'my-agent'
})

// Full 12-step pipeline, <50ms, $0
await m.store({ text: "User prefers dark mode and TypeScript" })

// Multi-signal ranked recall
const memories = await m.recall({ query: "user preferences" })

// Memories learn from feedback
await m.feedback("positive")
Enter fullscreen mode Exit fullscreen mode

Only hard requirement: Qdrant. Redis and FalkorDB are optional power-ups.

Honest Trade-offs

We're not claiming Mnemosyne is better at everything:

  • Mem0 has 41K stars, a great community, and production-hardened cloud offering
  • Cognee builds richer knowledge graphs via LLM extraction
  • Letta's LLM-directed memory management is genuinely innovative

Mnemosyne fills a specific niche: if you need cognitive intelligence + multi-agent collaboration + zero-LLM economics + self-improving memory — all in one open-source system — this is currently the only option that exists.

The Bet

We're betting that the future of AI memory is deterministic, local-first, and free at the point of storage. That cognitive capabilities don't require sending every memory through GPT-4. That you can build a brain without renting someone else's.

13,000 memories and counting. Zero LLM calls. The math speaks for itself.


GitHub: github.com/28naem-del/mnemosyne
npm: npm install mnemosy-ai
Website: mnemosy.ai
Discord: discord.gg/Sp6ZXD3X
License: MIT

Mnemosyne — Because intelligence without memory isn't intelligence.

Top comments (0)