We Built the First AI Agent Memory System With Zero LLM Calls
Every AI memory system on the market makes the same architectural choice: send your text to an LLM for extraction before storing it.
Mem0 calls GPT-4o. Zep makes multiple async LLM calls. Cognee uses LLMs for knowledge extraction. Letta's entire memory engine is an LLM.
That means every single memory.store() costs ~$0.01, takes 500ms-2s, and produces non-deterministic results. At 100K memories/month, you're paying $1,000-3,000 just to remember things.
We asked: what if you didn't need an LLM at all?
The result is Mnemosyne — the first cognitive memory OS for AI agents with zero LLM calls in the entire ingestion pipeline. 33 features, 5 cognitive layers, $0 per memory stored. MIT licensed.
The Cost Table Nobody Wants You to See
| System | LLM Required? | Cost per memory | 100K memories/mo |
|---|---|---|---|
| Mnemosyne | No | $0.00 | ~$60 (infra only) |
| Mem0 | Yes (GPT-4o) | ~$0.01 | $1,000-3,000 |
| Zep | Yes (multiple calls) | ~$0.01 | $1,000-2,000 |
| Cognee | Yes (extraction) | ~$0.01 | $1,000-5,000 |
| Letta/MemGPT | Yes (core engine) | ~$0.01 | $1,000-5,000 |
This isn't a criticism of these projects — Mem0 has 41K stars and popularized this entire space. But the LLM-in-the-loop architecture has fundamental trade-offs that nobody talks about.
Three Problems With LLM-Powered Memory
1. Non-deterministic behavior
The same input can produce different extracted facts on different runs. Your memory system's behavior changes when the model updates. In production, you need memory that behaves consistently.
2. Latency floor
Every store() requires an LLM API call — 500ms to 2 seconds minimum. When your agent processes 100 memories per session, that's 50-200 seconds of just waiting.
3. Linear cost scaling
At $0.01 per memory, 1M memories = $10,000. Per month. With no efficiency gains at scale.
How We Eliminated Every LLM Call
Mnemosyne's 12-step ingestion pipeline is 100% algorithmic:
- Security Filter — blocks API keys, credentials, secrets (regex patterns)
- Embedding — local vectors via Ollama (nomic-embed-text)
- Dedup & Merge — cosine similarity ≥0.92 = duplicate
- Entity Extraction — people, IPs, technologies, dates (pattern matching)
- Type Classification — 7 types: episodic, semantic, procedural, preference, relationship, profile, core
- Urgency Detection — critical / important / reference / background
- Domain Classification — technical / personal / project / knowledge / general
- Priority Scoring — urgency × domain composite
- Confidence Rating — 3-signal composite with human-readable tiers
- Vector Storage — 23-field metadata per memory
- Auto-Linking — bidirectional links to related memories
- Mesh Broadcast — published to agent network
Total time: <50ms. LLM calls: 0. Cost: $0.
The trade-off is real: LLM-based extraction catches implicit relationships that rule-based extractors miss. But for the vast majority of agent memory — where entities are explicit and speed matters — the algorithmic approach dominates.
But It's Not Just a Vector Wrapper
This is where it gets interesting. Mnemosyne implements 10 cognitive capabilities that previously only existed in research papers:
🧠 Activation Decay
Memories fade over time following the Ebbinghaus forgetting curve. Critical memories survive months. Background observations fade in hours. Procedural memories (like runbooks) never decay — just like how you never forget how to ride a bike.
⚡ Flash Reasoning
Query "why did auth crash?" and get the full chain:
Config changed → JWT expiry shortened → Session storm → Rollback fixed it
One recall. Complete narrative. BFS through linked memory graphs.
🤝 Theory of Mind for Agents
Agent A can ask "what does Agent B know about the database?" without talking to Agent B. Modeled after developmental psychology research (Baron-Cohen, 1985).
📈 Reinforcement Learning
Memories that consistently help → auto-promoted to permanent. Bad memories → flagged. Your memory system gets smarter through use, not manual curation.
🔗 Knowledge Graph (Built-in, Free)
Temporal entity graph with auto-linking, path finding, and timeline reconstruction. Mem0 charges $249/month for their knowledge graph. Ours ships with the MIT license.
🌐 Multi-Agent Mesh
When 3+ agents independently confirm the same fact, it's automatically promoted to "Mesh Fact" — the highest confidence tier. Real distributed consensus for AI knowledge.
33 Features, 5 Layers
L5 SELF-IMPROVEMENT
Reinforcement · Consolidation · Flash Reasoning · ToMA · Synthesis
L4 COGNITIVE
Activation Decay · Confidence · Priority · Diversity Reranking
L3 KNOWLEDGE GRAPH
Temporal Graph · Auto-Linking · Path Traversal · Entity Extraction
L2 PIPELINE
Security Filter · Classify · Dedup · Merge · 12-step Ingestion
L1 INFRASTRUCTURE
Qdrant · FalkorDB · Redis Cache · Redis Pub/Sub
Every feature is independently toggleable. Start with just Qdrant, progressively enable as needed.
Running in Production Right Now
This isn't a demo. It's running on a 10-machine AI agent mesh:
- 13,000+ memories stored
- <50ms ingestion (full 12-step pipeline)
- <200ms recall (multi-signal ranked, graph-enriched)
- >60% cache hit rate in conversational workloads
- 10 agents collaborating with shared memory
Quick Start (2 minutes)
npm install mnemosy-ai
docker run -d -p 6333:6333 qdrant/qdrant
import { createMnemosyne } from 'mnemosy-ai'
const m = await createMnemosyne({
vectorDbUrl: 'http://localhost:6333',
embeddingUrl: 'http://localhost:11434/v1/embeddings',
agentId: 'my-agent'
})
// Full 12-step pipeline, <50ms, $0
await m.store({ text: "User prefers dark mode and TypeScript" })
// Multi-signal ranked recall
const memories = await m.recall({ query: "user preferences" })
// Memories learn from feedback
await m.feedback("positive")
Only hard requirement: Qdrant. Redis and FalkorDB are optional power-ups.
Honest Trade-offs
We're not claiming Mnemosyne is better at everything:
- Mem0 has 41K stars, a great community, and production-hardened cloud offering
- Cognee builds richer knowledge graphs via LLM extraction
- Letta's LLM-directed memory management is genuinely innovative
Mnemosyne fills a specific niche: if you need cognitive intelligence + multi-agent collaboration + zero-LLM economics + self-improving memory — all in one open-source system — this is currently the only option that exists.
The Bet
We're betting that the future of AI memory is deterministic, local-first, and free at the point of storage. That cognitive capabilities don't require sending every memory through GPT-4. That you can build a brain without renting someone else's.
13,000 memories and counting. Zero LLM calls. The math speaks for itself.
GitHub: github.com/28naem-del/mnemosyne
npm: npm install mnemosy-ai
Website: mnemosy.ai
Discord: discord.gg/Sp6ZXD3X
License: MIT
Mnemosyne — Because intelligence without memory isn't intelligence.
Top comments (0)