Why AI Agents Need Brains, Not Just Vector Databases
Every AI agent shipping today has a fundamental problem: amnesia.
Load up any agent framework — LangChain, CrewAI, AutoGen, custom builds — and start a conversation. Ask it about your project. It knows nothing. Give it context across 50 turns. Then watch the context window compact. It knows nothing again.
This isn't a minor UX issue. It's the single biggest bottleneck to autonomous AI. Agents can't learn from mistakes if they don't remember making them. They can't build expertise if every session starts from scratch. They can't collaborate if they can't share what they know.
The industry's response has been to wrap vector databases with LLM-powered extraction layers. Send text to GPT-4, extract key facts, store as vectors, retrieve by similarity. Systems like Mem0, Zep, Cognee, and Letta have raised ~$47M combined doing variations of this approach.
It works for demos. It doesn't work for production. Here's why.
The Problem with LLM-in-the-Loop Memory
When you put an LLM in your memory ingestion pipeline, you inherit three structural problems:
1. Non-deterministic behavior. The same input can produce different extracted facts on different runs. Your memory system's behavior changes when the model version changes, when the prompt drifts, when the temperature fluctuates. In production, you need memory that behaves consistently.
2. Latency floor. Every memory store operation requires an LLM API call — 500ms to 2 seconds minimum. When your agent processes 100 memories per session, that's 50-200 seconds of just waiting for extraction. For real-time agent interactions, this is unacceptable.
3. Linear cost scaling. At approximately $0.01 per memory, storing 100K memories costs $1,000. A million memories costs $10,000. Per month. This scales linearly with no efficiency gains. For production systems processing tens of thousands of interactions daily, the economics are brutal.
These aren't implementation bugs. They're architectural consequences of the LLM-in-the-loop design.
What If Memory Worked Like a Brain?
We spent months running a 10-machine AI agent mesh — 10 agents collaborating on real tasks, 13,000+ memories accumulated, sub-200ms retrieval requirements. The vector-store-plus-LLM approach broke down immediately. We needed something fundamentally different.
So we built Mnemosyne: a 5-layer cognitive memory operating system for AI agents. Not another vector wrapper. An actual memory architecture inspired by how biological memory systems work — from the neural substrate up to metacognition.
+----------------------------------------------------------------------+
| MNEMOSYNE COGNITIVE OS |
| |
| L5 SELF-IMPROVEMENT |
| [ Reinforcement ] [ Consolidation ] [ Flash Reasoning ] [ ToMA ] |
| |
| L4 COGNITIVE |
| [ Activation Decay ] [ Confidence ] [ Priority ] [ Diversity ] |
| |
| L3 KNOWLEDGE GRAPH |
| [ Temporal Graph ] [ Auto-Linking ] [ Path Traversal ] [ Entities ] |
| |
| L2 PIPELINE |
| [ Extraction ] [ Classify ] [ Dedup & Merge ] [ Security Filter ] |
| |
| L1 INFRASTRUCTURE |
| [ Qdrant ] [ FalkorDB ] [ Redis Cache ] [ Redis Pub/Sub ] |
+----------------------------------------------------------------------+
33 features across 5 layers. Every feature independently toggleable. MIT licensed. TypeScript.
Zero LLM Calls: The Core Design Bet
The most controversial architectural decision in Mnemosyne: the entire ingestion pipeline runs without any LLM calls.
Every memory passes through a deterministic 12-step pipeline:
- Security Filter — 3-tier classification blocks API keys, credentials, private keys
- Embedding — 768-dim vectors via any OpenAI-compatible endpoint
- Dedup & Merge — Cosine ≥0.92 = duplicate (merge). 0.70–0.92 = conflict (alert).
- Entity Extraction — People, IPs, technologies, dates, URLs. Algorithmic, not LLM.
- Type Classification — 7 types: episodic, semantic, preference, procedural, relationship, profile, core
- Urgency Detection — 4 levels: critical, important, reference, background
- Domain Classification — 5 domains: technical, personal, project, knowledge, general
- Priority Scoring — Urgency × domain composite (0.0–1.0)
- Confidence Rating — 3-signal composite with 4 human-readable tiers
- Vector Storage — Written to appropriate collection with 23-field metadata
- Auto-Linking — Bidirectional links to related memories (Zettelkasten-style)
- Broadcast — Published to agent mesh via typed channels
Total time: <50ms. LLM calls: 0. Cost: $0.
import { createMnemosyne } from 'mnemosy-ai'
const m = await createMnemosyne({
vectorDbUrl: 'http://localhost:6333',
embeddingUrl: 'http://localhost:11434/v1/embeddings',
agentId: 'my-agent'
})
// Full 12-step pipeline, <50ms, $0
await m.store({ text: "CRITICAL: Auth service JWT expiry changed from 1hr to 30min" })
// -> type: semantic, urgency: critical, domain: technical
// -> priority: 1.0, entities: [Auth service, JWT, 1hr, 30min]
// -> auto-linked to 2 existing JWT memories
// -> broadcast to agent mesh with critical priority
The trade-off is real: LLM-based extraction catches implicit relationships and nuanced semantic structure that algorithmic extraction misses. Cognee's LLM-powered graph construction builds richer knowledge graphs for document corpora. But for the vast majority of agent memory operations — where entities are explicit, facts are stated directly, and you need speed, consistency, and zero cost — the algorithmic approach dominates.
Cognitive Features That Only Exist in Papers
Here's where it gets interesting. Beyond the pipeline, Mnemosyne implements 10 capabilities that previously existed only in academic research:
Activation Decay
Memories fade over time following a logarithmic model inspired by the Ebbinghaus forgetting curve. Critical memories stay active for months. Background observations fade within hours. Procedural memories (runbooks, deployment steps) are immune to decay — like how you never forget how to ride a bike.
// Critical memory: stays active for months
await m.store({ text: "CRITICAL: Never deploy to prod on Fridays" })
// -> decay rate: 0.3, baseline: +2.0
// Background memory: fades within hours
await m.store({ text: "User mentioned they had coffee this morning" })
// -> decay rate: 0.8, baseline: -1.0
// Procedural memory: immune to decay forever
await m.store({ text: "To deploy: 1) Run tests 2) Build 3) Push 4) Apply" })
// -> type: procedural, activation: permanent
Multi-Signal Scoring with Intent Detection
Every recall query is scored across 5 independent signals — not just cosine similarity:
| Signal | Weight | What it measures |
|---|---|---|
| Semantic Similarity | 35% | Vector distance |
| Temporal Recency | 20% | Time since last access |
| Importance × Confidence | 20% | Priority score × confidence |
| Access Frequency | 15% | How often retrieved (log scale) |
| Type Relevance | 10% | Memory type vs. query intent |
Mnemosyne auto-detects 5 query intents (factual, temporal, procedural, preference, exploratory) and dynamically adjusts these weights. A temporal query ("what happened recently?") boosts recency to 35%. A procedural query ("how do I deploy?") boosts frequency and type relevance.
Flash Reasoning
BFS traversal through linked memory graphs that reconstructs multi-step logic chains:
const results = await m.recall({ query: "why did auth service crash?" })
// Primary: "Auth service crashed after config update"
// Chain: -> (because) "Config changed JWT expiry from 1hr to 30min"
// -> (leads_to) "Short-lived tokens caused session storm"
// -> (therefore) "Rollback to 1hr expiry resolved the issue"
Your agent gets the complete narrative from a single recall.
Theory of Mind for Agents
In a multi-agent mesh, any agent can model what other agents know:
// What does the DevOps agent know about the production database?
const knowledge = await m.toma("devops-agent", "production database")
// Knowledge gap analysis
const gap = await m.knowledgeGap("frontend-agent", "backend-agent", "API contracts")
// -> { onlyFrontendKnows: [...], onlyBackendKnows: [...], bothKnow: [...] }
This concept comes from developmental psychology (Baron-Cohen, 1985) and multi-agent systems research (Gmytrasiewicz & Doshi, 2005). It has never been deployed as production infrastructure until now.
Cross-Agent Synthesis
When 3+ agents independently store corroborating memories about the same fact, it's automatically promoted to "Mesh Fact" — the highest confidence tier. Independent corroboration from separate agents operating in different contexts is strong evidence of factual accuracy.
Reinforcement Learning on Memory
Feedback closes the loop. Memories that consistently prove useful are promoted to core status (immune to decay). Memories that consistently mislead are flagged for review. Over time, retrieval quality improves without manual curation.
await m.recall({ query: "database config" })
// Agent uses the result successfully...
await m.feedback("positive")
// After 3+ retrievals with >70% positive ratio → auto-promoted to core
The Knowledge Graph: Built-In, Free, Temporal
Mnemosyne includes a temporal knowledge graph powered by FalkorDB. Every entity extracted from memories becomes a graph node. Relationships carry timestamps. The graph grows automatically as memories are stored.
- Auto-linking: Related memories are bidirectionally connected (Zettelkasten-style)
- Path finding: "How is Alice related to PostgreSQL?" → Alice → deployed auth service → auth service uses → PostgreSQL
- Timeline reconstruction: Chronological history of everything known about any entity
- Temporal queries: "What was server-1 connected to as of January 15th?"
Mem0 charges $249/month for their knowledge graph. Mnemosyne's ships with the MIT license.
Cost Comparison at Scale
| Memories/month | Mnemosyne | Mem0 | Zep | Cognee | Letta |
|---|---|---|---|---|---|
| 10K | ~$30 | ~$130-330 | ~$70-220 | ~$140-540 | ~$130-530 |
| 100K | ~$60 | ~$1K-3K | ~$1K-2K | ~$1K-5K | ~$1K-5K |
| 1M | ~$250 | ~$10K-30K | — | ~$10K-50K | ~$10K-50K |
The difference is entirely the per-memory LLM processing cost that Mnemosyne eliminates. Infrastructure costs (Qdrant, Redis, FalkorDB) are roughly equivalent across all systems.
Feature Count Comparison
| System | Features | Knowledge Graph | Multi-Agent | Self-Improving | Cost/Memory |
|---|---|---|---|---|---|
| Mnemosyne | 33 | Free (built-in) | Full mesh | Yes (RL + consolidation) | $0 |
| Mem0 | ~5 | $249/mo | Enterprise only | No | ~$0.01 |
| Zep | ~3 | None | None | No | ~$0.01 |
| Cognee | ~5 | Built-in | None | No | ~$0.01 |
| LangMem | ~0 | None | None | No | ~$0.01 |
| Letta | ~4 | None | Basic | No | ~$0.01 |
Getting Started
npm install mnemosy-ai
docker run -d -p 6333:6333 qdrant/qdrant # Only hard requirement
import { createMnemosyne } from 'mnemosy-ai'
const m = await createMnemosyne({
vectorDbUrl: 'http://localhost:6333',
embeddingUrl: 'http://localhost:11434/v1/embeddings',
agentId: 'my-agent'
})
await m.store({ text: "User prefers TypeScript and dark mode" })
const memories = await m.recall({ query: "user preferences" })
await m.feedback("positive")
Start with just Qdrant (vector-only mode). Add FalkorDB for the knowledge graph. Add Redis for multi-agent mesh. Every feature is independently toggleable — adopt progressively.
What We Didn't Build
To be honest about scope: Mnemosyne doesn't have a managed cloud offering (you run your own infra). It's TypeScript-only (the AI/ML ecosystem is mostly Python). It doesn't have 41K GitHub stars (Mem0 earned those). And its algorithmic entity extraction won't catch the implicit relationships that Cognee's LLM-powered extraction finds.
These are real trade-offs. Mnemosyne is purpose-built for teams that need cognitive intelligence, multi-agent collaboration, zero-LLM economics, and self-improving memory — and are willing to run their own infrastructure in exchange.
Try It
- GitHub: github.com/28naem-del/mnemosyne
-
npm:
npm install mnemosy-ai - Website: mnemosy.ai
- Discord: discord.gg/Sp6ZXD3X
- License: MIT
33 features. 5 cognitive layers. $0 per memory stored. The brain your agents are missing.
Mnemosyne — Because intelligence without memory isn't intelligence.
Top comments (0)