<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Mnemosy </title>
    <description>The latest articles on DEV Community by Mnemosy  (@mnemosybrain).</description>
    <link>https://dev.to/mnemosybrain</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3790381%2F1b47276f-b7c4-47ab-9c2f-d445d18b8c67.png</url>
      <title>DEV Community: Mnemosy </title>
      <link>https://dev.to/mnemosybrain</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mnemosybrain"/>
    <language>en</language>
    <item>
      <title>We Built the First AI Agent Memory System With Zero LLM Calls — Here's the Architecture</title>
      <dc:creator>Mnemosy </dc:creator>
      <pubDate>Tue, 24 Feb 2026 22:04:50 +0000</pubDate>
      <link>https://dev.to/mnemosybrain/we-built-the-first-ai-agent-memory-system-with-zero-llm-calls-heres-the-architecture-5hgc</link>
      <guid>https://dev.to/mnemosybrain/we-built-the-first-ai-agent-memory-system-with-zero-llm-calls-heres-the-architecture-5hgc</guid>
      <description>&lt;h1&gt;
  
  
  Why AI Agents Need Brains, Not Just Vector Databases
&lt;/h1&gt;

&lt;p&gt;Every AI agent shipping today has a fundamental problem: amnesia.&lt;/p&gt;

&lt;p&gt;Load up any agent framework — LangChain, CrewAI, AutoGen, custom builds — and start a conversation. Ask it about your project. It knows nothing. Give it context across 50 turns. Then watch the context window compact. It knows nothing again.&lt;/p&gt;

&lt;p&gt;This isn't a minor UX issue. It's the single biggest bottleneck to autonomous AI. Agents can't learn from mistakes if they don't remember making them. They can't build expertise if every session starts from scratch. They can't collaborate if they can't share what they know.&lt;/p&gt;

&lt;p&gt;The industry's response has been to wrap vector databases with LLM-powered extraction layers. Send text to GPT-4, extract key facts, store as vectors, retrieve by similarity. Systems like Mem0, Zep, Cognee, and Letta have raised ~$47M combined doing variations of this approach.&lt;/p&gt;

&lt;p&gt;It works for demos. It doesn't work for production. Here's why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem with LLM-in-the-Loop Memory
&lt;/h2&gt;

&lt;p&gt;When you put an LLM in your memory ingestion pipeline, you inherit three structural problems:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Non-deterministic behavior.&lt;/strong&gt; The same input can produce different extracted facts on different runs. Your memory system's behavior changes when the model version changes, when the prompt drifts, when the temperature fluctuates. In production, you need memory that behaves consistently.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Latency floor.&lt;/strong&gt; Every memory store operation requires an LLM API call — 500ms to 2 seconds minimum. When your agent processes 100 memories per session, that's 50-200 seconds of just waiting for extraction. For real-time agent interactions, this is unacceptable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Linear cost scaling.&lt;/strong&gt; At approximately $0.01 per memory, storing 100K memories costs $1,000. A million memories costs $10,000. Per month. This scales linearly with no efficiency gains. For production systems processing tens of thousands of interactions daily, the economics are brutal.&lt;/p&gt;

&lt;p&gt;These aren't implementation bugs. They're architectural consequences of the LLM-in-the-loop design.&lt;/p&gt;

&lt;h2&gt;
  
  
  What If Memory Worked Like a Brain?
&lt;/h2&gt;

&lt;p&gt;We spent months running a 10-machine AI agent mesh — 10 agents collaborating on real tasks, 13,000+ memories accumulated, sub-200ms retrieval requirements. The vector-store-plus-LLM approach broke down immediately. We needed something fundamentally different.&lt;/p&gt;

&lt;p&gt;So we built &lt;strong&gt;Mnemosyne&lt;/strong&gt;: a 5-layer cognitive memory operating system for AI agents. Not another vector wrapper. An actual memory architecture inspired by how biological memory systems work — from the neural substrate up to metacognition.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;+----------------------------------------------------------------------+
|                      MNEMOSYNE COGNITIVE OS                          |
|                                                                      |
|  L5  SELF-IMPROVEMENT                                                |
|  [ Reinforcement ] [ Consolidation ] [ Flash Reasoning ] [ ToMA ]    |
|                                                                      |
|  L4  COGNITIVE                                                       |
|  [ Activation Decay ] [ Confidence ] [ Priority ] [ Diversity ]      |
|                                                                      |
|  L3  KNOWLEDGE GRAPH                                                 |
|  [ Temporal Graph ] [ Auto-Linking ] [ Path Traversal ] [ Entities ] |
|                                                                      |
|  L2  PIPELINE                                                        |
|  [ Extraction ] [ Classify ] [ Dedup &amp;amp; Merge ] [ Security Filter ]   |
|                                                                      |
|  L1  INFRASTRUCTURE                                                  |
|  [ Qdrant ] [ FalkorDB ] [ Redis Cache ] [ Redis Pub/Sub ]          |
+----------------------------------------------------------------------+
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;33 features across 5 layers. Every feature independently toggleable. MIT licensed. TypeScript.&lt;/p&gt;

&lt;h2&gt;
  
  
  Zero LLM Calls: The Core Design Bet
&lt;/h2&gt;

&lt;p&gt;The most controversial architectural decision in Mnemosyne: &lt;strong&gt;the entire ingestion pipeline runs without any LLM calls.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Every memory passes through a deterministic 12-step pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Security Filter&lt;/strong&gt; — 3-tier classification blocks API keys, credentials, private keys&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding&lt;/strong&gt; — 768-dim vectors via any OpenAI-compatible endpoint&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dedup &amp;amp; Merge&lt;/strong&gt; — Cosine ≥0.92 = duplicate (merge). 0.70–0.92 = conflict (alert).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity Extraction&lt;/strong&gt; — People, IPs, technologies, dates, URLs. Algorithmic, not LLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type Classification&lt;/strong&gt; — 7 types: episodic, semantic, preference, procedural, relationship, profile, core&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Urgency Detection&lt;/strong&gt; — 4 levels: critical, important, reference, background&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain Classification&lt;/strong&gt; — 5 domains: technical, personal, project, knowledge, general&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Priority Scoring&lt;/strong&gt; — Urgency × domain composite (0.0–1.0)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence Rating&lt;/strong&gt; — 3-signal composite with 4 human-readable tiers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Storage&lt;/strong&gt; — Written to appropriate collection with 23-field metadata&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-Linking&lt;/strong&gt; — Bidirectional links to related memories (Zettelkasten-style)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broadcast&lt;/strong&gt; — Published to agent mesh via typed channels&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total time: &amp;lt;50ms. LLM calls: 0. Cost: $0.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createMnemosyne&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mnemosy-ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createMnemosyne&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;vectorDbUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:6333&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;embeddingUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:11434/v1/embeddings&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Full 12-step pipeline, &amp;lt;50ms, $0&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CRITICAL: Auth service JWT expiry changed from 1hr to 30min&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;// -&amp;gt; type: semantic, urgency: critical, domain: technical&lt;/span&gt;
&lt;span class="c1"&gt;// -&amp;gt; priority: 1.0, entities: [Auth service, JWT, 1hr, 30min]&lt;/span&gt;
&lt;span class="c1"&gt;// -&amp;gt; auto-linked to 2 existing JWT memories&lt;/span&gt;
&lt;span class="c1"&gt;// -&amp;gt; broadcast to agent mesh with critical priority&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The trade-off is real: LLM-based extraction catches implicit relationships and nuanced semantic structure that algorithmic extraction misses. Cognee's LLM-powered graph construction builds richer knowledge graphs for document corpora. But for the vast majority of agent memory operations — where entities are explicit, facts are stated directly, and you need speed, consistency, and zero cost — the algorithmic approach dominates.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cognitive Features That Only Exist in Papers
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting. Beyond the pipeline, Mnemosyne implements 10 capabilities that previously existed only in academic research:&lt;/p&gt;

&lt;h3&gt;
  
  
  Activation Decay
&lt;/h3&gt;

&lt;p&gt;Memories fade over time following a logarithmic model inspired by the Ebbinghaus forgetting curve. Critical memories stay active for months. Background observations fade within hours. Procedural memories (runbooks, deployment steps) are &lt;strong&gt;immune to decay&lt;/strong&gt; — like how you never forget how to ride a bike.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Critical memory: stays active for months&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;CRITICAL: Never deploy to prod on Fridays&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;// -&amp;gt; decay rate: 0.3, baseline: +2.0&lt;/span&gt;

&lt;span class="c1"&gt;// Background memory: fades within hours&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;User mentioned they had coffee this morning&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;// -&amp;gt; decay rate: 0.8, baseline: -1.0&lt;/span&gt;

&lt;span class="c1"&gt;// Procedural memory: immune to decay forever&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;To deploy: 1) Run tests 2) Build 3) Push 4) Apply&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;// -&amp;gt; type: procedural, activation: permanent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Multi-Signal Scoring with Intent Detection
&lt;/h3&gt;

&lt;p&gt;Every recall query is scored across 5 independent signals — not just cosine similarity:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Signal&lt;/th&gt;
&lt;th&gt;Weight&lt;/th&gt;
&lt;th&gt;What it measures&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Semantic Similarity&lt;/td&gt;
&lt;td&gt;35%&lt;/td&gt;
&lt;td&gt;Vector distance&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Temporal Recency&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;Time since last access&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Importance × Confidence&lt;/td&gt;
&lt;td&gt;20%&lt;/td&gt;
&lt;td&gt;Priority score × confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Access Frequency&lt;/td&gt;
&lt;td&gt;15%&lt;/td&gt;
&lt;td&gt;How often retrieved (log scale)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Type Relevance&lt;/td&gt;
&lt;td&gt;10%&lt;/td&gt;
&lt;td&gt;Memory type vs. query intent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Mnemosyne auto-detects 5 query intents (factual, temporal, procedural, preference, exploratory) and &lt;strong&gt;dynamically adjusts these weights&lt;/strong&gt;. A temporal query ("what happened recently?") boosts recency to 35%. A procedural query ("how do I deploy?") boosts frequency and type relevance.&lt;/p&gt;

&lt;h3&gt;
  
  
  Flash Reasoning
&lt;/h3&gt;

&lt;p&gt;BFS traversal through linked memory graphs that reconstructs multi-step logic chains:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;why did auth service crash?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;// Primary: "Auth service crashed after config update"&lt;/span&gt;
&lt;span class="c1"&gt;// Chain: -&amp;gt; (because) "Config changed JWT expiry from 1hr to 30min"&lt;/span&gt;
&lt;span class="c1"&gt;//        -&amp;gt; (leads_to) "Short-lived tokens caused session storm"&lt;/span&gt;
&lt;span class="c1"&gt;//        -&amp;gt; (therefore) "Rollback to 1hr expiry resolved the issue"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your agent gets the complete narrative from a single recall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Theory of Mind for Agents
&lt;/h3&gt;

&lt;p&gt;In a multi-agent mesh, any agent can model what other agents know:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// What does the DevOps agent know about the production database?&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;knowledge&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toma&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;devops-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;production database&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;// Knowledge gap analysis&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;gap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;knowledgeGap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;frontend-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;backend-agent&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;API contracts&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// -&amp;gt; { onlyFrontendKnows: [...], onlyBackendKnows: [...], bothKnow: [...] }&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This concept comes from developmental psychology (Baron-Cohen, 1985) and multi-agent systems research (Gmytrasiewicz &amp;amp; Doshi, 2005). It has never been deployed as production infrastructure until now.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cross-Agent Synthesis
&lt;/h3&gt;

&lt;p&gt;When 3+ agents independently store corroborating memories about the same fact, it's automatically promoted to "Mesh Fact" — the highest confidence tier. Independent corroboration from separate agents operating in different contexts is strong evidence of factual accuracy.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reinforcement Learning on Memory
&lt;/h3&gt;

&lt;p&gt;Feedback closes the loop. Memories that consistently prove useful are promoted to core status (immune to decay). Memories that consistently mislead are flagged for review. Over time, retrieval quality improves without manual curation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;database config&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;// Agent uses the result successfully...&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;positive&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;// After 3+ retrievals with &amp;gt;70% positive ratio → auto-promoted to core&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Knowledge Graph: Built-In, Free, Temporal
&lt;/h2&gt;

&lt;p&gt;Mnemosyne includes a temporal knowledge graph powered by FalkorDB. Every entity extracted from memories becomes a graph node. Relationships carry timestamps. The graph grows automatically as memories are stored.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Auto-linking&lt;/strong&gt;: Related memories are bidirectionally connected (Zettelkasten-style)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Path finding&lt;/strong&gt;: "How is Alice related to PostgreSQL?" → Alice → deployed auth service → auth service uses → PostgreSQL&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timeline reconstruction&lt;/strong&gt;: Chronological history of everything known about any entity&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Temporal queries&lt;/strong&gt;: "What was server-1 connected to as of January 15th?"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mem0 charges $249/month for their knowledge graph. Mnemosyne's ships with the MIT license.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cost Comparison at Scale
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Memories/month&lt;/th&gt;
&lt;th&gt;Mnemosyne&lt;/th&gt;
&lt;th&gt;Mem0&lt;/th&gt;
&lt;th&gt;Zep&lt;/th&gt;
&lt;th&gt;Cognee&lt;/th&gt;
&lt;th&gt;Letta&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10K&lt;/td&gt;
&lt;td&gt;~$30&lt;/td&gt;
&lt;td&gt;~$130-330&lt;/td&gt;
&lt;td&gt;~$70-220&lt;/td&gt;
&lt;td&gt;~$140-540&lt;/td&gt;
&lt;td&gt;~$130-530&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100K&lt;/td&gt;
&lt;td&gt;~$60&lt;/td&gt;
&lt;td&gt;~$1K-3K&lt;/td&gt;
&lt;td&gt;~$1K-2K&lt;/td&gt;
&lt;td&gt;~$1K-5K&lt;/td&gt;
&lt;td&gt;~$1K-5K&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;~$250&lt;/td&gt;
&lt;td&gt;~$10K-30K&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;~$10K-50K&lt;/td&gt;
&lt;td&gt;~$10K-50K&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The difference is entirely the per-memory LLM processing cost that Mnemosyne eliminates. Infrastructure costs (Qdrant, Redis, FalkorDB) are roughly equivalent across all systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Feature Count Comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;Features&lt;/th&gt;
&lt;th&gt;Knowledge Graph&lt;/th&gt;
&lt;th&gt;Multi-Agent&lt;/th&gt;
&lt;th&gt;Self-Improving&lt;/th&gt;
&lt;th&gt;Cost/Memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mnemosyne&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;33&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free (built-in)&lt;/td&gt;
&lt;td&gt;Full mesh&lt;/td&gt;
&lt;td&gt;Yes (RL + consolidation)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mem0&lt;/td&gt;
&lt;td&gt;~5&lt;/td&gt;
&lt;td&gt;$249/mo&lt;/td&gt;
&lt;td&gt;Enterprise only&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zep&lt;/td&gt;
&lt;td&gt;~3&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cognee&lt;/td&gt;
&lt;td&gt;~5&lt;/td&gt;
&lt;td&gt;Built-in&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LangMem&lt;/td&gt;
&lt;td&gt;~0&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Letta&lt;/td&gt;
&lt;td&gt;~4&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;mnemosy-ai
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 6333:6333 qdrant/qdrant  &lt;span class="c"&gt;# Only hard requirement&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createMnemosyne&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mnemosy-ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createMnemosyne&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;vectorDbUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:6333&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;embeddingUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:11434/v1/embeddings&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;User prefers TypeScript and dark mode&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user preferences&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;positive&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Start with just Qdrant (vector-only mode). Add FalkorDB for the knowledge graph. Add Redis for multi-agent mesh. Every feature is independently toggleable — adopt progressively.&lt;/p&gt;

&lt;h2&gt;
  
  
  What We Didn't Build
&lt;/h2&gt;

&lt;p&gt;To be honest about scope: Mnemosyne doesn't have a managed cloud offering (you run your own infra). It's TypeScript-only (the AI/ML ecosystem is mostly Python). It doesn't have 41K GitHub stars (Mem0 earned those). And its algorithmic entity extraction won't catch the implicit relationships that Cognee's LLM-powered extraction finds.&lt;/p&gt;

&lt;p&gt;These are real trade-offs. Mnemosyne is purpose-built for teams that need cognitive intelligence, multi-agent collaboration, zero-LLM economics, and self-improving memory — and are willing to run their own infrastructure in exchange.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub&lt;/strong&gt;: &lt;a href="https://github.com/28naem-del/mnemosyne" rel="noopener noreferrer"&gt;github.com/28naem-del/mnemosyne&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;npm&lt;/strong&gt;: &lt;code&gt;npm install mnemosy-ai&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Website&lt;/strong&gt;: &lt;a href="https://mnemosy.ai" rel="noopener noreferrer"&gt;mnemosy.ai&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discord&lt;/strong&gt;: &lt;a href="https://discord.gg/Sp6ZXD3X" rel="noopener noreferrer"&gt;discord.gg/Sp6ZXD3X&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: MIT&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;33 features. 5 cognitive layers. $0 per memory stored. The brain your agents are missing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Mnemosyne — Because intelligence without memory isn't intelligence.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>typescript</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>We Built the First AI Agent Memory System With Zero LLM Calls — Here's the Architecture</title>
      <dc:creator>Mnemosy </dc:creator>
      <pubDate>Tue, 24 Feb 2026 21:56:23 +0000</pubDate>
      <link>https://dev.to/mnemosybrain/we-built-the-first-ai-agent-memory-system-with-zero-llm-calls-heres-the-architecture-feb</link>
      <guid>https://dev.to/mnemosybrain/we-built-the-first-ai-agent-memory-system-with-zero-llm-calls-heres-the-architecture-feb</guid>
      <description>&lt;h1&gt;
  
  
  We Built the First AI Agent Memory System With Zero LLM Calls
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Every AI memory system on the market makes the same architectural choice: send your text to an LLM for extraction before storing it.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Mem0 calls GPT-4o. Zep makes multiple async LLM calls. Cognee uses LLMs for knowledge extraction. Letta's entire memory engine &lt;em&gt;is&lt;/em&gt; an LLM.&lt;/p&gt;

&lt;p&gt;That means every single &lt;code&gt;memory.store()&lt;/code&gt; costs ~$0.01, takes 500ms-2s, and produces non-deterministic results. At 100K memories/month, you're paying $1,000-3,000 just to &lt;em&gt;remember things&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We asked: what if you didn't need an LLM at all?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The result is &lt;a href="https://github.com/28naem-del/mnemosyne" rel="noopener noreferrer"&gt;Mnemosyne&lt;/a&gt; — the first cognitive memory OS for AI agents with &lt;strong&gt;zero LLM calls&lt;/strong&gt; in the entire ingestion pipeline. 33 features, 5 cognitive layers, $0 per memory stored. MIT licensed.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Table Nobody Wants You to See
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;System&lt;/th&gt;
&lt;th&gt;LLM Required?&lt;/th&gt;
&lt;th&gt;Cost per memory&lt;/th&gt;
&lt;th&gt;100K memories/mo&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mnemosyne&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;No&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$0.00&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;~$60&lt;/strong&gt; (infra only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Mem0&lt;/td&gt;
&lt;td&gt;Yes (GPT-4o)&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;td&gt;$1,000-3,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Zep&lt;/td&gt;
&lt;td&gt;Yes (multiple calls)&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;td&gt;$1,000-2,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cognee&lt;/td&gt;
&lt;td&gt;Yes (extraction)&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;td&gt;$1,000-5,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Letta/MemGPT&lt;/td&gt;
&lt;td&gt;Yes (core engine)&lt;/td&gt;
&lt;td&gt;~$0.01&lt;/td&gt;
&lt;td&gt;$1,000-5,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This isn't a criticism of these projects — Mem0 has 41K stars and popularized this entire space. But the LLM-in-the-loop architecture has fundamental trade-offs that nobody talks about.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Problems With LLM-Powered Memory
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Non-deterministic behavior
&lt;/h3&gt;

&lt;p&gt;The same input can produce different extracted facts on different runs. Your memory system's behavior changes when the model updates. In production, you need memory that behaves consistently.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Latency floor
&lt;/h3&gt;

&lt;p&gt;Every &lt;code&gt;store()&lt;/code&gt; requires an LLM API call — 500ms to 2 seconds minimum. When your agent processes 100 memories per session, that's 50-200 seconds of just waiting.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Linear cost scaling
&lt;/h3&gt;

&lt;p&gt;At $0.01 per memory, 1M memories = $10,000. Per month. With no efficiency gains at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  How We Eliminated Every LLM Call
&lt;/h2&gt;

&lt;p&gt;Mnemosyne's 12-step ingestion pipeline is 100% algorithmic:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Security Filter&lt;/strong&gt; — blocks API keys, credentials, secrets (regex patterns)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embedding&lt;/strong&gt; — local vectors via Ollama (nomic-embed-text)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dedup &amp;amp; Merge&lt;/strong&gt; — cosine similarity ≥0.92 = duplicate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Entity Extraction&lt;/strong&gt; — people, IPs, technologies, dates (pattern matching)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Type Classification&lt;/strong&gt; — 7 types: episodic, semantic, procedural, preference, relationship, profile, core&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Urgency Detection&lt;/strong&gt; — critical / important / reference / background&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain Classification&lt;/strong&gt; — technical / personal / project / knowledge / general&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Priority Scoring&lt;/strong&gt; — urgency × domain composite&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Confidence Rating&lt;/strong&gt; — 3-signal composite with human-readable tiers&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector Storage&lt;/strong&gt; — 23-field metadata per memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auto-Linking&lt;/strong&gt; — bidirectional links to related memories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mesh Broadcast&lt;/strong&gt; — published to agent network&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Total time: &amp;lt;50ms. LLM calls: 0. Cost: $0.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The trade-off is real: LLM-based extraction catches implicit relationships that rule-based extractors miss. But for the vast majority of agent memory — where entities are explicit and speed matters — the algorithmic approach dominates.&lt;/p&gt;

&lt;h2&gt;
  
  
  But It's Not Just a Vector Wrapper
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting. Mnemosyne implements &lt;strong&gt;10 cognitive capabilities&lt;/strong&gt; that previously only existed in research papers:&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 Activation Decay
&lt;/h3&gt;

&lt;p&gt;Memories fade over time following the Ebbinghaus forgetting curve. Critical memories survive months. Background observations fade in hours. Procedural memories (like runbooks) &lt;strong&gt;never decay&lt;/strong&gt; — just like how you never forget how to ride a bike.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚡ Flash Reasoning
&lt;/h3&gt;

&lt;p&gt;Query "why did auth crash?" and get the full chain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Config changed → JWT expiry shortened → Session storm → Rollback fixed it
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One recall. Complete narrative. BFS through linked memory graphs.&lt;/p&gt;

&lt;h3&gt;
  
  
  🤝 Theory of Mind for Agents
&lt;/h3&gt;

&lt;p&gt;Agent A can ask "what does Agent B know about the database?" without talking to Agent B. Modeled after developmental psychology research (Baron-Cohen, 1985).&lt;/p&gt;

&lt;h3&gt;
  
  
  📈 Reinforcement Learning
&lt;/h3&gt;

&lt;p&gt;Memories that consistently help → auto-promoted to permanent. Bad memories → flagged. Your memory system gets smarter through use, not manual curation.&lt;/p&gt;

&lt;h3&gt;
  
  
  🔗 Knowledge Graph (Built-in, Free)
&lt;/h3&gt;

&lt;p&gt;Temporal entity graph with auto-linking, path finding, and timeline reconstruction. Mem0 charges $249/month for their knowledge graph. Ours ships with the MIT license.&lt;/p&gt;

&lt;h3&gt;
  
  
  🌐 Multi-Agent Mesh
&lt;/h3&gt;

&lt;p&gt;When 3+ agents independently confirm the same fact, it's automatically promoted to "Mesh Fact" — the highest confidence tier. Real distributed consensus for AI knowledge.&lt;/p&gt;

&lt;h2&gt;
  
  
  33 Features, 5 Layers
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;L5  SELF-IMPROVEMENT
    Reinforcement · Consolidation · Flash Reasoning · ToMA · Synthesis

L4  COGNITIVE
    Activation Decay · Confidence · Priority · Diversity Reranking

L3  KNOWLEDGE GRAPH
    Temporal Graph · Auto-Linking · Path Traversal · Entity Extraction

L2  PIPELINE
    Security Filter · Classify · Dedup · Merge · 12-step Ingestion

L1  INFRASTRUCTURE
    Qdrant · FalkorDB · Redis Cache · Redis Pub/Sub
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every feature is independently toggleable. Start with just Qdrant, progressively enable as needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running in Production Right Now
&lt;/h2&gt;

&lt;p&gt;This isn't a demo. It's running on a 10-machine AI agent mesh:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;13,000+ memories&lt;/strong&gt; stored&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&amp;lt;50ms&lt;/strong&gt; ingestion (full 12-step pipeline)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&amp;lt;200ms&lt;/strong&gt; recall (multi-signal ranked, graph-enriched)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&amp;gt;60%&lt;/strong&gt; cache hit rate in conversational workloads&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;10 agents&lt;/strong&gt; collaborating with shared memory&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick Start (2 minutes)
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;mnemosy-ai
docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; 6333:6333 qdrant/qdrant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;createMnemosyne&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;mnemosy-ai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;createMnemosyne&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;vectorDbUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:6333&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;embeddingUrl&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;http://localhost:11434/v1/embeddings&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;agentId&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-agent&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Full 12-step pipeline, &amp;lt;50ms, $0&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;store&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;User prefers dark mode and TypeScript&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Multi-signal ranked recall&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recall&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;user preferences&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;// Memories learn from feedback&lt;/span&gt;
&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;feedback&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;positive&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Only hard requirement: Qdrant. Redis and FalkorDB are optional power-ups.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest Trade-offs
&lt;/h2&gt;

&lt;p&gt;We're not claiming Mnemosyne is better at everything:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Mem0&lt;/strong&gt; has 41K stars, a great community, and production-hardened cloud offering&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cognee&lt;/strong&gt; builds richer knowledge graphs via LLM extraction&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Letta's&lt;/strong&gt; LLM-directed memory management is genuinely innovative&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mnemosyne fills a specific niche: if you need cognitive intelligence + multi-agent collaboration + zero-LLM economics + self-improving memory — all in one open-source system — this is currently the only option that exists.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bet
&lt;/h2&gt;

&lt;p&gt;We're betting that the future of AI memory is &lt;strong&gt;deterministic, local-first, and free at the point of storage&lt;/strong&gt;. That cognitive capabilities don't require sending every memory through GPT-4. That you can build a brain without renting someone else's.&lt;/p&gt;

&lt;p&gt;13,000 memories and counting. Zero LLM calls. The math speaks for itself.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/28naem-del/mnemosyne" rel="noopener noreferrer"&gt;github.com/28naem-del/mnemosyne&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;npm:&lt;/strong&gt; &lt;code&gt;npm install mnemosy-ai&lt;/code&gt;&lt;br&gt;
&lt;strong&gt;Website:&lt;/strong&gt; &lt;a href="https://mnemosy.ai" rel="noopener noreferrer"&gt;mnemosy.ai&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Discord:&lt;/strong&gt; &lt;a href="https://discord.gg/Sp6ZXD3X" rel="noopener noreferrer"&gt;discord.gg/Sp6ZXD3X&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;License:&lt;/strong&gt; MIT&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Mnemosyne — Because intelligence without memory isn't intelligence.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>typescript</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
