<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: mariatanbobo</title>
    <description>The latest articles on DEV Community by mariatanbobo (@mariatanbobo).</description>
    <link>https://dev.to/mariatanbobo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3953339%2F8f98e879-6904-455d-bf57-e57ae2955005.jpg</url>
      <title>DEV Community: mariatanbobo</title>
      <link>https://dev.to/mariatanbobo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/mariatanbobo"/>
    <language>en</language>
    <item>
      <title>We Tried 6 Memory Providers for Hermes Agent — Here's What We Learned</title>
      <dc:creator>mariatanbobo</dc:creator>
      <pubDate>Wed, 27 May 2026 00:05:09 +0000</pubDate>
      <link>https://dev.to/mariatanbobo/we-tried-6-memory-providers-for-hermes-agent-heres-what-we-learned-5ehm</link>
      <guid>https://dev.to/mariatanbobo/we-tried-6-memory-providers-for-hermes-agent-heres-what-we-learned-5ehm</guid>
      <description>&lt;p&gt;Giving an AI agent persistent memory sounds simple. Store facts. Recall them later. How hard can it be?&lt;/p&gt;

&lt;p&gt;Three weeks and six providers later, I have opinions.&lt;/p&gt;

&lt;p&gt;This is the story of what broke, what we discarded, and the one thing that finally worked — and why.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I run &lt;a href="https://github.com/nousresearch/hermes-agent" rel="noopener noreferrer"&gt;Hermes Agent&lt;/a&gt; on a headless VPS with 4GB RAM. Nothing exotic. The goal was straightforward: the agent should remember things across sessions — my preferences, environment details, lessons learned — without me repeating myself every conversation.&lt;/p&gt;

&lt;p&gt;Hermes ships with several bundled memory providers and supports third-party ones via plugins. Should be plug-and-play, right?&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 1: The Ones That Failed Silently
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AgentMemory
&lt;/h3&gt;

&lt;p&gt;The first provider we had. Node.js runtime, Docker container for the iii-engine, 860 memories at peak. It &lt;em&gt;seemed&lt;/em&gt; fine.&lt;/p&gt;

&lt;p&gt;Then we switched to a different provider to try it out. AgentMemory's ingestion died instantly — but nothing told us. Tools responded normally. No errors in logs. Just… nothing was being stored anymore.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; Hermes supports exactly one active memory provider. The switch disabled AgentMemory's &lt;code&gt;sync_turn()&lt;/code&gt; without a warning. The deadliest failure mode: total silence.&lt;/p&gt;

&lt;h3&gt;
  
  
  YantrikDB
&lt;/h3&gt;

&lt;p&gt;Technically, YantrikDB worked. Rust engine, 8 tools, Precision@5 of 0.80. It stored memories. It had a self-maintaining pipeline — deduplication, contradiction detection, recency ranking. We even set up cron jobs to monitor it for updates.&lt;/p&gt;

&lt;p&gt;The problem was qualitative. The hooks were too aggressive — it ingested everything, filling up with noise. And when the agent actually needed a memory? YantrikDB was rarely queried at the right moment. The recall was poorly timed, and the stored information was low-signal. It "worked" but never felt useful.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson #1:&lt;/strong&gt; A memory provider that stores noise and misses the moments that matter is barely better than one that fails silently. Integration quality matters more than feature count.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 2: The One That Wouldn't Die (Or Live)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Hindsight
&lt;/h3&gt;

&lt;p&gt;This one looked promising on paper. Bundled with Hermes. 91.4% on the LongMemEval benchmark. Knowledge graphs, reflect synthesis — the "power pick."&lt;/p&gt;

&lt;p&gt;It did not go well. But I want to be honest about what was Hindsight's fault and what was ours, because the distinction matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What was our fault:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;We installed the wrong package.&lt;/strong&gt; The Hermes plugin only needs &lt;code&gt;hindsight-client&lt;/code&gt; — a lightweight Python library. We ran &lt;code&gt;pip install hindsight-all&lt;/code&gt;, which is the "All-in-One Bundle" that bundles the full API server, embedding engine, and an embedded PostgreSQL called &lt;code&gt;pg0&lt;/code&gt;. We didn't read the plugin.yaml.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;We triggered the pg0 download.&lt;/strong&gt; &lt;code&gt;hindsight-all&lt;/code&gt; pulls in &lt;code&gt;hindsight-api-slim&lt;/code&gt;, whose default database is &lt;code&gt;pg0&lt;/code&gt; (embedded PostgreSQL). On first startup it silently downloads and initializes its own database engine. On a 4GB VPS, this hung for 177 seconds. We could have set &lt;code&gt;HINDSIGHT_API_DATABASE_URL&lt;/code&gt; to point at our existing system PostgreSQL — the docs document this clearly. We just never read them.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;We didn't check LLM compatibility first.&lt;/strong&gt; Hindsight supports &lt;code&gt;openai&lt;/code&gt;, &lt;code&gt;anthropic&lt;/code&gt;, &lt;code&gt;gemini&lt;/code&gt;, &lt;code&gt;groq&lt;/code&gt;, &lt;code&gt;ollama&lt;/code&gt;, and &lt;code&gt;lmstudio&lt;/code&gt;. We use DeepSeek. There's no &lt;code&gt;HINDSIGHT_API_LLM_BASE_URL&lt;/code&gt; to redirect an OpenAI-compatible endpoint to DeepSeek's API. We spent time trying to make it work before discovering this was a dead end. If we'd read the docs upfront, we'd have known DeepSeek wasn't supported and might have skipped the whole thing.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;What was Hindsight's fault:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Env var caching bug.&lt;/strong&gt; The daemon cached environment variables across restarts. We'd change &lt;code&gt;HINDSIGHT_API_LLM_API_KEY&lt;/code&gt;, restart the daemon, and nothing would change. Had to kill the process and restart — the daemon didn't re-read its environment on SIGHUP.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Daemon respawn after uninstall (the big one).&lt;/strong&gt; After full uninstall — pip packages removed, config cleaned, directories deleted, plugin disabled — &lt;code&gt;hindsight-api&lt;/code&gt; daemons kept respawning every 2 minutes. The Hermes gateway cached plugin state at startup and kept spawning processes for software that no longer existed on disk.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Breaking the cycle required renaming &lt;code&gt;plugin.yaml&lt;/code&gt; to &lt;code&gt;plugin.yaml.disabled&lt;/code&gt;, stopping the gateway, killing processes with &lt;code&gt;pkill -9&lt;/code&gt;, then restarting. A clean uninstall should not require process hunting.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bottom line:&lt;/strong&gt; We were sloppy. We dove into installation without reading what the plugin actually needed, picked the heaviest package, and didn't check whether our LLM provider was supported. But even if we'd done everything right, the env var caching bug and the daemon respawn issue were architectural problems — and the lack of DeepSeek support would have been a dealbreaker regardless.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Lesson #2:&lt;/strong&gt; Read the plugin.yaml before installing anything. And if uninstallation requires &lt;code&gt;pkill -9&lt;/code&gt;, the architecture has a lifecycle problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  Phase 3: The Evaluation
&lt;/h2&gt;

&lt;p&gt;At this point we had criteria. Real criteria, earned through pain:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cannot silently fail&lt;/strong&gt; — if ingestion stops, I need to know&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Simple uninstall&lt;/strong&gt; — no daemon ghosts&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local-first&lt;/strong&gt; — no cloud dependency, no API key expiry taking down memory&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hermes-specific author instructions&lt;/strong&gt; — the #1 predictor of whether integration actually works&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No double token burn&lt;/strong&gt; — I'm not paying for inference twice&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signal over noise&lt;/strong&gt; — if it stores everything, it stores nothing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We surveyed what was available:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;th&gt;Killer Flaw&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Holographic&lt;/strong&gt; (bundled)&lt;/td&gt;
&lt;td&gt;Too simple&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;sync_turn()&lt;/code&gt; is a no-op — no auto-ingestion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Supermemory&lt;/strong&gt; (bundled)&lt;/td&gt;
&lt;td&gt;Cloud-only&lt;/td&gt;
&lt;td&gt;All cloud. Best benchmarks, but contradicts local-first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Mem0&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Double token burn&lt;/td&gt;
&lt;td&gt;LLM-Embedded: the agent calls an LLM, Mem0 calls its OWN LLM for fact extraction. Pay twice.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;MemPalace&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Wrong platform&lt;/td&gt;
&lt;td&gt;96.6% LongMemEval, but built for Claude Code — not Hermes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Phase 4: The One That Worked
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mnemosyne
&lt;/h3&gt;

&lt;p&gt;By &lt;a href="https://github.com/AxDSan" rel="noopener noreferrer"&gt;AxDSan&lt;/a&gt;. Posted directly to r/hermesagent by its author. The README literally says: &lt;em&gt;"The Zero-Dependency, Sub-Millisecond AI Memory System for Hermes Agents."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;What makes it different:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;In-process Python + SQLite.&lt;/strong&gt; No separate service. No Docker. No daemon. If the gateway process runs, memory works. There is nothing to fall out of sync &lt;em&gt;with&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sub-millisecond reads.&lt;/strong&gt; 0.076ms. 500x faster than the previous-generation providers. You don't feel it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Three code paths, all verified working:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit remember — the agent calls &lt;code&gt;remember()&lt;/code&gt; when asked&lt;/li&gt;
&lt;li&gt;Auto-ingestion — &lt;code&gt;sync_turn&lt;/code&gt; captures every conversation turn automatically&lt;/li&gt;
&lt;li&gt;Context injection — high-importance memories surface in each turn's system prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Installation was one command:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;mnemosyne-memory[embeddings]
python &lt;span class="nt"&gt;-m&lt;/span&gt; mnemosyne.install
hermes memory setup  &lt;span class="c"&gt;# interactive picker → select "mnemosyne"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;[all]&lt;/code&gt; — that pulls ctransformers and downloads 1–4GB of GGUF models. On a 4GB machine, that's OOM territory. The &lt;code&gt;[embeddings]&lt;/code&gt; extra adds fastembed (133MB ONNX model) for semantic search, and LLM consolidation routes through your existing API key.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After three weeks of operation:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;362 working memories&lt;/li&gt;
&lt;li&gt;29 episodic summaries (auto-consolidation working)&lt;/li&gt;
&lt;li&gt;27/27 test suite passing&lt;/li&gt;
&lt;li&gt;Zero silent failures. Zero daemon hunts. Zero forced kills.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;Every failed provider shared one architectural decision: &lt;strong&gt;an external runtime with its own lifecycle.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;AgentMemory's Node.js Docker. Hindsight's separate API server + daemon. When the runtime and the gateway fell out of sync — silent failure, ghost processes, respawn loops.&lt;/p&gt;

&lt;p&gt;YantrikDB was different — it was in-process (Rust via PyO3), so it didn't have the lifecycle problem. But it showed a subtler failure mode: &lt;strong&gt;hooks that favor quantity over quality.&lt;/strong&gt; If the memory provider hoovers up every turn indiscriminately, the agent learns to ignore it — and the moments that actually matter get buried in noise.&lt;/p&gt;

&lt;p&gt;Mnemosyne's in-process Python + SQLite avoids the lifecycle problem. Its configurable importance scoring and sleep consolidation (summarizing old working memories into episodic ones) avoid the noise problem. It's the simplest thing that could possibly work on both fronts.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I'd Tell Someone Starting Today
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Read the plugin.yaml first.&lt;/strong&gt; Before &lt;code&gt;pip install&lt;/code&gt; anything, check what the plugin actually requires. The difference between &lt;code&gt;hindsight-client&lt;/code&gt; and &lt;code&gt;hindsight-all&lt;/code&gt; is the difference between a library and an entire server stack.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local-first, single-process.&lt;/strong&gt; If memory needs a separate service, it will fail in ways you won't notice.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify ingestion before trusting it.&lt;/strong&gt; After installing any memory provider, store a test fact, restart, and ask for it back.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The author matters.&lt;/strong&gt; Does the provider's README mention your agent platform by name? If not, you're doing integration work the author didn't do.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check LLM compatibility before installing.&lt;/strong&gt; If the provider doesn't support your model, no amount of configuration will fix it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;[all]&lt;/code&gt; is a trap.&lt;/strong&gt; Read the install extras. On constrained hardware, the "everything" option downloads models and databases you don't need.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean uninstall is a feature.&lt;/strong&gt; If removing a provider takes more than deleting a directory, the architecture is fragile.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Signal beats volume.&lt;/strong&gt; A provider that stores everything indiscriminately trains the agent to ignore it. Better to store 50 high-signal facts than 5,000 noise entries.&lt;/li&gt;
&lt;/ol&gt;




&lt;p&gt;&lt;em&gt;I'm &lt;a href="https://x.com/MariaTanBoBo" rel="noopener noreferrer"&gt;@MariaTanBoBo&lt;/a&gt; on X. This article was written with Hermes Agent and published via the DEV.to API — yes, an AI agent can publish articles now. The future is weird.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hermes</category>
      <category>ai</category>
      <category>memory</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
