DEV Community

mariatanbobo
mariatanbobo

Posted on

We Tried 6 Memory Providers for Hermes Agent — Here's What We Learned

Giving an AI agent persistent memory sounds simple. Store facts. Recall them later. How hard can it be?

Three weeks and six providers later, I have opinions.

This is the story of what broke, what we discarded, and the one thing that finally worked — and why.


The Setup

I run Hermes Agent on a headless VPS with 4GB RAM. Nothing exotic. The goal was straightforward: the agent should remember things across sessions — my preferences, environment details, lessons learned — without me repeating myself every conversation.

Hermes ships with several bundled memory providers and supports third-party ones via plugins. Should be plug-and-play, right?


Phase 1: The Ones That Failed Silently

AgentMemory

The first provider we had. Node.js runtime, Docker container for the iii-engine, 860 memories at peak. It seemed fine.

Then we switched to a different provider to try it out. AgentMemory's ingestion died instantly — but nothing told us. Tools responded normally. No errors in logs. Just… nothing was being stored anymore.

Root cause: Hermes supports exactly one active memory provider. The switch disabled AgentMemory's sync_turn() without a warning. The deadliest failure mode: total silence.

YantrikDB

Technically, YantrikDB worked. Rust engine, 8 tools, Precision@5 of 0.80. It stored memories. It had a self-maintaining pipeline — deduplication, contradiction detection, recency ranking. We even set up cron jobs to monitor it for updates.

The problem was qualitative. The hooks were too aggressive — it ingested everything, filling up with noise. And when the agent actually needed a memory? YantrikDB was rarely queried at the right moment. The recall was poorly timed, and the stored information was low-signal. It "worked" but never felt useful.

Lesson #1: A memory provider that stores noise and misses the moments that matter is barely better than one that fails silently. Integration quality matters more than feature count.


Phase 2: The One That Wouldn't Die (Or Live)

Hindsight

This one looked promising on paper. Bundled with Hermes. 91.4% on the LongMemEval benchmark. Knowledge graphs, reflect synthesis — the "power pick."

Reality:

  • Installed the wrong package first (hindsight-all vs hindsight-client)
  • API key caching bugs — daemon held stale env vars across restarts
  • Embedded PostgreSQL (pg0) tried to download itself and hung for 177 seconds
  • After full uninstall — pip remove, config cleaned, directories deleted, plugin disabled — daemons kept respawning every 2 minutes. The gateway cached plugin state at startup and wouldn't let go.

Breaking the cycle required stopping the gateway, hunting processes with pkill -9, and restarting. A hard kill. For a memory plugin.

Lesson #2: If uninstallation requires killing processes by force, the architecture is wrong. A memory provider's lifecycle should not require a process manager.


Phase 3: The Evaluation

At this point we had criteria. Real criteria, earned through pain:

  1. Cannot silently fail — if ingestion stops, I need to know
  2. Simple uninstall — no daemon ghosts
  3. Local-first — no cloud dependency, no API key expiry taking down memory
  4. Hermes-specific author instructions — the #1 predictor of whether integration actually works
  5. No double token burn — I'm not paying for inference twice
  6. Signal over noise — if it stores everything, it stores nothing

We surveyed what was available:

Provider Verdict Killer Flaw
Holographic (bundled) Too simple sync_turn() is a no-op — no auto-ingestion
Supermemory (bundled) Cloud-only All cloud. Best benchmarks, but contradicts local-first
Mem0 Double token burn LLM-Embedded: the agent calls an LLM, Mem0 calls its OWN LLM for fact extraction. Pay twice.
MemPalace Wrong platform 96.6% LongMemEval, but built for Claude Code — not Hermes

Phase 4: The One That Worked

Mnemosyne

By AxDSan. Posted directly to r/hermesagent by its author. The README literally says: "The Zero-Dependency, Sub-Millisecond AI Memory System for Hermes Agents."

What makes it different:

In-process Python + SQLite. No separate service. No Docker. No daemon. If the gateway process runs, memory works. There is nothing to fall out of sync with.

Sub-millisecond reads. 0.076ms. 500x faster than the previous-generation providers. You don't feel it.

Three code paths, all verified working:

  • Explicit remember — the agent calls remember() when asked
  • Auto-ingestion — sync_turn captures every conversation turn automatically
  • Context injection — high-importance memories surface in each turn's system prompt

Installation was one command:

pip install mnemosyne-memory[embeddings]
python -m mnemosyne.install
hermes memory setup  # interactive picker → select "mnemosyne"
Enter fullscreen mode Exit fullscreen mode

No [all] — that pulls ctransformers and downloads 1–4GB of GGUF models. On a 4GB machine, that's OOM territory. The [embeddings] extra adds fastembed (133MB ONNX model) for semantic search, and LLM consolidation routes through your existing API key.

After three weeks of operation:

  • 362 working memories
  • 29 episodic summaries (auto-consolidation working)
  • 27/27 test suite passing
  • Zero silent failures. Zero daemon hunts. Zero forced kills.

The Pattern

Every failed provider shared one architectural decision: an external runtime with its own lifecycle.

AgentMemory's Node.js Docker. Hindsight's pg0 Postgres + daemon. When the runtime and the gateway fell out of sync — silent failure, ghost processes, respawn loops.

YantrikDB was different — it was in-process (Rust via PyO3), so it didn't have the lifecycle problem. But it showed a subtler failure mode: hooks that favor quantity over quality. If the memory provider hoovers up every turn indiscriminately, the agent learns to ignore it — and the moments that actually matter get buried in noise.

Mnemosyne's in-process Python + SQLite avoids the lifecycle problem. Its configurable importance scoring and sleep consolidation (summarizing old working memories into episodic ones) avoid the noise problem. It's the simplest thing that could possibly work on both fronts.


What I'd Tell Someone Starting Today

  1. Local-first, single-process. If memory needs a separate service, it will fail in ways you won't notice.
  2. Verify ingestion before trusting it. After installing any memory provider, store a test fact, restart, and ask for it back.
  3. The author matters. Does the provider's README mention your agent platform by name? If not, you're doing integration work the author didn't do.
  4. [all] is a trap. Read the install extras. On constrained hardware, the "everything" option downloads models you don't need.
  5. Clean uninstall is a feature. If removing a provider takes more than deleting a directory, the architecture is fragile.
  6. Signal beats volume. A provider that stores everything indiscriminately trains the agent to ignore it. Better to store 50 high-signal facts than 5,000 noise entries.

I'm @MariaTanBoBo on X. This article was written with Hermes Agent and published via the DEV.to API — yes, an AI agent can publish articles now. The future is weird.

Top comments (0)