I built a memory plugin for Hermes Agent that takes deletion seriously

#hermesagentchallenge #devchallenge #agents #opensource

Hermes Agent Challenge Submission

This is a submission for the Hermes Agent Challenge

A few days into using Hermes Agent on my own machine I went looking for the file that holds my conversation history. Hermes ships with a clean MemoryProvider ABC and a list of well-known backends (Mem0, Honcho, Hindsight, and friends). They all do the same job well: consolidate in the background, recall fast on the next turn.

The thing I could not get out of my head: once a memory has been baked into a derived summary, deleting the original event does not delete the bake. The summary still encodes the gist.

So I built a memory plugin that flips the model: pull instead of push, real deletes, and a trace file the user can read line by line.

Repo: github.com/MukundaKatta/hermes-agentmemory. MIT.

What it does

Three Hermes hooks, three tools, one trace log.

The provider implements the standard hooks — initialize, prefetch, sync_turn, on_session_switch, on_session_end, shutdown — but with a non-standard discipline: nothing happens in the background. Every write is synchronous. The first turn of a new session pays a 200ms-2s tax to compute the on-demand summary. In exchange the user can delete an event and know the deletion is real.

Tools the agent can call:

agentmemory_recall(query, top_k?) — search past events, return summary plus the event ids used
agentmemory_forget(session_id?, event_id?) — real delete; no tombstone
agentmemory_drift() — rolling-window state of retrieval quality

Audit trail at $HERMES_HOME/agentmemory/trace.jsonl. Each prefetch writes one line: intent, event ids, summary text, and the live drift snapshot. You can tail -f it and watch your agent's memory layer in real time.

How the plugin slots in

Hermes's plugin loader reads plugins/memory/<name>/__init__.py and looks for a class that subclasses agent.memory_provider.MemoryProvider. The whole integration is one file plus a vendored Python sibling of the agentmemory library.

from agent.memory_provider import MemoryProvider
from .agentmemory_py import EpisodicStore, OnDemandSummarizer, MemoryDriftWatcher

class AgentMemoryProvider(MemoryProvider):
    @property
    def name(self) -> str:
        return "agentmemory"

    def initialize(self, session_id: str, **kwargs) -> None:
        self._session_id = session_id
        self._hermes_home = kwargs.get("hermes_home") or os.path.expanduser("~/.hermes")
        self._summarizer = OnDemandSummarizer(
            llm=_make_anthropic_llm(self._model, self._max_tokens),
            max_tokens=self._max_tokens,
        )

    def prefetch(self, query: str, *, session_id: str = "") -> str:
        events = self._store.retrieve(query, top_k=self._top_k)
        if not events:
            return ""
        result = self._summarizer.summarize(events, intent=query)
        self._write_trace({
            "session_id": session_id or self._session_id,
            "intent": query,
            "event_ids": result.trace["event_ids"],
            "summary": result.summary,
            "drift": self._watcher.state(),
        })
        return result.summary

    def sync_turn(self, user_content, assistant_content, *, session_id="") -> None:
        sid = session_id or self._session_id
        if user_content:
            self._store.append(sid, "user", user_content)
        if assistant_content:
            self._store.append(sid, "assistant", assistant_content)

The _make_anthropic_llm factory is intentionally tiny: it returns a function that calls Anthropic().messages.create with the user's existing ANTHROPIC_API_KEY. Same family of model the agent already uses. No second smaller model that quietly degrades quality.

Real deletes, demonstrated

This is the part I care about most. When the user says "forget that conversation," the plugin reaches into the in-memory store and removes the row. There is no is_deleted=True flag, no archive folder, no derived summary still hanging around. The smoke test that proves it:

def test_forget_is_real(tmp_path):
    p = _new_provider(tmp_path)
    p.sync_turn("ephemeral", "noted")
    eid = p._store._events[0].id
    out = p.handle_tool_call("agentmemory_forget", {"event_id": eid})
    assert json.loads(out)["removed"] == 1
    # the event id should be GONE — no tombstone
    assert p._store.get(eid) is None

The test passes. The event id returns None from get because the row is not there to return.

Install in three lines

mkdir -p plugins/memory/agentmemory
cp -r path/to/hermes-agentmemory/* plugins/memory/agentmemory/
hermes config set memory.provider agentmemory

Set ANTHROPIC_API_KEY in your environment so the summarizer can call Claude. Done. Hermes will pick up the plugin on next start, expose the three tools to the model, and start writing trace records.

Why this lives outside Hermes

The Hermes maintainers ship excellent memory backends. I deliberately did not try to upstream this as a default option. Two reasons:

The pull-model trade-off (latency on cold start in exchange for auditability) is a values choice, not a default. People who want the magical version should keep the magical version.
Plugins are how Hermes invites different values into the same agent. The MemoryProvider ABC is well-designed precisely because it lets backends like this exist without forcing them on anyone.

The deeper argument for why this design matters lives in my companion Write-track post. The short version: a self-improving agent that the user cannot audit is just opaque drift. Memory you cannot really delete is the part that breaks first.

What I'd ship next

A hermes memory inspect command that pretty-prints trace.jsonl so the user does not have to grep JSON. A SQLite backend (the agentmemory core has one in the JS version; porting it is a weekend). And a small dashboard that visualizes the drift watcher over time so you can see when your retrieval starts feeling stale.

But the v0.1 plugin is already runnable, already tested, already MIT, and already does the one thing I wanted it to do: when I tell my agent to forget something, it forgets.