LanceDB Ships Semantic Memory Plugin for Hermes Agent: Durable Recall Across Sessions with Four Lifecycle Tools

#hermesagent #lancedb #semanticmemo #vectordataba

Anyone who's used a personal AI agent for more than a few sessions hits the same wall: it forgets. You explain your preferences, state your conventions, spell out the caveats — and a few sessions later, you're explaining it all over again. On June 16, 2026, LanceDB shipped an answer: hermes-agent-memory, an official memory provider plugin that gives Hermes Agent durable, semantic recall across sessions.

Four Lifecycle Tools, One Plugin

The plugin exposes four tools directly to the agent:

lancedb_remember — persist a fact into long-term memory
lancedb_recall — semantic search over stored facts (vector ANN by default)
lancedb_read — retrieve the full source context a fact was extracted from
lancedb_forget — preview candidates, then delete by exact ID

Together they cover the full lifecycle of a durable memory: save it, find it, trace it, and remove it when it's wrong.

Hybrid Retrieval Under the Hood

By default, recall uses pure vector ANN over OpenAI text-embedding-3-small (1536-dim). For production workloads, the plugin supports three hybrid modes — all configurable per call or globally:

Mode	What It Does
RRF (default)	Reciprocal Rank Fusion of vector + BM25 results
Linear	Weighted combination of vector and full-text scores
Cross-encoder	Full reranking pass via `cross-encoder/ettin-reranker-17m-v1`

The cross-encoder is the only mode that needs sentence-transformers (and therefore torch, ~2 GB). Everything else runs with just lancedb, openai, and pyyaml.

Facts That Survive Context Compression

Hermes already compresses its session context to stay within token budgets. The problem: extracted facts can get compressed away before they're saved. The LanceDB plugin hooks into two lifecycle events — on_pre_compress and on_session_end — using an auxiliary LLM to distill durable facts before compression runs. The result: insights survive the compression boundary, and every stored fact carries a link back to its source conversation for provenance.

In-Process, No External Service

The entire memory store runs inside Hermes's Python process. No external database server, no Docker container. The LanceDB table lives at ~/.hermes/lancedb/memories.lance. Embeddings go to your configured API (OpenAI by default, but any OpenAI-compatible endpoint works). For fully local setups, point it at Ollama or vLLM and nothing leaves the machine.

Auto-compaction runs in the background to prevent table fragmentation from single-row writes.

Benchmarked Against LongMemEval

The plugin ships with a LongMemEval benchmark harness — a challenging test with six question types across single-session and multi-session scenarios. One illustrative example: the agent was told about a theater play in a prior session, then asked about it in a fresh session using language that shared no keywords with the original description. Lexical search failed; semantic recall succeeded.

Per-type breakdowns show single-session questions are easy for everyone, while multi-session and temporal reasoning questions remain hard across the board — an area where Hermes's extraction lifecycle has room to help.

Five-Minute Install

hermes plugins install lancedb/hermes-agent-memory
uv pip install --python ~/.hermes/hermes-agent/venv/bin/python3 lancedb openai pyyaml
hermes memory setup   # pick "lancedb"

The plugin works in isolated profiles (hermes -p demo ...) so you can test without touching an existing Hermes setup. A full walkthrough in the LanceDB blog post demonstrates saving a project convention in one session and recalling it from a fresh session with built-in memory disabled — proving the LanceDB store does the work.

Why This Matters

Hermes Agent's plugin architecture was designed for exactly this kind of ecosystem contribution. LanceDB isn't just building a connector — they're shipping a memory provider that integrates at the lifecycle level, with benchmarks, provenance tracking, and a careful design that acknowledges the realities of context compression. It's a signal that the Hermes ecosystem is attracting serious infrastructure players who treat agent memory as a first-class problem.

Sources: LanceDB Blog, GitHub: lancedb/hermes-agent-memory

Cet article a été initialement publié sur The Agent Report.