amionweb

Posted on May 30

Your AI Agent Probably Doesn't Need a Vector Database

#hermesagentchallenge #ai #devchallenge #agents

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

Your AI Agent Probably Doesn't Need a Vector Database

There's a reflex in agent-building right now. You decide your agent should "remember things," and within ten minutes you're comparing managed vector-DB tiers, picking an embedding model, and quietly arguing with yourself about chunk size and overlap.

Hermes Agent — one of the more capable open agents you can run yourself — skips all of that for its long-term memory. It remembers past sessions with SQLite and full-text search. No embeddings. No vector index. No cosine-similarity threshold to babysit.

The first time I noticed this, I assumed it was a placeholder. The thing you rip out before you're serious. The longer I sat with it, the more it looked like the serious choice, and the reflex looked like the placeholder.

This post is about why that is, where it breaks down, and what it should change about how you build memory into your own agents. If you've never touched embeddings, don't worry — the next two sections get you up to speed in plain language. If you build RAG for a living, skip to "Where keyword search actually loses."

Two different problems hiding under one word

"Memory" gets used for two jobs that have almost nothing in common, and conflating them is where a lot of agent designs go wrong.

The first job is recall: what happened in our past sessions? You asked me to set up a deploy script last Tuesday; can I find that conversation and the script we landed on? This is retrieval over a growing pile of transcripts.

The second job is user modeling: who is this person and how do they like to work? Not a specific event, but a slowly-built profile — they prefer terse answers, they're on Windows, they hate when I touch their config without asking.

These want different tools. Recall is a search problem. User modeling is a summarization-and-inference problem. Hermes treats them separately, and that separation is half the reason the design holds together. Recall runs on full-text search. User modeling is handed off to Honcho, which maintains an evolving, "dialectic" model of you across conversations. Two problems, two mechanisms, no forcing one tool to do both.

The rest of this post is about the recall half, because that's the part everyone instinctively reaches for a vector database to solve.

A 90-second primer: vectors vs. full-text search

If you already know this cold, skip ahead. If not, here's the whole idea.

A vector database stores text as embeddings — long lists of numbers a model produces, positioned so that things with similar meaning sit close together. Ask "how do I get a refund," and it can surface a passage about "returning a purchase" even though they share no words. That semantic reach is the superpower. The cost: you run an embedding model on everything going in and every query coming out, you store and index the vectors, and you accept that retrieval is fuzzy and a little opaque. Why did it return that? Because the math said it was near. Good luck reading the math.

Full-text search — what SQLite's FTS5 gives you — is the grown-up version of keyword matching. It indexes the actual words, then ranks results by a relevance formula (BM25: roughly, rarer words that appear more often in a document score higher). Search "refund policy shopify," and you get the sessions where those terms actually show up, ranked best-first. It's fast, it's cheap, and crucially, you can read exactly why a result matched.

One reaches for meaning. The other reaches for words. The conventional wisdom says agents need meaning, so they need vectors. Hermes quietly bets that for session recall, words are enough, and the places where they aren't can be patched more cheaply than a vector stack costs.

How Hermes actually remembers

Three layers, and none of them are exotic. That's the point.

1. Every session goes into SQLite, indexed by FTS5. All your CLI and messaging conversations land in one local database file (~/.hermes/state.db), full-text indexed. When something might be relevant later, the agent searches its own history the way you'd search a codebase: by terms, ranked by relevance, and it gets back the actual messages — not a lossy paraphrase of them. Conceptually, recall looks closer to this than to a similarity query:

-- not the literal Hermes query, but the shape of it
SELECT session_id, snippet(messages, 0, '[', ']', '…', 12)
FROM   messages
WHERE  messages MATCH 'deploy AND script AND staging'
ORDER  BY rank;          -- BM25 relevance, best first

In practice these come back in around 20ms against the local file. There's no network hop, because there's nowhere to hop to.

2. A separate, bounded layer holds curated notes. Next to the raw log, Hermes keeps two small agent-maintained files in ~/.hermes/memories/: a running MEMORY.md of facts worth keeping (capped around 800 tokens) and a USER.md profile (around 500). They're injected into the system prompt as a frozen snapshot when a session starts, and the agent edits them deliberately with add, replace, and remove operations. This is the distilled layer — not everything that was ever said, but the handful of things the agent decided were worth writing down in plain language. It's the difference between your shell history and the README you keep by hand. The condensing work happens here, on what to keep, and stays out of the recall path, so search itself never hands back a guess.

3. Deep user modeling is delegated. For a richer, evolving picture of who you are, Hermes can hand off to Honcho, which reasons about each conversation after it ends and accumulates insight about your preferences, habits, and goals. It's optional, and it's deliberately a different system from raw recall.

Here's the whole shape:

   ┌────────────────────────────────────────────────┐
   │  this conversation (working context)            │
   └───────────────┬────────────────────────────────┘
                   │ when relevant, search the past
                   ▼
   ┌────────────────────────────────────────────────┐
   │  SQLite + FTS5  (~/.hermes/state.db)            │
   │  raw sessions · keyword/BM25 · returns messages │
   └────────────────────────────────────────────────┘
        ▲                                   ▲
        │ distilled by hand                 │ separate track
   ┌──────────────────────────┐   ┌──────────────────────────┐
   │ MEMORY.md / USER.md       │   │ Honcho (optional)        │
   │ agent-curated facts       │   │ evolving user model      │
   └──────────────────────────┘   └──────────────────────────┘

No embedding service anywhere in the recall path. No vector index to keep warm. A database file you could open and read on a plane.

Why this is a better default than it sounds

I want to be fair to vectors later, so let me be specific about what the boring approach buys you first.

It costs almost nothing to run. Every embedding-based memory pays a tax on both ends: you embed what you store, and you embed every query. At low volume that's noise. For an always-on agent that's reading and writing memory all day, it adds up, and it adds a network dependency to a path that now works with a local file and zero API calls.

You can debug it. This is the one I underrate every time. When an FTS-backed agent recalls the wrong thing, you can run the query yourself and see precisely why. When a vector-backed agent recalls the wrong thing, you're staring at cosine distances trying to reverse-engineer the embedding model's taste. One of these you fix on a Tuesday afternoon. The other becomes a ticket that says "memory feels off sometimes."

It's portable and inspectable. The memory is a file. You can back it up, copy it to another machine, grep it, audit what your agent knows, and delete what it shouldn't. Try explaining to a security reviewer what's inside your vector index. Now hand them a SQLite file they can query.

It doesn't drift under you. Swap your embedding model — a new version, a different provider — and strictly, your old vectors live in a different space than your new ones. Teams paper over this, but it's a real, recurring source of "why did retrieval quietly get worse." Keyword indexes don't have model-version baggage. The word "refund" in 2024 is the word "refund" in 2026.

It works offline and on tiny hardware. The whole pitch of Hermes is an agent that lives on infrastructure you own, possibly something cheap. Full-text search is right at home there. A vector stack wants more.

None of these are clever. Added together, for the specific job of "find the relevant past conversation," they're hard to argue with.

Where keyword search actually loses

Now the honest part, because "you never need vectors" would be the wrong lesson.

Full-text search fails exactly where you'd expect: when the words don't match but the meaning does. If three months ago you discussed "rolling back a release" and today you ask about "reverting a deploy," BM25 may shrug. It has no idea those are the same idea. Embeddings would catch it. Other genuine weak spots:

Conceptual or fuzzy queries ("that thing we tried when latency spiked") where you can't supply the keywords because you don't remember them.
Synonym-heavy or jargon-rich domains, where the same concept has ten surface forms.
Cross-lingual recall, where the stored text and the query aren't even in the same language.

This is real, and pretending otherwise would be the kind of advocacy this challenge is already full of. But two things soften it in Hermes's design, and the second one is the interesting part.

First, the curated MEMORY.md layer records the facts that matter in clean, canonical language. So your most important memories don't hang on whether you can reproduce the exact words from a six-week-old transcript.

Second — and this is what a vector-DB reflex misses — the thing doing the searching is an agent, not a dumb retrieval pipeline. If "reverting a deploy" comes back empty, a capable agent can just try "rollback," then "release," then "revert" on its own. A RAG pipeline fires one query and lives with whatever it gets. An agent can rifle through its own filing cabinet, reformulating until it finds the drawer. That recovers a surprising amount of what naive keyword matching would drop, and it costs nothing but a few extra tokens. It won't fully close the semantic gap. It narrows it without an embedding in sight.

And when that isn't enough? FTS5 and a vector index aren't enemies. The mature move is hybrid: keyword search for the 90% that's precise and cheap, embeddings layered on for the conceptual long tail. Hermes is literally built this way. Keyword recall ships by default, and you can run a semantic provider — Mem0, Supermemory, Honcho — alongside it, never replacing the built-in store. The hybrid model isn't a hack you bolt on later. It's the shipped design. Starting with vectors gets the order backwards: you pay for the expensive, fuzzy tool first and add the cheap, precise one afterward, if ever.

A decision guide you can actually use

Strip away the framework and here's the call for your agent's memory:

If you need…	Reach for	Why
"Find the session where we did X"	Full-text search (FTS5)	Cheap, fast, debuggable, exact
Bounded long-term facts	Agent-curated notes (MEMORY.md)	Keeps what matters in clean, canonical language
"Who is this user, how do they work"	A user model (e.g. Honcho)	Different problem from recall
Recall across synonyms/concepts	Add embeddings (hybrid)	The one place vectors clearly earn it
Everything, day one	Start with FTS, measure, then add	You'll likely never hit the wall

The general principle Hermes is quietly demonstrating: default to the boring, legible tool, and add the expensive, opaque one only when you can point at the specific query it failed. That's not a memory rule. That's just engineering. It's easy to forget in a field where every blog post assumes a vector database the way older ones assumed jQuery.

The takeaway

The interesting thing about Hermes's memory isn't that it's clever. It's that it's deliberately not clever, and it works anyway. SQLite, full-text search, a summarization pass, and a separate track for modeling the user. Four ordinary pieces, each doing one job it's genuinely good at.

I came in expecting to find a corner that had been cut. I left thinking most agent projects cut the corner in the other direction — reaching for vectors out of habit, paying for semantic recall they rarely exercise, and trading away the things that actually matter day to day: cost, portability, and being able to answer "why did it remember that?"

If you're about to add memory to an agent, try the boring version first. Index the text. Summarize the old stuff. Search by words. Measure where it genuinely fails before you reach for the embeddings. There's a decent chance, like me, you'll be surprised how rarely you need to.

If you've shipped agent memory both ways, I'd like to hear where full-text search ran out of road for you — that boundary is the actually-interesting question, and nobody seems to write it down.

Top comments (1)

Harjot Singh • May 31

Refreshing contrarian take, and mostly right. People reach for a vector DB reflexively when their corpus is small enough that keyword search, or just stuffing it in context, beats the added complexity. Vector search earns its keep at scale and for fuzzy semantic recall, but below some threshold it's infrastructure you maintain for no measurable win. The honest move is to measure recall without it first, then add only if the number demands it. I keep the same bias toward the simplest retrieval that clears the bar in Moonshift. Where's your cutover point, corpus size, query fuzziness, or latency budget?