DEV Community

Cover image for Hermes Agent Has Four Memories — And That's Why It Doesn't Forget You
Manikant Kella
Manikant Kella Subscriber

Posted on

Hermes Agent Has Four Memories — And That's Why It Doesn't Forget You

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent


A cognitive-science tour of the only open-source agent that gets smarter the longer you run it. We map Hermes' file layout onto the four classical memory systems of the human brain — and find an architecture that quietly solves the problem every other agent fakes.

I want to argue something that sounds like marketing but isn't.

Hermes Agent isn't impressive because it's "self-improving." That phrase has been on a thousand landing pages. It's impressive because — possibly by accident, possibly on purpose — the architecture matches how a human brain stores memory. Once you see the mapping, the whole project stops looking like a clever Python wrapper and starts looking like a thesis: if you want an agent that compounds, you have to give it the same memory systems we use.

I'm an ML engineer who builds recommendation and intent systems for a living, and I spend a lot of time thinking about the gap between an agent that's interesting and an agent that holds up after week three. Almost few article write-up of Hermes I've read covers the surface — "it remembers things, it has skills, it has crons." All true. None of them ask the more useful question:

Why this particular decomposition? Why these specific files? Why does a system designed in 2026 look so much like a 1970s cognitive-psychology paper?

That's the post. Grab a coffee.


The goldfish problem (and why "context window" isn't the answer)

Every agent in production today has the same failure mode, and we've all felt it.

Monday: you carefully explain your project, your stack, the three constraints that always trip up the model. The agent crushes the task. You're delighted.

Tuesday: you open a fresh session. The agent has no idea who you are. You re-explain. You re-correct. You re-paste the same context blob. This is what people mean when they say agents "feel like demos." They aren't building anything that lasts; they're paying the same context tax every single morning.

The standard responses to this problem are bad:

  • Bigger context windows — expensive, doesn't scale with how long you actually want the agent to live (months, not turns), and still amnesiac the moment the conversation ends.
  • Vector databases — useful for retrieval, terrible for identity. A RAG store doesn't know who you are; it knows which 8 chunks resemble your last question.
  • Fine-tuning — slow, expensive, and updates an agent's beliefs at the speed of GPU procurement, not the speed of Tuesday.

None of these is memory. They're all variations on "search more text faster." The reason your brain doesn't work this way is that your brain has at least four distinct memory systems, and each one solves a different problem. When cognitive psychologists separated them in the 1970s and 80s — names like Tulving, Squire, Cohen — they did it because patients with brain damage would lose one and keep the others. The systems are genuinely independent.

So here's the simple claim: the reason Hermes Agent works is that it has all four, and they're separate, and they're stored on disk where you can read them.

Let me show you.


A 90-second cognitive-science primer

You don't need a degree to follow this. Four words, four jobs:

System Holds Example
Procedural memory How to do things Riding a bike. Touch-typing. Driving a stick shift.
Semantic memory Facts about the world Paris is the capital of France. Mitochondria are the powerhouse of the cell.
Episodic memory Things that happened to you What you had for breakfast yesterday. The argument you had in 2019.
Working memory What you're holding right now The phone number someone just said out loud.

There's a fifth thing that isn't a memory system but is the substrate the others run inside: your identity — the sense of who you are that makes all four useful instead of just being a pile of disconnected data. Without a self, your "memories" are just other people's footage.

Now look at this directory listing from a fresh Hermes Agent install:

~/.hermes/
├── SOUL.md           # identity (the substrate)
├── memories/
│   ├── USER.md       # who you are (the agent's model of me)
│   └── MEMORY.md     # what's true in my world right now
├── skills/           # how to do things  (one folder per SKILL.md)
├── sessions/         # SQLite + FTS5: every conversation, searchable
└── cron/             # scheduled jobs that fire in fresh sessions
Enter fullscreen mode Exit fullscreen mode

I'm going to take you through these one at a time and show you which cognitive system each one is doing, why the boundaries are drawn where they are, and what each one prevents from going wrong. We'll end with the part nobody on dev.to has written about yet: how a feature called the Curator (added in v0.12) functions as the agent's equivalent of sleep.


SOUL.md — the substrate (identity)

SOUL.md is the first file loaded into the system prompt at every turn. The Hermes architecture docs call it "slot #1." It contains the agent's personality, communication style, role, and constraints. It's about 1,000–3,000 characters in most setups.

"You are a senior research analyst. You write in short paragraphs. You don't apologize. You never use the word 'leverage.'"

If you stop reading here you'll think this is just a system prompt. It isn't. The trick is what SOUL.md doesn't contain: facts about your projects, your name, anything that changes week-to-week. Those go in different files. SOUL.md is the slow-moving part — the part that makes a Telegram bot reading your code reviews behave consistently different from one that drafts your customer support replies.

You can run multiple Hermes profiles on the same machine, each with its own SOUL.md. A finance-ops agent and a research agent share the same underlying model but read as different colleagues. This is exactly how human personality works: same neural hardware, different self-concepts shaping the output.

What this prevents: tone drift. The slow, frustrating phenomenon where an agent that was crisp on day 1 has become a verbose, hedging, em-dash-spamming mess by day 30 because nothing was holding its character in place.


USER.md and MEMORY.md — semantic memory (facts about your world)

Open ~/.hermes/memories/USER.md after a few weeks of use and you'll find something like this (this is illustrative, not from any specific install):

- Name: Manikant
- Role: ML/AI engineer
- Communication: prefers terse responses, no apology preambles
- Working on: prompt-recommendation pipelines, intent taxonomy
- Avoid: marketing language, "leverage", em-dashes in code blocks
Enter fullscreen mode Exit fullscreen mode

And MEMORY.md:

- Active project: CCx personalization pipeline
- Stack: Databricks, FastAPI on AKS, Redis, OneLLM
- p95 target: sub-1s on warm path
- Recent issue: Kong/Okta 403 — fixed by adding personalization_services scope
Enter fullscreen mode Exit fullscreen mode

Two files. One is who you are (slow-changing). One is what's currently true in your world (medium-changing). Both get injected into the system prompt at session start as a frozen snapshot.

This is semantic memory — declarative facts that the agent can state without re-deriving them. The reason it's split into two files instead of one is the same reason your brain separates "I am a software engineer" from "the build is currently broken." One is identity-level. One is operational. Mixing them is how you end up with an agent that introduces itself in every reply.

The character limits matter. The published numbers I've seen are roughly USER.md ≤ 1,375 chars, MEMORY.md ≤ 2,200 chars. That's tiny on purpose. Memory in Hermes is a pointer index, not a data lake. If something needs more space, it doesn't belong here — it belongs in a skill, or in the episodic store (the session DB). This is the single most important design choice in the system and the one most newcomers fight against.

What this prevents: the context-amnesia tax. You stop re-explaining yourself every Monday. The agent walks into the session already knowing the constraints.

The non-obvious failure mode: stale memory. If MEMORY.md still says "the build is broken" three weeks after you fixed it, the agent will keep working around a phantom. Treat these files like sticky notes on your monitor, not like a database. If your agent starts behaving oddly, this is almost always the first place to look.


SKILL.md — procedural memory (how to do things)

This is the part of Hermes that most reviewers fixate on, but they usually describe what it is without explaining why this particular shape.

A skill is a markdown file with YAML front matter:

---
name: deploy-fastapi-aks
description: Deploy a FastAPI service to Azure Kubernetes Service with our standard manifests
version: 1.2.0
metadata:
  hermes:
    tags: [devops, azure, aks]
---

# Deploy FastAPI to AKS

## When to use
The user wants to ship a FastAPI service to our AKS cluster.

## Procedure
1. Confirm the image tag exists in ACR
2. Run `kubectl apply -f manifests/`
3. Watch the rollout with `kubectl rollout status`
4. Hit /healthz from a pod in the cluster to confirm

## Pitfalls
- If the rollout stalls > 60s, check the readiness probe path
- Token expiry: rotate via `az acr login` before applying
Enter fullscreen mode Exit fullscreen mode

This is procedural memory. It's not facts about your stack; it's a recipe for executing a workflow that survives the agent forgetting your conversation. You can install skills from URLs (hermes skills install https://example.com/SKILL.md), browse a community hub at agentskills.io, or write your own — but the most interesting case is the one where the agent writes them itself.

After Hermes solves a complex task, it can call a tool called skill_manage and propose saving the solution as a new skill. You confirm, it writes the file, and from then on the workflow is one slash command away. The next time you ask for the same kind of thing, the agent doesn't re-derive the solution from first principles — it loads the recipe.

Here's the part that took me three readings of the Hermes architecture page to appreciate. Skills don't cost tokens until they're used. The mechanism is called progressive disclosure:

  1. At session start: skills_list() loads a compact list of names and descriptions only. Roughly 3k tokens for ~100 skills.
  2. When the agent decides one is relevant: skill_view(name) reads the full SKILL.md.
  3. If the skill references a deep file: skill_view(name, "references/api-docs.md") reads that one file.

That's a three-tier lazy-load. The agent never pays the full cost of all its procedural knowledge — only the tip of the iceberg you're using right now. This is the unlock that makes the "more skills makes the agent better, not slower" property actually hold. Without progressive disclosure, every skill you add is a tax on every conversation. With it, the cost is bounded by what you actually invoke.

You don't have to take my word for it. The official skills docs (here) lay out the three-tier load explicitly.

What this prevents: workflow drift. The same task done differently every time because the agent is improvising rather than following a known-good procedure. Skills are the difference between "the agent figured it out again" and "the agent did it the way we always do it."


sessions/ + FTS5 — episodic memory (things that happened)

Open ~/.hermes/sessions/ and you'll find a SQLite database. Inside, FTS5 — SQLite's full-text search — indexes every turn, every tool call, every result the agent has ever produced.

This is episodic memory. Not facts ("I prefer dark mode"), not skills ("how to deploy"), but the actual diary of what happened. When does this matter? When you say something like: "that bug we hit two weeks ago with the Redis TTL — what did we end up doing?" Hermes runs a search across the session history, summarizes the relevant turns with the LLM, and brings the answer back.

This is the layer that most people who shrug off "the agent has memory" never actually use. They put facts in MEMORY.md and call it done. The episodic layer is what makes the agent feel like a colleague who was there with you, not just a search engine that remembers your settings.

The reason this is implemented as SQLite + FTS5 instead of a vector database is worth pausing on. Embeddings are great for "find me something semantically similar." But episodic memory is dominated by literal recall: specific error messages, file paths, dates, function names. FTS5 is faster, cheaper, has no extra service to run, and handles literal queries better than embeddings. The trade-off is that fuzzy queries work less well — but that's what the LLM-on-top-of-search layer compensates for. The agent reads the raw matches and synthesizes.

What this prevents: the "did we already try that?" tax. You stop re-running experiments you've already run. The agent can tell you what was tried, what worked, and what was abandoned without you remembering the original session.


Working memory — the running conversation

The current session — your active conversation, the in-flight tool calls, the scratchpad of what the agent has done in the last 20 turns — is working memory. It's bounded by the model's context window, and Hermes has a dedicated component called context_compressor.py that summarizes older turns once the context crosses about 50% of the available window, preserving recent messages and grouping related tool calls together.

This isn't fancy. But it's the unglamorous plumbing that makes long sessions possible without the agent suddenly forgetting what it was just doing. Working memory in humans has the same property: about 7±2 items, and the brain dumps older items into longer-term stores constantly. Hermes does it explicitly because LLMs don't.


The part nobody writes about: the Curator (v0.12+)

If you've read other Hermes write-ups, you've seen the four-pillar story (memory, skills, soul, crons) and the five-pillar story (add the self-improving loop). Almost nobody is talking about what shipped in version 0.12: the autonomous Curator.

The Curator is a cron job — running on a 7-day cycle by default — whose only job is to grade, consolidate, and prune the agent's own skill library. It walks through every skill in ~/.hermes/skills/, scores it, merges near-duplicates, archives skills that haven't been invoked in N weeks, and rewrites descriptions for clarity. The agent does this on itself, while you sleep.

Why does this matter? Because every learning system has the same problem: growth alone isn't intelligence. If your agent adds a skill every time you do something complex, in six months you have 800 skills, half of them redundant, a third of them stale. The skill index gets noisy. The agent picks the wrong skill for a task because the descriptions overlap. The compounding starts to compound backwards.

The Curator is consolidation. In humans, the analogous process is sleep — specifically slow-wave sleep, when the hippocampus replays the day's episodic memories and the cortex selectively keeps, merges, and discards them. You wake up and the things that didn't matter are gone, the things that did are integrated. Memory researchers have been writing about this since the 1990s. Hermes is, as far as I can tell, the first widely-used open-source agent to ship it as a first-class feature.

When I realized this is what the Curator was doing, the architecture story clicked all the way: Hermes isn't a smart loop on top of an LLM. It's an attempt to give a stateless model the same memory-management apparatus a biological system uses. Identity, semantic, procedural, episodic, working — plus consolidation. That's the whole stack.


A diagram, because diagrams help

Here is the mental model I've been building toward:

                    ┌───────────────────────────────────────────┐
                    │            SOUL.md  (identity)            │
                    │  the substrate that makes the rest useful │
                    └────────────────────┬──────────────────────┘
                                         │
        ┌────────────────────┬───────────┼────────────────────────┐
        │                    │           │                        │
   ┌────▼─────┐    ┌─────────▼────┐  ┌───▼───────┐    ┌──────────▼─────────┐
   │ USER.md  │    │  MEMORY.md   │  │ SKILL.md  │    │  sessions/ (FTS5)  │
   │ + USER MODEL    │              │  │           │    │                    │
   │ semantic  │   │  semantic    │  │ procedural│    │      episodic      │
   │ (who I am)│    │ (my world)   │  │ (how-to)  │    │  (what happened)   │
   └───────────┘    └──────────────┘  └─────┬─────┘    └──────────┬─────────┘
                                            │                     │
                                       progressive            FTS5 + LLM
                                        disclosure              recall
                                            │                     │
                                       ┌────▼─────────────────────▼────┐
                                       │   working memory (the turn)   │
                                       │   + context_compressor.py     │
                                       └────────────────┬──────────────┘
                                                        │
                                                ┌───────▼────────┐
                                                │  the Curator   │
                                                │ (weekly cron)  │
                                                │  = "sleep"     │
                                                └────────────────┘
Enter fullscreen mode Exit fullscreen mode

If you've ever stared at a Tulving diagram of memory systems and then stared at this layout, the resemblance is uncanny.


The bonus layer most people miss: trajectory export

There's one more thing I want to flag, because it's the part that makes Hermes interesting if you're an ML engineer rather than just an end user.

Every conversation Hermes has is, by design, exportable in ShareGPT format — the de-facto schema for instruction-tuning data. The repo includes an Atropos integration for reinforcement-learning environments. The agent's own sessions become training data for the next version of the model that runs the agent.

This closes a flywheel that almost no other open-source agent has closed:

  1. You use Hermes. It logs what worked.
  2. The Curator consolidates the useful skills.
  3. The trajectories get exported.
  4. Someone (Nous Research, your team, you) fine-tunes a model on those trajectories.
  5. The fine-tuned model becomes a better Hermes.

This is also why Hermes is described as "built by model trainers." Nous Research isn't a UX shop; they ship the Hermes, Nomos, and Psyche model families. The agent is also their data-collection apparatus. Once you know this, the design choices stop looking arbitrary. Of course they separated procedural from semantic from episodic — they need clean labels to train on.

I'm not making a claim about what fine-tuning runs they're doing internally. The mechanism is there in the codebase; what they do with it is their business. What I'm pointing out is that the architecture is shaped by the data it generates, which is a property almost no other open-source agent has.


What this means if you're building something

If you've read this far you probably want practical takeaways. Here are five.

1. Don't treat MEMORY.md as a dumping ground. It's a sticky note, not a database. If you put everything in it, the agent will get confused about what's currently true. Aggressively prune.

2. Let the agent write skills, then edit them. Hand-written skills from scratch tend to be too generic. Agent-written skills are concrete because they were extracted from a real workflow you just completed. Your job is editorial — tighten the description, add the pitfall you noticed, ship it.

3. The "when to use" section in a SKILL.md is the most important sentence in the file. That's what the agent searches against when deciding to invoke the skill. Treat it like ad copy: specific, keyword-rich, unambiguous.

4. Multiple SOUL.md profiles beat one mega-agent. A finance agent and a research agent with different SOUL.md files and different credentials are cleaner, safer, and easier to debug than one agent with all the skills and all the API keys.

5. The cron + skill combo is the actual moat. A single sentence — "every night at 1am, pull the latest commits and summarize them on Slack" — produces a skill, a scheduled job, and an output destination, all at once. This is what shifts an agent from reactive to proactive. Most "agent" demos can't do this without 200 lines of glue code.


The honest caveats

I'd rather not write a love letter, so:

  • Stale memory is the #1 reason Hermes "starts acting weird." If your agent is misbehaving, open MEMORY.md before anything else.
  • Skill name collisions are real. A skill in your home directory shadows the same name in an external repo, which means "it works on my machine" is a known failure pattern when a team shares skills.
  • The cost story depends on your model choice. Progressive disclosure keeps the agent lean. The model behind it can still be expensive if you point it at a frontier API.
  • "Self-improving" doesn't mean automatic. The official docs say it themselves: the loop works best when you actively correct mistakes, save things to memory, and update skills. Passive use produces some improvement; active use compounds.
  • The 40% research-task speedup number that floats around is from a third-party benchmark I haven't personally re-run. Treat it as directional.

Closing

I started this post with a claim: that Hermes Agent works because its architecture matches how a brain stores memory. I think the mapping is real, and I think it explains why this particular agent has 164,000+ GitHub stars while a dozen frameworks with louder marketing are stuck at 5k.

Identity that doesn't drift. Facts that update. Procedures that get reused. Episodes that can be searched. Consolidation that happens while you sleep.

None of those features is novel on its own. Vector DBs do semantic-ish search. Prompt frameworks do identity-ish anchoring. RAG does episodic-ish retrieval. What's new is putting them all in one place, with clean boundaries, in plain markdown files you can cat. That's the architecture. And once you see it through this lens, the next time you build an agent, you'll find yourself reaching for the same five buckets — because they're the buckets that work.

If you've made it this far, I'd love to hear which of the five layers you'd want to extend first. The Curator is the one I'm most curious about — there's a whole research direction in "how should an agent forget" that nobody's seriously explored yet, and it's sitting right there in ~/.hermes/cron/.


If you found this useful, the conversation is open — drop your own mental model in the comments. And if you spot something I got wrong about the internals, please correct me. Hermes is moving fast and I want to keep this honest.

Top comments (0)