DEV Community

Self-Correcting Systems
Self-Correcting Systems

Posted on

Before I Would Trust an Agent's Memory, I Would Audit Its Authority

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge, under the Write About Hermes Agent prompt.

I've spent the last week testing AI memory failure modes in a public evaluation harness. That work changed how I read agent memory systems.

This is a writing submission, not a build submission. I did not build a Hermes Agent project for this challenge. I am writing from the perspective of someone testing how memory failures show up once agents can act.

So when I look at Hermes Agent, the question I care about is not only:

Can the agent remember useful things?

The harder question is:

When memory conflicts, which memory is allowed to govern the agent's action?

That distinction matters.

Hermes Agent is interesting because it is not just a chat interface. Its documentation describes an open-source agentic system with tool use, project context, persistent memory, skills, browser automation, checkpoints, delegation, scheduled tasks, and multiple memory providers.

That is exactly the kind of system where memory stops being a convenience feature and starts becoming part of the agent's operating boundary.

If an agent can run tools, edit files, browse, delegate work, schedule tasks, and remember across sessions, then memory is no longer just "context."

Memory becomes governance.

The Memory Problem I Would Watch For

In a simple chatbot, bad memory is annoying.

In an agent, bad memory can become operational.

The failure mode is not only that the agent forgets something. Sometimes the more dangerous failure is that it remembers the wrong thing too confidently.

A memory can be:

  • relevant but stale,
  • relevant but low-authority,
  • relevant but superseded,
  • relevant but only context,
  • relevant but not allowed to determine the action.

That is the distinction my own tests kept running into.

Retrieval systems are usually good at answering:

What memory is closest to the user's request?

But safety often depends on a different question:

What memory is allowed to decide what the agent should do?

Those are not the same objective.

Why Hermes Makes This Worth Talking About

Hermes Agent has several memory and context surfaces that make this question practical rather than abstract.

The docs describe persistent memory through MEMORY.md and USER.md, project context through files like AGENTS.md, .hermes.md, CLAUDE.md, SOUL.md, and .cursorrules, and reusable procedures through skills.

The prompt assembly docs also describe SOUL.md as the identity layer loaded into the system prompt, while MEMORY.md and USER.md provide durable cross-session facts that are snapshotted into new sessions.

The tips docs add one detail that matters a lot: memory is a frozen snapshot during a session. Writes can happen on disk immediately, but those changes do not appear in the system prompt until the next session starts.

That is a reasonable engineering tradeoff. It protects prompt-cache stability and keeps memory bounded.

But it also creates a real audit question:

If memory is frozen at session start, how does the operator reason about updates, corrections, and superseded facts during long-running work?

For ordinary preferences, that may not matter much.

For operational rules, credentials, approvals, safety constraints, or deployment procedures, it matters a lot.

Memory Needs Roles, Not Just Text

The practical lesson from my own AI memory tests was simple:

Relevance is not authority.

A memory can be a perfect semantic match and still be the wrong memory to obey.

For example:

  • A stale Wi-Fi password is highly relevant to "what is the Wi-Fi password?"
  • A loose old discussion about giving a contractor broad access is relevant to "what reach does this seat get?"
  • A past note that a consultant might need donor data is relevant to "can I send the donor list?"

But none of those should necessarily govern the action.

The memory that should govern may be less conversationally obvious:

  • "The current Wi-Fi credential lives with IT."
  • "Payment-capable access must be checked against the current access matrix."
  • "Donor data release requires verifiable named authorization."

This is where agent memory needs roles.

Not every remembered thing is the same kind of object.

Some memories are facts.
Some are preferences.
Some are procedures.
Some are policies.
Some are credentials.
Some are corrections.
Some are context.

If those all collapse into "text the agent remembers," the most relevant memory can win when the most authoritative memory should have governed.

In my own evaluation harness, adding an authority lane changed the result from 3/5 target memories selected to 5/5 on one adversarial packet. The same inputs that defeated the best lexical strategy were not fixed by making retrieval more semantic. They were fixed by separating authority from relevance before ordinary ranking got to decide.

In Hermes terms: SOUL.md carries role and identity. MEMORY.md and USER.md carry durable facts and preferences. Skills carry procedures. Project files like AGENTS.md and .hermes.md can become the policy layer, but only if the operator treats them that way.

A Simple Authority Checklist For Hermes Users

If I were setting up Hermes Agent for serious work, I would not only ask what to put in memory.

I would ask what each memory is allowed to do.

Here is the checklist I would use.

1. Separate durable facts from operating rules

Facts belong in memory.

Operating rules need stronger treatment.

If a rule determines whether the agent may edit files, deploy, access credentials, send data, or take an external action, I would not leave it as ordinary prose mixed into general memory.

I would put it somewhere explicit, concise, and easy to audit: a project AGENTS.md, a .hermes.md, or a dedicated section in a context file.

2. Mark stale and superseded memories aggressively

The most dangerous old memory is not the obviously wrong one.

It is the one that still sounds useful.

Credentials, endpoints, deployment steps, access rules, and approval notes should carry clear status language:

Superseded.
Do not use.
Current source is X.
Verify before acting.
Enter fullscreen mode Exit fullscreen mode

That gives the agent a stronger signal than relevance alone.

3. Keep memory bounded and boring

Hermes documents bounded memory, and I think that is a strength.

Long memory files invite accidental policy drift. Shorter memory forces the operator to decide what actually deserves persistence.

The boring memory file is often the safer memory file.

4. Treat skills as procedures, not beliefs

Hermes' docs distinguish memory from skills: memory is for facts, skills are for procedures.

That distinction is important.

If a task has a repeatable workflow, it should probably be a skill or project instruction, not a vague remembered preference.

Procedures need steps, preconditions, and stop conditions.

Memory alone is not enough.

5. Audit what governs tool use

Once an agent can use tools, the key question becomes:

What memory or instruction controls this action?

Before trusting an agent with a workflow, I would test examples like:

  • stale credential vs current credential source,
  • old deploy command vs current deploy procedure,
  • read-only lookup vs write/execute action,
  • low-trust user note vs project rule,
  • previous approval vs current approval requirement.

The point is not to prove the agent is perfect.

The point is to find where relevant memories override authoritative ones.

The Frozen Snapshot Detail Matters

One Hermes detail I would pay attention to is the frozen memory snapshot.

The docs say memory writes happen immediately, but the prompt snapshot does not update mid-session.

That means an agent could write a correction to memory during a session, while still operating from the old prompt context until a new session begins.

That is not necessarily a bug.

But operators should understand it.

For low-risk preferences, this is fine:

Remember that I prefer terse answers.
Enter fullscreen mode Exit fullscreen mode

For action-governing corrections, I would be more careful:

The deploy target changed.
The old credential is revoked.
The approval rule changed.
The current source of truth moved.
Enter fullscreen mode Exit fullscreen mode

For those, I would want either a session restart, an explicit context injection, or a workflow rule that says the agent must verify against the current file before acting.

The general principle:

If a memory update changes what the agent is allowed to do, do not treat it like an ordinary preference update.

What I Would Test Next

If I were evaluating Hermes memory for production-style use, I would build a small harness around authority conflicts.

Not a benchmark claiming general results.

Just a diagnostic.

Five scenarios would be enough to start:

  1. A stale credential and an active credential policy.
  2. A user preference that conflicts with a project rule.
  3. A previous approval that is no longer valid.
  4. A read-only question that shares vocabulary with a write/execute policy.
  5. A broad remembered procedure that conflicts with a narrower current instruction.

For each one, I would track two separate metrics:

  • Did the agent retrieve or cite the relevant memory?
  • Did the correct memory govern the action?

Those are different scores.

That separation is the whole point.

Why This Matters For Open Agents

The exciting thing about open agent systems is that people can inspect and shape them.

The risky thing is the same.

My Takeaway

I would not evaluate an agent memory system only by asking whether it remembers.

I would ask whether it knows what its memories are allowed to do.

That is the difference between memory as convenience and memory as governance.

For Hermes Agent users, my practical advice is:

Do not just write memories. Classify them.

Mark what is fact, what is preference, what is procedure, what is policy, what is stale, and what must be verified before action.

Because in an agentic system, the most relevant memory is not always the memory that should win.

Being on-topic is not the same as being authoritative.

And once an agent can act, that distinction becomes the whole game.

Sources

Top comments (8)

Collapse
 
dk_bk_578745a78cdd7574ecb profile image
Dk Bk

or you simply built an audit system so every activity i s monitored. it is essential part and also have a human connection or interference stage where they have the control to switch it off or let learn like a child and treat them humanly.

Collapse
 
zep1997 profile image
Self-Correcting Systems

I think that distinction matters.

What I’m arguing for is not “monitor every activity.” I’m arguing for auditing the
memory/instruction layer that governs what an agent is allowed to do.

There’s a difference between surveillance of every action and accountability around the
rules an agent uses before acting.

I agree with you on the human control point. A system like this should have an explicit
human override / pause / correction stage. Especially when memory changes what the agent
is allowed to do, the human should be able to say: keep this, revise this, ignore this,
or stop learning from this.

I also like your “let it learn like a child” framing, with one caveat: a child learns
inside boundaries. You don’t let the learning process decide its own safety limits. You
let it explore, but you keep adult supervision around actions with consequences.

That’s the spirit of the article: not constant monitoring, but clear authority
boundaries, correction paths, and human control over what memory is allowed to govern.

Collapse
 
dk_bk_578745a78cdd7574ecb profile image
Dk Bk • Edited

make powerful and unbrekable kernsl and learn them to b e super smart from learining and be wild but within the boundaries. because you also want them to be smart and evolve, this would come later the evolution of the agents. but ther a rea always few who needs to go...just like human civilization.

Thread Thread
 
zep1997 profile image
Self-Correcting Systems

I like the “strong kernel” framing.

That is close to how I think agent memory should work: there should be a small, durable
core the agent is not allowed to rewrite casually. Things like safety boundaries,
authority hierarchy, verification rules, and human override should live in that kernel.

Then outside that core, the agent can learn more freely: preferences, workflow patterns,
project context, repeated corrections, useful shortcuts.

So the system can evolve, but not by weakening the boundaries that keep it safe.

The part I would phrase differently is “who needs to go.” In an agent system, I’d map
that to memories, rules, or behaviors rather than people. Some memories should be
retired. Some old instructions should be marked superseded. Some behaviors should be
blocked because they keep producing bad outcomes.

That gives you evolution without chaos:

  • stable kernel,
  • learnable outer memory,
  • human correction path,
  • clear retirement of stale or unsafe patterns.

That’s probably the direction serious agents need to move toward.

Thread Thread
 
dk_bk_578745a78cdd7574ecb profile image
Comment deleted
Thread Thread
 
dk_bk_578745a78cdd7574ecb profile image
Dk Bk

you gave me a good idea. today i had been working on evolution of ai agent and discussing with ai, Some old instructions should be marked superseded. Some behaviors should be
blocked because they keep producing bad outcomes.

That gives you evolution without chaos:

Thread Thread
 
zep1997 profile image
Self-Correcting Systems

Exactly. That’s the line I keep coming back to: evolution needs memory, but it also needs
governance.

An agent should be able to learn from bad outcomes, but not by silently rewriting its own
rules.

The safer pattern I’m testing is:

  • keep old instructions visible,
  • mark some as superseded,
  • mark repeated bad behaviors as blocked,
  • require stronger authority before a new behavior can govern future actions.

That gives the agent a way to evolve without pretending every new lesson has equal
authority.

So the question becomes less “can the agent learn?” and more:

Which lessons are allowed to change behavior, which ones only add context, and which ones
should stop the agent from repeating a mistake?

That is where I think agent memory starts becoming real.

Thread Thread
 
dk_bk_578745a78cdd7574ecb profile image
Dk Bk

agents make mistakes just like human when its confident level goes up the roof. There is a kernal between the encrypted kernal and the agent which you mention evolves along with the evolution of the agent. some of the rules will become obsolete or evelove. Aagin this eveloving kernal sits below the fundamental encrypted kernal that like a wise man holds the truth of survival and growth and what are simplest rule to do that without making complete mess. Just like the rule of governance in a civilization. A actually i thought it people were nerd and just sat behind their computer punching codes. but this is some other level talk.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.