Uuhi Reddy

Posted on May 24

30 Days With Hermes Agent: The Only AI That Learned From My Mistakes

#hermesagentchallenge #devchallenge #agents #ai

Hermes Agent Challenge Submission: Write About Hermes Agent

Why Your AI Agent Forgets Everything — And What I Found After 30 Days With Hermes

Picture this: You spend two hours getting your AI assistant up to speed on your project. The database schema, the API conventions, the weird edge cases in your auth flow. It clicks. You ship features together.

The next morning, new session.

"Hey, can you help me add a new endpoint following our API conventions?"

"Sure! What conventions would you like me to follow?"

Gone. All of it.

If you've used GitHub Copilot, Claude, or ChatGPT for coding, you know this. It's not a bug — it's an architectural choice most AI tools make: no persistent memory between sessions.

After living with this frustration long enough, I went looking for something different. I found Hermes Agent, an open-source agent built by Nous Research, and spent 30 days putting it through real work. Here's what I actually found — the good, the limitations, and the things that surprised me.

The Problems I Was Actually Trying to Solve

Before getting into Hermes, let me be clear about the problems I was hitting day-to-day. These aren't invented pain points — they're friction I ran into constantly.

Repetition tax. Every new session meant re-explaining context. Preferences, project structure, conventions I'd already corrected — all of it had to be restated. Across a busy week, this added up to a surprising amount of wasted time.

Token bloat. Long conversations got expensive fast. As chat history grew, every new message was re-reading the entire conversation. The longer I worked, the higher the per-message cost.

The one-bad-prompt problem. One confusing or ambiguous prompt could send a session sideways. Once the agent latched onto a wrong assumption, recovering mid-conversation was painful. Starting fresh felt safer than correcting course.

Workspace lock-in. IDE-based assistants only see what's open right now. "How did we handle auth in the last project?" — not possible. Each new codebase meant starting from zero, even for patterns I'd solved before.

Siloed knowledge. When I'd spent time "training" an agent on a codebase — correcting it, teaching it conventions — that knowledge lived only on my machine. Sharing it with a teammate meant re-explaining everything from scratch.

None of these are exotic problems. If you use AI assistants regularly, you've hit at least three of them.

What Hermes Agent Actually Is

Hermes Agent is an open-source autonomous agent built by Nous Research, released in February 2026. It's not an IDE plugin or a chatbot wrapper. It runs on your own infrastructure — a VPS, a Docker container, or serverless backends — and you talk to it from wherever you are: CLI, Telegram, Discord, Slack, and 20+ other platforms.

The core bet Hermes makes is that an agent's value should compound over time through what it learns, not just what it can do on day one. That sounds like marketing copy, and I was skeptical too. Here's how it actually plays out.

The Architecture That Makes It Different

Persistent Memory

Hermes maintains two curated files across all sessions:

MEMORY.md — project knowledge, conventions, important decisions, recurring patterns
USER.md — your preferences, working style, environment details, things it's learned about how you operate

Both are stored in ~/.hermes/memories/ and injected into the system prompt as a snapshot at the start of each session. The agent manages these itself — it can add, update, or remove entries via a memory tool. When memory fills up, it consolidates rather than just appending indefinitely.

One nuance worth understanding: memory changes during a session are written to disk immediately but don't appear in the system prompt until the next session starts. This is intentional — it preserves the prefix cache for performance. You'll see the effect next time you open a conversation, not mid-session.

What this solves in practice: after a few days with a project, I stopped having to re-explain our API conventions. After a week, it was proactively noting when something I was doing diverged from patterns we'd established. That shift — from tool to context-aware collaborator — is real, but it takes time to accumulate.

The prompt caching piece matters too. Built-in cross-session prefix caching (for Claude on Anthropic, OpenRouter, and Nous Portal) means memory file costs stay low even as they grow. You're not paying to re-read everything — you're loading cached, curated snapshots.

Skills System

After complex tasks (typically those involving 5+ tool calls), Hermes can automatically extract the workflow as a reusable skill — a structured markdown document following the agentskills.io open standard. Skills include step-by-step procedures, tool usage patterns, and error handling strategies.

The part that initially sounded gimmicky but turned out to be genuinely useful: skills self-improve during use. When Hermes uses a skill and encounters a better approach or an edge case the original didn't cover, it patches the skill in place. Next run starts with the improved version.

My honest experience: the auto-generated skills are inconsistent in quality. Some are excellent distillations of a workflow. Others are too generic to be useful. I've learned to review and occasionally manually clean them up, especially early on. Hermes has an autonomous Curator that consolidates overlap and archives stale skills, but you still want to check in periodically.

SOUL.md — Consistent Personality Across Sessions

Most agents reset their communication style with each session. Hermes uses a SOUL.md file at the top of the system prompt to define how it communicates, its defaults, and its domain focus. You can swap personalities per session using /personality presets, but the base SOUL.md ensures consistent behavior across all interactions. It's a small thing that matters more than I expected — particularly when you're not in the mood to re-calibrate the agent's tone every morning.

Seven Terminal Backends

Hermes runs on seven execution backends: local, Docker, SSH, Daytona, Singularity, Modal, and Vercel Sandbox. Daytona and Modal offer serverless persistence — the environment hibernates when idle and wakes on demand, so a 24/7 agent doesn't require a 24/7 bill.

For most developers, local or Docker is fine to start. The serverless backends become interesting when you want the agent running scheduled tasks overnight without keeping a machine on.

Scheduled Automations (Cron)

Hermes includes a built-in cron scheduler. You describe a job in natural language:

/cron add "Every Monday at 9am, summarize my GitHub notifications and send to Telegram"

Or use standard cron expressions for precision. Jobs can run with full tool access and deliver results to any connected messaging platform.

This is where the "agent that works while you sleep" promise becomes tangible. Once you have a few scheduled tasks running, Hermes starts feeling less like a chat interface and more like an autonomous system.

Event Hooks

Beyond cron, Hermes supports event hooks that fire on specific triggers — agentStop, promptSubmit, fileEdited, and others. I used this to maintain an automatic project history log:

{
  "name": "Project History Logger",
  "when": { "type": "agentStop" },
  "then": { "type": "runCommand", "command": "append-to-history.sh" }
}

Every conversation gets archived automatically. Weeks later, searching for why a particular architectural decision was made takes seconds. This was unexpectedly valuable for projects spanning multiple sprints.

What I Actually Built and Tested

Long-Running Research (3 weeks)

I used Hermes as the primary assistant for researching distributed consensus algorithms — reading papers, synthesizing findings, tracking open questions. With any other tool, each session would have started cold.

By week two, the experience was noticeably different. The agent had learned my research preferences, remembered which papers I'd flagged as important, and — without prompting — started connecting things I'd read in week one to new findings. The cross-session recall using FTS5 full-text search worked reliably for queries like "what did we cover on Raft last Tuesday?"

This isn't magic. It's curated memory plus search. But it meaningfully reduces the overhead of long research projects.

Automated Log Monitoring

I set up a cron job to check production logs hourly and alert me on Telegram for critical errors. After correcting the agent a few times on what "critical" actually meant in our context (not every warning, not retries), it wrote that definition to memory and stopped over-alerting.

The feedback loop — correct once, remember always — is exactly what makes this kind of automation practical. With stateless tools, you're correcting forever.

Cross-Platform Continuity

I started conversations in the CLI during coding sessions, continued them on Telegram during commutes, and picked them back up in Discord. Same memory, same skills, zero re-orientation needed.

This sounds simple. In practice it's one of the more useful things about Hermes, especially for tasks that span multiple days and contexts.

Cross-Project Memory

This is something IDE-bound assistants genuinely cannot match. Working on a new codebase, I asked: "What was the error handling pattern we used in the auth service last month?" Hermes searched its session history, found the relevant conversation, and applied the pattern to the new context.

For developers who rotate between projects or frequently start new codebases, this is the killer feature. You're not manually documenting patterns or trying to remember past solutions — the agent has them.

Sharing a Trained Agent With a Teammate

After several weeks building up context — teaching Hermes our team's conventions, error handling patterns, deployment workflows — a new developer joined. With any other tool, I'd have to walk them through it all manually, or they'd spend days re-teaching their own agent.

With Hermes, I packaged the trained agent using Profile Distributions — a git-based format that bundles personality, skills, cron jobs, MCP connections, and config into a single installable profile:

hermes profile install github.com/our-org/team-agent --alias

The new developer had our full skills library, SOUL.md, and cron setup immediately. Their own memories, API keys, and session history remained private — only the agent's accumulated knowledge transferred.

When I updated the agent later, they ran:

hermes profile update team-agent

One command. No re-explaining.

This is genuinely new. IDE agents are personal tools. The knowledge stays on your machine. Hermes is the first framework where team knowledge is actually transferable.

The Honest Assessment

What works well

Memory persistence is reliable. USER.md and MEMORY.md genuinely carry forward, and the agent's decisions about what to remember improve over time.
FTS5 session search works. Finding past conversations by topic or keyword is fast and accurate.
Profile Distributions work exactly as advertised.
Scheduled tasks run reliably across sessions.
Model flexibility is real — switching between Nous Portal, OpenRouter, Anthropic, or a local endpoint is one command.

What requires active supervision

Skill quality is uneven early on. Auto-generated skills need periodic review, especially in the first week. The Curator helps, but it's not set-and-forget.
Memory curation is the agent's job, but you should verify it. Check USER.md and MEMORY.md occasionally to make sure what's been stored is accurate. I found a few outdated preferences that had stuck around longer than they should have.
Skill self-improvement has no automatic verification. When a skill gets patched during use, the update isn't validated — it's just applied. Review changes, especially for critical workflows.
Setup takes real effort. Full value requires self-hosting. For non-technical users, the barrier is high.

The payoff curve

Hermes is slower to show value than tools like Cursor or Claude Code. Those deliver productivity immediately. Hermes delivers compounding productivity — day 1 is setup, day 30 is where it starts to feel different, day 90 is where the accumulated context becomes genuinely hard to replicate elsewhere.

Neither approach is wrong. They solve different problems. The honest answer for most developers is: use both. Cursor or Claude Code for focused coding sessions; Hermes for everything that spans sessions, projects, and weeks.

Getting Started

Install (Linux / macOS / WSL2):

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

First steps:

Configure your model provider:

   hermes model

Start a session:

   hermes

Try the memory system: "Remember that I prefer TypeScript over JavaScript" → then next session: "What do you know about me?"
Create a skill: Complete a complex task and ask "Can you save this workflow as a skill?"
Schedule something: /cron add "Every day at 9am, summarize my GitHub notifications"

Pro tips from 30 days:

Create a .hermes.md file in your project root with project-specific context. It's loaded automatically and saves a lot of repetition.
Correct the agent explicitly when it gets something wrong, and ask it to remember the correction. One clear correction is worth ten implicit signals.
Set up conversation logging via event hooks early. You'll thank yourself in three weeks when you need to reference a past discussion.
Review memory files weekly at first. Once the agent's model of you stabilizes, you can check less often.

Should You Use Hermes Agent?

Yes, if:

You work on projects that span days, weeks, or multiple codebases
You want an agent that learns from corrections rather than repeating mistakes
You're comfortable with self-hosting or serverless deployment
You need an agent accessible across platforms, not just your IDE
You work in a team and want to share accumulated knowledge without re-explaining from scratch

Probably not yet, if:

You want immediate productivity with zero setup
You're a non-technical user — the terminal-first setup is a real barrier
You need a large pre-built skill ecosystem (OpenClaw's marketplace is broader, with trade-offs)

The bottom line:

Hermes Agent is the most architecturally serious attempt I've found at building an agent that actually gets better the longer you use it. The learning loop is real. The cross-session memory works. The team knowledge-sharing via Profile Distributions is something no IDE agent can match.

It asks more of you upfront than a coding copilot does. But if you're willing to invest in setup and spend a week or two teaching it your context, what you get back is an agent that stops feeling like a tool and starts feeling like a collaborator that actually knows your work.

That's a different thing entirely.

Hermes Agent is open-source (MIT License) and built by Nous Research. All data stays on your machine — no telemetry, no tracking. You can find the project at github.com/NousResearch/hermes-agent and the full documentation at hermes-agent.nousresearch.com.

DEV Community