Jangwook Kim

Posted on Apr 21 • Originally published at effloow.com

Hermes Agent Review: Self-Improving Open-Source AI Agent

#hermesagent #nousresearch #aiagents #opensource

8.1
/ 10



  <span>Learning Loop</span>

  <span>9.5</span>


  <span>Memory System</span>

  <span>9.0</span>


  <span>Developer Experience</span>

  <span>8.0</span>


  <span>Ecosystem</span>

  <span>7.5</span>


  <span>Stability</span>

  <span>6.5</span>

What Is Hermes Agent?

Hermes Agent is Nous Research's open-source AI agent framework, released February 25, 2026. In the seven weeks following its debut, it accumulated 95,600 GitHub stars, making it the fastest-growing agent framework of 2026. The framework is built around a single idea that most agent tools have ignored: an agent should get better at your specific workflow over time, not just execute instructions more reliably. Hermes Agent does this through a closed learning loop that automatically generates reusable skills from experience, refines those skills during continued use, and builds a persistent model of you across sessions. It ships under an MIT license, runs on Python 3.11+, and connects to any LLM backend you already use.

The Self-Improving Learning Loop

The core mechanism that distinguishes Hermes from frameworks like Goose or LangGraph is its autonomous skill creation system. After any task that involves five or more tool calls — retrieving files, running searches, calling APIs — Hermes generates a skill document in Markdown. This document captures the approach taken, the edge cases encountered, and the domain-specific knowledge discovered during that interaction.

The next time a similar task comes up, the agent loads the relevant skill instead of reasoning from scratch. In Nous Research's own benchmarks, agents with 20 or more self-created skills completed research tasks 40% faster than a fresh instance with no prior skills — without any manual prompt tuning. This improvement is domain-specific; a skill built from summarizing GitHub pull requests does not automatically transfer to planning a database migration. Hermes does not claim to solve cross-domain generalization. What it does claim — and appears to deliver — is genuine compounding value within a defined workflow area.

The closed loop works in both directions. Skills created automatically get refined as Hermes uses them. If a skill misses edge cases on the second use, the agent updates it. Over weeks, a developer who runs Hermes daily for code review tasks ends up with a highly specialized code review agent that knows their codebase's conventions, their preferred response style, and the failure modes their team runs into repeatedly.

Three-Layer Memory Architecture

Most agent frameworks treat memory as a flat conversation history. Hermes uses three layers:

Session context is the standard in-context working memory for the current conversation — fast, temporary, cleared between sessions.

Persistent store is a SQLite database with FTS5 full-text search indexing. This is where skills, past task summaries, and extracted user preferences get written. Retrieval stays sub-10ms even across 10,000+ skill documents, which means the system does not degrade as you use it more.

User model is a drift-adjusting representation of who you are: your communication style, the domains you work in, the tools you prefer, and the decisions you tend to make. The drift-adjusting component is important — it actively updates as your behavior changes rather than locking in early assumptions.

Together these layers let Hermes answer "what did we decide about the deployment pipeline two weeks ago?" with the same speed as "what did I just ask?" The SQLite approach is deliberately boring compared to vector database architectures, but it avoids the cold-start problem, works offline, and requires zero infrastructure beyond the agent process itself.

Installation and Quick Start

Getting Hermes running takes under five minutes on a standard developer machine. The one-liner install handles Python version management via uv automatically:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Alternatively, clone the repository directly:

git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
./setup-hermes.sh

The setup script installs uv, creates a virtual environment, and installs all dependencies. It detects what's missing on your system without requiring sudo for Python version management.

Once installed, the interactive terminal UI starts with:

hermes

The full setup wizard — which configures your model provider, messaging gateway platforms, and tool integrations — runs with:

hermes setup

Model selection is handled via hermes model. Hermes supports Nous Portal, OpenRouter (200+ models), NVIDIA NIM (Nemotron), Xiaomi MiMo, z.ai/GLM, Kimi/Moonshot, MiniMax, Hugging Face, and OpenAI endpoints. Switching providers requires no code changes — just re-run hermes model and select a different backend.

Messaging Gateway: All Your Channels, One Process

Hermes ships with a messaging gateway that routes conversations from six external platforms to the same agent backend: Telegram, Discord, Slack, WhatsApp, Signal, and Email. The gateway runs as a single background process that handles session management, cron-scheduled tasks, and voice message transcription across all connected channels simultaneously.

Starting the gateway is a single command:

hermes gateway

Platform-specific configuration lives in hermes gateway setup, which walks through OAuth tokens and webhook configuration per channel. Once running, you can send a task to your Hermes agent from Telegram, get the response in Slack if you switch devices, and the agent maintains session continuity across the switch.

The v0.10.0 release added the Tool Gateway for Nous Portal subscribers: automatic access to Firecrawl (web search), FAL/FLUX 2 Pro (image generation), OpenAI TTS (text-to-speech), and Browser Use (browser automation) without managing separate API keys. The agent selects the appropriate tool at inference time.

The 118 Bundled Skills

Version v0.10.0 ships with 118 pre-built skills covering developer workflows, research tasks, writing, data processing, and system administration. These are starting-point skills written by the Nous Research team — useful immediately but designed to be overwritten by the superior domain-specific versions that Hermes generates from your actual usage patterns.

Unlike OpenClaw's ClawHub marketplace (which distributes community-written skills for download), Hermes does not have a centralized third-party skill repository. Skills in Hermes are generated from your sessions and live locally. This is a deliberate security tradeoff: ClawHub's community marketplace was found to contain approximately 824 malicious packages in an April 2026 audit, roughly 20% of the catalog. Hermes has zero agent-specific CVEs to date.

Strengths
<ul>
  <li>Genuine compounding improvement — 40% faster on repeat tasks with 20+ self-created skills</li>
  <li>Three-layer memory with sub-10ms retrieval across 10,000+ skill documents</li>
  <li>Zero CVEs; local skill generation avoids supply chain risks from marketplaces</li>
  <li>Supports 200+ LLM providers via OpenRouter — no lock-in, switch with one command</li>
  <li>Six messaging platforms in a single background gateway process</li>
  <li>MIT license; free forever, self-hosted, no SaaS dependency</li>
</ul>


Weaknesses
<ul>
  <li>Only v0.x — API stability between minor releases is not guaranteed</li>
  <li>No community skill marketplace; you start from 118 bundled skills and build up</li>
  <li>Self-improvement is domain-specific; cross-task generalization remains limited</li>
  <li>Nous Portal subscription required for the full Tool Gateway (Firecrawl, image gen, browser use)</li>
  <li>Heavy agentic use on frontier models is expensive: $131/day on Claude Opus 4.6 at high volume</li>
</ul>

Pricing Breakdown

Hermes Agent itself is free. The framework has no subscription cost, usage fee, or seat limit. You self-host it on any machine that runs Python 3.11+.

The real cost is LLM API usage. On budget models (Gemini Flash, Qwen3, DeepSeek V3.2), typical interactive sessions cost $0.30 per complex task or $1–3 per day for moderate use. On frontier models, costs scale quickly: heavy agentic use on Claude Opus 4.6 can reach $131/day because each message sends the full conversation history to the API.

Practical cost estimates for a solo developer:

Usage Pattern	Model	Est. Monthly Cost
Light (1-2 hours/day)	Qwen3 / DeepSeek	$15–30
Moderate (4-6 hours/day)	Claude Sonnet 4.6	$60–120
Heavy agentic (8+ hours/day)	Claude Sonnet 4.6	$150–300
Always-on server (VPS)	Any	+$5–10/mo

The Nous Portal subscription (required for Tool Gateway auto-access) has separate pricing not disclosed publicly at time of writing; tools can also be wired manually with individual API keys at no cost.

Who Should Use Hermes Agent?

The ideal Hermes user is a solo developer or small team that runs the same classes of tasks repeatedly over weeks or months. A backend engineer who does daily code review, weekly architecture documentation, and regular incident post-mortems will see compounding improvement: the agent gets faster and more accurate on those specific tasks as self-created skills accumulate. The learning loop delivers its 40% speed improvement most clearly after consistent use in a narrow domain — not in one-off varied sessions.

Hermes also makes sense if security posture matters to you. The local-only skill system means you are not downloading execution instructions from a community marketplace where malware detection is reactive, not preventive. If you are running Hermes on a machine with production credentials, that matters.

Who should skip Hermes for now: Teams that need deployment stability and semantic versioning guarantees. At v0.x, breaking changes between minor releases are possible, and the API surface is actively evolving. Organizations that want 24+ messaging platform integrations out of the box — OpenClaw's gateway breadth is currently unmatched there. And developers looking for a one-week experiment: Hermes's core value proposition requires sustained use to demonstrate. A short evaluation won't show what the learning loop actually produces.

Hermes Agent vs. OpenClaw

OpenClaw and Hermes Agent are the two most-discussed open-source agent frameworks of 2026, and they reflect fundamentally different product philosophies.

OpenClaw (345,000+ GitHub stars) was built gateway-first: it maximizes integration breadth with 24+ messaging platforms and a community marketplace of 13,000+ skills. Hermes Agent (95,600 stars in 7 weeks) was built learning-first: it maximizes depth of improvement through a self-generating skill loop and three-layer memory.

The practical difference: if you need an agent that works seamlessly across twenty platforms from day one, OpenClaw wins on breadth. If you need an agent that gets significantly better at your specific workflows over three months, Hermes wins on depth. For developers choosing between them, the deciding question is: do you need ecosystem reach, or compounding improvement?

For context on the LiteLLM-style multi-provider routing that Hermes uses to support 200+ models, the principle is similar: provider-agnostic routing lets you switch backends without touching application code.

Frequently Asked Questions

Q: Does Hermes Agent work with local models?

Yes. Hermes supports Ollama endpoints, Hugging Face local inference, and any provider that exposes an OpenAI-compatible API. Switch with hermes model — no code changes needed.

Q: How long does it take to see real improvement from the learning loop?

The 40% speed improvement reported by Nous Research appears after approximately 20 self-generated skills in a given domain. For a developer running 2–3 complex tasks daily, that typically takes 2–3 weeks of regular use in the same workflow area.

Q: Is Hermes Agent stable enough for a production environment?

Not yet, if "production" means automated pipelines with zero tolerance for breaking changes. At v0.x, the Nous Research team is actively iterating on the API surface. For interactive developer tooling where you manage the upgrade cycle manually, v0.10.0 is solid for daily use.

Q: Can I use Hermes Agent without a Nous Portal subscription?

Yes. The 118 bundled skills, messaging gateway, and self-improving learning loop all work without any Nous subscription. The Nous Portal subscription only adds the Tool Gateway auto-routing feature (Firecrawl, FAL image gen, OpenAI TTS, Browser Use). You can wire any of those tools manually with individual API keys at no cost.

Q: How does skill generation work behind the scenes?

After a task that involves five or more tool calls, Hermes generates a Markdown file capturing the approach, edge cases, and domain knowledge from that interaction. The file is stored in the persistent SQLite store with FTS5 indexing for fast retrieval. On subsequent similar tasks, the agent queries the store, loads the relevant skill, and uses it as working context.

Key Takeaways

Hermes Agent is the most technically interesting open-source agent framework released in 2026, and it earns that distinction by solving a problem that most frameworks ignore: making agents genuinely better at your specific tasks over time, not just more capable in general. The three-layer memory architecture and autonomous skill generation are the real innovations here — not the 95,600 GitHub stars.

The meaningful caveat is timing. At v0.10.0, Hermes is two months old. Its API will change, its documentation is a moving target, and the compounding learning value requires weeks of consistent use to materialize. If you can invest that time — and you work in defined workflow areas rather than random one-off tasks — it is the most rewarding agent framework available today.

Bottom Line

Hermes Agent delivers on its core promise: a self-improving agent that compounds in value with use. At v0.x it is not a production infrastructure tool, but for a developer who runs the same classes of tasks daily, no other open-source framework currently comes close to what the learning loop produces after 30 days. Install it, give it your real workflow, and evaluate it in 3 weeks — not 3 hours.

Prefer a deep-dive walkthrough? Watch the full video on YouTube.

DEV Community