Owen

Posted on May 17 • Originally published at ofox.ai

Hermes Agent: The Complete Guide to the Self-Improving AI Agent (Setup, Skills, ofox Integration)

#agents #ai #opensource #tutorial

Hermes Agent: The Complete Guide to the Self-Improving AI Agent (Setup, Skills, ofox Integration)

TL;DR — Hermes Agent v0.14 is the first widely-adopted AI agent built around a closed learning loop: it writes its own reusable skills as it works, persists a three-layer memory across sessions, and runs as either a local CLI or a messaging gateway covering 20+ platforms. Most AI agents reset to zero every conversation — Hermes doesn't, and that single design choice changes what "using an agent" actually means. This guide covers install, connecting it to ofox.ai in five minutes, the skill and memory systems, and where Hermes does and doesn't beat Claude Code or Codex CLI.

What Hermes Agent Actually Is

Hermes Agent is an open-source, model-agnostic conversational agent from Nous Research, first released February 25, 2026. Seven weeks later it cleared 95,000 GitHub stars — the fastest-growing agent framework of the year so far.

Three things make it different from the agent CLIs you've probably used:

It learns across sessions. When Hermes figures out a non-trivial workflow (say, the exact sequence of grep + sed + git commands you use to backfill a config across a repo), it can save that as a markdown skill via the skill_manage tool. On the next similar task, it loads the skill first and acts on it. Skills are stored in SQLite with FTS5 full-text search, so retrieval is fast even after hundreds accumulate.

Memory is structured, not just a context window. Hermes uses a layered memory model rooted in two markdown files under ~/.hermes/memories/ — MEMORY.md for general facts and USER.md for who you are and how you work — plus a Honcho dialectic layer that builds a deepening psychological model from your messages. All of them feed the system prompt of every session.

It lives where you do. The same agent process can run as a terminal TUI, as a Telegram bot, a Discord bot, a Slack bot, a WhatsApp/Signal/Matrix bridge, or a scheduled cron worker — 20+ platforms from one binary. You can yell at the same agent from your laptop and your phone.

The v0.14.0 release (May 16, 2026) ships a refined setup wizard that auto-detects existing OpenClaw installs at ~/.openclaw and offers to migrate them via hermes claw migrate, alongside an updated bundled-skills catalog. v0.14 also added xAI Grok OAuth, a Microsoft Teams gateway, an X/Twitter search tool, and an OpenAI-compatible local proxy mode.

Install in 60 Seconds

On Linux, macOS, WSL2, or Termux:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

On Windows PowerShell:

irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex

The installer drops a hermes binary on your PATH and creates ~/.hermes/ for config, the SQLite database, skills, and memory files. There's no GPU requirement — the LLM lives behind an API, so a $5/month VPS handles a 24/7 messaging gateway deployment without strain.

After install, run:

hermes setup

The wizard walks you through provider selection, model choice, and (optionally) messaging platform tokens. Skip everything you don't need; you can rerun hermes setup or edit config files directly later.

Wiring Hermes to ofox.ai

Hermes reads credentials from ~/.hermes/.env. The cleanest way to route everything through ofox is:

# ~/.hermes/.env
OPENAI_BASE_URL=https://api.ofox.ai/v1
OPENAI_API_KEY=sk-your-ofox-key

Then switch model interactively:

hermes model

This opens a picker over your provider's catalog. For ofox, the model IDs follow the vendor/model-name convention used across the OpenAI-compatible endpoint. Pick what fits the work:

Use case	Recommended model	Why
Code-heavy agent runs	`anthropic/claude-opus-4.7`	87.6% on SWE-bench Verified and 64.3% on the harder SWE-bench Pro — second only to the (non-GA) Claude Mythos Preview, well-suited to multi-file refactors
Long autonomous loops	`anthropic/claude-sonnet-4.6`	Pays for itself when token volume is high — meaningfully cheaper than Opus 4.7 at close coding quality
Cost-sensitive default	`deepseek/deepseek-v4-pro`	Strong reasoning and tool use at a fraction of US-vendor pricing
Frontier reasoning experiments	`openai/gpt-5.5`	Top GA model on SWE-bench Verified at 88.7% (Claude Mythos Preview is higher but not generally available); useful as a second opinion against Claude

One key, one base URL, and Hermes can switch between any of them at will. No separate billing per vendor, no juggling API keys. The same OpenAI-compatible pattern that's documented in the ofox SDK migration guide is exactly what Hermes expects.

This is also why the gateway model fits agents in particular. Agent runs hit dozens of providers across a single task chain — see the API aggregation rationale for the longer argument.

The Skill System: How It Actually Learns

The mental model that confused me at first: skills are not code. They're short markdown files the agent writes for itself.

A skill looks roughly like:

---
name: backfill-config-across-repo
when_to_use: "User wants to add the same config key to multiple files in a repo"
---

1. Use `git grep -l <existing-key>` to enumerate target files
2. For each file, locate the config block via the surrounding context
3. Insert the new key preserving the file's indentation style (detect from siblings)
4. Stage with `git add -p` so the user can review chunk-by-chunk before commit

The agent retrieves skills via FTS5 search keyed on the when_to_use line, then injects the matched skill into its context before tool-calling. The retrieval prompt is part of the system prompt — you don't trigger skills manually, but you can list them with hermes skills list (or /skills list from inside a chat) and audit individual files in ~/.hermes/skills/.

What changes after a few weeks of use: the agent stops re-deriving workflows it has already done. Nous Research's internal benchmarks show that once an agent has accumulated 20+ self-created skills, similar future tasks finish about 40% faster than a fresh instance. The underlying model didn't get better; the agent just doesn't have to plan from scratch every time.

The honest caveat: skills only help when the new task pattern-matches an old one. Truly novel work doesn't benefit, and skill quality degrades if you use the agent for too wide a range of unrelated tasks without curation. v0.12's autonomous Curator was added precisely to rewrite or retire underperforming skills on a weekly schedule.

Three-Layer Memory: Where State Lives

Memory in Hermes is intentionally explicit, not a magic black box:

~/.hermes/memories/MEMORY.md — general facts the agent should know across all conversations. Project names, conventions, ongoing work. You can edit it directly.
~/.hermes/memories/USER.md — who you are and how you communicate. The agent updates this from observation via its memory tool; you can override.
Honcho dialectic layer — a structured representation of you and recent conversations, retrievable by similarity. Runs behind an HTTP API; can be hosted or pointed at a local instance.

All of them are pulled into the system prompt at session start. The memory files are plain markdown — cat, $EDITOR, or git will all do the right thing. For provider configuration there are CLI helpers:

hermes memory setup    # configure external memory providers
hermes memory status   # show which providers are active
hermes sessions list   # browse past sessions

Because everything is local SQLite + plain markdown by default, nothing leaves your machine unless you point Hermes at a hosted memory provider. The agent's persistent state is yours to grep, version-control, or back up.

Running Hermes as a Messaging Gateway

This is the deployment mode that makes Hermes feel structurally different from a CLI agent like Claude Code or Codex CLI.

A single Hermes process can listen on every configured platform simultaneously. The minimum Telegram bridge looks like:

# ~/.hermes/.env
TELEGRAM_BOT_TOKEN=your-token-from-botfather
TELEGRAM_ALLOWED_USERS=123456789

Then run the gateway in the foreground:

hermes gateway

Use hermes gateway setup for the interactive wizard that walks you through credentials for each platform you want enabled. The same agent, the same skills, the same memory — now reachable from your phone. The currently supported gateway platforms are CLI, Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, DingTalk, Feishu, WeCom, Weixin, QQ Bot, Yuanbao, BlueBubbles, Home Assistant, Microsoft Teams, and Google Chat — 20+ in one process.

The closest analog in mainstream tooling doesn't really exist. Claude Code and Codex are terminal-bound. Cursor is editor-bound. Hermes is the first widely-used agent that treats "where the user is" as a runtime parameter rather than a deployment decision.

When Hermes Wins, When It Doesn't

Be honest about this — it's a different shape of tool, not strictly better.

Hermes wins when:

You want one agent that follows you across surfaces (laptop, phone, group chats)
Your workflow has high repetition the agent can capture as skills
You want vendor independence and can pair it with a model gateway like ofox
You'd benefit from persistent context across days or weeks
You want full local control of memory, skills, and conversation history

Claude Code or Codex CLI win when:

You're doing tightly editor-coupled work where IDE integration matters more than cross-session memory
You want vendor-tuned reasoning behavior — Anthropic's Opus 4.7 in Claude Code or OpenAI's GPT-5.5 in Codex is more predictable than the same model called naively through a generic harness
Your tasks are short and one-shot; skill accumulation is dead weight
You don't want to maintain a persistent service

For most developers who've already invested in Claude Code, the answer is "use both." Hermes for the long-running, cross-surface, messaging-style work; Claude Code for tight editor-coupled coding sessions. Same ofox key powers both.

Cost Picture

The cost equation is simple in a way that's rare in this space:

Line item	Cost
Hermes Agent software	Free (MIT licensed)
VPS to host messaging gateway	$5–$10/month (Hetzner CX11, DigitalOcean basic, etc.)
Local install	$0
API tokens	Whatever you'd pay otherwise

The agent itself adds no overhead beyond the prompt context for loaded skills (typically 500–1500 tokens). Routing through ofox keeps per-token costs as low as your model picks allow — the API cost reduction playbook applies directly here, especially the routing-by-task-difficulty pattern.

For an individual developer using Hermes a few hours a day on a mix of Sonnet 4.6 and DeepSeek V4 Pro through ofox, monthly token spend tends to land in the $15–$60 range. The gateway VPS is usually the smaller line item.

What's Coming Next

The companion project to watch is hermes-agent-self-evolution, released alongside v0.14. It uses GEPA — a technique accepted as an ICLR 2026 Oral — to read execution traces and propose targeted improvements to skills and system prompts, rather than just retrying failed attempts. The integration is still optional, but it's a credible path to agents that don't just learn what to do, but learn how to learn.

If you've been waiting for agent frameworks to stop feeling like wrappers over a chat completion call, this is the first one I'd actually point to. Self-improving agents aren't a 2027 problem anymore — Hermes shipped one, and 95K stars in eleven weeks is the developer community saying it noticed.

Pair it with the right model for your agent workload, route through ofox so you can swap providers without re-plumbing, and give it a few weeks to build up a skill library. The gains compound faster than you'd expect.

Originally published on ofox.ai/blog.

DEV Community

Hermes Agent: The Complete Guide to the Self-Improving AI Agent (Setup, Skills, ofox Integration)

Hermes Agent: The Complete Guide to the Self-Improving AI Agent (Setup, Skills, ofox Integration)

What Hermes Agent Actually Is

Install in 60 Seconds

Wiring Hermes to ofox.ai

The Skill System: How It Actually Learns

Three-Layer Memory: Where State Lives

Running Hermes as a Messaging Gateway

When Hermes Wins, When It Doesn't

Cost Picture

What's Coming Next

Top comments (0)