Hermes Agent: The Complete Guide to the Self-Improving AI Agent (Setup, Skills, ofox Integration)
TL;DR — Hermes Agent v0.14 is the first widely-adopted AI agent built around a closed learning loop: it writes its own reusable skills as it works, persists a three-layer memory across sessions, and runs as either a local CLI or a messaging gateway covering 20+ platforms. Most AI agents reset to zero every conversation — Hermes doesn't, and that single design choice changes what "using an agent" actually means. This guide covers install, connecting it to ofox.ai in five minutes, the skill and memory systems, and where Hermes does and doesn't beat Claude Code or Codex CLI.
What Hermes Agent Actually Is
Hermes Agent is an open-source, model-agnostic conversational agent from Nous Research, first released February 25, 2026. Seven weeks later it cleared 95,000 GitHub stars — the fastest-growing agent framework of the year so far.
Three things make it different from the agent CLIs you've probably used:
It learns across sessions. When Hermes figures out a non-trivial workflow (say, the exact sequence of grep + sed + git commands you use to backfill a config across a repo), it can save that as a markdown skill via the skill_manage tool. On the next similar task, it loads the skill first and acts on it. Skills are stored in SQLite with FTS5 full-text search, so retrieval is fast even after hundreds accumulate.
Memory is structured, not just a context window. Hermes uses a layered memory model rooted in two markdown files under ~/.hermes/memories/ — MEMORY.md for general facts and USER.md for who you are and how you work — plus a Honcho dialectic layer that builds a deepening psychological model from your messages. All of them feed the system prompt of every session.
It lives where you do. The same agent process can run as a terminal TUI, as a Telegram bot, a Discord bot, a Slack bot, a WhatsApp/Signal/Matrix bridge, or a scheduled cron worker — 20+ platforms from one binary. You can yell at the same agent from your laptop and your phone.
The v0.14.0 release (May 16, 2026) ships a refined setup wizard that auto-detects existing OpenClaw installs at ~/.openclaw and offers to migrate them via hermes claw migrate, alongside an updated bundled-skills catalog. v0.14 also added xAI Grok OAuth, a Microsoft Teams gateway, an X/Twitter search tool, and an OpenAI-compatible local proxy mode.
Install in 60 Seconds
On Linux, macOS, WSL2, or Termux:
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
On Windows PowerShell:
irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
The installer drops a hermes binary on your PATH and creates ~/.hermes/ for config, the SQLite database, skills, and memory files. There's no GPU requirement — the LLM lives behind an API, so a $5/month VPS handles a 24/7 messaging gateway deployment without strain.
After install, run:
hermes setup
The wizard walks you through provider selection, model choice, and (optionally) messaging platform tokens. Skip everything you don't need; you can rerun hermes setup or edit config files directly later.
Wiring Hermes to ofox.ai
Hermes reads credentials from ~/.hermes/.env. The cleanest way to route everything through ofox is:
# ~/.hermes/.env
OPENAI_BASE_URL=https://api.ofox.ai/v1
OPENAI_API_KEY=sk-your-ofox-key
Then switch model interactively:
hermes model
This opens a picker over your provider's catalog. For ofox, the model IDs follow the vendor/model-name convention used across the OpenAI-compatible endpoint. Pick what fits the work:
| Use case | Recommended model | Why |
|---|---|---|
| Code-heavy agent runs | anthropic/claude-opus-4.7 |
87.6% on SWE-bench Verified and 64.3% on the harder SWE-bench Pro — second only to the (non-GA) Claude Mythos Preview, well-suited to multi-file refactors |
| Long autonomous loops | anthropic/claude-sonnet-4.6 |
Pays for itself when token volume is high — meaningfully cheaper than Opus 4.7 at close coding quality |
| Cost-sensitive default | deepseek/deepseek-v4-pro |
Strong reasoning and tool use at a fraction of US-vendor pricing |
| Frontier reasoning experiments | openai/gpt-5.5 |
Top GA model on SWE-bench Verified at 88.7% (Claude Mythos Preview is higher but not generally available); useful as a second opinion against Claude |
One key, one base URL, and Hermes can switch between any of them at will. No separate billing per vendor, no juggling API keys. The same OpenAI-compatible pattern that's documented in the ofox SDK migration guide is exactly what Hermes expects.
This is also why the gateway model fits agents in particular. Agent runs hit dozens of providers across a single task chain — see the API aggregation rationale for the longer argument.
The Skill System: How It Actually Learns
The mental model that confused me at first: skills are not code. They're short markdown files the agent writes for itself.
A skill looks roughly like:
---
name: backfill-config-across-repo
when_to_use: "User wants to add the same config key to multiple files in a repo"
---
1. Use `git grep -l <existing-key>` to enumerate target files
2. For each file, locate the config block via the surrounding context
3. Insert the new key preserving the file's indentation style (detect from siblings)
4. Stage with `git add -p` so the user can review chunk-by-chunk before commit
The agent retrieves skills via FTS5 search keyed on the when_to_use line, then injects the matched skill into its context before tool-calling. The retrieval prompt is part of the system prompt — you don't trigger skills manually, but you can list them with hermes skills list (or /skills list from inside a chat) and audit individual files in ~/.hermes/skills/.
What changes after a few weeks of use: the agent stops re-deriving workflows it has already done. Nous Research's internal benchmarks show that once an agent has accumulated 20+ self-created skills, similar future tasks finish about 40% faster than a fresh instance. The underlying model didn't get better; the agent just doesn't have to plan from scratch every time.
The honest caveat: skills only help when the new task pattern-matches an old one. Truly novel work doesn't benefit, and skill quality degrades if you use the agent for too wide a range of unrelated tasks without curation. v0.12's autonomous Curator was added precisely to rewrite or retire underperforming skills on a weekly schedule.
Three-Layer Memory: Where State Lives
Memory in Hermes is intentionally explicit, not a magic black box:
-
~/.hermes/memories/MEMORY.md— general facts the agent should know across all conversations. Project names, conventions, ongoing work. You can edit it directly. -
~/.hermes/memories/USER.md— who you are and how you communicate. The agent updates this from observation via itsmemorytool; you can override. - Honcho dialectic layer — a structured representation of you and recent conversations, retrievable by similarity. Runs behind an HTTP API; can be hosted or pointed at a local instance.
All of them are pulled into the system prompt at session start. The memory files are plain markdown — cat, $EDITOR, or git will all do the right thing. For provider configuration there are CLI helpers:
hermes memory setup # configure external memory providers
hermes memory status # show which providers are active
hermes sessions list # browse past sessions
Because everything is local SQLite + plain markdown by default, nothing leaves your machine unless you point Hermes at a hosted memory provider. The agent's persistent state is yours to grep, version-control, or back up.
Running Hermes as a Messaging Gateway
This is the deployment mode that makes Hermes feel structurally different from a CLI agent like Claude Code or Codex CLI.
A single Hermes process can listen on every configured platform simultaneously. The minimum Telegram bridge looks like:
# ~/.hermes/.env
TELEGRAM_BOT_TOKEN=your-token-from-botfather
TELEGRAM_ALLOWED_USERS=123456789
Then run the gateway in the foreground:
hermes gateway
Use hermes gateway setup for the interactive wizard that walks you through credentials for each platform you want enabled. The same agent, the same skills, the same memory — now reachable from your phone. The currently supported gateway platforms are CLI, Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Email, SMS, DingTalk, Feishu, WeCom, Weixin, QQ Bot, Yuanbao, BlueBubbles, Home Assistant, Microsoft Teams, and Google Chat — 20+ in one process.
The closest analog in mainstream tooling doesn't really exist. Claude Code and Codex are terminal-bound. Cursor is editor-bound. Hermes is the first widely-used agent that treats "where the user is" as a runtime parameter rather than a deployment decision.
When Hermes Wins, When It Doesn't
Be honest about this — it's a different shape of tool, not strictly better.
Hermes wins when:
- You want one agent that follows you across surfaces (laptop, phone, group chats)
- Your workflow has high repetition the agent can capture as skills
- You want vendor independence and can pair it with a model gateway like ofox
- You'd benefit from persistent context across days or weeks
- You want full local control of memory, skills, and conversation history
Claude Code or Codex CLI win when:
- You're doing tightly editor-coupled work where IDE integration matters more than cross-session memory
- You want vendor-tuned reasoning behavior — Anthropic's Opus 4.7 in Claude Code or OpenAI's GPT-5.5 in Codex is more predictable than the same model called naively through a generic harness
- Your tasks are short and one-shot; skill accumulation is dead weight
- You don't want to maintain a persistent service
For most developers who've already invested in Claude Code, the answer is "use both." Hermes for the long-running, cross-surface, messaging-style work; Claude Code for tight editor-coupled coding sessions. Same ofox key powers both.
Cost Picture
The cost equation is simple in a way that's rare in this space:
| Line item | Cost |
|---|---|
| Hermes Agent software | Free (MIT licensed) |
| VPS to host messaging gateway | $5–$10/month (Hetzner CX11, DigitalOcean basic, etc.) |
| Local install | $0 |
| API tokens | Whatever you'd pay otherwise |
The agent itself adds no overhead beyond the prompt context for loaded skills (typically 500–1500 tokens). Routing through ofox keeps per-token costs as low as your model picks allow — the API cost reduction playbook applies directly here, especially the routing-by-task-difficulty pattern.
For an individual developer using Hermes a few hours a day on a mix of Sonnet 4.6 and DeepSeek V4 Pro through ofox, monthly token spend tends to land in the $15–$60 range. The gateway VPS is usually the smaller line item.
What's Coming Next
The companion project to watch is hermes-agent-self-evolution, released alongside v0.14. It uses GEPA — a technique accepted as an ICLR 2026 Oral — to read execution traces and propose targeted improvements to skills and system prompts, rather than just retrying failed attempts. The integration is still optional, but it's a credible path to agents that don't just learn what to do, but learn how to learn.
If you've been waiting for agent frameworks to stop feeling like wrappers over a chat completion call, this is the first one I'd actually point to. Self-improving agents aren't a 2027 problem anymore — Hermes shipped one, and 95K stars in eleven weeks is the developer community saying it noticed.
Pair it with the right model for your agent workload, route through ofox so you can swap providers without re-plumbing, and give it a few weeks to build up a skill library. The gains compound faster than you'd expect.
Originally published on ofox.ai/blog.
Top comments (0)