DEV Community

Cover image for The AI Agent That Learns While It Works — A Complete Guide to Hermes Agent
Aditya
Aditya

Posted on

The AI Agent That Learns While It Works — A Complete Guide to Hermes Agent

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge


Most AI Agents Are Goldfish. Hermes Is Different.

Let me describe the standard agentic experience of 2024–2025.

You open a terminal, give an agent a task, watch it spin through a dozen steps, and feel genuinely impressed — right up until you close the session. Tomorrow, you start completely from scratch. The agent has no memory of what it learned, no idea who you are, and zero awareness that it made the same mistake three sessions ago.

You're its first user. Every single time.

"We've been shipping amnesia as a feature and calling it 'stateless architecture.' The emperor has no clothes, and the emperor can't remember what clothes were."

This is the problem that Hermes Agent, built by Nous Research, is genuinely trying to solve. Not with a wrapper around an existing API. Not with a clever prompt. With a fundamentally different architecture: a closed learning loop baked into the agent itself.

The longer Hermes runs, the more capable it becomes — at working with you, specifically.


What Is Hermes Agent, Really?

Before we get into setup, the architecture, or how it compares to other frameworks — it's worth being clear about what Hermes actually is, because it doesn't fit neatly into existing categories.

It's not a coding copilot tethered to an IDE.
It's not a chatbot wrapper around a single API.
It's not a rigid workflow automation engine.

It's an autonomous agent that lives wherever you put it — a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. You can talk to it from Telegram while it works on a cloud VM you never SSH into yourself. It runs on your infrastructure. You own the runtime and the data.

Here's a quick snapshot of what ships in the box:

Stat Number
Built-in tools 70+
Messaging platform integrations 20+
Terminal execution backends 6
License MIT

Part 1: Getting Started — From Zero to Running in Under 5 Minutes

Step 1 — Install

One command. Works on Linux, macOS, WSL2, and Android via Termux.

# Linux / macOS / WSL2 / Android (Termux)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Reload your shell after install:
source ~/.bashrc   # or source ~/.zshrc

# Windows (PowerShell, early beta):
irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
Enter fullscreen mode Exit fullscreen mode

That's the whole install step. No dependency hunting. No pip install hell. The script handles everything and places the agent at ~/.hermes/hermes-agent.

Step 2 — Choose Your AI Provider

This is the most important setup step. Run hermes model for an interactive selection menu. Here are the main options worth knowing:

Provider Best For Setup Method
Nous Portal Zero-config start, subscription-based OAuth via hermes model
OpenRouter Multi-model experimentation API key
Anthropic Claude models OAuth (Max plan) or API key
GitHub Copilot Using your existing subscription OAuth via hermes model
Custom Endpoint Local models via Ollama / VLLM / llama.cpp Base URL + API key

⚠️ Critical: Hermes requires a model with at least 64,000 tokens of context. Most hosted models (Claude, GPT, Gemini, Qwen, DeepSeek) meet this easily. If you're running locally, set your context size to at least 64K (e.g., --ctx-size 65536 in llama.cpp).

You can switch providers at any time with hermes model — no lock-in.

Step 3 — Your First Session

hermes          # classic CLI
hermes --tui    # modern TUI with overlays and mouse support (recommended)
Enter fullscreen mode Exit fullscreen mode

You'll see a welcome banner showing your provider, model, available tools, and loaded skills. Start with something easy to verify:

Summarize this repo in 5 bullets and tell me the main entrypoint.
Enter fullscreen mode Exit fullscreen mode
Check my current directory and tell me what looks like the main project file.
Enter fullscreen mode Exit fullscreen mode

What success looks like:

  • Banner shows your chosen model and provider
  • Hermes replies without error
  • It uses a tool when needed (terminal, file read, web search)
  • The conversation continues normally for more than one turn

Step 4 — Verify Sessions Work

This matters more than most tutorials mention:

hermes --continue   # Resume the most recent session
hermes -c           # Short form
Enter fullscreen mode Exit fullscreen mode

If this works, you have persistent sessions. That's the foundation everything else is built on.

Step 5 — Add the Next Layer

Only after the base chat works. Don't skip ahead.

Messaging platforms (Telegram, Discord, Slack, WhatsApp, Signal, Email, and 15+ more):

hermes gateway setup
Enter fullscreen mode Exit fullscreen mode

Skills — structured knowledge documents the agent loads on demand:

hermes skills search kubernetes
hermes skills install openai/skills/k8s
Enter fullscreen mode Exit fullscreen mode

MCP servers — add to ~/.hermes/config.yaml:

mcp_servers:
  github:
    command: npx
    args: ["-y", "@modelcontextprotocol/server-github"]
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_xxx"
Enter fullscreen mode Exit fullscreen mode

Docker sandbox — for safety on production work:

hermes config set terminal.backend docker
Enter fullscreen mode Exit fullscreen mode

Voice mode:

cd ~/.hermes/hermes-agent
uv pip install -e ".[voice]"
# Then inside a session: /voice on  (Ctrl+B to record)
Enter fullscreen mode Exit fullscreen mode

💡 Pro tip: Run hermes doctor any time something feels off. It diagnoses config problems and tells you exactly what to fix. Don't add features until hermes doctor is clean.


Part 2: How the Learning Loop Actually Works

This is the part that separates Hermes from the rest of the field. Let me walk through what actually happens under the hood when you use Hermes over time.

The Five-Stage Learning Loop

Stage 1 — Context Loading
Before the agent responds to anything, it loads MEMORY.md (persistent facts about you and your projects) and USER.md (a behavioral model of who you are). It also discovers and loads any context files in your project directory — .hermes.md, AGENTS.md, CLAUDE.md, SOUL.md. The agent starts every session knowing your history.

Stage 2 — Tool Selection and Multi-Step Planning
From 70+ built-in tools, Hermes selects what the task needs. It can spawn subagents via delegate_task — up to 3 concurrent child agents by default, each with isolated context, restricted toolsets, and their own terminal sessions. This is how it parallelizes complex work without the threads stepping on each other.

Stage 3 — Skill Creation for Novel Tasks
When Hermes successfully completes a task it hasn't done before, it can write a reusable Skill document. Next time it faces a similar problem — even in a different session — it loads the Skill and executes efficiently without rediscovering the approach from scratch.

Stage 4 — Memory Consolidation
After sessions, Hermes uses FTS5 full-text search with LLM summarization to curate what's worth keeping. It doesn't dump raw logs into memory — it actively decides what to remember. This keeps memory bounded and useful even after hundreds of sessions. It also uses Honcho's dialectic user modeling to build a deepening picture of who you are across time.

Stage 5 — Self-Improvement
Skills created in previous sessions are eligible for improvement. Hermes can notice when an old Skill isn't working optimally and update it mid-use. The agent gets measurably better at your specific workflows over weeks and months.

This is a closed loop. Most agents have none of it.

The execute_code Power Move

This is the feature that surprised me most. The execute_code tool lets Hermes write Python scripts that call its own tools programmatically, via sandboxed RPC execution:

# Instead of: search → wait → read → wait → summarize → wait → write
# Hermes collapses this into a single LLM turn:

async def research_pipeline(topic):
    results = await tools.web_search(query=topic, n=10)
    pages = [await tools.read_url(r.url) for r in results[:3]]
    summary = await tools.summarize(pages, style="technical")
    await tools.write_file("research.md", summary)
    return summary
Enter fullscreen mode Exit fullscreen mode

This dramatically reduces the token cost of multi-step pipelines. Instead of burning inference tokens on "I will now do step 3 of 7," Hermes writes the whole pipeline as code and runs it. The LLM is only involved at the decision point, not at every mechanical step.

Terminal Backends: It's Not Tied to Your Laptop

Six backends let you separate where you talk to Hermes from where it actually runs:

  • Local — direct execution, fast, simple
  • Docker — sandboxed isolation, the right choice for production work
  • SSH — talk locally, execute on a remote server
  • Daytona / Modal — serverless; the environment hibernates when idle and costs nearly nothing

The Modal and Daytona backends are worth understanding. You can talk to your Hermes agent from your phone via Telegram while the agent's actual work runs on a cloud VM. The environment sleeps when you're not using it. For a personal always-on assistant this changes the economics completely.

The Skills System and agentskills.io

Skills are structured knowledge documents — procedures the agent loads on demand. They follow progressive disclosure: Hermes loads just the skill index first, then drills into full skill content only if the task requires it. Token usage stays low even in long sessions.

Skills are compatible with the open agentskills.io standard — meaning skills you write for Hermes are portable and shareable with the community, and community skills work with your setup without any conversion.


Part 3: How Hermes Stacks Up Against Other Agentic Frameworks

There are serious alternatives in the open-source agent ecosystem. Here's an honest look at where Hermes fits.

Feature Hermes Agent AutoGen CrewAI OpenDevin
Persistent cross-session memory ✅ Native ⚠️ Limited
Autonomous skill creation ✅ Built-in
Multi-step tool use
Messaging platform gateway ✅ 20+ platforms
Runs on $5 VPS / serverless ✅ Yes ⚠️ Possible ⚠️ Possible ❌ GPU needed
Multi-agent delegation ✅ Subagents ✅ Core feature ✅ Crews / Flows ⚠️
Local / self-hosted LLM ✅ Any endpoint
Voice mode ✅ CLI + messaging
MCP server support ⚠️ Via plugins ⚠️ Via plugins
Primary focus Personal autonomous agent Multi-agent orchestration Role-based agent teams Software engineering

Plain-English Breakdown

AutoGen is brilliant if you want fine-grained control over agent-to-agent communication patterns. But it's a framework you orchestrate — not a ready-to-run agent. You write the coordination logic yourself.

CrewAI makes multi-agent teamwork feel intuitive — define crews with roles, let them coordinate. Great for structured pipelines. Less suited for the kind of open-ended "figure it out" sessions where Hermes excels.

OpenDevin is purpose-built for software engineering tasks: browsing, code execution, file editing. Excellent in its lane. That lane is narrower than Hermes's.

Hermes is the agent you'd deploy as your actual daily assistant — one that knows your name, remembers your projects, and gets measurably better at helping you over weeks and months. That's a genuinely different product from the others.

When NOT to use Hermes: If you need a complex multi-agent pipeline with dozens of specialized roles that coordinate on a strict workflow, AutoGen or CrewAI give you more structured control. If your only use case is automated software engineering on repos, Claude Code or OpenDevin are sharper tools. Hermes shines brightest as a personal, persistent, do-everything agent — not a single-purpose workflow engine.


Part 4: What Open Agentic Systems at This Capability Level Actually Mean

I want to say something that might be obvious, but I haven't seen it written plainly.

When your agent genuinely knows you — your preferences, your projects, your quirks, your bad habits — you need to be the one in control of that knowledge. Hermes runs on your infrastructure. You own what it learns about you.

That changes the trust calculus entirely.

Most AI products in this space are cloud-hosted services with "open-source" labels slapped on for marketing. You're renting access to an agent that lives on their servers, stores its state in their database, and disappears if they change their pricing. Hermes runs on a $5 VPS you control, hibernates when idle via Modal or Daytona, and costs nearly nothing when you're not actively using it. The economics and the ownership model are completely different.

The second thing worth saying: agents that compound matter more than agents that are capable on day one.

The tools with the most features in their initial release rarely win. The tools that get measurably better at working with you over time, that reduce the friction of repeated patterns, that remember what you learned together last Tuesday — those are the ones that stick.

Hermes is one of the very few systems being built with that end state explicitly in mind. Not as a future roadmap item. As a first-class architectural property, shipping today, MIT license, running on your hardware.

That's worth paying attention to.


Quick-Start Checklist

Before you go — here are the six things to actually do today:

  • [ ] Run the one-line installer
  • [ ] Run hermes model and pick a provider
  • [ ] Launch hermes --tui and complete your first conversation
  • [ ] Test session resume with hermes --continue
  • [ ] Install a skill with hermes skills search ...
  • [ ] Share what you built on DEV.to for the challenge 🎉

Resources


Built during the Hermes Agent Challenge, May 2026. If you found this useful, I'd love to see what you build — drop it in the comments.

Top comments (0)