DEV Community

Cover image for ๐Ÿ”ฎ Hermes Agent ๐Ÿค– โ€” Deep Dive & Build-Your-Own Guide ๐Ÿ“˜
Truong Phung
Truong Phung

Posted on

๐Ÿ”ฎ Hermes Agent ๐Ÿค– โ€” Deep Dive & Build-Your-Own Guide ๐Ÿ“˜

A practical, end-to-end walkthrough of Nous Research's Hermes Agent: the principles it's built on, the architecture that makes it work, and a concrete checklist for building a similar self-improving agent yourself.


๐Ÿ“‹ Table of Contents


๐Ÿค– 1. What Hermes Actually Is (in one paragraph)

Hermes is a model-agnostic, self-improving conversational agent that runs locally as a CLI/TUI, on a server as a messaging gateway (Telegram/Discord/Slack/WhatsApp/Signal), or as a scheduled cron worker. Its key differentiator is a closed learning loop: while solving problems with tools, it writes reusable "skill" documents and curates a persistent memory file so the agent quite literally gets more capable the longer it runs. Everything โ€” model, tools, skills, memory backend, execution environment, UI โ€” is pluggable.

Two ideas to internalize before you build anything:

  1. One agent, many surfaces. A single AIAgent class powers every interface. Surfaces (CLI, gateway, cron, batch, API) are thin entry points that construct an agent and call run_conversation().
  2. Procedural memory > clever prompting. Most "smart agent" behavior comes not from prompt engineering but from the agent owning a folder of markdown documents (skills + memory + persona) it can read, write, and grow over time.

๐Ÿงญ 2. Core Principles

These are the design rules Hermes follows. Keep them in mind for your own build โ€” most "weird" decisions in the codebase trace back to one of these.

2.1 ๐ŸŒ Platform-agnostic core

The agent doesn't know whether it's running in a terminal, a Telegram chat, or a cron job. All platform specifics live in adapters that translate platform events โ†’ agent.run_conversation(...) and translate the response back. If you find yourself adding a Telegram-specific if branch inside core agent code, you've drifted from the architecture.

2.2 ๐Ÿ”’ Prompt stability (cache-friendly)

The system prompt is assembled once at session start and does not mutate mid-conversation. This isn't aesthetic โ€” it's economic. Anthropic and OpenAI prompt caches require a stable prefix to get hits. Mid-conversation toolset changes, memory reloads, or skill swaps invalidate the cache and 10ร— your cost. Defer changes to "next session" by default.

2.3 ๐Ÿ” Progressive disclosure

Don't load every skill, every memory, every tool's full docs into the system prompt. Load descriptions (Level 0). Let the agent pull in full content (Level 1) only when it actually needs that skill. Load referenced files (Level 2) only when the skill itself requests them. This is how Hermes can ship 47 tools and dozens of skills while staying under context limits.

2.4 ๐Ÿ“ Self-registration over central lists

Tools and plugins should register themselves at import time (registry.register(...)) rather than being added to a hand-maintained __all__ list. New tool = one new file, no edits elsewhere.

2.5 ๐Ÿงฑ Profile isolation

Multiple independent agent instances coexist by each owning a HERMES_HOME directory (default ~/.hermes/, override via env var). Every filesystem path in the codebase goes through get_hermes_home() โ€” never hard-code ~/.hermes.

2.6 ๐ŸŽ’ The agent owns its own learning artifacts

Skills are not added by humans editing source code. The agent writes them via a tool called skill_manage after solving a non-trivial task. Memory is not curated by humans โ€” the agent edits MEMORY.md and USER.md between turns. This is the loop.


๐Ÿ—๏ธ 3. High-Level Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                          ENTRY POINTS                            โ”‚
โ”‚  CLI / TUI / Gateway (TG, Discord, Slack) / Cron / Batch / API   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                   โ”‚   each entry point builds an AIAgent
                   โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                       AIAgent (core loop)                        โ”‚
โ”‚  build_system_prompt โ†’ call model โ†’ dispatch tool calls โ†’ repeat โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚             โ”‚                โ”‚                โ”‚
      โ–ผ             โ–ผ                โ–ผ                โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Tools   โ”‚  โ”‚  Skills  โ”‚    โ”‚   Memory   โ”‚   โ”‚  Providers  โ”‚
โ”‚ Registry โ”‚  โ”‚  Loader  โ”‚    โ”‚  Manager   โ”‚   โ”‚ (model API) โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
      โ”‚
      โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Execution Environments: local / Docker / SSH / Modal / Daytona  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
Enter fullscreen mode Exit fullscreen mode

Three tiers, in plain English:

  • Tier 1 โ€” Surfaces: how a human or system talks to the agent (CLI, chat platforms, cron).
  • Tier 2 โ€” Agent core: the loop, plus the four pluggable subsystems (tools, skills, memory, model).
  • Tier 3 โ€” Execution backends: where shell/code-running tools actually run. Local laptop today, sandboxed Docker tomorrow, Modal cloud in production.

๐Ÿ”„ 4. The Agent Loop (the heart of everything)

This is the single most important piece. The whole AIAgent class is essentially this loop:

1. Receive input            โ†’ from CLI / gateway / cron / ACP / web
2. Build system prompt      โ†’ persona + memory + skills + tools (ONCE per session)
3. Resolve provider         โ†’ which API key + endpoint for the chosen model
4. Call model               โ†’ one of FOUR API modes (auto-detected by endpoint/model):
                              chat_completions | codex_responses |
                              anthropic_messages | bedrock_converse
5. Parse response
   โ”œโ”€ if tool calls present โ†’ dispatch each via registry โ†’ append results โ†’ GOTO 4
   โ””โ”€ else                  โ†’ final assistant message โ†’ display โ†’ persist โ†’ done
6. Persist                  โ†’ SQLite SessionDB (WAL mode + FTS5 index)
Enter fullscreen mode Exit fullscreen mode

A few non-obvious details that matter:

  • Iteration budget โ€” more nuanced than a simple counter. A thread-safe IterationBudget is shared across the parent agent and any subagents it spawns. execute_code refunds iterations on completion so a programmatic tool-loop doesn't drain the budget. On exhaustion: one warning message is injected (_budget_exhausted_injected), exactly one final API call is allowed (_budget_grace_call), then summarization is forced. No intermediate warnings โ€” deliberate, to prevent the model from giving up early.
  • Reasoning content is stored separately from the visible assistant message (OpenAI o-series and Anthropic extended thinking both produce hidden "reasoning" tokens). Keep them in their own field; they're needed for cache validity but shouldn't be displayed. Callbacks: stream_delta_callback, interim_assistant_callback, thinking_callback, reasoning_callback.
  • Streaming with stateful scrubbing. A _stream_context_scrubber strips <memory-context> spans even when they're split across chunks โ€” don't underestimate how fiddly this gets when tags straddle network boundaries.
  • Compression, not truncation. When context fills, a context_compressor summarizes middle turns rather than dropping them. The summary itself becomes a message. Lossy is fine; lossless will OOM.
  • Interrupts. Ctrl-C mid-tool-call must cleanly cancel the in-flight tool, append a "user interrupted" tool result to history, and return control. Don't kill the whole loop โ€” let the agent see the interruption and respond.
  • Session resumption. --continue / --resume flags load prior history via SessionDB.get_messages(). SQLite WAL mode + a custom retry layer (20โ€“150 ms jitter, BEGIN IMMEDIATE) handle multi-process write contention. A recap is shown to the user before continuing.

๐Ÿงฉ 5. System Prompt Assembly

A prompt_builder.build_system_prompt() function concatenates these sections, in this order:

  1. Persona โ€” SOUL.md / DEFAULT_AGENT_IDENTITY. Identity, voice, values.
  2. Platform hints โ€” PLATFORM_HINTS. Tells the model whether it's running in CLI, Telegram, Slack, etc. โ€” this changes formatting rules (no MarkdownV2 in CLI, no nested code blocks in Telegram, โ€ฆ).
  3. Memory guidance โ€” MEMORY_GUIDANCE. Embeds a frozen snapshot of MEMORY.md + USER.md as a single block (separated by a ยง delimiter). Size-capped (~2200 chars MEMORY, ~1375 chars USER).
  4. Session search guidance โ€” SESSION_SEARCH_GUIDANCE. Tells the agent it can search prior sessions via FTS5, with a small example.
  5. Skills guidance โ€” SKILLS_GUIDANCE. The Level-0 skills index plus the heuristic prose nudging the agent to create skills after solving hard tasks.
  6. Context files โ€” AGENTS.md and .hermes.md from the working directory.
  7. Tool-use enforcement โ€” TOOL_USE_ENFORCEMENT_GUIDANCE. Hard rules about parallel calls, error recovery, etc.
  8. Tool schemas โ€” JSON schemas for all enabled tools.

Then prompt_caching.py inserts cache breakpoints (Anthropic cache_control: {type: ephemeral}; equivalents for other providers). The whole assembled prefix becomes the cacheable region.

The frozen-snapshot pattern (this is the trick). MEMORY.md and USER.md are read once at session start and embedded immutably in the system prompt for the rest of the session. The agent can still write to those files on disk during the session โ€” but the system prompt does not change. Result: cache stays valid across the whole conversation, and the new memory takes effect next session. Skip this and you destroy your prefix cache.

Memory security scan. Before injection, MEMORY/USER content is scanned for prompt-injection patterns, exfiltration attempts (curl/wget referencing env vars), persistence backdoors, and invisible Unicode. A poisoned memory file is the agent's prion disease โ€” scan defensively.

Key rule: sections 1โ€“8 are frozen for the session. User messages and tool results are appended to history; they don't go into the system prompt.


๐Ÿ› ๏ธ 6. Tools System

6.1 ๐Ÿ“ฆ Self-registering registry

A central tools/registry.py exposes:

registry.register(
    name="read_file",
    toolset="filesystem",
    schema={...JSON schema...},
    handler=read_file_handler,
    available=lambda ctx: True,   # gating predicate
)
Enter fullscreen mode Exit fullscreen mode

Every tool file calls this at module import. The registry handles:

  • Schema collection for the system prompt.
  • Dispatch by name when the model emits a tool call.
  • Availability filtering (per-user, per-platform, per-toolset).
  • Error wrapping โ€” any exception in a handler is converted into a tool result the model can see and react to. Never let a tool crash the loop.

All handlers return JSON strings, not Python objects. The model only ever sees text.

6.2 ๐Ÿ—‚๏ธ Toolsets

Tools group into logical sets (filesystem, web, browser, code, mcp, vision, audio, โ€ฆ) โ€” Hermes ships ~40+ tools (the docs say "47 built-in" in some places, "40+" in others; AGENTS.md says the filesystem is the canonical source because counts shift constantly โ€” don't hard-code numbers in your own version). Users enable/disable by toolset rather than tool-by-tool. Disabled toolsets are completely absent from the system prompt โ€” saves tokens and prevents the model from even knowing about them.

6.3 ๐Ÿ–ฅ๏ธ Execution environments

Tools that run shell commands or code go through an environment abstraction (tools/environments/):

Backend Use case
local Dev on your laptop. Fastest. Zero isolation.
docker Shared dev box. One container per session.
ssh Remote VM. Treat the VM as the agent's "computer".
daytona / modal Serverless sandboxes for production. Auto-spin-up.
singularity HPC clusters.

Same tool, different blast radius. The agent doesn't know โ€” only the environment changes.

6.4 ๐Ÿค– Agent-level tools

A few tools (todo_*, memory_*, skill_manage, skills_list, skill_view) are intercepted before the generic tool dispatch and handled by the agent itself, because they mutate agent state (memory, skills, todo list) rather than the outside world. Keep this category small and explicit.

6.5 ๐Ÿ”— MCP integration

Model Context Protocol servers can be plugged in as additional tool sources. Hermes treats each MCP server as a virtual toolset, lets users filter individual tools, and dispatches calls through the same registry. This is how you get a long tail of integrations (GitHub, Slack, Linear, ...) without writing them yourself.

6.6 ๐Ÿ›ก๏ธ Tool approval & safety (the layered defense)

Shell tools are dangerous. Hermes layers four mechanisms:

  1. Tirith โ€” an external Rust-based scanner with auto-install + SHA-256 verification. Detects homograph URLs, terminal-injection attacks (ANSI escapes that hide commands), and known dangerous patterns.
  2. Regex dangerous-command detection โ€” runs on a normalized command string (case-insensitive, whitespace-collapsed) so attackers can't bypass via RM -RF.
  3. Smart Approval โ€” an LLM risk-rates each command. Low-risk auto-approves; medium/high blocks for human approval.
  4. Approval scopes โ€” when a human approves, they pick Once / Session / Permanent. Trust accumulates instead of asking on every call.

When the agent is running on a messaging gateway and needs approval, it uses a threading.Event to block until the human responds in chat. A /yolo command bypasses approval entirely for trusted sessions. Sandboxed backends auto-bypass approval (the Docker/Modal sandbox is the safety boundary; double-prompting is just friction).


๐Ÿง  7. Skills System (the killer feature)

7.1 ๐Ÿ“„ What a skill is

A skill is a markdown document with YAML frontmatter that teaches the agent how to do one thing well. Not code. Not a config. A runbook the agent reads.

---
name: deploy-staging
description: Push current branch to staging via Vercel and verify health.
version: 1.2.0
platforms: [macos, linux]
requires_toolsets: [shell, web]
fallback_for_toolsets: []
required_environment_variables: [VERCEL_TOKEN]
tags: [deploy, vercel]
category: devops
---

## When to Use
The user asks to "ship", "deploy to staging", or "preview this branch".

## Procedure
1. Run `git status` โ€” abort if dirty.
2. Run `vercel --token=$VERCEL_TOKEN`.
3. Poll `/healthz` until 200 OR 60s timeout.
4. Report the preview URL.

## Pitfalls
- Don't deploy from `main` โ€” only feature branches.
- If the build fails, fetch logs via `vercel logs <deployment>`.

## Verification
The healthz endpoint returns `{"status":"ok"}`.
Enter fullscreen mode Exit fullscreen mode

7.2 ๐Ÿ“ Where skills live

~/.hermes/skills/
โ”œโ”€โ”€ devops/deploy-staging/
โ”‚   โ”œโ”€โ”€ SKILL.md              โ† the file above
โ”‚   โ”œโ”€โ”€ references/           โ† extra docs the skill can pull in
โ”‚   โ”œโ”€โ”€ templates/            โ† file templates
โ”‚   โ”œโ”€โ”€ scripts/              โ† helper scripts the agent can run
โ”‚   โ””โ”€โ”€ assets/               โ† images, etc.
โ”œโ”€โ”€ .hub/                     โ† installed from skills hub
โ””โ”€โ”€ .bundled_manifest         โ† what shipped with Hermes
Enter fullscreen mode Exit fullscreen mode

7.3 ๐Ÿ” Progressive disclosure (3 levels)

This is what keeps token usage sane:

Level What loads When
0 name, description, category Always โ€” in system prompt
1 full SKILL.md content When agent decides to use the skill
2 files in references/, scripts/ When the skill body says "see references/foo.md"

The agent calls a read_skill (or equivalent) tool to escalate from L0 to L1 to L2.

7.4 โšก Triggering

Three ways a skill activates:

  1. Slash command โ€” user types /deploy-staging please ship #123.
  2. Natural language โ€” "deploy this to staging"; the agent matches against L0 descriptions and pulls in L1.
  3. Programmatic โ€” cron jobs explicitly attach skills.

7.5 ๐ŸŽ›๏ธ Conditional activation

Frontmatter fields gate visibility:

  • platforms: [linux] โ€” hidden on macOS.
  • fallback_for_toolsets: [web] โ€” only visible if no premium web tool is enabled (e.g., a DuckDuckGo skill that fills in when Brave Search isn't configured).
  • requires_toolsets: [shell] โ€” hidden if shell tool disabled.

This makes the skill catalog adapt to the deployment.

7.6 ๐Ÿ” Self-improvement: the skill_manage tool

The agent uses two complementary tools:

  • Read path: skills_list (browse Level-0 index) and skill_view (escalate to Level-1/2 content).
  • Write path: skill_manage, a meta-tool with sub-operations:
Action Effect
create New skill from scratch
patch Surgical text replacement (preferred for updates)
edit Full rewrite
delete Remove skill (restricted to user/agent-created skills โ€” can't delete bundled ones)

Note: file management within a skill (references/, scripts/) goes through generic write_file / remove_file tools scoped to the skill's directory.

The SKILLS_GUIDANCE block in the system prompt explicitly nudges the agent to create a skill after:

  • Solving a task that took 5+ tool calls.
  • Finding a non-obvious workaround.
  • Discovering a workflow it might repeat.

Skill installation from the hub is user-driven only (security). The agent never installs untrusted skills on its own โ€” it can only skill_manage create from its own experience.

This is the closed learning loop. The agent writes its own playbooks while it works.

7.7 ๐ŸŒ Skills hub & sharing

Skills are portable markdown โ€” they're trivially shareable. Hermes integrates with multiple sources (official/, skills-sh/, github/, well-known/, url, clawhub, lobehub). On install, each skill is security-scanned for prompt injection, data exfiltration, and destructive commands before being trusted. Trust tiers: builtin > official > community.

The format is the open agentskills.io standard โ€” meaning skills written for Hermes work in other compatible agents.


๐Ÿ’พ 8. Memory System

Three independent mechanisms working together (the "3-layer" framing is a teaching simplification โ€” in the code they're orthogonal):

๐ŸงŠ Mechanism 1 โ€” Frozen-snapshot persistent memory

Two markdown files, both agent-curated:

  • MEMORY.md โ€” facts. "Project ships every Tuesday." "Test DB password is in 1Password vault X." (~2200 char cap)
  • USER.md โ€” user model. "Prefers terse answers." "Senior Go engineer, new to React." (~1375 char cap)

A MemoryStore reads them once at session start and embeds them in the system prompt as a single immutable block (delimited by ยง). The agent can write to those files mid-session (and the writes go to disk), but the system prompt's copy doesn't change until next session. This is what keeps the prefix cache valid.

๐Ÿ—ƒ๏ธ Mechanism 2 โ€” Cross-session recall via SessionDB

A SessionDB (SQLite, WAL mode, FTS5 full-text index) stores every prior conversation turn. On demand, the agent uses a session_search tool to query it; an LLM summarizer condenses hits into a paragraph that fits in context. Multi-process write contention is handled with BEGIN IMMEDIATE + a custom retry loop (20โ€“150 ms jitter).

๐Ÿ”Œ Mechanism 3 โ€” Pluggable provider (Honcho / mem0 / supermemory)

This is a swap-in, not an additional layer. A single MemoryProvider ABC (agent/memory_provider.py); orchestration via agent/memory_manager.py. Lifecycle hooks: prefetch() (before model call), sync_turn() (after turn), shutdown().

Provider knobs that matter:

  • Recall mode: hybrid / context / tools. Tools-mode lets the model decide when to query; context-mode just injects relevant memories every turn.
  • Write frequency: async / turn / session / numeric (every N turns).

Honcho's "dialectic" deserves a note because it sounds mystical and isn't: it runs three sequential reasoning passes โ€” Initial Assessment โ†’ Self-Audit โ†’ Reconciliation โ€” depth controlled by dialecticDepth (1โ€“3). It's effectively self-critique chained for higher-quality user modeling.

You only have one active provider at a time. Pick the right abstraction for your use case (Honcho for deep user modeling, mem0/supermemory for vector recall, none if files+FTS5 are enough).


๐Ÿ”Œ 9. Plugin System

A PluginManager discovers plugins from three places:

  1. ~/.hermes/plugins/ (user-level)
  2. ./.hermes/plugins/ (project-level)
  3. pip entry points (hermes.plugins)

Each plugin defines a register(ctx) function and can hook into lifecycle events:

  • pre_tool / post_tool
  • pre_llm / post_llm
  • session_start / session_end

โ€ฆand can register new tools, new CLI commands, or replace memory providers.

Iron rule: plugins must NEVER modify core files. If a plugin needs something the framework doesn't expose, the framework grows a generic hook โ€” not a special-case import. This keeps the plugin surface stable.


๐Ÿ“‹ 9b. The COMMAND_REGISTRY Pattern (worth stealing)

A single COMMAND_REGISTRY constant in hermes_cli/commands.py is the source of truth for every slash command. From this one structure, the codebase auto-derives:

  • CLI dispatch
  • Gateway hooks (so /skill foo works in Telegram)
  • Telegram inline menu entries
  • Slack slash subcommands
  • prompt_toolkit autocomplete
  • /help text

Adding a new slash command is one new CommandDef entry plus a handler. Zero scattered edits. This is the same pattern as the tools registry, applied to UI commands. Steal it for your own build โ€” it's how Hermes scales surface area without scaling maintenance.


๐ŸŽจ 9c. Skin Engine (theming as data)

YAML files in ~/.hermes/skins/ (with inheritance from default). One YAML controls 18 named colors, spinner faces and verbs, agent name and greeting/farewell, prompt symbols, tool emojis, ASCII banners with Rich markup. Ten built-in skins (default, daylight, mono, poseidon, charizard, โ€ฆ). Hermes Mod ships a web editor with live preview and imageโ†’ASCII conversion.

The takeaway is architectural: branding lives in YAML, not code. A user can fork the look without touching Python. This matters more than you'd think for an agent users live inside for hours.


๐Ÿ“ก 9d. Multimodal & Streaming

  • Vision: vision_analyze tool. Anthropic image-to-text fallback caching via _anthropic_image_fallback_cache (when a model can't see images natively, the cache avoids re-describing them).
  • Audio out: text_to_speech tool.
  • Audio in: voice-memo transcription on the input side.
  • Browser tool: injects multimodal context (screenshots + DOM + extracted text).
  • Streaming: _stream_callback, _current_streamed_assistant_text, plus the stateful _stream_context_scrubber that strips <memory-context> spans even across chunk boundaries.

๐ŸŽ“ 9e. RL / Atropos Training Integration (environments/)

This is arguably the point of the project for Nous Research, not a side-feature. The environments/ directory wraps Hermes for reinforcement-learning training:

  • HermesAgentBaseEnv โ€” abstracts tool resolution and sandbox wiring.
  • HermesAgentLoop โ€” runs the tool-call loop in a way RL rollouts can drive.
  • ToolContext โ€” exposes the sandbox to reward functions (so a reward can grep the filesystem to verify the agent did the work).
  • resize_tool_pool โ€” prevents thread-pool deadlocks during parallel rollouts.
  • Two-phase training pipeline:
    • Phase 1: VLLM/SGLang native tool-call parsing.
    • Phase 2: ManagedServer raw-token parsing โ€” needed for Hermes's XML-style tool tags and DeepSeek's Unicode delimiters.
  • Three-layer tool-result budgeting: per-tool truncation โ†’ sandbox spillover with previews โ†’ per-turn budget. Without this, a single ls / blows out the context window of a training rollout.
  • Pre-integrated benchmarks: TerminalBench 2.0, YC-Bench, WebResearch.

You probably don't need this for a v1 of your own agent. Just know the hooks are there if you ever want to fine-tune your model on its own traces.


๐Ÿ–ฅ๏ธ 10. Surfaces โ€” How the Agent Reaches Users

The same AIAgent powers six distinct surfaces. Each is a thin adapter, not a re-implementation.

10.1 ๐Ÿ’ป CLI (classic)

cli.py (~11k lines). Rich-based panels, prompt_toolkit input with autocompletion, animated spinners (KawaiiSpinner), activity feeds during API calls.

10.2 ๐Ÿ–ผ๏ธ TUI (hermes --tui) โ€” genuinely novel

Not just a fancier CLI. Architecture:

  • Frontend: Node.js + React Ink.
  • Backend: Python tui_gateway/server.py.
  • Wire format: newline-delimited JSON-RPC 2.0 over stdio.

The Python side redirects print to stderr so stdout stays clean for the protocol. A persistent _SlashWorker subprocess runs slash commands, and slow handlers route through a ThreadPoolExecutor so interrupts stay responsive. Distinctive features: streaming chain-of-thought with braille spinners, a ToolTrail tree visualization, virtual-history viewport (only render visible rows), mouse selection.

Design rule from AGENTS.md: Do not re-implement the chat surface in React. The transcript, composer, and slash-command behavior belong to the embedded TUI. Sidebars and inspectors are fine โ€” replacement views are not.

10.3 ๐Ÿ“จ Gateway (messaging platforms)

Telegram, Discord, Slack, WhatsApp, Signal. Each adapter:

  1. Connects to the platform (websocket / long-poll / webhook).
  2. On incoming message: authorizes the user, derives a stable session_key, looks up the session in SessionDB, instantiates an AIAgent with that history.
  3. Calls agent.run_conversation().
  4. Formats and sends the response back (Telegram's MarkdownV2 vs Discord's flavor vs Slack's mrkdwn โ€” this lives in the adapter).

10.4 ๐Ÿ”— ACP (Agent Client Protocol) โ€” for AI-native editors

ACP is the standard protocol Zed and emerging VS Code integrations use to talk to agents. Hermes implements HermesACPAgent. ACP sessions are tied to the editor's cwd and persist in the same shared SessionDB. Hermes tools map to ACP semantic types (e.g. read_file โ†’ read), and the IDE can register MCP servers that the agent then sees as additional toolsets.

10.5 ๐ŸŒ Web UI (hermes web)

React SPA in web/ + FastAPI in hermes_cli/web_server.py. Tabs: Status, Sessions (FTS5 search UI), Config (form + raw YAML), Cron, Skills. Security: ephemeral session tokens, DNS rebinding protection, CORS, rate limiting. EN/ไธญๆ–‡ localization.

10.6 โฐ Cron scheduler (~/.hermes/cron/)

Not APScheduler. A custom scheduler with a 60-second tick() loop running on a background thread inside the gateway process. Jobs are stored as JSON in ~/.hermes/cron/jobs.json (not SQLite). Outputs persist to ~/.hermes/cron/output/{job_id}/{timestamp}.md.

Job definition supports:

  • Intervals (every 30m), 5-field cron, one-shot durations, ISO timestamps.
  • A prompt field (the user message to send).
  • An optional skills list to attach before execution (so a "review-PRs" cron job can pre-load a pr-review skill).
  • Delivery target: local (write only), origin (back to where the job was created), or platform:chat_id (post to a specific Telegram/Slack chat).

Each tick: a fresh AIAgent with no history is created, attached skills load, the prompt runs, output is delivered, job state updates.

10.7 ๐Ÿญ Batch runners (the training data pipeline)

Two siblings in the repo root:

  • batch_runner.py โ€” BatchRunner over multiprocessing.Pool, one isolated AIAgent per worker. toolset_distributions.py samples toolsets per prompt by independent inclusion probabilities. Checkpointing in checkpoint.json keyed on prompt text, not index (so prompt-list edits don't invalidate the checkpoint). Outputs trajectories formatted for HuggingFace; reasoning detected via <REASONING_SCRATCHPAD> tags or native thinking tokens โ€” trajectories without reasoning are discarded.
  • mini_swe_runner.py โ€” sibling runner for SWE-style benchmark runs.

These are how Nous generates training data from real agent runs.


๐Ÿ‘ค 11. Profiles & Multi-Instance

You want to run a "personal" agent and a "work" agent on the same machine without their memories crossing? Profiles.

Implementation is dead simple but the timing is critical:

  • Each profile has its own HERMES_HOME directory.
  • _apply_profile_override() in hermes_cli/main.py sets HERMES_HOME before any other module imports run. If you set it after imports, modules that read paths at import time will use the wrong home.
  • Every path lookup goes through get_hermes_home(). Hardcoded ~/.hermes paths anywhere in the codebase break profile isolation.

Things to get right:

  • Tests must mock both Path.home() and the HERMES_HOME env var โ€” getting one but not the other leads to flaky failures.
  • Gateway adapters acquire a per-profile token lock so two profiles can't both try to consume the same Telegram bot token.
  • Honcho identities (and other memory provider IDs) are profile-scoped โ€” don't share them across profiles or you'll cross-pollute user models.

โš™๏ธ 12. Configuration & Secrets

Three knobs, three places:

What Where Why
Model, toolsets, terminal backend, skin config.yaml Non-secret, version-controlled with the profile
API keys, tokens .env Secrets, never logged
Per-skill settings each skill's config.yaml Skill-local

Three config loaders (load_cli_config, load_config, direct YAML) exist because CLI/tool/gateway runtime have subtly different needs. Don't merge them prematurely โ€” the duplication is intentional.


๐Ÿ’ฐ 13. Prompt Caching (the cost story)

This is the single biggest reason your agent will be cheap or expensive in production.

Do:

  • Build the system prompt once per session.
  • Insert provider-specific cache breakpoints (Anthropic: cache_control: {type: "ephemeral"} on the last static message in the prefix).
  • Use the frozen-snapshot pattern for memory: read MEMORY/USER files once at session start and embed them immutably even if they change on disk later.
  • Defer config changes ("toolset on/off", "switch model") to next session. Slash commands that mutate state should accept an optional --now flag if invalidation is truly required, but default to deferred.

Don't:

  • Reload memory mid-conversation.
  • Add/remove tools mid-conversation.
  • Mutate the system prompt because the user "switched topic".
  • Make the system prompt depend on the current time, random ID, or anything that changes per turn.

A cached prefix is ~10ร— cheaper to read than to write. With a stable prefix, a 10-turn conversation costs roughly 1.5ร— a single turn. With an unstable prefix, it costs 10ร—.


๐Ÿ—บ๏ธ 14. Build-Your-Own โ€” Concrete Checklist

Here's what to build, in the order I'd build it. Each step is independently shippable.

๐ŸŒฑ Phase 1 โ€” The loop (Day 1โ€“2)

  1. Pick a language (Python is what Hermes uses; Go works too โ€” see your repo's backend-go/).
  2. Implement AIAgent.run_conversation(messages) -> messages:
    • Call the model.
    • If the response has tool calls, dispatch each, append a tool result message, loop.
    • Else return the final assistant message.
  3. Add an IterationBudget (default: 25 tool calls per user turn). One grace turn on exhaustion.
  4. Wrap each tool call in try/except โ€” return errors as tool results, never raise out of the loop.

You now have a "tool-using chatbot". This is 80% of any agent.

๐Ÿ’ป Phase 2 โ€” The CLI (Day 3)

  1. Build a thin CLI: read line, call run_conversation, print response, repeat.
  2. Add Ctrl-C interrupt handling that cancels the in-flight tool gracefully.
  3. Persist sessions to SQLite. Add an FTS5 virtual table on the message text column.

๐Ÿ› ๏ธ Phase 3 โ€” Tools registry (Day 4โ€“5)

  1. Create a Registry class with register(name, toolset, schema, handler, available).
  2. Auto-import every file under tools/ so tool modules can self-register.
  3. Implement 5 starter tools:
    • read_file, write_file, list_dir
    • run_shell (start with local-only)
    • web_fetch
  4. Add a terminal.backend config that swaps run_shell between local / docker / ssh.

๐Ÿ’พ Phase 4 โ€” Memory & persona (Day 6)

  1. Add ~/.youragent/{SOUL.md, MEMORY.md, USER.md} files.
  2. In build_system_prompt(), embed them in that order.
  3. Add agent-level tools memory_append, memory_replace, memory_delete so the agent can update them.

๐Ÿง  Phase 5 โ€” Skills (Day 7โ€“10) โ† the magic

  1. Define the SKILL.md frontmatter spec (copy Hermes's โ€” it's already an open standard).
  2. On startup, scan ~/.youragent/skills/**/SKILL.md and emit Level-0 entries (name + description) into the system prompt.
  3. Add tools:
    • read_skill(name) โ†’ returns full SKILL.md (Level 1).
    • read_skill_file(name, path) โ†’ returns referenced files (Level 2).
    • skill_manage(action, name, ...) โ†’ create | patch | edit | delete.
  4. In your system prompt, add explicit nudges: "When you finish a hard task, write a skill so you don't have to figure it out again." This single sentence is what turns a chatbot into a self-improving agent.

๐Ÿ’ฐ Phase 6 โ€” Prompt caching (Day 11)

  1. Pick a primary provider (Anthropic's caching is the most generous).
  2. Mark the end of the system prompt as a cache breakpoint.
  3. Audit every code path that touches agent.system_prompt โ€” make sure none of them fire mid-conversation.
  4. Add a CI test that asserts the system prompt is byte-identical at turn 1 and turn 10.

๐Ÿ“จ Phase 7 โ€” Gateways (Day 12+)

  1. Build one adapter (Telegram is easiest โ€” python-telegram-bot or equivalent).
  2. Adapter responsibilities: auth, session-key derivation, attachment download, response formatting.
  3. Verify: same agent, same skills, same memory, accessed from CLI and Telegram, sees the same memory updates.

๐Ÿ”— Phase 8 โ€” MCP (Day 14+)

  1. Add an MCP client. Each MCP server becomes a virtual toolset in your registry.
  2. You now get GitHub, Slack, Linear, Postgres, Notion, etc. for free.

โœจ Phase 9 โ€” Profiles & polish

  1. Route every filesystem path through a get_home() helper. Add a --profile flag that sets the home dir before imports.
  2. Add a context compressor (LLM-summarize middle turns when the conversation exceeds N tokens).
  3. Add a cron runner that loads jobs from ~/.youragent/cron.yaml and runs them with no history.

That's the whole product. About 2โ€“3 weeks of focused work for one engineer.


โšก 15. Recommended Tech Stack

What Hermes uses, and what I'd swap.

Concern Hermes choice Reasonable alternative
Language Python 3.11+ Go (faster CLI startup, single binary), TypeScript (web-native)
CLI rendering rich + prompt_toolkit bubbletea (Go), ink (TS)
TUI Node.js + Ink, JSON-RPC to Python Same, or a single-language stack
Storage SQLite + FTS5 Same. Don't get fancy here.
Vector memory (optional) Honcho / mem0 / supermemory pgvector, Chroma, Qdrant
Sandbox Docker / Modal / Daytona Firecracker, gVisor, E2B
MCP Python MCP SDK Anthropic's official SDK
Config YAML TOML if you prefer

The boring choices (SQLite, markdown files, JSON schemas) are not accidents. Resist the urge to "upgrade" them โ€” every place Hermes uses something boring, it's because it integrates trivially with the agent's own tools (the agent can cat MEMORY.md and reason about it).


โš ๏ธ 16. Pitfalls You Will Hit

Listed in the order you'll hit them:

  1. Tool errors crashing the loop. Wrap every handler in try/except, return the error to the model.
  2. Cache-busting prompts. A datetime.now() in the system prompt will quietly destroy your cost model. Audit early.
  3. Infinite tool loops. Without an iteration budget, a model will happily call list_dir 400 times. Hard-cap it.
  4. Unbounded shell access. Local backend is fine for dev; in prod, use Docker with read-only root and an explicit writable workspace.
  5. Skills that lie. The agent will write skills that reference tools or env vars that don't exist. Make requires_toolsets and required_environment_variables validation strict at install time.
  6. Memory file rot. The agent will append to MEMORY.md forever. Add a periodic compaction nudge in the system prompt: "If MEMORY.md exceeds 500 lines, consolidate."
  7. Profile leakage in tests. A test that creates files in ~/.youragent/ because you forgot to mock the home dir. Mock Path.home() AND your home env var.
  8. Mid-conversation toolset toggles. The user types /tools and changes settings. Tell them "applies next session" โ€” don't break the cache.
  9. Race conditions across gateways. Two Telegram messages arrive in 100ms. Use per-session locks.
  10. Reasoning content lost. OpenAI o-series and Claude extended thinking emit reasoning blocks that must be preserved in history (or dropped by the same rule on every turn) โ€” inconsistency breaks the cache.

๐Ÿ’ก 17. The Mental Model in One Sentence

A Hermes-style agent is a loop that fills its own filing cabinet: it reads from skills/ and MEMORY.md to do its job, and writes back to those same files when it learns something โ€” and every other system (tools, gateways, providers, plugins, profiles) exists to make that loop faster, safer, and reachable from more places.

Build the loop, build the filing cabinet, give it a tool to edit the filing cabinet, and tell it (in the system prompt) that it's allowed to. Everything else is scaffolding.


๐Ÿ“š 18. References


If you found this helpful, let me know by leaving a ๐Ÿ‘ or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! ๐Ÿ˜ƒ

Top comments (0)