A practical, end-to-end walkthrough of Nous Research's Hermes Agent: the principles it's built on, the architecture that makes it work, and a concrete checklist for building a similar self-improving agent yourself.
๐ Table of Contents
- ๐ค 1. What Hermes Actually Is (in one paragraph)
- ๐งญ 2. Core Principles
- ๐๏ธ 3. High-Level Architecture
- ๐ 4. The Agent Loop (the heart of everything)
- ๐งฉ 5. System Prompt Assembly
- ๐ ๏ธ 6. Tools System
- ๐ง 7. Skills System (the killer feature)
- ๐พ 8. Memory System
- ๐ 9. Plugin System
- ๐ 9b. The
COMMAND_REGISTRYPattern (worth stealing) - ๐จ 9c. Skin Engine (theming as data)
- ๐ก 9d. Multimodal & Streaming
- ๐ 9e. RL / Atropos Training Integration (
environments/) -
๐ฅ๏ธ 10. Surfaces โ How the Agent Reaches Users
- 10.1 ๐ป CLI (classic)
- 10.2 ๐ผ๏ธ TUI (
hermes --tui) โ genuinely novel - 10.3 ๐จ Gateway (messaging platforms)
- 10.4 ๐ ACP (Agent Client Protocol) โ for AI-native editors
- 10.5 ๐ Web UI (
hermes web) - 10.6 โฐ Cron scheduler (
~/.hermes/cron/) - 10.7 ๐ญ Batch runners (the training data pipeline)
- ๐ค 11. Profiles & Multi-Instance
- โ๏ธ 12. Configuration & Secrets
- ๐ฐ 13. Prompt Caching (the cost story)
-
๐บ๏ธ 14. Build-Your-Own โ Concrete Checklist
- ๐ฑ Phase 1 โ The loop (Day 1โ2)
- ๐ป Phase 2 โ The CLI (Day 3)
- ๐ ๏ธ Phase 3 โ Tools registry (Day 4โ5)
- ๐พ Phase 4 โ Memory & persona (Day 6)
- ๐ง Phase 5 โ Skills (Day 7โ10) โ the magic
- ๐ฐ Phase 6 โ Prompt caching (Day 11)
- ๐จ Phase 7 โ Gateways (Day 12+)
- ๐ Phase 8 โ MCP (Day 14+)
- โจ Phase 9 โ Profiles & polish
- โก 15. Recommended Tech Stack
- โ ๏ธ 16. Pitfalls You Will Hit
- ๐ก 17. The Mental Model in One Sentence
- ๐ 18. References
๐ค 1. What Hermes Actually Is (in one paragraph)
Hermes is a model-agnostic, self-improving conversational agent that runs locally as a CLI/TUI, on a server as a messaging gateway (Telegram/Discord/Slack/WhatsApp/Signal), or as a scheduled cron worker. Its key differentiator is a closed learning loop: while solving problems with tools, it writes reusable "skill" documents and curates a persistent memory file so the agent quite literally gets more capable the longer it runs. Everything โ model, tools, skills, memory backend, execution environment, UI โ is pluggable.
Two ideas to internalize before you build anything:
-
One agent, many surfaces. A single
AIAgentclass powers every interface. Surfaces (CLI, gateway, cron, batch, API) are thin entry points that construct an agent and callrun_conversation(). - Procedural memory > clever prompting. Most "smart agent" behavior comes not from prompt engineering but from the agent owning a folder of markdown documents (skills + memory + persona) it can read, write, and grow over time.
๐งญ 2. Core Principles
These are the design rules Hermes follows. Keep them in mind for your own build โ most "weird" decisions in the codebase trace back to one of these.
2.1 ๐ Platform-agnostic core
The agent doesn't know whether it's running in a terminal, a Telegram chat, or a cron job. All platform specifics live in adapters that translate platform events โ agent.run_conversation(...) and translate the response back. If you find yourself adding a Telegram-specific if branch inside core agent code, you've drifted from the architecture.
2.2 ๐ Prompt stability (cache-friendly)
The system prompt is assembled once at session start and does not mutate mid-conversation. This isn't aesthetic โ it's economic. Anthropic and OpenAI prompt caches require a stable prefix to get hits. Mid-conversation toolset changes, memory reloads, or skill swaps invalidate the cache and 10ร your cost. Defer changes to "next session" by default.
2.3 ๐ Progressive disclosure
Don't load every skill, every memory, every tool's full docs into the system prompt. Load descriptions (Level 0). Let the agent pull in full content (Level 1) only when it actually needs that skill. Load referenced files (Level 2) only when the skill itself requests them. This is how Hermes can ship 47 tools and dozens of skills while staying under context limits.
2.4 ๐ Self-registration over central lists
Tools and plugins should register themselves at import time (registry.register(...)) rather than being added to a hand-maintained __all__ list. New tool = one new file, no edits elsewhere.
2.5 ๐งฑ Profile isolation
Multiple independent agent instances coexist by each owning a HERMES_HOME directory (default ~/.hermes/, override via env var). Every filesystem path in the codebase goes through get_hermes_home() โ never hard-code ~/.hermes.
2.6 ๐ The agent owns its own learning artifacts
Skills are not added by humans editing source code. The agent writes them via a tool called skill_manage after solving a non-trivial task. Memory is not curated by humans โ the agent edits MEMORY.md and USER.md between turns. This is the loop.
๐๏ธ 3. High-Level Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ENTRY POINTS โ
โ CLI / TUI / Gateway (TG, Discord, Slack) / Cron / Batch / API โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ each entry point builds an AIAgent
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AIAgent (core loop) โ
โ build_system_prompt โ call model โ dispatch tool calls โ repeat โ
โโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโ
โ โ โ โ
โผ โผ โผ โผ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ Tools โ โ Skills โ โ Memory โ โ Providers โ
โ Registry โ โ Loader โ โ Manager โ โ (model API) โ
โโโโโโโโโโโโ โโโโโโโโโโโโ โโโโโโโโโโโโโโ โโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Execution Environments: local / Docker / SSH / Modal / Daytona โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Three tiers, in plain English:
- Tier 1 โ Surfaces: how a human or system talks to the agent (CLI, chat platforms, cron).
- Tier 2 โ Agent core: the loop, plus the four pluggable subsystems (tools, skills, memory, model).
- Tier 3 โ Execution backends: where shell/code-running tools actually run. Local laptop today, sandboxed Docker tomorrow, Modal cloud in production.
๐ 4. The Agent Loop (the heart of everything)
This is the single most important piece. The whole AIAgent class is essentially this loop:
1. Receive input โ from CLI / gateway / cron / ACP / web
2. Build system prompt โ persona + memory + skills + tools (ONCE per session)
3. Resolve provider โ which API key + endpoint for the chosen model
4. Call model โ one of FOUR API modes (auto-detected by endpoint/model):
chat_completions | codex_responses |
anthropic_messages | bedrock_converse
5. Parse response
โโ if tool calls present โ dispatch each via registry โ append results โ GOTO 4
โโ else โ final assistant message โ display โ persist โ done
6. Persist โ SQLite SessionDB (WAL mode + FTS5 index)
A few non-obvious details that matter:
-
Iteration budget โ more nuanced than a simple counter. A thread-safe
IterationBudgetis shared across the parent agent and any subagents it spawns.execute_coderefunds iterations on completion so a programmatic tool-loop doesn't drain the budget. On exhaustion: one warning message is injected (_budget_exhausted_injected), exactly one final API call is allowed (_budget_grace_call), then summarization is forced. No intermediate warnings โ deliberate, to prevent the model from giving up early. -
Reasoning content is stored separately from the visible assistant message (OpenAI o-series and Anthropic extended thinking both produce hidden "reasoning" tokens). Keep them in their own field; they're needed for cache validity but shouldn't be displayed. Callbacks:
stream_delta_callback,interim_assistant_callback,thinking_callback,reasoning_callback. -
Streaming with stateful scrubbing. A
_stream_context_scrubberstrips<memory-context>spans even when they're split across chunks โ don't underestimate how fiddly this gets when tags straddle network boundaries. -
Compression, not truncation. When context fills, a
context_compressorsummarizes middle turns rather than dropping them. The summary itself becomes a message. Lossy is fine; lossless will OOM. - Interrupts. Ctrl-C mid-tool-call must cleanly cancel the in-flight tool, append a "user interrupted" tool result to history, and return control. Don't kill the whole loop โ let the agent see the interruption and respond.
-
Session resumption.
--continue/--resumeflags load prior history viaSessionDB.get_messages(). SQLite WAL mode + a custom retry layer (20โ150 ms jitter,BEGIN IMMEDIATE) handle multi-process write contention. A recap is shown to the user before continuing.
๐งฉ 5. System Prompt Assembly
A prompt_builder.build_system_prompt() function concatenates these sections, in this order:
-
Persona โ
SOUL.md/DEFAULT_AGENT_IDENTITY. Identity, voice, values. -
Platform hints โ
PLATFORM_HINTS. Tells the model whether it's running in CLI, Telegram, Slack, etc. โ this changes formatting rules (no MarkdownV2 in CLI, no nested code blocks in Telegram, โฆ). -
Memory guidance โ
MEMORY_GUIDANCE. Embeds a frozen snapshot ofMEMORY.md+USER.mdas a single block (separated by aยงdelimiter). Size-capped (~2200 chars MEMORY, ~1375 chars USER). -
Session search guidance โ
SESSION_SEARCH_GUIDANCE. Tells the agent it can search prior sessions via FTS5, with a small example. -
Skills guidance โ
SKILLS_GUIDANCE. The Level-0 skills index plus the heuristic prose nudging the agent to create skills after solving hard tasks. -
Context files โ
AGENTS.mdand.hermes.mdfrom the working directory. -
Tool-use enforcement โ
TOOL_USE_ENFORCEMENT_GUIDANCE. Hard rules about parallel calls, error recovery, etc. - Tool schemas โ JSON schemas for all enabled tools.
Then prompt_caching.py inserts cache breakpoints (Anthropic cache_control: {type: ephemeral}; equivalents for other providers). The whole assembled prefix becomes the cacheable region.
The frozen-snapshot pattern (this is the trick). MEMORY.md and USER.md are read once at session start and embedded immutably in the system prompt for the rest of the session. The agent can still write to those files on disk during the session โ but the system prompt does not change. Result: cache stays valid across the whole conversation, and the new memory takes effect next session. Skip this and you destroy your prefix cache.
Memory security scan. Before injection, MEMORY/USER content is scanned for prompt-injection patterns, exfiltration attempts (curl/wget referencing env vars), persistence backdoors, and invisible Unicode. A poisoned memory file is the agent's prion disease โ scan defensively.
Key rule: sections 1โ8 are frozen for the session. User messages and tool results are appended to history; they don't go into the system prompt.
๐ ๏ธ 6. Tools System
6.1 ๐ฆ Self-registering registry
A central tools/registry.py exposes:
registry.register(
name="read_file",
toolset="filesystem",
schema={...JSON schema...},
handler=read_file_handler,
available=lambda ctx: True, # gating predicate
)
Every tool file calls this at module import. The registry handles:
- Schema collection for the system prompt.
- Dispatch by name when the model emits a tool call.
- Availability filtering (per-user, per-platform, per-toolset).
- Error wrapping โ any exception in a handler is converted into a tool result the model can see and react to. Never let a tool crash the loop.
All handlers return JSON strings, not Python objects. The model only ever sees text.
6.2 ๐๏ธ Toolsets
Tools group into logical sets (filesystem, web, browser, code, mcp, vision, audio, โฆ) โ Hermes ships ~40+ tools (the docs say "47 built-in" in some places, "40+" in others; AGENTS.md says the filesystem is the canonical source because counts shift constantly โ don't hard-code numbers in your own version). Users enable/disable by toolset rather than tool-by-tool. Disabled toolsets are completely absent from the system prompt โ saves tokens and prevents the model from even knowing about them.
6.3 ๐ฅ๏ธ Execution environments
Tools that run shell commands or code go through an environment abstraction (tools/environments/):
| Backend | Use case |
|---|---|
local |
Dev on your laptop. Fastest. Zero isolation. |
docker |
Shared dev box. One container per session. |
ssh |
Remote VM. Treat the VM as the agent's "computer". |
daytona / modal
|
Serverless sandboxes for production. Auto-spin-up. |
singularity |
HPC clusters. |
Same tool, different blast radius. The agent doesn't know โ only the environment changes.
6.4 ๐ค Agent-level tools
A few tools (todo_*, memory_*, skill_manage, skills_list, skill_view) are intercepted before the generic tool dispatch and handled by the agent itself, because they mutate agent state (memory, skills, todo list) rather than the outside world. Keep this category small and explicit.
6.5 ๐ MCP integration
Model Context Protocol servers can be plugged in as additional tool sources. Hermes treats each MCP server as a virtual toolset, lets users filter individual tools, and dispatches calls through the same registry. This is how you get a long tail of integrations (GitHub, Slack, Linear, ...) without writing them yourself.
6.6 ๐ก๏ธ Tool approval & safety (the layered defense)
Shell tools are dangerous. Hermes layers four mechanisms:
- Tirith โ an external Rust-based scanner with auto-install + SHA-256 verification. Detects homograph URLs, terminal-injection attacks (ANSI escapes that hide commands), and known dangerous patterns.
-
Regex dangerous-command detection โ runs on a normalized command string (case-insensitive, whitespace-collapsed) so attackers can't bypass via
RM -RF. - Smart Approval โ an LLM risk-rates each command. Low-risk auto-approves; medium/high blocks for human approval.
- Approval scopes โ when a human approves, they pick Once / Session / Permanent. Trust accumulates instead of asking on every call.
When the agent is running on a messaging gateway and needs approval, it uses a threading.Event to block until the human responds in chat. A /yolo command bypasses approval entirely for trusted sessions. Sandboxed backends auto-bypass approval (the Docker/Modal sandbox is the safety boundary; double-prompting is just friction).
๐ง 7. Skills System (the killer feature)
7.1 ๐ What a skill is
A skill is a markdown document with YAML frontmatter that teaches the agent how to do one thing well. Not code. Not a config. A runbook the agent reads.
---
name: deploy-staging
description: Push current branch to staging via Vercel and verify health.
version: 1.2.0
platforms: [macos, linux]
requires_toolsets: [shell, web]
fallback_for_toolsets: []
required_environment_variables: [VERCEL_TOKEN]
tags: [deploy, vercel]
category: devops
---
## When to Use
The user asks to "ship", "deploy to staging", or "preview this branch".
## Procedure
1. Run `git status` โ abort if dirty.
2. Run `vercel --token=$VERCEL_TOKEN`.
3. Poll `/healthz` until 200 OR 60s timeout.
4. Report the preview URL.
## Pitfalls
- Don't deploy from `main` โ only feature branches.
- If the build fails, fetch logs via `vercel logs <deployment>`.
## Verification
The healthz endpoint returns `{"status":"ok"}`.
7.2 ๐ Where skills live
~/.hermes/skills/
โโโ devops/deploy-staging/
โ โโโ SKILL.md โ the file above
โ โโโ references/ โ extra docs the skill can pull in
โ โโโ templates/ โ file templates
โ โโโ scripts/ โ helper scripts the agent can run
โ โโโ assets/ โ images, etc.
โโโ .hub/ โ installed from skills hub
โโโ .bundled_manifest โ what shipped with Hermes
7.3 ๐ Progressive disclosure (3 levels)
This is what keeps token usage sane:
| Level | What loads | When |
|---|---|---|
| 0 | name, description, category | Always โ in system prompt |
| 1 | full SKILL.md content | When agent decides to use the skill |
| 2 | files in references/, scripts/
|
When the skill body says "see references/foo.md" |
The agent calls a read_skill (or equivalent) tool to escalate from L0 to L1 to L2.
7.4 โก Triggering
Three ways a skill activates:
-
Slash command โ user types
/deploy-staging please ship #123. - Natural language โ "deploy this to staging"; the agent matches against L0 descriptions and pulls in L1.
- Programmatic โ cron jobs explicitly attach skills.
7.5 ๐๏ธ Conditional activation
Frontmatter fields gate visibility:
-
platforms: [linux]โ hidden on macOS. -
fallback_for_toolsets: [web]โ only visible if no premium web tool is enabled (e.g., a DuckDuckGo skill that fills in when Brave Search isn't configured). -
requires_toolsets: [shell]โ hidden if shell tool disabled.
This makes the skill catalog adapt to the deployment.
7.6 ๐ Self-improvement: the skill_manage tool
The agent uses two complementary tools:
-
Read path:
skills_list(browse Level-0 index) andskill_view(escalate to Level-1/2 content). -
Write path:
skill_manage, a meta-tool with sub-operations:
| Action | Effect |
|---|---|
create |
New skill from scratch |
patch |
Surgical text replacement (preferred for updates) |
edit |
Full rewrite |
delete |
Remove skill (restricted to user/agent-created skills โ can't delete bundled ones) |
Note: file management within a skill (references/, scripts/) goes through generic write_file / remove_file tools scoped to the skill's directory.
The SKILLS_GUIDANCE block in the system prompt explicitly nudges the agent to create a skill after:
- Solving a task that took 5+ tool calls.
- Finding a non-obvious workaround.
- Discovering a workflow it might repeat.
Skill installation from the hub is user-driven only (security). The agent never installs untrusted skills on its own โ it can only skill_manage create from its own experience.
This is the closed learning loop. The agent writes its own playbooks while it works.
7.7 ๐ Skills hub & sharing
Skills are portable markdown โ they're trivially shareable. Hermes integrates with multiple sources (official/, skills-sh/, github/, well-known/, url, clawhub, lobehub). On install, each skill is security-scanned for prompt injection, data exfiltration, and destructive commands before being trusted. Trust tiers: builtin > official > community.
The format is the open agentskills.io standard โ meaning skills written for Hermes work in other compatible agents.
๐พ 8. Memory System
Three independent mechanisms working together (the "3-layer" framing is a teaching simplification โ in the code they're orthogonal):
๐ง Mechanism 1 โ Frozen-snapshot persistent memory
Two markdown files, both agent-curated:
-
MEMORY.mdโ facts. "Project ships every Tuesday." "Test DB password is in 1Password vault X." (~2200 char cap) -
USER.mdโ user model. "Prefers terse answers." "Senior Go engineer, new to React." (~1375 char cap)
A MemoryStore reads them once at session start and embeds them in the system prompt as a single immutable block (delimited by ยง). The agent can write to those files mid-session (and the writes go to disk), but the system prompt's copy doesn't change until next session. This is what keeps the prefix cache valid.
๐๏ธ Mechanism 2 โ Cross-session recall via SessionDB
A SessionDB (SQLite, WAL mode, FTS5 full-text index) stores every prior conversation turn. On demand, the agent uses a session_search tool to query it; an LLM summarizer condenses hits into a paragraph that fits in context. Multi-process write contention is handled with BEGIN IMMEDIATE + a custom retry loop (20โ150 ms jitter).
๐ Mechanism 3 โ Pluggable provider (Honcho / mem0 / supermemory)
This is a swap-in, not an additional layer. A single MemoryProvider ABC (agent/memory_provider.py); orchestration via agent/memory_manager.py. Lifecycle hooks: prefetch() (before model call), sync_turn() (after turn), shutdown().
Provider knobs that matter:
-
Recall mode:
hybrid/context/tools. Tools-mode lets the model decide when to query; context-mode just injects relevant memories every turn. -
Write frequency:
async/turn/session/ numeric (every N turns).
Honcho's "dialectic" deserves a note because it sounds mystical and isn't: it runs three sequential reasoning passes โ Initial Assessment โ Self-Audit โ Reconciliation โ depth controlled by dialecticDepth (1โ3). It's effectively self-critique chained for higher-quality user modeling.
You only have one active provider at a time. Pick the right abstraction for your use case (Honcho for deep user modeling, mem0/supermemory for vector recall, none if files+FTS5 are enough).
๐ 9. Plugin System
A PluginManager discovers plugins from three places:
-
~/.hermes/plugins/(user-level) -
./.hermes/plugins/(project-level) - pip entry points (
hermes.plugins)
Each plugin defines a register(ctx) function and can hook into lifecycle events:
-
pre_tool/post_tool -
pre_llm/post_llm -
session_start/session_end
โฆand can register new tools, new CLI commands, or replace memory providers.
Iron rule: plugins must NEVER modify core files. If a plugin needs something the framework doesn't expose, the framework grows a generic hook โ not a special-case import. This keeps the plugin surface stable.
๐ 9b. The COMMAND_REGISTRY Pattern (worth stealing)
A single COMMAND_REGISTRY constant in hermes_cli/commands.py is the source of truth for every slash command. From this one structure, the codebase auto-derives:
- CLI dispatch
- Gateway hooks (so
/skill fooworks in Telegram) - Telegram inline menu entries
- Slack slash subcommands
- prompt_toolkit autocomplete
-
/helptext
Adding a new slash command is one new CommandDef entry plus a handler. Zero scattered edits. This is the same pattern as the tools registry, applied to UI commands. Steal it for your own build โ it's how Hermes scales surface area without scaling maintenance.
๐จ 9c. Skin Engine (theming as data)
YAML files in ~/.hermes/skins/ (with inheritance from default). One YAML controls 18 named colors, spinner faces and verbs, agent name and greeting/farewell, prompt symbols, tool emojis, ASCII banners with Rich markup. Ten built-in skins (default, daylight, mono, poseidon, charizard, โฆ). Hermes Mod ships a web editor with live preview and imageโASCII conversion.
The takeaway is architectural: branding lives in YAML, not code. A user can fork the look without touching Python. This matters more than you'd think for an agent users live inside for hours.
๐ก 9d. Multimodal & Streaming
-
Vision:
vision_analyzetool. Anthropic image-to-text fallback caching via_anthropic_image_fallback_cache(when a model can't see images natively, the cache avoids re-describing them). -
Audio out:
text_to_speechtool. - Audio in: voice-memo transcription on the input side.
- Browser tool: injects multimodal context (screenshots + DOM + extracted text).
-
Streaming:
_stream_callback,_current_streamed_assistant_text, plus the stateful_stream_context_scrubberthat strips<memory-context>spans even across chunk boundaries.
๐ 9e. RL / Atropos Training Integration (environments/)
This is arguably the point of the project for Nous Research, not a side-feature. The environments/ directory wraps Hermes for reinforcement-learning training:
-
HermesAgentBaseEnvโ abstracts tool resolution and sandbox wiring. -
HermesAgentLoopโ runs the tool-call loop in a way RL rollouts can drive. -
ToolContextโ exposes the sandbox to reward functions (so a reward can grep the filesystem to verify the agent did the work). -
resize_tool_poolโ prevents thread-pool deadlocks during parallel rollouts. -
Two-phase training pipeline:
- Phase 1: VLLM/SGLang native tool-call parsing.
-
Phase 2:
ManagedServerraw-token parsing โ needed for Hermes's XML-style tool tags and DeepSeek's Unicode delimiters.
-
Three-layer tool-result budgeting: per-tool truncation โ sandbox spillover with previews โ per-turn budget. Without this, a single
ls /blows out the context window of a training rollout. - Pre-integrated benchmarks: TerminalBench 2.0, YC-Bench, WebResearch.
You probably don't need this for a v1 of your own agent. Just know the hooks are there if you ever want to fine-tune your model on its own traces.
๐ฅ๏ธ 10. Surfaces โ How the Agent Reaches Users
The same AIAgent powers six distinct surfaces. Each is a thin adapter, not a re-implementation.
10.1 ๐ป CLI (classic)
cli.py (~11k lines). Rich-based panels, prompt_toolkit input with autocompletion, animated spinners (KawaiiSpinner), activity feeds during API calls.
10.2 ๐ผ๏ธ TUI (hermes --tui) โ genuinely novel
Not just a fancier CLI. Architecture:
- Frontend: Node.js + React Ink.
-
Backend: Python
tui_gateway/server.py. - Wire format: newline-delimited JSON-RPC 2.0 over stdio.
The Python side redirects print to stderr so stdout stays clean for the protocol. A persistent _SlashWorker subprocess runs slash commands, and slow handlers route through a ThreadPoolExecutor so interrupts stay responsive. Distinctive features: streaming chain-of-thought with braille spinners, a ToolTrail tree visualization, virtual-history viewport (only render visible rows), mouse selection.
Design rule from AGENTS.md: Do not re-implement the chat surface in React. The transcript, composer, and slash-command behavior belong to the embedded TUI. Sidebars and inspectors are fine โ replacement views are not.
10.3 ๐จ Gateway (messaging platforms)
Telegram, Discord, Slack, WhatsApp, Signal. Each adapter:
- Connects to the platform (websocket / long-poll / webhook).
- On incoming message: authorizes the user, derives a stable
session_key, looks up the session in SessionDB, instantiates anAIAgentwith that history. - Calls
agent.run_conversation(). - Formats and sends the response back (Telegram's MarkdownV2 vs Discord's flavor vs Slack's mrkdwn โ this lives in the adapter).
10.4 ๐ ACP (Agent Client Protocol) โ for AI-native editors
ACP is the standard protocol Zed and emerging VS Code integrations use to talk to agents. Hermes implements HermesACPAgent. ACP sessions are tied to the editor's cwd and persist in the same shared SessionDB. Hermes tools map to ACP semantic types (e.g. read_file โ read), and the IDE can register MCP servers that the agent then sees as additional toolsets.
10.5 ๐ Web UI (hermes web)
React SPA in web/ + FastAPI in hermes_cli/web_server.py. Tabs: Status, Sessions (FTS5 search UI), Config (form + raw YAML), Cron, Skills. Security: ephemeral session tokens, DNS rebinding protection, CORS, rate limiting. EN/ไธญๆ localization.
10.6 โฐ Cron scheduler (~/.hermes/cron/)
Not APScheduler. A custom scheduler with a 60-second tick() loop running on a background thread inside the gateway process. Jobs are stored as JSON in ~/.hermes/cron/jobs.json (not SQLite). Outputs persist to ~/.hermes/cron/output/{job_id}/{timestamp}.md.
Job definition supports:
- Intervals (
every 30m), 5-field cron, one-shot durations, ISO timestamps. - A
promptfield (the user message to send). - An optional
skillslist to attach before execution (so a "review-PRs" cron job can pre-load apr-reviewskill). - Delivery target:
local(write only),origin(back to where the job was created), orplatform:chat_id(post to a specific Telegram/Slack chat).
Each tick: a fresh AIAgent with no history is created, attached skills load, the prompt runs, output is delivered, job state updates.
10.7 ๐ญ Batch runners (the training data pipeline)
Two siblings in the repo root:
-
batch_runner.pyโBatchRunnerovermultiprocessing.Pool, one isolatedAIAgentper worker.toolset_distributions.pysamples toolsets per prompt by independent inclusion probabilities. Checkpointing incheckpoint.jsonkeyed on prompt text, not index (so prompt-list edits don't invalidate the checkpoint). Outputs trajectories formatted for HuggingFace; reasoning detected via<REASONING_SCRATCHPAD>tags or native thinking tokens โ trajectories without reasoning are discarded. -
mini_swe_runner.pyโ sibling runner for SWE-style benchmark runs.
These are how Nous generates training data from real agent runs.
๐ค 11. Profiles & Multi-Instance
You want to run a "personal" agent and a "work" agent on the same machine without their memories crossing? Profiles.
Implementation is dead simple but the timing is critical:
- Each profile has its own
HERMES_HOMEdirectory. -
_apply_profile_override()inhermes_cli/main.pysetsHERMES_HOMEbefore any other module imports run. If you set it after imports, modules that read paths at import time will use the wrong home. - Every path lookup goes through
get_hermes_home(). Hardcoded~/.hermespaths anywhere in the codebase break profile isolation.
Things to get right:
- Tests must mock both
Path.home()and theHERMES_HOMEenv var โ getting one but not the other leads to flaky failures. - Gateway adapters acquire a per-profile token lock so two profiles can't both try to consume the same Telegram bot token.
- Honcho identities (and other memory provider IDs) are profile-scoped โ don't share them across profiles or you'll cross-pollute user models.
โ๏ธ 12. Configuration & Secrets
Three knobs, three places:
| What | Where | Why |
|---|---|---|
| Model, toolsets, terminal backend, skin | config.yaml |
Non-secret, version-controlled with the profile |
| API keys, tokens | .env |
Secrets, never logged |
| Per-skill settings | each skill's config.yaml
|
Skill-local |
Three config loaders (load_cli_config, load_config, direct YAML) exist because CLI/tool/gateway runtime have subtly different needs. Don't merge them prematurely โ the duplication is intentional.
๐ฐ 13. Prompt Caching (the cost story)
This is the single biggest reason your agent will be cheap or expensive in production.
Do:
- Build the system prompt once per session.
- Insert provider-specific cache breakpoints (Anthropic:
cache_control: {type: "ephemeral"}on the last static message in the prefix). - Use the frozen-snapshot pattern for memory: read MEMORY/USER files once at session start and embed them immutably even if they change on disk later.
- Defer config changes ("toolset on/off", "switch model") to next session. Slash commands that mutate state should accept an optional
--nowflag if invalidation is truly required, but default to deferred.
Don't:
- Reload memory mid-conversation.
- Add/remove tools mid-conversation.
- Mutate the system prompt because the user "switched topic".
- Make the system prompt depend on the current time, random ID, or anything that changes per turn.
A cached prefix is ~10ร cheaper to read than to write. With a stable prefix, a 10-turn conversation costs roughly 1.5ร a single turn. With an unstable prefix, it costs 10ร.
๐บ๏ธ 14. Build-Your-Own โ Concrete Checklist
Here's what to build, in the order I'd build it. Each step is independently shippable.
๐ฑ Phase 1 โ The loop (Day 1โ2)
- Pick a language (Python is what Hermes uses; Go works too โ see your repo's
backend-go/). - Implement
AIAgent.run_conversation(messages) -> messages:- Call the model.
- If the response has tool calls, dispatch each, append a tool result message, loop.
- Else return the final assistant message.
- Add an
IterationBudget(default: 25 tool calls per user turn). One grace turn on exhaustion. - Wrap each tool call in try/except โ return errors as tool results, never raise out of the loop.
You now have a "tool-using chatbot". This is 80% of any agent.
๐ป Phase 2 โ The CLI (Day 3)
- Build a thin CLI: read line, call
run_conversation, print response, repeat. - Add Ctrl-C interrupt handling that cancels the in-flight tool gracefully.
- Persist sessions to SQLite. Add an FTS5 virtual table on the message text column.
๐ ๏ธ Phase 3 โ Tools registry (Day 4โ5)
- Create a
Registryclass withregister(name, toolset, schema, handler, available). - Auto-import every file under
tools/so tool modules can self-register. - Implement 5 starter tools:
-
read_file,write_file,list_dir -
run_shell(start with local-only) web_fetch
-
- Add a
terminal.backendconfig that swapsrun_shellbetween local / docker / ssh.
๐พ Phase 4 โ Memory & persona (Day 6)
- Add
~/.youragent/{SOUL.md, MEMORY.md, USER.md}files. - In
build_system_prompt(), embed them in that order. - Add agent-level tools
memory_append,memory_replace,memory_deleteso the agent can update them.
๐ง Phase 5 โ Skills (Day 7โ10) โ the magic
- Define the SKILL.md frontmatter spec (copy Hermes's โ it's already an open standard).
- On startup, scan
~/.youragent/skills/**/SKILL.mdand emit Level-0 entries (name + description) into the system prompt. - Add tools:
-
read_skill(name)โ returns full SKILL.md (Level 1). -
read_skill_file(name, path)โ returns referenced files (Level 2). -
skill_manage(action, name, ...)โcreate | patch | edit | delete.
-
- In your system prompt, add explicit nudges: "When you finish a hard task, write a skill so you don't have to figure it out again." This single sentence is what turns a chatbot into a self-improving agent.
๐ฐ Phase 6 โ Prompt caching (Day 11)
- Pick a primary provider (Anthropic's caching is the most generous).
- Mark the end of the system prompt as a cache breakpoint.
- Audit every code path that touches
agent.system_promptโ make sure none of them fire mid-conversation. - Add a CI test that asserts the system prompt is byte-identical at turn 1 and turn 10.
๐จ Phase 7 โ Gateways (Day 12+)
- Build one adapter (Telegram is easiest โ
python-telegram-botor equivalent). - Adapter responsibilities: auth, session-key derivation, attachment download, response formatting.
- Verify: same agent, same skills, same memory, accessed from CLI and Telegram, sees the same memory updates.
๐ Phase 8 โ MCP (Day 14+)
- Add an MCP client. Each MCP server becomes a virtual toolset in your registry.
- You now get GitHub, Slack, Linear, Postgres, Notion, etc. for free.
โจ Phase 9 โ Profiles & polish
- Route every filesystem path through a
get_home()helper. Add a--profileflag that sets the home dir before imports. - Add a context compressor (LLM-summarize middle turns when the conversation exceeds N tokens).
- Add a cron runner that loads jobs from
~/.youragent/cron.yamland runs them with no history.
That's the whole product. About 2โ3 weeks of focused work for one engineer.
โก 15. Recommended Tech Stack
What Hermes uses, and what I'd swap.
| Concern | Hermes choice | Reasonable alternative |
|---|---|---|
| Language | Python 3.11+ | Go (faster CLI startup, single binary), TypeScript (web-native) |
| CLI rendering |
rich + prompt_toolkit
|
bubbletea (Go), ink (TS) |
| TUI | Node.js + Ink, JSON-RPC to Python | Same, or a single-language stack |
| Storage | SQLite + FTS5 | Same. Don't get fancy here. |
| Vector memory (optional) | Honcho / mem0 / supermemory | pgvector, Chroma, Qdrant |
| Sandbox | Docker / Modal / Daytona | Firecracker, gVisor, E2B |
| MCP | Python MCP SDK | Anthropic's official SDK |
| Config | YAML | TOML if you prefer |
The boring choices (SQLite, markdown files, JSON schemas) are not accidents. Resist the urge to "upgrade" them โ every place Hermes uses something boring, it's because it integrates trivially with the agent's own tools (the agent can cat MEMORY.md and reason about it).
โ ๏ธ 16. Pitfalls You Will Hit
Listed in the order you'll hit them:
- Tool errors crashing the loop. Wrap every handler in try/except, return the error to the model.
-
Cache-busting prompts. A
datetime.now()in the system prompt will quietly destroy your cost model. Audit early. -
Infinite tool loops. Without an iteration budget, a model will happily call
list_dir400 times. Hard-cap it. - Unbounded shell access. Local backend is fine for dev; in prod, use Docker with read-only root and an explicit writable workspace.
-
Skills that lie. The agent will write skills that reference tools or env vars that don't exist. Make
requires_toolsetsandrequired_environment_variablesvalidation strict at install time. -
Memory file rot. The agent will append to
MEMORY.mdforever. Add a periodic compaction nudge in the system prompt: "If MEMORY.md exceeds 500 lines, consolidate." -
Profile leakage in tests. A test that creates files in
~/.youragent/because you forgot to mock the home dir. MockPath.home()AND your home env var. -
Mid-conversation toolset toggles. The user types
/toolsand changes settings. Tell them "applies next session" โ don't break the cache. - Race conditions across gateways. Two Telegram messages arrive in 100ms. Use per-session locks.
- Reasoning content lost. OpenAI o-series and Claude extended thinking emit reasoning blocks that must be preserved in history (or dropped by the same rule on every turn) โ inconsistency breaks the cache.
๐ก 17. The Mental Model in One Sentence
A Hermes-style agent is a loop that fills its own filing cabinet: it reads from
skills/andMEMORY.mdto do its job, and writes back to those same files when it learns something โ and every other system (tools, gateways, providers, plugins, profiles) exists to make that loop faster, safer, and reachable from more places.
Build the loop, build the filing cabinet, give it a tool to edit the filing cabinet, and tell it (in the system prompt) that it's allowed to. Everything else is scaffolding.
๐ 18. References
- Repo: github.com/nousresearch/hermes-agent
- Docs: hermes-agent.nousresearch.com/docs
- Architecture page: hermes-agent.nousresearch.com/docs/developer-guide/architecture
- Skills format spec: agentskills.io
- Skills hub: agentskills.io
- DeepWiki overview: deepwiki.com/NousResearch/hermes-agent
If you found this helpful, let me know by leaving a ๐ or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! ๐
Top comments (0)