정상록

Posted on Mar 27

Hermes Agent: The Self-Evolving AI Agent That Learns From Your Workflow

TL;DR

Hermes Agent is an MIT-licensed AI agent framework by Nous Research that genuinely learns from experience. Auto-generates skills after 5+ repeated tasks, maintains 5-layer persistent memory, supports 200+ models via OpenRouter, and can self-evolve its own prompts using GEPA (ICLR 2026 Oral). Just hit v0.4.0 with 300 PRs merged in one week.

The Problem: AI Agents That Forget Everything

Every AI agent I've used has the same fundamental issue: session ends, memory gone.

You spend 2 hours teaching it your project structure, coding conventions, and deployment pipeline. Next morning? Clean slate.

Hermes Agent solves this with a genuinely different architecture.

Self-Improving Loop: How It Works

The core innovation is a 4-step cycle:

Step 1 - Auto Skill Generation:
When you repeat a tool call 5+ times, the agent automatically synthesizes the procedure into a Python-based skill.

Step 2 - Skill Nudge:
Periodic prompts suggest saving completed workflows as reusable skills.

Step 3 - Skill Refinement:
When a skill fails or runs inefficiently, the agent iteratively improves it.

Step 4 - Persistent Storage:
Skills are saved to ~/.hermes/skills/ in the open agentskills.io format.

# Your skills grow over time
ls ~/.hermes/skills/
# deploy-staging.py
# git-feature-branch.py
# db-migration-check.py

5-Layer Memory System

Layer	Mechanism	Persistence
MEMORY.md	Searchable markdown	Permanent
USER.md	User model (preferences, coding style)	Permanent
Honcho	AI-native dual-peer memory	Cross-session
SessionDB	SQLite + FTS5 full-text search	Permanent
Conversation	Messages + compression	Session

The Honcho integration is particularly interesting -- it builds both a "user peer" (learning your goals and communication style) and an "AI peer" (building the agent's knowledge representation).

Self-Evolution via GEPA + DSPy

This is the wildcard feature. A separate repo (hermes-agent-self-evolution) provides genetic prompt evolution:

# Optimize a prompt for code review quality
python evolve.py --target "code review quality" --budget 10

How it works:

Collects execution traces (errors, profiling, reasoning logs)
Diagnoses why things failed
Generates candidate prompt variants (genetic algorithm)
Evaluates each variant
Auto-creates a PR with the best performer

No GPU training. API calls only. ~$2-10 per optimization cycle.

Based on GEPA (Genetic-Pareto Prompt Evolution), which was an ICLR 2026 Oral paper.

Quick Start

# One-line install (Linux, macOS, WSL2)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

# Setup (model selection, API keys)
hermes setup

# Start
hermes

Only prerequisite: git. The install script handles Python, Node.js, and dependencies.

200+ Models, Zero Lock-in

# Switch models with one command
hermes model set openrouter/anthropic/claude-3.5-sonnet
hermes model set openai/gpt-4o
hermes model set ollama/llama3.1  # Local

Works with OpenRouter (200+ models), OpenAI, Anthropic (via proxy), Ollama, vLLM, llama.cpp.

6 Terminal Backends

Backend	Use Case
Local	Direct host execution
Docker	Isolated, reproducible environments
SSH	Remote server management
Daytona	Serverless with hibernation
Singularity	HPC containers
Modal	Serverless (~$0 when idle)

Honest Comparison

Feature	Hermes Agent	Claude Code	Cursor
Self-evolution	Yes (GEPA)	No	No
Open source	MIT	Partial	No
Data privacy	Fully self-hosted	Cloud	Cloud
Model diversity	200+	Claude only	Multi
Persistent memory	5 layers	Limited	Limited
Code quality	Model-dependent	Excellent	Excellent

Where Hermes wins: Self-evolution, data privacy, model flexibility, cost ($5/mo VPS).

Where others win: Code output quality (Claude Code), community size (Cursor), polished UX.

v0.4.0 Highlights (March 23, 2026)

OpenAI-compatible API server
6 new messaging adapters (Signal, DingTalk, SMS, Mattermost, Matrix, Webhook)
MCP server management + OAuth 2.1
Prompt caching + streaming by default
200+ bug fixes
300 PRs merged in one week

Links:

What's your experience with self-improving AI agents? Have you tried Hermes or something similar? Would love to hear about your setup.

DEV Community