DEV Community

정상록
정상록

Posted on

Hermes Agent v0.11.0: How a Self-Improving Open-Source AI Agent Hit 105K GitHub Stars in 7 Weeks

Hermes Agent v0.11.0: How a Self-Improving Open-Source AI Agent Hit 105K GitHub Stars in 7 Weeks

If you missed it: Nous Research dropped Hermes Agent v0.11.0 on April 23, 2026, and the project crossed 105,000 GitHub stars in just 7 weeks since its February 25 launch. That's faster than AutoGPT, faster than CrewAI, and arguably the most significant release in the open-source agent space this year.

I spent the weekend digging into the actual code and benchmarks. Here's what I found, and why I think the GEPA self-improvement loop is the part most articles are underselling.

Why this matters more than typical "another agent framework" launches

Most "agent frameworks" published in 2024-2025 were orchestration layers — they call LLM APIs in sequence and manage state. Hermes Agent does that, but adds something new: the agent literally rewrites its own prompts and skills as it works.

This isn't a marketing claim. It's GEPA (Generative Embedding Prompt Adaptation), accepted as an Oral paper at ICLR 2026.

The GEPA Loop in Detail

Task complete (5+ tool calls)
      ↓
Trace analysis (which tools, in what order, with what context)
      ↓
Skill file auto-generated (.md format, human-readable)
      ↓
System prompt auto-tuned (small deltas)
      ↓
SQLite FTS5 index updated for retrieval
      ↓
Next similar task → 40% faster (after ~20 skills accumulated)
Enter fullscreen mode Exit fullscreen mode

The skill files end up in ~/.hermes/skills/ and look like this:

---
name: market-research-pipeline
trigger_keywords: ["market trends", "research", "competitive analysis"]
tool_sequence: [web-search, extract-content, summarize, translate, image-gen]
---

# Market Research Pipeline

When user asks for market research:
1. Run web-search with date filter (last 30 days)
2. Extract from top 5 results
3. Summarize in target language
4. Generate infographic if data is quantitative
5. Format as markdown report
Enter fullscreen mode Exit fullscreen mode

Crucially, you can edit these files manually. The agent's "memory" is just a directory of markdown files. No proprietary vector store, no opaque embeddings.

Benchmarks: Where GEPA Actually Wins

Benchmark Hermes Agent (GEPA) Comparison Point
MATH 93% Base CoT on same model: 67% (+26pt)
AIME-2025 MIPROv2 +12% vs leading prompt optimizer
GEPA vs GRPO avg +6%, max +20% with 35x fewer rollouts
RefusalBench (Hermes 4.3 36B) 57%+ GPT-4o / Claude: ~17%

The RefusalBench result is the one I'd verify independently if I were betting production budget on this. 3.4x less false refusal vs the major closed models is a big claim. But if it holds, the enterprise implications are significant.

What's New in v0.11 Specifically

React/Ink TUI v2 (Complete Rewrite)

The terminal interface was completely rebuilt on React + Ink. New capabilities:

  • Sticky composer: Input area stays at the bottom even with long output streams
  • OSC-52 clipboard: Click any code block to copy to system clipboard
  • Live streaming: Tool call results render in real-time with progress indicators

If you've used the TUI in v0.10, this feels like a different product.

/steer — Mid-Execution Intervention

This one's underrated. Traditional agent frameworks make you wait for the entire run to complete before you can correct course. With /steer, you intervene right before the next tool call:

Agent: "I'll translate all 10 articles next."
You: /steer only translate 5
Agent: [adjusts plan, translates 5]
Enter fullscreen mode Exit fullscreen mode

The implementation uses a queue inspection at the tool dispatch boundary. Clean.

Unlimited Sub-Agent Recursion

Sub-agents can spawn sub-agents to arbitrary depth and breadth. Example pipeline:

main-agent
  ├── researcher
  │   ├── web-scraper-agent
  │   └── pdf-extractor-agent
  ├── analyst
  │   └── data-validator-agent
  └── writer
      └── editor-agent
Enter fullscreen mode Exit fullscreen mode

In v0.10 this was capped. v0.11 removes the cap.

Five New Model Providers

Provider Use Case
GPT-5.5 (Codex OAuth) OpenAI's latest coding-specialized model
AWS Bedrock (Converse API) Enterprise AWS infrastructure integration
NVIDIA NIM NVIDIA inference containers
Arcee AI Small specialized models
Vercel ai-gateway Multi-provider routing

The AWS Bedrock integration is the one I'd watch closely. It enables truly private deployments inside enterprise VPCs — which is what most regulated industries need.

QQBot (17th Messaging Platform)

Tencent QQ integration is the new platform. Combined with Discord, Slack, Telegram, Line, WhatsApp, WeChat, etc., Hermes Agent now covers basically every major chat surface globally.

The Cost Story That Should Make You Pay Attention

Per Nous Research's published numbers, equivalent enterprise tasks cost:

Volume Hermes Agent (Local 4.3 36B) GPT-5.5 / Claude (API)
Single 5-tool task $0.001 $0.02 - $0.09
1,000 tasks/day $1 $20 - $90
Monthly $30 $600 - $2,700

That's a 20-90x cost differential. For a 10-person engineering team running agents 24/7, the math becomes obvious quickly. The catch: you need a 24GB+ GPU for local Hermes 4.3 36B. If you're running on API providers, you lose most of the cost advantage.

Architecture Highlights

┌─────────────────────────────────────────┐
│  Messaging Platforms (17)               │
│  Telegram / Discord / Slack / LINE / ... │
└─────────────────────────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│  Hermes Agent Core                       │
│                                          │
│  ┌────────────────────────────────────┐ │
│  │  Memory (3-tier)                    │ │
│  │  • Short: in-memory context         │ │
│  │  • Medium: SQLite FTS5 (cross-session) │
│  │  • Long: skills/personas/profiles   │ │
│  └────────────────────────────────────┘ │
│                                          │
│  ┌────────────────────────────────────┐ │
│  │  GEPA Loop                          │ │
│  │  • Trace capture                    │ │
│  │  • Skill generation                 │ │
│  │  • Prompt tuning                    │ │
│  └────────────────────────────────────┘ │
│                                          │
│  ┌────────────────────────────────────┐ │
│  │  Tool Gateway (v0.10+)              │ │
│  │  • Web search (Firecrawl)           │ │
│  │  • Image gen (FAL FLUX 2 Pro)       │ │
│  │  • TTS (OpenAI)                     │ │
│  │  • Browser (Browser Use)            │ │
│  └────────────────────────────────────┘ │
└─────────────────────────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│  Model Providers (multiplexed)          │
│  Local 4.3 / GPT-5.5 / Claude / Gemini  │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Getting Started in 30 Minutes

# Install
npm install -g @nousresearch/hermes-agent

# Initialize
hermes init my-agent
cd my-agent

# Choose provider (pick one)
hermes config set provider local        # Free, needs GPU
hermes config set provider openai        # Easy, costs money
hermes config set provider anthropic
hermes config set provider google

# Connect Telegram (easiest messaging platform)
hermes connect telegram --token $TELEGRAM_BOT_TOKEN

# Start
hermes start

# Now message your bot on Telegram
Enter fullscreen mode Exit fullscreen mode

For Tool Gateway (web search, image gen, TTS, browser):

hermes login nous-portal
hermes config set tool-gateway nous-portal
Enter fullscreen mode Exit fullscreen mode

Or BYO API keys:

hermes config set tools.web-search.firecrawl-key $FIRECRAWL_KEY
hermes config set tools.image.fal-key $FAL_KEY
Enter fullscreen mode Exit fullscreen mode

What I'm Still Watching

  • Independent benchmark verification: GEPA numbers come from Nous Research themselves. Would be valuable to see third-party reproduction.
  • GEPA in production: Does the 40% speedup on repeat tasks materialize after 1-3 months of real usage, or is it a benchmark artifact?
  • Tool Gateway availability: Nous Portal is currently the easiest path. Is it stable enough for production SLAs?

Verdict

Hermes Agent v0.11.0 is the most significant open-source agent release of 2026 so far. The GEPA self-improvement loop is genuinely novel, the 20-90x cost advantage opens up agent use cases that didn't pencil out before, and the 17-platform messaging integration makes consumer-facing deployments trivial.

If you're building agents in 2026, you owe it to yourself to spend a weekend with this.


Sources:

Top comments (0)