Michael Egberts

Posted on May 28

I Connected Hermes Agent to a Live MCP Server with 59 Tools and Here's What It Actually Built

#hermesagentchallenge #devchallenge #agents #mcp

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

The Problem Nobody Talks About

Every AI can generate HTML. Give Claude, ChatGPT, or Gemini a prompt and they'll produce a beautiful landing page in seconds. But here's the thing nobody mentions in the demos:

The HTML has nowhere to go.

You copy it. You find hosting. You configure DNS. You set up SSL. You build a form backend. You connect a payment provider. You do this every single time, for every single client. The AI is fast. Everything around it is slow.

That's why we built WebsitePublisher.ai — an AI web platform where the AI doesn't just generate HTML, it publishes a live website through MCP tools. Pages, forms, data, payments, visual editing — 59 tools that turn any AI conversation into a working website.

But we hit a different problem.

The Skill Problem

We wrote a SKILL.md — 1,800 lines of documentation teaching AI assistants how to use our tools correctly. Patching rules, fragment conventions, design context, form integration patterns. Everything an AI needs to know.

And every AI interpreted it differently.

Claude sometimes forgot to re-fetch pages after patching. ChatGPT would skip fragments and hardcode headers into every page. Gemini ignored design context entirely. We rewrote the skill dozens of times. Added more examples. Simplified. Restructured. Still, each platform had its own blind spots.

Then we discovered Hermes Agent.

What Hermes Agent Actually Is

I'll be honest — I initially thought Hermes was an AI model. It's not.

Hermes Agent is an agent framework. It's the orchestration layer — tools, memory, and self-improving skills — that sits between you and any LLM. You plug in Claude, GPT-4o, Gemini, or a local model as the "brain." Hermes handles the rest.

Think of it like this:

Framework	Brain
Claude Code	Claude
Cursor	Claude / GPT
Hermes Agent	Your choice

The key differentiator: self-improving skills. Hermes learns from its own sessions. It builds reusable knowledge documents that persist across conversations. The more it works, the better it gets.

For us, this was the missing piece. Instead of rewriting our SKILL.md for every AI platform, what if we put Hermes in front — as an enforcement layer that learns our tool patterns once and applies them correctly, regardless of which LLM is doing the thinking?

We had to test it.

Setting Up the Connection

The setup was surprisingly smooth. On a Mac:

1. Install Hermes:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

2. Install MCP support:

cd ~/.hermes/hermes-agent
uv pip install -e ".[mcp]" --python ~/.hermes/hermes-agent/venv/bin/python

3. Configure Claude as the brain (in ~/.hermes/config.yaml):

provider: anthropic
model: claude-sonnet-4-20250514

4. Add WebsitePublisher as MCP server:

mcp_servers:
  websitepublisher:
    url: "https://mcp.websitepublisher.ai/mcp"
    auth: oauth
    timeout: 60

5. Authenticate:

hermes mcp login websitepublisher

The browser opened, OAuth 2.1 + PKCE completed in seconds, and:

✓ Authenticated — 59 tool(s) available

59 tools discovered. No session ID issues. No configuration headaches. Total setup time: under 10 minutes.

The Tests

We ran six structured tests to see how Hermes handles a real-world MCP integration. No cherry-picking — these are the actual results.

Test 1: Skill Loading + Page Creation

Prompt: "Call get_skill to load the WebsitePublisher skill, then build a simple landing page for an AI-powered website builder called 'Hermes Web'. Use a modern dark theme with a hero section, 3 feature cards, and a CTA button."

Hermes did something interesting. Without being told the specific workflow, it:

Called get_skill — twice (main + design guidelines)
Called get_project_status to check for existing design context
Set a design context via execute_integration to persist the color palette
Created the page with create_page That workflow — skill → context → build — is exactly what our SKILL.md recommends. Hermes absorbed it and followed it.

Result: A professional dark-themed landing page, built in 1 minute 27 seconds. Gradient text, glass-morphism cards, responsive design, meta descriptions set, SEO enabled.

👉 See it live: hermes-mcp.websitepublisher.ai

Test 2: Patching

Prompt: "Change the CTA button text to 'Launch Your Site Now' and update the hero subtitle."

Hermes correctly chose patch_page over update_page — a targeted edit instead of replacing the entire page. It found both CTA buttons (hero and footer) and updated them in a single call.

Time: 17 seconds. No mismatches, no broken HTML.

Test 3: Fragments

This is where most AI assistants struggle. Fragments are reusable components (headers, footers) that share across pages — they use a different tool (create_fragment) than pages, and they can't be patched, only fully replaced.

Prompt: "Create a reusable header fragment with navigation and the same dark theme."

Hermes correctly used create_fragment (not create_page), named it site-header, maintained the exact same color palette and fonts, added a mobile hamburger menu, and documented the include tag: .

Time: 44 seconds.

Test 4: Data Entities

Prompt: "Create a 'features' entity with title, description, icon, and sort_order fields. Set public read. Add 3 records."

Hermes created the entity, enabled public read, and added 3 records in separate calls with correct sort ordering. Then — without being asked — it documented both the SSR template syntax ( with mustache tags) and the JavaScript fetch endpoint.

That SSR syntax comes directly from our SKILL.md. Hermes learned it and applied it correctly.

Time: 32 seconds.

Test 5: Form Integration

The SAPI form system has a specific pattern: configure the form server-side first, then add the HTML. The honeypot spam protection is automatic — if an AI manually adds a honeypot field, it breaks the form.

Prompt: "Add a contact form. Do NOT manually add a honeypot field."

Hermes called configure_form first (correct order), then patch_page to add the HTML. No manual honeypot. Dark theme styling maintained. Rate limiting configured.

Time: 1 minute 6 seconds.

Test 6: Self-Reflection

Prompt: "What have you learned? What would you do differently?"

This is where it got interesting. Hermes identified its own mistakes:

"I created the features entity but then didn't actually use it in the page! Should have replaced the static feature cards with SSR."

"Should have immediately updated the landing page to use the fragment instead of the inline header."

And a quote that captures exactly why this matters:

"The platform remembers so the AI doesn't have to."

The Results

Test	Status	Time	Key Observation
Skill + Page Creation	✅	1m 27s	Followed SKILL.md workflow without being told
Patching	✅	17s	Correct method, found both CTAs
Fragments	✅	44s	Right tool, consistent design
Data Entities	✅	32s	SSR syntax from skill applied
Forms	✅	1m 6s	No manual honeypot, correct order
Self-Reflection	✅	41s	Identified own mistakes

6 tests. 0 failures. 9% context window used. ~5 minutes total tool execution time.

That context efficiency deserves emphasis. After loading a 1,800-line skill document, creating a full page, patching it, building a fragment, setting up a data entity with 3 records, configuring a form, and reflecting on the session — Hermes had used just 9% of its context window. That means you could run 10x this workload in a single session before hitting any limits.

There's another thing we didn't expect: Sonnet performed like Opus. Through Hermes, Claude Sonnet 4 produced output quality we normally associate with Opus — structured reasoning, correct workflow ordering, self-criticism. It's as if the agent layer acts as a pre-processor that elevates the underlying model's performance by providing the right context at the right time. The skill system doesn't just teach the LLM what to do — it makes a mid-tier model punch above its weight.

The Bigger Insight

The test results are nice. But here's what actually matters:

Hermes Agent can be a skill enforcement layer.

Right now, when we support a new AI platform, we have to test whether it interprets our SKILL.md correctly. Does it follow the patching rules? Does it use fragments? Does it remember to set design context? Every platform has different blind spots, and we end up rewriting the skill over and over.

With Hermes in the middle, the equation changes:

Before: Every AI must understand the skill → different interpretations → inconsistent results

After: Hermes understands the skill → any LLM can be the brain → consistent results

The self-improving skill system means Hermes gets better at using our tools over time. It builds up patterns, learns from mistakes ("I should have used the entity SSR instead of static HTML"), and applies those lessons in future sessions. The underlying LLM doesn't matter — the agent layer enforces quality.

This isn't theoretical. We just proved it works with a live MCP server, real OAuth authentication, and a published website you can visit right now.

What's Next

We're exploring Hermes Agent as a permanent part of our platform architecture — not just as another supported AI, but as the orchestration layer that sits in front of all of them. One skill to learn, one agent to enforce it, any brain to power it.

The code and configuration are open source: github.com/megberts/mcp-hermes-integration

The live test result: hermes-mcp.websitepublisher.ai

WebsitePublisher.ai is an AI web platform where AI assistants build and publish complete websites via MCP. 59 tools, OAuth 2.1, works with Claude, ChatGPT, Gemini, Cursor, Copilot, Grok, Mistral, and now — Hermes Agent.

DEV Community