DEV Community

Cover image for Engineering of Small Things: Hermes Plugin
ShatilKhan
ShatilKhan

Posted on

Engineering of Small Things: Hermes Plugin

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

Most "give your agent a new model provider" tutorials are stories of bravely subclassing things. You inherit from a base class, you override three methods, you read the wire-format docs of two vendors, you write an adapter, you handle the streaming chunks, you wire it into a settings page somewhere. By the time you've shipped, you've forgotten what you were trying to do.

Hermes Agent's provider-plugin SDK refuses to play that game. The whole thing is a declarative dataclass and one registry call. If the thing you want to add already speaks OpenAI on the wire which most modern gateways and aggregators do — you can be done in twenty-six lines.

This post walks you through the pattern with a real working example: the omnizen-provider plugin I shipped to expose Omnizen's gateway to Hermes. You don't have to care about Omnizen specifically the same shape works for any OpenAI-compatible gateway (Together, Groq, Fireworks, your home-grown vLLM, your in-house router). Omnizen just happens to be a convenient real example because the URL is the only thing you'd swap out.

The shape Hermes wants

A Hermes provider plugin lives at plugins/model-providers/<your-name>/ and ships two files:

  1. plugin.yaml — a short manifest so Hermes can discover and version the plugin.
  2. __init__.py — instantiates a ProviderProfile, then calls register_provider() on it.

That's it. There is no adapter class to subclass, no chat-completions method to implement, no streaming-chunk handler. The ProviderProfile is declarative: you describe what the provider is, and Hermes's existing OpenAI-compatible call path handles all the actual work.

The fields ProviderProfile cares about:

Field What it's for
name Internal identifier; the key shown in hermes model
aliases Short aliases users can type instead of the full name
env_vars Tuple of env vars the plugin reads — Hermes uses this for "is this provider configured?" detection
display_name Human-friendly name in hermes model
description One-line pitch under the name in the picker
signup_url Where Hermes sends users who don't have a key yet
base_url The OpenAI-compatible chat-completions endpoint
default_aux_model Model Hermes uses for internal calls (planning probes, tool description embeddings) when no model is specified
fallback_models Models Hermes tries in order when the primary fails or isn't picked

That's the whole API surface. If you can fill in those fields, you have a working provider.

My Plugin

Here's the entire __init__.py for the Omnizen plugin — no abbreviation, no "…and so on":

Initial omnizen plugin for hermes

And the manifest next to it:

manifest write for hermese plugin

You can checkout the code here:
Hermes-Omnizen

What happens at runtime

Hermes Agent -> Omnizen AI integration

The flow is symmetric on both ends:

  1. At Hermes startup, the plugin's __init__.py executes once. The register_provider(omnizen) call drops the ProviderProfile into Hermes's in-memory registry. From Hermes's point of view, the provider now exists; nothing more is needed.
  2. The user runs hermes model, picks the provider from the menu, and Hermes stores their choice.
  3. The user runs hermes chat — or invokes a tool, or kicks off a multi-step plan, or hops to another agent through the Agent Communication Protocol. Hermes builds a standard OpenAI chat-completions request, reads OMNIZEN_API_KEY from the env, and POSTs to base_url. The gateway answers in OpenAI's SSE envelope. Hermes's existing parser handles the stream and the tool-call frames. None of that code knows the difference between Omnizen and OpenAI proper — the wire format is identical, so the same call path serves both.

The reason this works is the OpenAI Chat Completions API has become the lingua franca for "talk to an LLM." If the thing you're building a Hermes plugin for already speaks OpenAI — and most modern gateways do — your job is describe the gateway, not implement an adapter. The runtime does the rest.

Why the pattern matters (even if you skip the rest of the post)

A few lessons that generalise beyond Hermes:

  1. Treat aggregators as a single "provider" in your agent's mental model. It keeps the pluggable-model interface clean — one wire format, one auth, one place to swap. Don't try to make your agent multiplex five providers itself if a gateway is already doing that work.
  2. OpenAI-compatible is the lingua franca, even when the backend isn't OpenAI. If the thing you're calling speaks OpenAI on the wire, your agent doesn't need to know what's behind it — Anthropic, MiniMax, your home-grown Llama, doesn't matter.
  3. Provider plugins should be declarative, not procedural. Hermes gets this right: I described what the provider is, I didn't write any code about what to do. The runtime knows what to do because the wire format is fixed.
  4. You shouldn't have to maintain a fork to ship a vendor. When your provider SDK is this small (ProviderProfile + register_provider), adding a new vendor is a PR-sized commit, not a months-long integration. Hermes ships fifteen or so of these out of the box — the cost is so low it might as well be free.

If you're building anything that wants to be "model-agnostic," this is the seam to expose.

Gotchas

The things you only learn by shipping one of these:

  1. The default_aux_model matters more than you think. Hermes uses it for any internal call where the user didn't pick a model — planning probes, tool-description embeddings, name-it. If you set it to a heavyweight model, every interaction feels slow and twice as expensive before the user has even said anything. Pick a cheap-fast model for the aux; let the user spend on the chat call. I default to kimi-k2 here because it's roughly the speed of a thought.
  2. Fallbacks are silent. fallback_models swap in automatically when the primary fails (rate limit, 5xx). Great — until you're debugging "why does my answer have a different vibe today?" and realise Hermes quietly fell back two models down the chain. Log the actual responding model: the usage.model field on the SSE stream tells you the truth; the picker only tells you the intent.
  3. The plugin registers at import time. Which means if the plugin module isn't on Hermes's discovery path, Hermes won't see it. Symptom: hermes model doesn't list your provider. Cause: missing __init__.py in a parent directory, wrong working directory at launch, or the plugin folder is in ~/Documents/cool-hermes-stuff and Python can't see it. Fix: install the plugin as a module so it's on sys.path, don't just copy-paste it into a random folder and hope.
  4. OpenAI-compatible ≠ identical to OpenAI. Most gateways disagree with the OpenAI SDK on at least one streaming-chunk shape — usually the final usage frame, sometimes the role on the first delta. Hermes is forgiving here, but if you build your own provider plugin and watch your assistant's last token vanish into thin air, this is where to start looking. Send one real chat through and watch the stream end-to-end before you ship.

References

The gateway-side of this — how the Omnizen API actually fans one virtual key out to multiple model brands behind the curtain — is a separate post for another day. For this tutorial, all you need to know is it speaks OpenAI on the wire. The pattern works the same for any gateway that does.

Top comments (0)