Hermes already has an image_generate tool. In many installs, that tool points at a built-in provider such as FAL or an OAuth-backed image backend.
But there is a common edge case: the image model is available through a third-party or internal OpenAI-compatible API instead of the default Hermes image backend.
For example, an organization might already expose GPT Image 2 from the same service that handles chat completions:
POST https://your-api.example.com/v1/images/generations
In that case, the cleanest path is not to teach every agent session a custom curl command. It is to add a small Hermes image-generation backend plugin, then let the normal image_generate tool call that provider.
This keeps image generation inside the agent workflow:
agent prompt → image_generate → image provider plugin → custom API → local image path
The API shape
The custom API only needs to behave like an image-generation endpoint.
A typical request looks like this:
{
"model": "gpt-image-2",
"prompt": "A realistic portrait image of an orange tabby cat sitting on a windowsill",
"size": "1024x1536",
"n": 1,
"quality": "high"
}
The response can contain either:
-
b64_json, which the plugin should save to a local image file -
url, which the plugin can return directly
Hermes expects the provider result to contain a usable image reference, usually a local file path or a URL.
Why a backend plugin is better than a script
A one-off script works for testing, but it is awkward inside a real agent session.
A Hermes backend plugin gives you:
- the existing
image_generatetool interface - config-based provider selection
- reusable support for
low,medium, andhighquality modes - local file saving for base64 responses
- the ability to switch image providers without rewriting prompts
The agent does not need to know where the image model lives. It just calls image_generate.
Plugin layout
Hermes user plugins live under $HERMES_HOME/plugins/.
For an image generation backend, use this shape:
$HERMES_HOME/plugins/image_gen/<provider-name>/
├── plugin.yaml
└── __init__.py
Example:
/data/hermes/plugins/image_gen/custom-gpt-image/
├── plugin.yaml
└── __init__.py
A minimal plugin.yaml:
name: custom-gpt-image
version: 1.0.0
description: GPT Image 2 generation through a custom OpenAI-compatible API
kind: backend
requires_env: []
The important part is kind: backend. This tells Hermes the plugin extends an existing tool category instead of registering a separate standalone tool.
For the broader Hermes plugin/config model, see the Hermes Agent documentation.
Provider class
The plugin registers an ImageGenProvider.
A simplified version:
from agent.image_gen_provider import (
DEFAULT_ASPECT_RATIO,
ImageGenProvider,
error_response,
resolve_aspect_ratio,
save_b64_image,
success_response,
)
API_MODEL = "gpt-image-2"
MODELS = {
"gpt-image-2-low": {"quality": "low"},
"gpt-image-2-medium": {"quality": "medium"},
"gpt-image-2-high": {"quality": "high"},
}
SIZES = {
"landscape": "1536x1024",
"square": "1024x1024",
"portrait": "1024x1536",
}
class CustomGPTImageProvider(ImageGenProvider):
@property
def name(self):
return "custom-gpt-image"
def generate(self, prompt: str, aspect_ratio: str = DEFAULT_ASPECT_RATIO, **kwargs):
import os
import requests
prompt = (prompt or "").strip()
aspect = resolve_aspect_ratio(aspect_ratio)
model_id = kwargs.get("model") or "gpt-image-2-medium"
quality = MODELS.get(model_id, MODELS["gpt-image-2-medium"])["quality"]
if not prompt:
return error_response(
error="Prompt is required",
error_type="invalid_argument",
provider=self.name,
model=model_id,
aspect_ratio=aspect,
)
base_url = os.environ["CUSTOM_IMAGE_BASE_URL"].rstrip("/")
api_key = os.environ["CUSTOM_IMAGE_API_KEY"]
response = requests.post(
f"{base_url}/images/generations",
headers={"Authorization": f"Bearer {api_key}"},
json={
"model": API_MODEL,
"prompt": prompt,
"size": SIZES[aspect],
"n": 1,
"quality": quality,
},
timeout=240,
)
response.raise_for_status()
item = response.json()["data"][0]
if item.get("b64_json"):
path = save_b64_image(item["b64_json"], prefix=f"custom_{model_id}")
image = str(path)
else:
image = item["url"]
return success_response(
image=image,
model=model_id,
prompt=prompt,
aspect_ratio=aspect,
provider=self.name,
extra={"quality": quality, "size": SIZES[aspect]},
)
def register(ctx):
ctx.register_image_gen_provider(CustomGPTImageProvider())
Do not hard-code the token. Read it from environment variables, config.yaml, or the same provider configuration already used by the Hermes model endpoint.
Reusing an existing custom model endpoint
Some Hermes installs already use a custom OpenAI-compatible provider for chat:
model:
provider: custom
default: my-model
base_url: https://your-api.example.com/v1
api_key: ${YOUR_API_KEY}
If the same upstream service also exposes image generation, the image plugin can read model.base_url and model.api_key, then call:
{model.base_url}/images/generations
That avoids adding a second secret just for images.
Enable the provider
Enable the plugin and point image_gen at it:
plugins:
enabled:
- image_gen/custom-gpt-image
image_gen:
provider: custom-gpt-image
model: gpt-image-2-high
Then restart the Hermes session or gateway so plugin discovery reloads.
You can also set the config through the Hermes CLI when appropriate:
hermes config set image_gen.provider custom-gpt-image
hermes config set image_gen.model gpt-image-2-high
Test it
Ask Hermes for an image:
Generate a realistic portrait image of an orange tabby cat on a windowsill.
A successful result should look roughly like this:
{
"success": true,
"image": "/data/hermes/cache/images/custom_gpt-image-2-high_20260630_134735.png",
"model": "gpt-image-2-high",
"provider": "custom-gpt-image",
"size": "1024x1536",
"quality": "high"
}
The key field is image. If it is a local path, messaging gateways can deliver the file as an actual image attachment. If it is a URL, the client can render or fetch it directly.
Common gotchas
The provider does not appear
Check that:
-
plugin.yamlexists -
kind: backendis set - the plugin is listed in
plugins.enabled - Hermes was restarted after the plugin was added
The tool still uses another backend
image_generate is dispatched through image_gen.provider, not through the normal chat model name.
Check:
image_gen:
provider: custom-gpt-image
The API returns base64
Save the base64 payload with save_b64_image() and return the resulting path. Do not pass a large base64 string back into the chat.
High quality requests time out
gpt-image-2-high can take longer than text generation. Use a realistic request timeout and make sure the surrounding gateway or worker allows enough time.
The general pattern
This is not specific to one hosted API.
Any image model that can be called from Python can be wrapped as a Hermes image backend:
Hermes tool contract on one side, your image API on the other side.
That is the useful boundary. The agent keeps using one stable tool, while the backend can be OpenAI-compatible, internal, self-hosted, or replaced later.
If someone wants this without setting up the plugin path themselves, an option is to use a ready-to-use agent environment such as ClawMama, where image generation can be used directly from Telegram or WhatsApp. The manual plugin setup is still worth understanding when you need to connect your own image backend.
Top comments (0)