DEV Community

Eli
Eli

Posted on

Use GPT Image 2 in Hermes with a custom image API

Hermes already has an image_generate tool. In many installs, that tool points at a built-in provider such as FAL or an OAuth-backed image backend.

But there is a common edge case: the image model is available through a third-party or internal OpenAI-compatible API instead of the default Hermes image backend.

For example, an organization might already expose GPT Image 2 from the same service that handles chat completions:

POST https://your-api.example.com/v1/images/generations
Enter fullscreen mode Exit fullscreen mode

In that case, the cleanest path is not to teach every agent session a custom curl command. It is to add a small Hermes image-generation backend plugin, then let the normal image_generate tool call that provider.

This keeps image generation inside the agent workflow:

agent prompt → image_generate → image provider plugin → custom API → local image path
Enter fullscreen mode Exit fullscreen mode

The API shape

The custom API only needs to behave like an image-generation endpoint.

A typical request looks like this:

{
  "model": "gpt-image-2",
  "prompt": "A realistic portrait image of an orange tabby cat sitting on a windowsill",
  "size": "1024x1536",
  "n": 1,
  "quality": "high"
}
Enter fullscreen mode Exit fullscreen mode

The response can contain either:

  • b64_json, which the plugin should save to a local image file
  • url, which the plugin can return directly

Hermes expects the provider result to contain a usable image reference, usually a local file path or a URL.

Why a backend plugin is better than a script

A one-off script works for testing, but it is awkward inside a real agent session.

A Hermes backend plugin gives you:

  • the existing image_generate tool interface
  • config-based provider selection
  • reusable support for low, medium, and high quality modes
  • local file saving for base64 responses
  • the ability to switch image providers without rewriting prompts

The agent does not need to know where the image model lives. It just calls image_generate.

Plugin layout

Hermes user plugins live under $HERMES_HOME/plugins/.

For an image generation backend, use this shape:

$HERMES_HOME/plugins/image_gen/<provider-name>/
├── plugin.yaml
└── __init__.py
Enter fullscreen mode Exit fullscreen mode

Example:

/data/hermes/plugins/image_gen/custom-gpt-image/
├── plugin.yaml
└── __init__.py
Enter fullscreen mode Exit fullscreen mode

A minimal plugin.yaml:

name: custom-gpt-image
version: 1.0.0
description: GPT Image 2 generation through a custom OpenAI-compatible API
kind: backend
requires_env: []
Enter fullscreen mode Exit fullscreen mode

The important part is kind: backend. This tells Hermes the plugin extends an existing tool category instead of registering a separate standalone tool.

For the broader Hermes plugin/config model, see the Hermes Agent documentation.

Provider class

The plugin registers an ImageGenProvider.

A simplified version:

from agent.image_gen_provider import (
    DEFAULT_ASPECT_RATIO,
    ImageGenProvider,
    error_response,
    resolve_aspect_ratio,
    save_b64_image,
    success_response,
)

API_MODEL = "gpt-image-2"

MODELS = {
    "gpt-image-2-low": {"quality": "low"},
    "gpt-image-2-medium": {"quality": "medium"},
    "gpt-image-2-high": {"quality": "high"},
}

SIZES = {
    "landscape": "1536x1024",
    "square": "1024x1024",
    "portrait": "1024x1536",
}

class CustomGPTImageProvider(ImageGenProvider):
    @property
    def name(self):
        return "custom-gpt-image"

    def generate(self, prompt: str, aspect_ratio: str = DEFAULT_ASPECT_RATIO, **kwargs):
        import os
        import requests

        prompt = (prompt or "").strip()
        aspect = resolve_aspect_ratio(aspect_ratio)
        model_id = kwargs.get("model") or "gpt-image-2-medium"
        quality = MODELS.get(model_id, MODELS["gpt-image-2-medium"])["quality"]

        if not prompt:
            return error_response(
                error="Prompt is required",
                error_type="invalid_argument",
                provider=self.name,
                model=model_id,
                aspect_ratio=aspect,
            )

        base_url = os.environ["CUSTOM_IMAGE_BASE_URL"].rstrip("/")
        api_key = os.environ["CUSTOM_IMAGE_API_KEY"]

        response = requests.post(
            f"{base_url}/images/generations",
            headers={"Authorization": f"Bearer {api_key}"},
            json={
                "model": API_MODEL,
                "prompt": prompt,
                "size": SIZES[aspect],
                "n": 1,
                "quality": quality,
            },
            timeout=240,
        )
        response.raise_for_status()

        item = response.json()["data"][0]

        if item.get("b64_json"):
            path = save_b64_image(item["b64_json"], prefix=f"custom_{model_id}")
            image = str(path)
        else:
            image = item["url"]

        return success_response(
            image=image,
            model=model_id,
            prompt=prompt,
            aspect_ratio=aspect,
            provider=self.name,
            extra={"quality": quality, "size": SIZES[aspect]},
        )


def register(ctx):
    ctx.register_image_gen_provider(CustomGPTImageProvider())
Enter fullscreen mode Exit fullscreen mode

Do not hard-code the token. Read it from environment variables, config.yaml, or the same provider configuration already used by the Hermes model endpoint.

Reusing an existing custom model endpoint

Some Hermes installs already use a custom OpenAI-compatible provider for chat:

model:
  provider: custom
  default: my-model
  base_url: https://your-api.example.com/v1
  api_key: ${YOUR_API_KEY}
Enter fullscreen mode Exit fullscreen mode

If the same upstream service also exposes image generation, the image plugin can read model.base_url and model.api_key, then call:

{model.base_url}/images/generations
Enter fullscreen mode Exit fullscreen mode

That avoids adding a second secret just for images.

Enable the provider

Enable the plugin and point image_gen at it:

plugins:
  enabled:
    - image_gen/custom-gpt-image

image_gen:
  provider: custom-gpt-image
  model: gpt-image-2-high
Enter fullscreen mode Exit fullscreen mode

Then restart the Hermes session or gateway so plugin discovery reloads.

You can also set the config through the Hermes CLI when appropriate:

hermes config set image_gen.provider custom-gpt-image
hermes config set image_gen.model gpt-image-2-high
Enter fullscreen mode Exit fullscreen mode

Test it

Ask Hermes for an image:

Generate a realistic portrait image of an orange tabby cat on a windowsill.
Enter fullscreen mode Exit fullscreen mode

A successful result should look roughly like this:

{
  "success": true,
  "image": "/data/hermes/cache/images/custom_gpt-image-2-high_20260630_134735.png",
  "model": "gpt-image-2-high",
  "provider": "custom-gpt-image",
  "size": "1024x1536",
  "quality": "high"
}
Enter fullscreen mode Exit fullscreen mode

The key field is image. If it is a local path, messaging gateways can deliver the file as an actual image attachment. If it is a URL, the client can render or fetch it directly.

Common gotchas

The provider does not appear

Check that:

  • plugin.yaml exists
  • kind: backend is set
  • the plugin is listed in plugins.enabled
  • Hermes was restarted after the plugin was added

The tool still uses another backend

image_generate is dispatched through image_gen.provider, not through the normal chat model name.

Check:

image_gen:
  provider: custom-gpt-image
Enter fullscreen mode Exit fullscreen mode

The API returns base64

Save the base64 payload with save_b64_image() and return the resulting path. Do not pass a large base64 string back into the chat.

High quality requests time out

gpt-image-2-high can take longer than text generation. Use a realistic request timeout and make sure the surrounding gateway or worker allows enough time.

The general pattern

This is not specific to one hosted API.

Any image model that can be called from Python can be wrapped as a Hermes image backend:

Hermes tool contract on one side, your image API on the other side.
Enter fullscreen mode Exit fullscreen mode

That is the useful boundary. The agent keeps using one stable tool, while the backend can be OpenAI-compatible, internal, self-hosted, or replaced later.

If someone wants this without setting up the plugin path themselves, an option is to use a ready-to-use agent environment such as ClawMama, where image generation can be used directly from Telegram or WhatsApp. The manual plugin setup is still worth understanding when you need to connect your own image backend.

Top comments (0)