DEV Community

Cover image for Routing 30+ image models with one MCP server
Mohamed Abdallah
Mohamed Abdallah

Posted on

Routing 30+ image models with one MCP server

Most image-model wrappers pick one model and call it. DALL-E, Imagen, Stable Diffusion, Flux — pick your favorite, ship an API. The trade-off is fixed: one model's strengths become your whole tool's strengths, and its weaknesses become yours too.

prompt-to-asset takes a different angle. It's an MCP server (Model Context Protocol — the open standard Claude and a growing list of clients use for tool integration) that routes a request to the right image model for the task, out of 30+. The routing decisions live in a JSON table; the hard question this post is about is how that table got built.

Why routing at all

Image models have wildly different strengths. A short list of specifics:

  • Text rendering. Imagen and Flux Pro are decent. Stable Diffusion and most Midjourney-clones produce garbled letterforms. If your prompt involves in-image text, routing matters.
  • Transparent backgrounds. Only a subset of models produce clean alpha. The rest force you to matt after generation.
  • Style adherence. For "flat vector editorial illustration, no 3D," some models comply on first try. Others need 3-4 regenerations.
  • Aspect ratios. Models have preferred training resolutions. Requesting 2.4:1 when the model was trained at 1:1 produces low-quality output or silent crops.
  • Cost. Free tiers (Pollinations, Stable Horde, HuggingFace Inference) work for drafts. Paid tiers (Imagen 3, Flux Pro, DALL-E 3) for finals. Picking the wrong tier for the use case wastes either money or quality.

Given all that, one model per tool is a compromise. Routing lets you respect the compromise instead of pretending it doesn't exist.

The routing table

The table lives in data/routing-table.json. Each entry looks roughly like this (simplified):

{
  "task": "app_icon",
  "aspect_ratio": "1:1",
  "constraints": {
    "no_text": true,
    "transparent_background_optional": true
  },
  "preferred_models": [
    {"model": "imagen-3", "tier": "paid", "score": 9},
    {"model": "flux-pro-1.1", "tier": "paid", "score": 8},
    {"model": "pollinations-flux", "tier": "free", "score": 6}
  ],
  "never": ["dall-e-2", "stable-diffusion-1.5"],
  "post_processing": ["center_crop", "alpha_matte_if_needed"]
}
Enter fullscreen mode Exit fullscreen mode

Three things about that structure that took me time to get right:

  1. "Never" is explicit. Routing by best-fit is fine until a user requests "app icon" and the router picks a model that can't handle square aspect. Listing the models to exclude per task prevents the correct-on-average, wrong-on-edge-cases failure mode.
  2. Scores are 1-10, not best-only. If the top-scored model is down or rate-limited, the router falls through. Fallback paths matter more than I initially thought.
  3. Post-processing is part of the route. "Generate a 1024x1024 icon, then center-crop, then alpha-matte" is a single logical route even though it involves multiple steps. Separating model selection from post-processing prematurely made the table harder to reason about.

How the table got built

Honestly? Mostly by running the same ~30 prompts through each of the 30+ models and eyeballing the outputs. There's no automated scoring for "this looks like a good app icon." You see it or you don't.

I tried LLM-based scoring for a while — generate an image, have GPT-4-vision rate it. It was 60% accurate and introduced its own biases (favored photorealism when the target was flat illustration, etc.). Faster than manual but noisier. I ended up with a hybrid: LLM-pre-screening to cut the top 3, then human pick.

What I'd do differently if starting over: build the eval harness first, not the router. An eval harness is a script that takes a set of reference prompts and runs them through a model, spitting out a gallery of results. Even without automated scoring, having the gallery on disk for 30 models × 30 prompts = 900 images laid out in a grid makes routing decisions fast. Without the harness, I was regenerating the same images over and over.

Three execution modes

The server exposes three ways to use it:

  1. inline_svg — the host LLM writes SVG directly. No external model call. For simple logos and wordmarks this is fastest, free, and the LLM often does better than a diffusion model at pure-vector tasks.
  2. external_prompt_only — the server returns a structured prompt you paste into another tool (e.g., dev.to's cover generator, Midjourney, whatever). For when you want to run the model yourself.
  3. api — the server routes to an actual image-generation API, returns the image. Default mode for most use cases.

Mode 1 surprised me. I built it last, thinking it was niche. It turns out "have Claude write the SVG" is correct for probably 40% of icon/logo requests. The LLM writes cleaner, more editable output than any diffusion model can.

Free-tier first

One design decision I'd recommend to anyone building similar tooling: make zero-key usage the first-class path. prompt-to-asset defaults to Pollinations for drafts, falling through to HuggingFace Inference and Stable Horde if Pollinations is down. No API key required to get a first result.

The paid-model routes exist for quality runs. But the first-time user who just wants to see what the tool does shouldn't have to get a Google Cloud billing account set up to see anything.

What's next

The routing table grows as the model landscape changes. Every month there's a new Flux version or a new open-source diffusion model worth evaluating. The tool's longevity depends on the table staying current, not on any single routing decision being optimal.

If you hit a task where the routing picks the wrong model, open an issue with the task description and the output you got. Those reports are how the table stays honest.


Repo: github.com/MohamedAbdallah-14/prompt-to-asset. MIT licensed. MCP server, installable in Claude Code + compatible clients.


Top comments (0)