Routing 30+ image models with one MCP server

#ai #mcp #opensource #typescript

Most image-model wrappers pick one model and call it. DALL-E, Imagen, Stable Diffusion, Flux — pick your favorite, ship an API. The trade-off is fixed: one model's strengths become your whole tool's strengths, and its weaknesses become yours too.

prompt-to-asset takes a different angle. It's an MCP server (Model Context Protocol — the open standard Claude and a growing list of clients use for tool integration) that routes a request to the right image model for the task, out of 30+. The routing decisions live in a JSON table; the hard question this post is about is how that table got built.

Why routing at all

Image models have wildly different strengths. A short list of specifics:

Text rendering. Imagen and Flux Pro are decent. Stable Diffusion and most Midjourney-clones produce garbled letterforms. If your prompt involves in-image text, routing matters.
Transparent backgrounds. Only a subset of models produce clean alpha. The rest force you to matt after generation.
Style adherence. For "flat vector editorial illustration, no 3D," some models comply on first try. Others need 3-4 regenerations.
Aspect ratios. Models have preferred training resolutions. Requesting 2.4:1 when the model was trained at 1:1 produces low-quality output or silent crops.
Cost. Free tiers (Pollinations, Stable Horde, HuggingFace Inference) work for drafts. Paid tiers (Imagen 3, Flux Pro, DALL-E 3) for finals. Picking the wrong tier for the use case wastes either money or quality.

Given all that, one model per tool is a compromise. Routing lets you respect the compromise instead of pretending it doesn't exist.

The routing table

The table lives in data/routing-table.json. Each entry looks roughly like this (simplified):

{
  "task": "app_icon",
  "aspect_ratio": "1:1",
  "constraints": {
    "no_text": true,
    "transparent_background_optional": true
  },
  "preferred_models": [
    {"model": "imagen-3", "tier": "paid", "score": 9},
    {"model": "flux-pro-1.1", "tier": "paid", "score": 8},
    {"model": "pollinations-flux", "tier": "free", "score": 6}
  ],
  "never": ["dall-e-2", "stable-diffusion-1.5"],
  "post_processing": ["center_crop", "alpha_matte_if_needed"]
}

Three things about that structure that took me time to get right:

"Never" is explicit. Routing by best-fit is fine until a user requests "app icon" and the router picks a model that can't handle square aspect. Listing the models to exclude per task prevents the correct-on-average, wrong-on-edge-cases failure mode.
Scores are 1-10, not best-only. If the top-scored model is down or rate-limited, the router falls through. Fallback paths matter more than I initially thought.
Post-processing is part of the route. "Generate a 1024x1024 icon, then center-crop, then alpha-matte" is a single logical route even though it involves multiple steps. Separating model selection from post-processing prematurely made the table harder to reason about.

How the table got built

Honestly? Mostly by running the same ~30 prompts through each of the 30+ models and eyeballing the outputs. There's no automated scoring for "this looks like a good app icon." You see it or you don't.

I tried LLM-based scoring for a while — generate an image, have GPT-4-vision rate it. It was 60% accurate and introduced its own biases (favored photorealism when the target was flat illustration, etc.). Faster than manual but noisier. I ended up with a hybrid: LLM-pre-screening to cut the top 3, then human pick.

What I'd do differently if starting over: build the eval harness first, not the router. An eval harness is a script that takes a set of reference prompts and runs them through a model, spitting out a gallery of results. Even without automated scoring, having the gallery on disk for 30 models × 30 prompts = 900 images laid out in a grid makes routing decisions fast. Without the harness, I was regenerating the same images over and over.

Three execution modes

The server exposes three ways to use it:

inline_svg — the host LLM writes SVG directly. No external model call. For simple logos and wordmarks this is fastest, free, and the LLM often does better than a diffusion model at pure-vector tasks.
external_prompt_only — the server returns a structured prompt you paste into another tool (e.g., dev.to's cover generator, Midjourney, whatever). For when you want to run the model yourself.
api — the server routes to an actual image-generation API, returns the image. Default mode for most use cases.

Mode 1 surprised me. I built it last, thinking it was niche. It turns out "have Claude write the SVG" is correct for probably 40% of icon/logo requests. The LLM writes cleaner, more editable output than any diffusion model can.

Free-tier first

One design decision I'd recommend to anyone building similar tooling: make zero-key usage the first-class path. prompt-to-asset defaults to Pollinations for drafts, falling through to HuggingFace Inference and Stable Horde if Pollinations is down. No API key required to get a first result.

The paid-model routes exist for quality runs. But the first-time user who just wants to see what the tool does shouldn't have to get a Google Cloud billing account set up to see anything.

What's next

The routing table grows as the model landscape changes. Every month there's a new Flux version or a new open-source diffusion model worth evaluating. The tool's longevity depends on the table staying current, not on any single routing decision being optimal.

If you hit a task where the routing picks the wrong model, open an issue with the task description and the output you got. Those reports are how the table stays honest.

Repo: github.com/MohamedAbdallah-14/prompt-to-asset. MIT licensed. MCP server, installable in Claude Code + compatible clients.