Most image-model wrappers pick one model and call it. DALL-E, Imagen, Stable Diffusion, Flux — pick your favorite, ship an API. The trade-off is fixed: one model's strengths become your whole tool's strengths, and its weaknesses become yours too.
prompt-to-asset takes a different angle. It's an MCP server (Model Context Protocol — the open standard Claude and a growing list of clients use for tool integration) that routes a request to the right image model for the task, out of 30+. The routing decisions live in a JSON table; the hard question this post is about is how that table got built.
Why routing at all
Image models have wildly different strengths. A short list of specifics:
- Text rendering. Imagen and Flux Pro are decent. Stable Diffusion and most Midjourney-clones produce garbled letterforms. If your prompt involves in-image text, routing matters.
- Transparent backgrounds. Only a subset of models produce clean alpha. The rest force you to matt after generation.
- Style adherence. For "flat vector editorial illustration, no 3D," some models comply on first try. Others need 3-4 regenerations.
- Aspect ratios. Models have preferred training resolutions. Requesting 2.4:1 when the model was trained at 1:1 produces low-quality output or silent crops.
- Cost. Free tiers (Pollinations, Stable Horde, HuggingFace Inference) work for drafts. Paid tiers (Imagen 3, Flux Pro, DALL-E 3) for finals. Picking the wrong tier for the use case wastes either money or quality.
Given all that, one model per tool is a compromise. Routing lets you respect the compromise instead of pretending it doesn't exist.
The routing table
The table lives in data/routing-table.json. Each entry looks roughly like this (simplified):
{
"task": "app_icon",
"aspect_ratio": "1:1",
"constraints": {
"no_text": true,
"transparent_background_optional": true
},
"preferred_models": [
{"model": "imagen-3", "tier": "paid", "score": 9},
{"model": "flux-pro-1.1", "tier": "paid", "score": 8},
{"model": "pollinations-flux", "tier": "free", "score": 6}
],
"never": ["dall-e-2", "stable-diffusion-1.5"],
"post_processing": ["center_crop", "alpha_matte_if_needed"]
}
Three things about that structure that took me time to get right:
- "Never" is explicit. Routing by best-fit is fine until a user requests "app icon" and the router picks a model that can't handle square aspect. Listing the models to exclude per task prevents the correct-on-average, wrong-on-edge-cases failure mode.
- Scores are 1-10, not best-only. If the top-scored model is down or rate-limited, the router falls through. Fallback paths matter more than I initially thought.
- Post-processing is part of the route. "Generate a 1024x1024 icon, then center-crop, then alpha-matte" is a single logical route even though it involves multiple steps. Separating model selection from post-processing prematurely made the table harder to reason about.
How the table got built
Honestly? Mostly by running the same ~30 prompts through each of the 30+ models and eyeballing the outputs. There's no automated scoring for "this looks like a good app icon." You see it or you don't.
I tried LLM-based scoring for a while — generate an image, have GPT-4-vision rate it. It was 60% accurate and introduced its own biases (favored photorealism when the target was flat illustration, etc.). Faster than manual but noisier. I ended up with a hybrid: LLM-pre-screening to cut the top 3, then human pick.
What I'd do differently if starting over: build the eval harness first, not the router. An eval harness is a script that takes a set of reference prompts and runs them through a model, spitting out a gallery of results. Even without automated scoring, having the gallery on disk for 30 models × 30 prompts = 900 images laid out in a grid makes routing decisions fast. Without the harness, I was regenerating the same images over and over.
Three execution modes
The server exposes three ways to use it:
-
inline_svg— the host LLM writes SVG directly. No external model call. For simple logos and wordmarks this is fastest, free, and the LLM often does better than a diffusion model at pure-vector tasks. -
external_prompt_only— the server returns a structured prompt you paste into another tool (e.g., dev.to's cover generator, Midjourney, whatever). For when you want to run the model yourself. -
api— the server routes to an actual image-generation API, returns the image. Default mode for most use cases.
Mode 1 surprised me. I built it last, thinking it was niche. It turns out "have Claude write the SVG" is correct for probably 40% of icon/logo requests. The LLM writes cleaner, more editable output than any diffusion model can.
Free-tier first
One design decision I'd recommend to anyone building similar tooling: make zero-key usage the first-class path. prompt-to-asset defaults to Pollinations for drafts, falling through to HuggingFace Inference and Stable Horde if Pollinations is down. No API key required to get a first result.
The paid-model routes exist for quality runs. But the first-time user who just wants to see what the tool does shouldn't have to get a Google Cloud billing account set up to see anything.
What's next
The routing table grows as the model landscape changes. Every month there's a new Flux version or a new open-source diffusion model worth evaluating. The tool's longevity depends on the table staying current, not on any single routing decision being optimal.
If you hit a task where the routing picks the wrong model, open an issue with the task description and the output you got. Those reports are how the table stays honest.
Repo: github.com/MohamedAbdallah-14/prompt-to-asset. MIT licensed. MCP server, installable in Claude Code + compatible clients.
Top comments (0)