Why AI-Generated UIs Are All the Same Color — And the Data to Prove It
Have you ever noticed that AI-generated websites all look... blue-purple? It's not your imagination. I ran 480 experiments to find out why.
The Experiment
I showed 40 color swatches to 4 Vision-Language Models (GPT-4o, Claude 3.5 Sonnet, Claude Sonnet 4, LLaVA 7B) and asked: "What's the HEX code of this color?" Three trials per color, per model. 480 data points total, evaluated using CIEDE2000 — the industry standard for perceptual color difference.
Full paper: AI Blue: Systematic Color Recognition Bias in Vision-Language Models (DOI: 10.5281/zenodo.19159702)
Code & data: github.com/kenimo49/ai-blue-color-bias
Three Patterns Emerged
Pattern 1: Pure colors are fine. In-between colors break.
Red, blue, green, yellow — models nail these almost perfectly. But teal, lime green, chartreuse, mauve? Accuracy drops dramatically.
Why? Web text is full of "red" and "blue" but rarely mentions "chartreuse." The models default to colors they've seen named most often.
Commercial models (GPT-4o, Claude):
- Mean ΔE₀₀: 2.51 – 3.33 (near-human accuracy for pure colors)
- But mid-tones showed 2–3x higher error
Open-source (LLaVA 7B):
- Mean ΔE₀₀: 24.63 (essentially guessing)
Pattern 2: Pastel colors are the worst
Lower saturation = worse accuracy. The irony: modern UI design heavily uses pastels and muted tones. The exact color space designers want is the one AI handles worst.
Pattern 3: 95.4% of AI-generated UI colors are blue-purple
I analyzed pixel-level color distribution across AI-generated UI samples. The result:
- 95.4% clustered around 240° (blue-purple)
- 4.6% at 210° (azure)
- 0% outside the 210°–270° range
That's not a design choice. That's a systematic bias.
The Feedback Loop
This is where it gets concerning:
- VLMs recognize pure colors (especially blue) most accurately
- During generation, they default to "easily recognizable" colors → blue-purple
- AI-generated UIs (full of blue-purple) enter training data
- Next-gen models amplify the bias
- Repeat
Left unchecked, AI-generated design will converge to an ever-narrower color palette.
Statistical Validation
The differences aren't random noise:
| Comparison | Result |
|---|---|
| Kruskal-Wallis (all 4 models) | H=110.15, p<.001 |
| GPT-4o vs Claude 3.5 Sonnet | p=.133 (not significant) |
| GPT-4o vs Claude Sonnet 4 | p=.060 (not significant) |
| Commercial vs LLaVA 7B | p<.001, d=-1.75 (massive gap) |
Commercial models perform similarly to each other — but the gap between commercial and open-source is enormous (Cohen's d = -1.75).
What You Can Do Right Now
-
Specify HEX codes, not color names. Don't say "blue-ish." Say
#0EA5E9. - Always verify mid-tones visually. Teal, lime, mauve — check these manually.
- Add design token validation to your pipeline.
- Intentionally choose non-blue-purple palettes. That alone differentiates your AI-assisted design from 95% of the crowd.
The Bigger Picture: AI Slop
This color bias is one symptom of a larger phenomenon called AI Slop — the tendency of AI-generated content to converge on the same patterns. Purple gradients, Inter font, equal-spaced card grids. Merriam-Webster named "slop" its 2025 Word of the Year.
I've written a comprehensive guide covering the full escape route: typography, color, motion, spatial composition, and backgrounds — with Before/After experiments for each axis.
📖 AI Slop Escape Guide (Zenn Book, Japanese)
📄 AI Blue Paper (DOI: 10.5281/zenodo.19159702)
💻 Experiment Code & Data (GitHub)
Ken Imoto — Software Engineer, Propel-Lab. Building at the intersection of AI agents, WebRTC, and human-AI interaction.
kenimoto.dev · LinkedIn · GitHub
Top comments (0)