TL;DR:
- AI-built sites look uncannily similar because they share the same defaults — Tailwind + shadcn/ui + Lucide + the same gradients. It's not a placeholder problem; it's a visual-stack problem. Real, project-specific images are the cheapest way out.
- I wrote a small Claude Code skill that wraps Codex CLI's
gpt-image-2and triggers on natural-language asks. Drop aDESIGN.mdat the project root, tell Claude to insert images, and you get a coherent, on-brand set across the site. - Biggest win for solo developers shipping without a designer. Repo: github.com/JunSeo99/claude-skill-codex-imagegen — install takes 30 seconds (or just hand the URL to Claude Code itself).
Claude Code can build a working site in one session. The structure, the routing, the component library — it all comes together fine on the first pass. The problem is more subtle: most of these sites end up looking like each other.
The reason is the stack. Claude reaches for the same defaults every time — Tailwind, shadcn/ui, Lucide icons, a slate-or-zinc palette, a hero with a soft purple-to-blue gradient, cards with a 1px border and rounded-2xl corners, an abstract SVG blob somewhere in the header. None of those choices are bad. But across hundreds of vibe-coded sites, the cumulative effect is that someone landing on one feels like they've been on this site before — even when they haven't. Visitors don't say "this is shadcn." They say "this feels AI-generated." And the surface they're reacting to is mostly visual: the same component library, the same icon language, the same illustration-less spaces.
The cheapest way out of that uniformity, I've found, is real images. Not stock. Not Unsplash. Project-specific, style-consistent images generated to match a brand voice. Three or four of them placed where default vibe-coded sites would have left a Lucide icon over a gradient, and the "feels AI-generated" reaction collapses. The site stops reading as a template.
I wanted that to stop being a manual step.
In April, OpenAI shipped gpt-image-2 and bundled an $imagegen skill into Codex CLI. That gave me what I needed: a real image model I could shell out to from inside Claude Code. So I wrote a Claude Code skill that triggers on natural-language asks like "make a hero image for this landing page" and dispatches the actual generation to Codex.
Then I spent a weekend learning why nobody had a clean solution yet.
gpt-image-2 has three sharp edges and none of them are documented loudly
These are the things I hit, in order, on the first day:
Size requests are advisory, not enforced. I asked for 256×256. Got 1254×1254. Asked for 1024×1024 — also 1254×1254. The model picks its own dimensions based on what it thinks the prompt needs. If you actually need a specific size for a CSS slot, you resize after, not before. You can't prompt your way out of it.
Transparent PNGs aren't supported. gpt-image-2 will not emit alpha. Only gpt-image-1.5 does. This is buried in the OpenAI image-generation guide. The first time I asked for an icon "on transparent background," I got a perfectly nice icon sitting on a solid white square. The workaround is to generate on a flat removable background — green or pure white — and chroma-key it out locally. Fine, but you need to know that going in.
The PNG doesn't land where you asked. It lands at ~/.codex/generated_images/<session-uuid>/ig_*.png. Telling Codex "save to assets/hero.png" doesn't move the file there. You move it yourself afterwards.
Each of those is a 20-minute debug session if you don't know them. Stacked, they make image generation feel "kind of broken" when it's actually working as designed, just badly documented.
And then there's the prompt itself
Even if you handle all three edges above, your output is only as good as your prompt. And gpt-image-2 punishes keyword soup.
The "stunning cinematic 8K masterpiece volumetric lighting" energy that worked on Midjourney v5 produces visibly worse output here. The OpenAI cookbook recommends a five-part structure — Scene → Subject → Details → Use case → Constraints — and front-loading the first 50 words because the model weights the opening more heavily. This is real. I A/B'd it. The five-part one wins every time.
For text in images (logos, banners, posters), wrap the literal text in double quotes or ALL CAPS so the model knows what's literal vs. descriptive. gpt-image-2 is genuinely strong here — short labels, signs, and UI mockups land at near-perfect spelling across Latin and CJK scripts, which is a meaningful jump from older models. Where it still wobbles is (a) long multi-line paragraphs baked into the image, (b) brand names and uncommon spellings, and (c) very small text inside dense layouts. For brand names, the OpenAI prompting guide recommends spelling the tricky word out letter-by-letter in the prompt ("the word ACME spelled A-C-M-E"). For paragraph-length text, render it as an HTML/CSS overlay over the generated image instead of asking the model to bake it in — that's the workflow gpt-image-2's own docs recommend.
For edits, the trick is "change only X, keep everything else identical." The model preserves what you don't mention vaguely — but it preserves what you explicitly tell it to keep very well.
None of this lives in the recipes that just say "run codex exec and you're done." So I baked all of it into the skill's playbook.
What the skill actually does
One SKILL.md plus two reference files (prompting-guide.md, cli-reference.md) that Claude Code auto-loads from ~/.claude/skills/codex-imagegen/. No Node, no install step beyond git clone && ln -s.
When you say something like "make a hero image of an origami crane for the landing page, save to assets/hero.png at 1600×900," the skill:
- Rewrites your request into the five-part structure (Scene → Subject → Details → Use case → Constraints), front-loaded.
- Runs
codex exec --sandbox workspace-write '$imagegen <prompt>. Print only the absolute path on the last line.'— Codex generates, doesn't move. - Parses the path from stdout. Runs
cpandsips -z 900 1600(macOS) orconvert -resize 1600x900(Linux) to land the file where you actually asked. - Prints the final path. Done.
The natural-language trigger is the part that matters most to my actual goal. I want Claude Code, mid-build, to decide on its own that this <section> needs a hero image, and just generate one. Not "user types a special slash command." The skill fires from phrases like "generate an image," "make an icon," "create a banner," "OG image," "hero illustration." Claude calls it the same way it calls anything else in its toolkit.
That's the whole point. The site shouldn't end up looking like every other vibe-coded site because the agent never broke out of its default visual stack. The agent building the site should be reaching for project-specific imagery on its own.
The trick that changes everything: DESIGN.md
Here's the bit I didn't expect to matter as much as it does.
If you drop a DESIGN.md at the root of your project — palette, type, illustration style, tone — and then ask Claude Code:
Using DESIGN.md as the style reference, insert images that fit the site.
…it just works. Really well.
Claude reads DESIGN.md, decides which slots in the codebase need imagery, writes prompts that incorporate the palette and tone, calls the skill, and inserts the resulting paths into the right <img> tags. The hero image, the empty-state illustration, the OG card, and the favicon all end up looking like they belong to the same product. Without DESIGN.md it still works, but each image drifts a little — palette, mood, lighting are all slightly off across slots, and you can feel it even if you can't immediately name what's wrong.
DESIGN.md doesn't have to be fancy. Here's a trimmed version of one I'm using right now:
# Design
## Concept
Calm, considered, modern. The kind of feel that gets out of the user's
way instead of demanding attention.
## Palette
- Surface (main): #F4F1ED — warm off-white
- Surface (cards): #FFFFFF
- Text: #1A1A1A — near-black, not pure
- Accent / CTA: #C46A4E — soft terracotta, used sparingly
## Typography
- Inter, system-ui sans-serif
## Illustration style
- Single subject, plenty of whitespace, no busy backgrounds
- Soft natural light from upper left, gentle shadows
- Hand-folded paper / origami feel where applicable
- No text inside images unless explicitly asked
- Avoid stock-photo vibes and over-saturated colors
That's 20-ish lines. But Claude treats it as a hard constraint when writing prompts, and the visual consistency across a 4–5 page site is night and day vs. asking for each image cold. The "Illustration style" block is doing about 80% of the work — palette obviously matters, but the qualitative instructions ("hand-folded paper feel," "no busy backgrounds") are what stop each image from feeling like it came from a different stock-image library.
Why this is the year vibe-coded sites stop looking vibe-coded
A year ago this would have been a different post. Back then, even if you wanted to break out of the shadcn-default look, generated images weren't the answer. The available models produced output that screamed AI louder than the layout did — slightly melted typography, off-axis lighting, the same handful of obvious tells. So the fastest path was usually "just don't add an image," and the result was a sea of sites that all leaned on the same component library to do all the visual work.
gpt-image-2 changes the math. With a tight DESIGN.md and a five-part prompt, generated images now look like they came from a brand, not from a model. Text spells correctly. Light angles agree across slots. Subject framing is intentional. They're not hand-crafted illustrations from an agency, but they no longer carry the "AI tell" that earlier generations did. And once those images sit alongside the shadcn cards and the Lucide icons, they shift where the eye lands. A visitor reads the hero illustration, the OG card, the empty-state graphic — slots that on a default vibe-coded site were either missing or generic — and the site registers as a product instead of a template.
The interesting part isn't any individual image. It's that the gap between "site built by a small team with a designer on call" and "site built solo with Claude Code overnight" is mostly carried by image quality and visual specificity. The structure is solved. The components are solved. What's left, and what was carrying most of the "feels AI-generated" signal, was the image layer — and that's the slot this skill fills.
If you ship as a solo developer — no designer on call, no illustration budget, no Figma file from a teammate — this is the part of the workflow that used to force a compromise. Either you paid for stock images that didn't quite match the rest of the site, or you pulled an SVG from Heroicons and called it a hero. With gpt-image-2 plus a DESIGN.md, that compromise mostly goes away. The same person who writes the code can produce custom, on-brand visuals in the same session, without leaving the editor and without commissioning anyone. That's the audience I built this skill for, and the audience it changes the most for. Designers will always have an edge on intentional taste — I'm not pretending otherwise — but for the long tail of side projects, landing pages, and internal tools that were never going to get a designer in the first place, the bar just moved.
Once you have this loop — Claude builds the site, reads DESIGN.md, decides where images belong, generates them with consistent style, drops them in place — visitors stop registering that AI built the site. Which is the bar.
Caveats
- gpt-image-2 turns burn 3–5× your Codex usage limit vs. a plain text turn. If you're iterating a lot, set
OPENAI_API_KEYand switch to per-image API billing. - macOS is primary. Linux works via ImageMagick. Windows is not on the roadmap.
- The skill is around 200 lines of markdown plus a small shell helper. If you don't like a default, edit it. There's no framework to wrestle.
- For small text or dense multi-font layouts, bump quality to medium or high — gpt-image-2 is honest about which slots benefit from extra compute.
Repo
github.com/JunSeo99/claude-skill-codex-imagegen
git clone https://github.com/JunSeo99/claude-skill-codex-imagegen \
~/.claude/skills/codex-imagegen
If even that feels like effort, just hand the repo URL to Claude Code itself and tell it to install the skill — something like "install this Claude Code skill: https://github.com/JunSeo99/claude-skill-codex-imagegen". It'll read the README, run the clone-and-symlink, and the next session will just have it. Mildly recursive — using Claude Code to install something Claude Code is going to use — but it works, and honestly it's how I install most of my own skills these days.
Once it's installed, restart Claude Code. Drop a DESIGN.md at the root of your project. Build your site. Then say: "Using DESIGN.md as the style reference, insert images that fit the site."
Curious if anyone else is doing the DESIGN.md-as-style-anchor pattern for AI-generated assets — I'd love to compare notes on which fields actually move the needle and which are noise. The "Illustration style" block is doing 80% of the work in my setup, but I haven't tested it across enough projects to call it.
And feedback on the skill itself is genuinely welcome — issues, PRs, "this default is wrong," "this caveat is missing," "this prompt pattern didn't work for me." It's still early, and I plan to keep iterating on it as people actually run it in their own projects. If you try it and something breaks or feels off, please tell me — that's the fastest way I'll make it better.
Top comments (0)