The feature was done. Everything worked. I opened the browser, looked at the page, and thought: it's functional, but it's just... off.
So I did what felt natural — I asked AI to help fix the styling.
That's when things got weird.
When I was building features, AI understood me perfectly. "Add a loading state to this button." Done. "Validate the form before submission." Done. The collaboration had a rhythm to it.
But the moment I switched to UI feedback, the rhythm broke.
"The interaction doesn't feel user-friendly — users can't find the main action." AI responded: "Got it, I'll optimize that for you." It changed some things. Not the right things. I tried again: "The overall flow feels rough, like users will get stuck here." Another round of changes. Still not right.
And somewhere in the back of my mind: what if it accidentally breaks the feature while adjusting the styles?
I spent weeks on this. 20+ revision rounds. The UI kept feeling generic, like it came from a template. I kept assuming I just needed to get better at writing prompts.
I was wrong about that.
First, let's actually define "looks good"
Before diagnosing the problem, I had to stop and ask: what do I even mean when I say "looks better"?
Turns out "make it look better" isn't one instruction — it's pointing at one of at least three distinct layers:
Perception layer — What does the user notice first? Where does their eye land? This is contrast, visual weight, color hierarchy, spatial density. "Users can't find the main action" is usually a perception problem: the primary action isn't visually differentiated from secondary elements.
Interaction layer — Does the user get feedback when they act? Do steps feel connected? This is feedback states, transitions, confirmation moments. "The flow feels rough" is usually an interaction problem: clicks feel unresponsive, state changes feel abrupt, next steps are unclear.
Consistency layer — Does the whole interface feel like it speaks one language? Unified corner radius, coherent color logic, predictable spacing rhythm. When this layer breaks, users don't say "it's inconsistent." They say "it feels cheap" or "something feels off" — without being able to name it.
These layers stack. The perception layer is the foundation. The consistency layer is the roof. When I told AI "the overall feeling isn't good," it had no idea which layer I was pointing at — so it guessed, and usually guessed wrong.
Getting specific with prompts helped. Until it didn't.
The natural response to "AI doesn't understand vague instructions" is: be more specific.
Instead of "make it look better," I'd say "increase the contrast between the primary button and the background to meet WCAG AA." Instead of "make the flow smoother," I'd say "add a 200ms ease-out transition between these two states."
This helped. Noticeably. But it introduced a new problem I hadn't anticipated.
Design decisions operate at different scopes:
- Global decisions: What's the primary brand color? What's the type scale? What corner radius do we use throughout?
-
Component decisions: Should this button be
primaryorsecondaryvariant? - Instance decisions: Should this specific button say "Submit" or "Save changes"?
Writing detailed prompts solves instance-level problems well. But global decisions have to be restated every single conversation.
Here's what actually happened across three different pages:
- Page A: "use blue for the button"
- Page B: "use the accent color here, make it blue"
- Page C: "this should use the brand primary color"
AI executed all three instructions correctly. I ended up with #2563EB, #3B82F6, and #60A5FA — three different blues, each reasonable in isolation, fighting each other on the page.
AI didn't misunderstand me. The problem was that I had never defined what "blue" meant in this product. I just picked a blue when I needed one and moved on.
The real problem: three layers deeper
I kept thinking "AI has bad memory." But thinking about it more carefully — even if it remembered, what would it be remembering?
"The blue you used last time."
But that blue was an improvised decision. I hadn't actually decided what blue meant for this product. I'd just picked one in the moment.
What AI can't remember isn't just your instructions. It's a decision you never made.
This led me to three realizations:
1. AI's design state is stateless. This is great for code — same input, same output, predictable, debuggable. But design requires continuity. This button's corner radius should follow the same logic as that card's corner radius. The color here should be consistent with the product's visual hierarchy. Starting every conversation from zero means AI reconstructs its understanding of "this product's design language" each time. No accumulated context, no shared reference point.
2. Without constraints, AI outputs the statistical mean. It's not making taste decisions — it's doing probability estimation. "What output most commonly follows this kind of input?" Whatever dominated its training data, it gravitates toward. This is where the term "AI design slop" comes from: gradient backgrounds, rounded everything, emoji headers, inconsistent spacing. Not bad exactly, just... characterless. Like it came from a template, because in a sense it did. UXPin's research put it directly: without structured design data as a reference, AI has no choice but to guess.
3. The deepest layer: your design judgments have never been externalized. You have a real aesthetic model for your product in your head. "This blue should feel technical but not cold." "The spacing should feel open but not loose." "The headings should have weight without feeling oppressive." These judgments are real — they developed as you built the product.
But where are they? Nowhere written down. They exist only inside you.
This is the actual root: it's not just that AI doesn't know. You've never articulated it clearly yourself. Every time you ask AI to adjust the UI, you're simultaneously describing what you want and improvising a global design decision — without realizing you're doing the second thing.
You don't need better prompts. You need an external memory for your design decisions.
What the community is already doing
People have been working on this problem. A few approaches have emerged:
Style guide files — Drop a .cursorrules, CLAUDE.md, or AGENTS.md in the project root with your color rules, type scale, component behavior. AI reads it before generating code. One widely-cited post put it directly: "Spend 30 minutes on a style guide before writing any code and save hours fixing inconsistencies later."
Figma MCP — Figma's official MCP Server lets AI coding tools read your Figma file directly: variables, component structure, token definitions, auto-layout rules — all passed as context. For the first time, the design-to-code pipeline has a standardized interface.
Code Connect — Maps design components to code components. Instead of generating new components from scratch, AI reuses what already exists. Figma calls it "the #1 approach for consistent component reuse in code."
These three approaches look different, but they're doing the same thing: moving design decisions from wherever they live vaguely to somewhere AI can read precisely.
Style guide moves them in prose. Figma MCP moves them in variables. Code Connect moves them in component mappings.
The transport method differs. The cargo is the same — your design judgments about this product.
But here's the problem: what if the judgments themselves are still fuzzy?
A style guide that says "the blue should feel technical" gives AI another thing to interpret. Figma variables named color-1, color-2, color-3 — MCP reads them as noise. Code Connect components with different colors on each one — the mapping doesn't help.
These approaches solve "how to transmit." They don't solve "what to transmit."
Design tokens are the missing layer
This is where design tokens actually belong — not competing with those three approaches, but as the foundation all of them depend on.
Tokens aren't "storing colors as CSS variables." That's syntax. Tokens store decisions — with names, semantics, and usage context:
/* Raw value — AI knows the color, nothing else */
#3B82F6
/* Prose description — AI still has to interpret */
/* "blue, techy feel" */
/* Design token — color + intent + where it belongs */
--color-brand-primary: #3B82F6;
/* Brand primary: use for main actions and visual emphasis */
Same value. Completely different information.
With a token system, the style guide isn't prose anymore — it's executable rules. Figma MCP doesn't read arbitrarily named variables, it reads a complete index of design decisions. Code Connect components have semantic backing, so reuse actually produces consistency.
What this looked like in practice
My type scale used clamp() for responsive sizing, with six levels stored as CSS custom properties:
--text-h1: clamp(2.5rem, 5vw, 4rem);
--text-h2: clamp(1.875rem, 3.5vw, 2.5rem);
--text-body: clamp(0.9375rem, 1.2vw, 1rem);
--text-overline: 0.6875rem;
Before the token system, I asked AI to "adjust the heading rhythm on this page." It wrote font-size: 3.2rem — a hardcoded number. Looked reasonable at 1440px. Crowded at 768px. Overflowing at 375px.
Not because AI doesn't understand responsive design. Because it didn't know the project had already defined what "heading" means. It had to estimate.
After the token system, the prompt became: "use --text-h2 for this subtitle, check if it's already applied." The output was predictable: find the component, verify the token, replace if missing. No magic numbers, no overriding the system.
Color worked the same way. Without tokens, AI would write #3B82F6 or rgba(59,130,246,0.8) — each slightly different opacity, scattered across a dozen files. Changing the brand color meant hunting through the whole codebase.
With --color-brand-primary, AI knows the semantic, knows where it belongs, and changes propagate from one place.
Tokens enable reasoning about components
Once color tokens exist, component variants stop being arbitrary choices. They become derived:
The primary button consumes --color-brand-primary. The ghost button consumes the same token at low opacity. The chip background comes from --color-surface-subtle. These appearances aren't guessed — they're derived from the token system.
When I needed to add a new chip component, AI didn't need to pick colors. It needed to know which semantic layer the chip lived in, which token that implied, and the form followed. Not a guess. A derivation.
That's the real gift a design system gives AI: not just constraints, but a language it can reason with.
The bigger realization
After finishing this project, I started thinking about "AI can't improve design" differently.
I used to think it was a prompt problem. But even a perfectly specific prompt only describes what you want once. Next conversation, AI starts over.
The actual problem was that my design decisions had never been systematized.
Without AI, this is invisible. Design decisions live in individual adjustments, held together by human intuition, maintained by experience. Looks fine from the outside. When AI enters the loop, it amplifies whatever looseness already exists in the system — every underdefined place becomes a place where AI fills in a guess.
"AI design slop" isn't AI's fault. Your design system was always like that. AI just made it visible.
Building a token system, on the surface, is about "getting AI to generate more consistent UI." But underneath, it's doing something else: turning "good-looking" from a personal aesthetic sense into a set of rules that can actually be executed. Colors get names. Type sizes get names. Every design decision gets a reference point that can be cited, verified, and passed along.
Worth doing with or without AI. AI just makes it feel more urgent.
Figma's MCP Server, Code Connect — the whole industry is moving in the same direction: making design decisions machine-readable. Not because of AI, but because that's what it means for design to be properly defined.
I built this token system after the fact — after the chaos, after the 20+ rounds. It would have been faster to start there.
What's your experience been with AI and design consistency? Have you found other approaches that help AI stay within a system rather than drifting toward the generic?

Top comments (0)