AI design system output is approximate by default. Wrong border radii, raw hex values, inconsistent tokens across 60 components. The fix isn't better prompts. Here's the structural change that made it exact using Figma's REST API.
The fourth time I manually corrected the same border radius mistake in an AI-generated component, I stopped and asked why this kept happening.
Not "what prompt would fix this?" The deeper question: why does every AI tool I tried get the structure right and the values wrong?
The button was correct. The variants were there. The layout matched the Figma spec. But borderRadius: 8 when it should be borderRadius: '8px'. A spacing gap of 8 when the spec said 6. The color #3B82F6 sitting in the file where semantic.button.primary should be.
None of it wrong in a way that breaks the build. All of it wrong in a way that breaks the design system.
After hitting this wall enough times, I realized the problem wasn't the AI. It was the question I was asking it.
Why AI keeps generating the wrong Figma design tokens
When you give an AI tool a Figma screenshot and ask it to produce a component, it does something reasonable: it interprets what it sees.
The structure, the layout, the hierarchy - it gets most of that right. What it cannot get right is the token mapping.
The AI doesn't know your semantic token file. It doesn't know that #3B82F6 maps to semantic.button.primary in your codebase. It doesn't know that your MUI setup multiplies numeric border radii by 4, which means borderRadius: 8 renders at 32px instead of 8px.
So it approximates. Here's what that looks like in practice:
| What AI produces | What the spec requires | Why it's wrong |
|---|---|---|
borderRadius: 8 |
borderRadius: '8px' |
MUI multiplies numeric values by 4 |
gap: 8 |
gap: 6 |
Spacing value not extracted from Figma |
color: '#3B82F6' |
semantic.button.primary |
Raw hex instead of semantic token |
fontSize: 14 |
variant="MD_Medium" |
Typography token not resolved |
Across one component, these deviations are small. Across 60 components, they mean your design system exists in two versions: what the designer built and what the code implements.
This isn't a prompt engineering problem. A better prompt doesn't tell the AI your semantic token file. The problem is structural, the input is wrong.
How to fix AI design token generation: read Figma's API, not a screenshot
The insight that fixed this for me: design system components have two completely different kinds of decisions.
Deterministic decisions have exact correct answers already defined somewhere like the token for this fill, the typography variant for this size/weight combination, the exact spacing value. These are not judgment calls. They have right answers that live in your Figma file and your token file.
Judgment decisions require actual design thinking where which variant is the default, how the component behaves in edge cases. These genuinely benefit from AI reasoning.
The mistake I kept making was asking AI to handle both at once. Once I separated them, everything changed.
Instead of giving the AI a screenshot to interpret, I started reading Figma's REST API directly. The API returns exact values, fills as precise hex codes, typography as specific size/weight/line-height combinations, spacing as pixel measurements. No interpretation. Exact data.
Here's what the fixed pipeline looks like:
# Step 1: Read exact values from Figma REST API (not a screenshot)
figment scan --node 87YQbb7f33GYUHSOogYGjH:397:23320
# Output: token patch with classified fills
✓ semantic.button.primary #3B82F6 reachable
✓ semantic.surface.pressed #1E3A5F reachable
⚠ spacing.gap 8px → resolves to tokens.space.2
# Step 2: Deterministic resolvers run before AI sees anything
# Typography: 14px/500 → MD_Medium
# Corner radius: 8 → '8px' (MUI string literal)
# Gap: 8px → tokens.space.2
# Step 3: AI generates from facts, not interpretations
figment generate --name Badge --node 87YQbb7f33GYUHSOogYGjH:397:23320
The prompt no longer says "generate a button component based on this design."
It says "generate a button component where the background is semantic.button.primary, the corner radius is '8px' as a string literal, the gap is tokens.space.2, and the typography variant is MD_Medium."
The AI received facts. It produced code from them. It never had to guess at a token name because I had already resolved every single one before the model saw anything.
The problem generation doesn't solve: design system drift in CI
Getting values correct at generation time is necessary. I learned it's not sufficient.
One month in, a developer renamed a token in a PR that looked completely unrelated. The rename was correct and it was a necessary cleanup. What nobody checked, including me, was which components used the old name. During the design review, the designer flagged that three buttons in production no longer matched the Figma spec. Not dramatically. Just slightly off.
That's the thing about design system drift. It's invisible until someone looks closely enough to notice.
The fix I landed on: a verification script that runs on every pull request. It fetches the live Figma data for each component, re-runs the same deterministic extractors I used at generation time, and compares the results against the current component source.
# Runs on every pull request automatically
npm run verify-figma -- --component Badge --node 87YQbb7f33GYUHSOogYGjH:397:23320
✓ Typography variant="MD_Medium" PASS
✓ Spacing gap: tokens.space.2 PASS
✓ Colors no raw hex values PASS
✓ Border-radius '8px' string literal PASS
Exit code: 0 — no drift detected
If anything has drifted from the Figma spec, the script fails. The pull request doesn't merge.
The design system no longer depends on the memory of whoever is reviewing the PR. It depends on the Figma file, verified continuously on every merge.
What production-ready AI-generated components actually look like
When you put these two things together - deterministic pre-resolution and CI drift detection, the output is structurally different from what most AI tools produce.
Every generated component includes:
- ✅ Zero raw hex values — every color is a semantic token
- ✅ Correct border radii — string literals where MUI requires them
- ✅ A
.figment.jsonspec file recording exact Figma values at generation time - ✅ A spec-lock test suite running against the current source on every CI build
- ✅ An overrides file documenting every intentional deviation with written justification
This approach shipped more than 60 components with 3,077 tests in 35 business days against an original estimate of 120 engineer-days. The reason cleanup time dropped to near zero was the pre-resolution step. There was nothing to fix because the values had never been wrong.
Why the constraint-first pattern works for any AI code generation
AI output is approximate by default. Making it exact requires constraining what AI is allowed to decide.
I've come to think of this as a general principle, not just a design system trick. Any workflow where AI generates code that needs to be production-correct, not just production-close, benefits from the same structure.
Resolve the deterministic parts upstream. Delegate the judgment parts to the model. Scan the output for violations before writing any file. Verify against the source of truth on every pull request.
Most teams skip the constraints because they seem like overhead. Then they wonder why every AI-generated component needs a round of manual cleanup before it's usable.
That cleanup is the cost of asking AI to make decisions it was never designed to make well. Once I stopped asking AI those questions, it stopped giving me wrong answers.
By Amrutha Kollu, Software Engineer.
Part 1: How I Shipped 60 Design System Components in 5 Weeks Using Figma as the Single Source of Truth
Top comments (0)