Amrutha Kollu

Posted on Jun 8

Why AI Keeps Generating the Wrong Design Tokens and How I Fixed It with Figma's API

#figma #ai #designsystem #claude

AI design system output is approximate by default. Wrong border radii, raw hex values, inconsistent tokens across 60 components. The fix isn't better prompts. Here's the structural change that made it exact using Figma's REST API.

The fourth time I manually corrected the same border radius mistake in an AI-generated component, I stopped and asked why this kept happening.

Not "what prompt would fix this?" The deeper question: why does every AI tool I tried get the structure right and the values wrong?

The button was correct. The variants were there. The layout matched the Figma spec. But borderRadius: 8 when it should be borderRadius: '8px'. A spacing gap of 8 when the spec said 6. The color #3B82F6 sitting in the file where semantic.button.primary should be.

None of it wrong in a way that breaks the build. All of it wrong in a way that breaks the design system.

After hitting this wall enough times, I realized the problem wasn't the AI. It was the question I was asking it.

Why AI keeps generating the wrong Figma design tokens

When you give an AI tool a Figma screenshot and ask it to produce a component, it does something reasonable: it interprets what it sees.

The structure, the layout, the hierarchy - it gets most of that right. What it cannot get right is the token mapping.

The AI doesn't know your semantic token file. It doesn't know that #3B82F6 maps to semantic.button.primary in your codebase. It doesn't know that your MUI setup multiplies numeric border radii by 4, which means borderRadius: 8 renders at 32px instead of 8px.

So it approximates. Here's what that looks like in practice:

What AI produces	What the spec requires	Why it's wrong
`borderRadius: 8`	`borderRadius: '8px'`	MUI multiplies numeric values by 4
`gap: 8`	`gap: 6`	Spacing value not extracted from Figma
`color: '#3B82F6'`	`semantic.button.primary`	Raw hex instead of semantic token
`fontSize: 14`	`variant="MD_Medium"`	Typography token not resolved

Across one component, these deviations are small. Across 60 components, they mean your design system exists in two versions: what the designer built and what the code implements.

This isn't a prompt engineering problem. A better prompt doesn't tell the AI your semantic token file. The problem is structural, the input is wrong.

How to fix AI design token generation: read Figma's API, not a screenshot

The insight that fixed this for me: design system components have two completely different kinds of decisions.

Deterministic decisions have exact correct answers already defined somewhere like the token for this fill, the typography variant for this size/weight combination, the exact spacing value. These are not judgment calls. They have right answers that live in your Figma file and your token file.

Judgment decisions require actual design thinking where which variant is the default, how the component behaves in edge cases. These genuinely benefit from AI reasoning.

The mistake I kept making was asking AI to handle both at once. Once I separated them, everything changed.

Instead of giving the AI a screenshot to interpret, I started reading Figma's REST API directly. The API returns exact values, fills as precise hex codes, typography as specific size/weight/line-height combinations, spacing as pixel measurements. No interpretation. Exact data.

Here's what the fixed pipeline looks like:

# Step 1: Read exact values from Figma REST API (not a screenshot)
figment scan --node 87YQbb7f33GYUHSOogYGjH:397:23320

# Output: token patch with classified fills
✓  semantic.button.primary  #3B82F6  reachable
✓  semantic.surface.pressed  #1E3A5F  reachable
⚠  spacing.gap  8px  → resolves to tokens.space.2

# Step 2: Deterministic resolvers run before AI sees anything
# Typography: 14px/500 → MD_Medium
# Corner radius: 8 → '8px' (MUI string literal)
# Gap: 8px → tokens.space.2

# Step 3: AI generates from facts, not interpretations
figment generate --name Badge --node 87YQbb7f33GYUHSOogYGjH:397:23320

The prompt no longer says "generate a button component based on this design."

It says "generate a button component where the background is semantic.button.primary, the corner radius is '8px' as a string literal, the gap is tokens.space.2, and the typography variant is MD_Medium."

The AI received facts. It produced code from them. It never had to guess at a token name because I had already resolved every single one before the model saw anything.

The problem generation doesn't solve: design system drift in CI

Getting values correct at generation time is necessary. I learned it's not sufficient.

One month in, a developer renamed a token in a PR that looked completely unrelated. The rename was correct and it was a necessary cleanup. What nobody checked, including me, was which components used the old name. During the design review, the designer flagged that three buttons in production no longer matched the Figma spec. Not dramatically. Just slightly off.

That's the thing about design system drift. It's invisible until someone looks closely enough to notice.

The fix I landed on: a verification script that runs on every pull request. It fetches the live Figma data for each component, re-runs the same deterministic extractors I used at generation time, and compares the results against the current component source.

# Runs on every pull request automatically
npm run verify-figma -- --component Badge --node 87YQbb7f33GYUHSOogYGjH:397:23320

✓  Typography    variant="MD_Medium"         PASS
✓  Spacing       gap: tokens.space.2         PASS  
✓  Colors        no raw hex values           PASS
✓  Border-radius '8px' string literal        PASS
Exit code: 0 — no drift detected

If anything has drifted from the Figma spec, the script fails. The pull request doesn't merge.

The design system no longer depends on the memory of whoever is reviewing the PR. It depends on the Figma file, verified continuously on every merge.

What production-ready AI-generated components actually look like

When you put these two things together - deterministic pre-resolution and CI drift detection, the output is structurally different from what most AI tools produce.

Every generated component includes:

✅ Zero raw hex values — every color is a semantic token
✅ Correct border radii — string literals where MUI requires them
✅ A .figment.json spec file recording exact Figma values at generation time
✅ A spec-lock test suite running against the current source on every CI build
✅ An overrides file documenting every intentional deviation with written justification

This approach shipped more than 60 components with 3,077 tests in 35 business days against an original estimate of 120 engineer-days. The reason cleanup time dropped to near zero was the pre-resolution step. There was nothing to fix because the values had never been wrong.

Why the constraint-first pattern works for any AI code generation

AI output is approximate by default. Making it exact requires constraining what AI is allowed to decide.

I've come to think of this as a general principle, not just a design system trick. Any workflow where AI generates code that needs to be production-correct, not just production-close, benefits from the same structure.

Resolve the deterministic parts upstream. Delegate the judgment parts to the model. Scan the output for violations before writing any file. Verify against the source of truth on every pull request.

Most teams skip the constraints because they seem like overhead. Then they wonder why every AI-generated component needs a round of manual cleanup before it's usable.

That cleanup is the cost of asking AI to make decisions it was never designed to make well. Once I stopped asking AI those questions, it stopped giving me wrong answers.

By Amrutha Kollu, Software Engineer.

Part 1: How I Shipped 60 Design System Components in 5 Weeks Using Figma as the Single Source of Truth

Top comments (2)

Alexander • Jul 2

Feeding a vision model raw pixels and expecting semantic token mapping is a losing game since it just guesses the closest hex. I started using a Figma plugin called DS Sync for this exact reason, as its code gen reads the bound variable aliases directly from the node, outputting the proper token syntax rather than flattened values. Did you end up piping that REST API payload through Style Dictionary to enforce the specific MUI unit formatting?

Amrutha Kollu • Jul 2

Exactly right on the vision model problem. That's the core reason I went with REST API over screenshot approaches. On Style Dictionary, I didn't pipe through it directly. Instead I built a deterministic extraction step that resolves typography, spacing, and border radius values algorithmically before the AI sees anything. The token file acts as the lookup table, fills get classified as reachable, primitive-only, or new before generation starts. So the AI never has to guess a token value, it either finds an exact match or the generation gets flagged. DS Sync sounds interesting! Does it handle drift detection after the component is merged, or just at generation time?