My Figma frame and my code had a fight. Gemma 4 refereed

#devchallenge #gemmachallenge #gemma

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

Figma vs build is a Next.js app that compares a Figma frame export to a screenshot of the shipped UI. You upload both images, hit Run check, and get a structured list of mismatches (spacing, typography, color, radius and shadows, alignment, missing or extra bits, copy, icons) with low / medium / high severity and filters so you can skim fast.

Problem it tries to help with: “Looks off compared to Figma” bugs usually turn into eyeballing two pictures in Slack. This gives a first pass machine read so you have concrete bullets to triage, not a final sign off.

How Gemma fits in the stack: the browser only sends multipart images to POST /api/analyze. That route (Node runtime) calls OpenRouter with a multimodal chat payload, then zod validates the JSON before anything renders. Images go as data URLs inside image_url parts so there is no separate file host. A tiny parser handles models that wrap JSON in markdown fences or chatter around the object.

Honest limits: bad zoom, crop, or light vs dark mismatches cause false positives. There is a ~6MB per image cap in the handler, and serverless hosts can choke on huge bodies, so heavy @2x PNGs may need a resize before upload.

Demo

Video walkthrough

Images used

Deployed app: Vercel 🚀

Code

Repository: gemma4-figma-vs-ui

Quick run

git clone https://github.com/chilupa/gemma4-figma-vs-ui
cp .env.example .env.local

Add OPENROUTER_API_KEY from OpenRouter, then:

npm install
npm run dev

How I Used Gemma 4

What powers the product: every comparison is a single multimodal completion. The model sees two images in fixed order (reference first, implementation second) plus a system prompt that forces one JSON object with a mismatches array. No pixel diff engine does the judging. If parsing or schema validation fails, the API returns an error and a short raw text preview for debugging.

Which model I chose: google/gemma-4-31b-it on OpenRouter, i.e. the 31B Dense multimodal line the challenge calls out.

Why 31B Dense and not E2B / E4B for this build: E2B and E4B are the right story when you care about ultra mobile, edge, browser, or on device speed and cost. I want fewer missed diffs on subtle layout and hierarchy when both inputs are full screenshots, so I traded latency and API cost for stronger vision and judgment in one shot. If I rewrote the product for offline or Pixel class usage, I would move the default to a small Gemma 4 and probably simplify prompts or accept noisier output.

MoE note (not in the template checklist but real on OpenRouter): google/gemma-4-26b-a4b-it is there if I later optimize for throughput or cost with a different quality bar. Same env var swap (OPENROUTER_MODEL), no code fork required.

Bottom line: for pairwise UI screenshot QA, 31B Dense was the intentional fit. Small sizes and MoE stay valid alternatives for other deployment and economics stories.

DEV Community

My Figma frame and my code had a fight. Gemma 4 refereed

What I Built

Demo

Code

How I Used Gemma 4

Top comments (0)