This is a submission for the Gemma 4 Challenge: Build with Gemma 4
What I Built
Figma vs build is a Next.js app that compares a Figma frame export to a screenshot of the shipped UI. You upload both images, hit Run check, and get a structured list of mismatches (spacing, typography, color, radius and shadows, alignment, missing or extra bits, copy, icons) with low / medium / high severity and filters so you can skim fast.
Problem it tries to help with: “Looks off compared to Figma” bugs usually turn into eyeballing two pictures in Slack. This gives a first pass machine read so you have concrete bullets to triage, not a final sign off.
How Gemma fits in the stack: the browser only sends multipart images to POST /api/analyze. That route (Node runtime) calls OpenRouter with a multimodal chat payload, then zod validates the JSON before anything renders. Images go as data URLs inside image_url parts so there is no separate file host. A tiny parser handles models that wrap JSON in markdown fences or chatter around the object.
Honest limits: bad zoom, crop, or light vs dark mismatches cause false positives. There is a ~6MB per image cap in the handler, and serverless hosts can choke on huge bodies, so heavy @2x PNGs may need a resize before upload.
Demo
Video walkthrough
Images used
|
|
Deployed app: Vercel 🚀
Code
Repository: gemma4-figma-vs-ui
Quick run
git clone https://github.com/chilupa/gemma4-figma-vs-ui
cp .env.example .env.local
Add OPENROUTER_API_KEY from OpenRouter, then:
npm install
npm run dev
How I Used Gemma 4
What powers the product: every comparison is a single multimodal completion. The model sees two images in fixed order (reference first, implementation second) plus a system prompt that forces one JSON object with a mismatches array. No pixel diff engine does the judging. If parsing or schema validation fails, the API returns an error and a short raw text preview for debugging.
Which model I chose: google/gemma-4-31b-it on OpenRouter, i.e. the 31B Dense multimodal line the challenge calls out.
Why 31B Dense and not E2B / E4B for this build: E2B and E4B are the right story when you care about ultra mobile, edge, browser, or on device speed and cost. I want fewer missed diffs on subtle layout and hierarchy when both inputs are full screenshots, so I traded latency and API cost for stronger vision and judgment in one shot. If I rewrote the product for offline or Pixel class usage, I would move the default to a small Gemma 4 and probably simplify prompts or accept noisier output.
MoE note (not in the template checklist but real on OpenRouter): google/gemma-4-26b-a4b-it is there if I later optimize for throughput or cost with a different quality bar. Same env var swap (OPENROUTER_MODEL), no code fork required.
Bottom line: for pairwise UI screenshot QA, 31B Dense was the intentional fit. Small sizes and MoE stay valid alternatives for other deployment and economics stories.
Top comments (0)