<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hyohyeok Jeong</title>
    <description>The latest articles on DEV Community by Hyohyeok Jeong (@happinessee).</description>
    <link>https://dev.to/happinessee</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3881111%2Fa93bf08f-fc77-449f-b90e-e9ba5c2a5f0f.png</url>
      <title>DEV Community: Hyohyeok Jeong</title>
      <link>https://dev.to/happinessee</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/happinessee"/>
    <language>en</language>
    <item>
      <title>From 4–6 Revisions to 0–2: Adding a TDD Loop to AI-driven Flutter UI</title>
      <dc:creator>Hyohyeok Jeong</dc:creator>
      <pubDate>Sat, 18 Apr 2026 20:35:04 +0000</pubDate>
      <link>https://dev.to/happinessee/from-4-6-revisions-to-0-2-adding-a-tdd-loop-to-ai-driven-flutter-ui-29lb</link>
      <guid>https://dev.to/happinessee/from-4-6-revisions-to-0-2-adding-a-tdd-loop-to-ai-driven-flutter-ui-29lb</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;After wiring a &lt;strong&gt;golden tests + Figma screenshot diff&lt;/strong&gt; loop into my UI workflow, my average number of manual correction rounds dropped from 4–6 to 0–2. Here is how I got there.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The problem I wanted to solve
&lt;/h2&gt;

&lt;p&gt;The &lt;a href="https://velog.io/@1984/%EB%82%98%EC%98%A8-%EA%B2%B0%EA%B3%BC%EB%AC%BCdev-cycle-SKILL.md" rel="noopener noreferrer"&gt;dev-cycle skill&lt;/a&gt; (Korean), which had been working great for backend and infra work, did not carry over well to the frontend. With tests, lints, and &lt;code&gt;CLAUDE.md&lt;/code&gt; in place, I could reliably get &lt;strong&gt;well-formed code that followed project patterns and passed behavior checks&lt;/strong&gt; — but the AI was not actually building the UI to match what Figma MCP was handing it.&lt;/p&gt;

&lt;p&gt;On top of that, I was running tasks in parallel across git worktrees. Launching each worktree as a running app, eyeballing it, and iterating by hand had become the obvious bottleneck.&lt;/p&gt;

&lt;p&gt;At some point I caught myself wondering: should I just hand the AI business logic and API integration, and keep UI manual? But I wanted to preserve the productivity gains I had come to rely on, so I sat down and asked what concretely had to change.&lt;/p&gt;

&lt;p&gt;Boiling it down, I had two problems:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I want the implementation to match the design file as closely as possible.&lt;/li&gt;
&lt;li&gt;I want an easy way to verify and iterate on the implemented design.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Why it was failing: Figma MCP mapping limits + fuzzy "done" criteria
&lt;/h2&gt;

&lt;p&gt;For &lt;strong&gt;"1. match the design file closely,"&lt;/strong&gt; two root causes stood out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Figma MCP exports design context as React + Tailwind CSS. Mapping that into Flutter/Dart + Material is inherently lossy, and it is not something I can fix at the user level. Better to compensate elsewhere.&lt;/li&gt;
&lt;li&gt;For well-formed code I have a concrete spec — tests and lints. Pass them and you are done. &lt;strong&gt;For UI, there is no equivalent.&lt;/strong&gt; The AI has no objective "when is this done?" signal. I suspected this was fixable, though no obvious approach presented itself at first.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For &lt;strong&gt;"2. easy verification and iteration,"&lt;/strong&gt; I figured I could automate my way out:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Inspect simple components with the &lt;code&gt;widget_book&lt;/code&gt; package&lt;/li&gt;
&lt;li&gt;Use golden tests to capture composed, logic-bound UI as images&lt;/li&gt;
&lt;li&gt;Verify actual behavior before CI review&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The fix: wrap the UI in a verification loop
&lt;/h2&gt;

&lt;p&gt;The core idea is simple: &lt;strong&gt;give the AI a concrete, TDD-style spec for the UI too.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Once I decided to lean on golden tests, the next question was how to define "passing." I fed the AI both the Figma screenshot (via Figma MCP) and the golden screenshot of the current implementation, and had it build its own diff checklist, review the gap, and patch it.&lt;/p&gt;

&lt;p&gt;My first instinct was a pixel diff for a quantitative score. But rendering differences between Figma (web) and Flutter (app) are unavoidable, and chasing them can easily wreck the code — the AI itself pointed that out — so I kept the evaluation &lt;strong&gt;qualitative&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Original design&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqrhwdrd2kt7wl0ng7wq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmqrhwdrd2kt7wl0ng7wq.png" alt="Figma source — customer registration screen" width="800" height="869"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kdvmsl5oamefq4ffu41.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7kdvmsl5oamefq4ffu41.png" alt="Design comparison — customer registration screen" width="800" height="455"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Baseline approach&lt;/strong&gt; (plan → build). Colors, fonts, and spacing drifted from the design. Components were not reused. Initial output was fast, but I averaged 4–6 correction rounds.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Baseline&lt;/th&gt;
&lt;th&gt;golden test + Figma MCP screenshot diff&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Initial plan/build tokens&lt;/td&gt;
&lt;td&gt;~160k tokens + (revision cost × 5)&lt;/td&gt;
&lt;td&gt;~250k tokens + (revision cost × 1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Avg. manual correction rounds&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;4–6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0–2&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Visual fidelity (color/font/spacing)&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Component reuse&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;High*&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;* Component reuse rules already live in &lt;code&gt;CLAUDE.md&lt;/code&gt;, but the extra review pass in the new skill seems to enforce them a second time, and the effect compounds.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How the comparison actually works
&lt;/h2&gt;

&lt;p&gt;The reason this produces a measurable delta is that the AI is now handed a &lt;strong&gt;concrete, spec-style rubric for the UI&lt;/strong&gt;. When a UI task completes, it builds two checklists on its own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;① Spec-level diff — Figma vs rendered golden&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each component attribute (typography, spacing, borders, colors, shadow, etc.) is compared &lt;strong&gt;one item at a time&lt;/strong&gt; between the Figma source and the rendered golden, and each is marked as &lt;code&gt;✅ match / ⚠️ approximation / ❌ mismatch&lt;/code&gt;. A simplified excerpt looks like this (the real checklist has 18+ rows):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Attribute&lt;/th&gt;
&lt;th&gt;Figma&lt;/th&gt;
&lt;th&gt;Rendered (golden)&lt;/th&gt;
&lt;th&gt;Verdict&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Label typography&lt;/td&gt;
&lt;td&gt;Pretendard Medium 16, #1A1A1A&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;LabelLarge(16/500)&lt;/code&gt;, &lt;code&gt;onSurface&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;✅ matches&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Focused border&lt;/td&gt;
&lt;td&gt;2px primary (#597D2E)&lt;/td&gt;
&lt;td&gt;2px &lt;code&gt;AppColors.primary&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;✅ exact match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Field radius&lt;/td&gt;
&lt;td&gt;8px&lt;/td&gt;
&lt;td&gt;8px (&lt;code&gt;fieldRadius&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;✅ exact match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Disabled fill&lt;/td&gt;
&lt;td&gt;#F5F5F5 (neutral10)&lt;/td&gt;
&lt;td&gt;&lt;code&gt;AppColors.surfaceContainerLow&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;⚠️ approximation (token-semantic mapping)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77jt30mreu56e32y35bn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F77jt30mreu56e32y35bn.png" alt="AI review 2b — attribute-level diff between the Figma source and the rendered golden" width="800" height="371"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;② Intent check — golden variants vs design intent&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Each golden variant per state (&lt;code&gt;default&lt;/code&gt;, &lt;code&gt;focused&lt;/code&gt;, &lt;code&gt;disabled&lt;/code&gt;, &lt;code&gt;no_label_enabled&lt;/code&gt;, &lt;code&gt;no_label_focused&lt;/code&gt;, …) is checked against &lt;strong&gt;design intent&lt;/strong&gt; at the state level. Here the bar is not pixel parity but "does this state convey what the design meant?"&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fld7sdy8h8d269nr1k6pj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fld7sdy8h8d269nr1k6pj.png" alt="AI review 2a — checking whether the 5 golden variants reflect the design intent of each state" width="800" height="148"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AI runs both passes, patches the gaps it finds, and then hands the result to me for approval. These checklists are triggered automatically at the &lt;strong&gt;UI Review Gate&lt;/strong&gt; stage of the broader workflow.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe20t88rf2hp0vih6ymud.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe20t88rf2hp0vih6ymud.png" alt="Overall workflow progress — currently at " width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The flow, end to end
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Create a worktree&lt;/li&gt;
&lt;li&gt;Pull metadata and screenshots through Figma MCP&lt;/li&gt;
&lt;li&gt;Write widget tests (TDD RED)&lt;/li&gt;
&lt;li&gt;Implement until widget tests pass (TDD GREEN)&lt;/li&gt;
&lt;li&gt;Add golden tests&lt;/li&gt;
&lt;li&gt;Compare Figma screenshots against golden screenshots, then patch&lt;/li&gt;
&lt;li&gt;Human review&lt;/li&gt;
&lt;li&gt;Simplify the code&lt;/li&gt;
&lt;li&gt;Iterate review + patch&lt;/li&gt;
&lt;li&gt;Done&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Wrap-up
&lt;/h2&gt;

&lt;p&gt;The sample size is small, but in practice this has been working well for me. What I felt most is that &lt;strong&gt;the wait time inside the correction loop itself shrank, and the number of times I had to step in dropped.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;A few common-sense rules that pair well with it:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Break tasks into smaller chunks than you normally would&lt;/li&gt;
&lt;li&gt;Build component-first, then compose&lt;/li&gt;
&lt;li&gt;Lean on &lt;code&gt;widget_book&lt;/code&gt; — a component-catalog tool, analogous to Storybook in the web ecosystem&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you regularly build UI from design files, this approach is worth trying on your next task. The full setup is open-sourced here: &lt;a href="https://github.com/happinessee/flutter-golden-cycle" rel="noopener noreferrer"&gt;flutter-golden-cycle&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;References&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flutter &amp;amp; Figma MCP (live session) — &lt;a href="https://www.youtube.com/live/d7qrvytOxSA" rel="noopener noreferrer"&gt;https://www.youtube.com/live/d7qrvytOxSA&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Figma implement skill — &lt;a href="https://github.com/openai/skills/blob/main/skills/.curated/figma-implement-design/SKILL.md" rel="noopener noreferrer"&gt;figma-implement-design&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;This post was originally written in Korean by a human. The English translation was produced by an AI model.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>flutter</category>
      <category>ai</category>
      <category>tdd</category>
      <category>webdev</category>
    </item>
  </channel>
</rss>
