Frontend automation has been getting pretty wild lately. π«
A few months ago, this comparison would have been much easier to frame.
On one side, y...
For further actions, you may consider blocking this person and/or reporting abuse
One of the very few tools Iβve actually stuck with. Iβve been using it for more than a year now.
Such a bliss for frontend engineers, and even for someone like me who rarely touches frontend.
All in all, itβs my go-to for frontend. β
Never-trust-always-verify covers who's calling, but agents broke it on a different axis for me. The call is authorized and still wrong. An agent with valid creds that confidently does the wrong thing passes every auth check, so I ended up gating on the output being worth paying for, not on the identity making the request.
Totally. Valid creds donβt mean much if the agent still builds the wrong thing. The result matters more.
The 'frontend AI agents used to be much easier to frame' line is the whole story in one sentence. Kombai going from Figma-to-code to design-to-iterate-to-ship is a category shift, not a feature add. Tests like this are how I decide which one stays in my toolchain.
Exactly, thatβs what stood out to me too. The bigger shift isnβt just βbetter Figma to code,β itβs moving closer to an actual frontend workflow. Thatβs why I wanted the tests to be closer to real product work.
Is this tested on Opus API usage? And what was the criteria to pick the opensourced projects.
No such criteria. I just picked those randomly from github explore.
Whats up with "design engineer" tag with Kombai. Why Opus 4.6 though?
With the recent release of Design Mode, Kombai 2.0 is now tagged as an AI design engineer.
The reason I used Opus 4.6 is that I had planned this blog a few months ago and had already run the test, but somehow forgot to share it publicly.
You can also try the same test with the newer 4.8 or the newer models from OpenAI. :)
the "does it preserve functionality or just match the visual" test is the right frame. we have burned time with general purpose agents on component rewrites that looked correct in isolation but quietly broke state bindings 2 layers up.
the gap most comparisons miss: does the output hold under the actual interaction patterns the component was designed for, not just "does it render." that distinction changes the verdict.
how did you handle cases where the figma spec and existing interaction patterns contradicted β did either tool pick the right winner?