DEV Community

Claude Opus vs Kombai in 3 Real-World Frontend AI Tests 🚀

Shrijal Acharya on June 02, 2026

Frontend automation has been getting pretty wild lately. 🫠 A few months ago, this comparison would have been much easier to frame. On one side, y...

Read full post

Shrijal Acharya • Jun 2

One of the very few tools I’ve actually stuck with. I’ve been using it for more than a year now.
Such a bliss for frontend engineers, and even for someone like me who rarely touches frontend.

All in all, it’s my go-to for frontend. ✅

Andrii Krugliak • Jun 3

Never-trust-always-verify covers who's calling, but agents broke it on a different axis for me. The call is authorized and still wrong. An agent with valid creds that confidently does the wrong thing passes every auth check, so I ended up gating on the output being worth paying for, not on the identity making the request.

Shrijal Acharya • Jun 3

Totally. Valid creds don’t mean much if the agent still builds the wrong thing. The result matters more.

Echo • Jun 2

The 'frontend AI agents used to be much easier to frame' line is the whole story in one sentence. Kombai going from Figma-to-code to design-to-iterate-to-ship is a category shift, not a feature add. Tests like this are how I decide which one stays in my toolchain.

Shrijal Acharya • Jun 3

Exactly, that’s what stood out to me too. The bigger shift isn’t just “better Figma to code,” it’s moving closer to an actual frontend workflow. That’s why I wanted the tests to be closer to real product work.

Shekhar Rajput • Jun 2

Is this tested on Opus API usage? And what was the criteria to pick the opensourced projects.

Shrijal Acharya • Jun 2

No such criteria. I just picked those randomly from github explore.

Nabin Bhardwaj • Jun 2

Whats up with "design engineer" tag with Kombai. Why Opus 4.6 though?

Shrijal Acharya • Jun 2

With the recent release of Design Mode, Kombai 2.0 is now tagged as an AI design engineer.

The reason I used Opus 4.6 is that I had planned this blog a few months ago and had already run the test, but somehow forgot to share it publicly.

You can also try the same test with the newer 4.8 or the newer models from OpenAI. :)

Mudassir Khan • Jun 8

the "does it preserve functionality or just match the visual" test is the right frame. we have burned time with general purpose agents on component rewrites that looked correct in isolation but quietly broke state bindings 2 layers up.

the gap most comparisons miss: does the output hold under the actual interaction patterns the component was designed for, not just "does it render." that distinction changes the verdict.

how did you handle cases where the figma spec and existing interaction patterns contradicted — did either tool pick the right winner?