Peter Tamas

Posted on Apr 7

AI Field Notes #001 | Is AI frontend development finally getting good? Our Opus 4.6 test says yes. (And no.)

#ai #ui #webdev #frontend

In December 2025, I wrote about trying to build a full page with AI with a much smaller scope, and it didn’t go well.

At that time, my conclusion was that while implementing a simple, small UI component with AI and Figma MCP worked quite well, it was surprising how badly it handled the implementation of a full page. The small UI component generation wasn't perfect either. I could get a ~90% "close enough" output that I could quickly align to the requirements by hand. But when I asked AI to implement a simple login page that contained only already-existing components, even with Figma MCP, the result was disappointing. The layout was far from the design, and it hallucinated elements that weren't in the design at all. No matter how I prompted, it just produced different hallucinations. Which I really don't understand, because Figma MCP provides a structured description of the design. In the end, I spent much more time experimenting with AI than it would have taken to puzzle the components into their places by myself within a few minutes.

My current experience is still not flawless, but I'm amazed by the improvement in this area over the past 3 months. I managed to implement a whole complex page, with existing and new components, that I had estimated at 48 hours, in just 8 hours. Not in one iteration, not 100% AI-generated, not without refactoring and human code reviews, but the velocity is impressive.

Some Context on the Comparison

After having satisfying experiences with Opus 4.6 UI component implementation, I was eager to retry a full-page AI implementation experiment. When you don't have a strict specification, it's easy to vibe-code a fair-looking result, but it's hard to evaluate how well the output matches the client's needs. That's why I chose a project where we had clear requirements:

Figma designs that we need to implement
An OpenAPI specification of the backend API

These are strict, structural anchors that provide clear and easy verification of the result.

The state of the project when I ran my experiment:

The project was already "Claude-ready". It had a well-set-up, project-specific Claude.md that my colleagues had been using for months.
We already had an API client, but none of the endpoints that this page uses were defined in it yet.
The site layout, design system, and some of the UI components that the page needed were already in place, but the design also contained new, complex UI components.

Unfortunately, this was a client project we built at Bobcats Coding, so screenshots, product details, and the repository stay private. But I'm going to write about everything else.

The first iteration

The better you specify, the better outcome you can expect. This isn't a new directive; it was true before AI coding as well. But with agentic engineering, specification is the new code.

So I spent ~1 hour specifying the task as my initial prompt. I gave a general context about the page we were building, linked the design of the whole page in Figma and the components one by one. I gave a clear specification for each element of the page: which API endpoint it gets its data from, what it represents, how it should work. I specified all the page actions as well. What should happen when a button is clicked, when a dropdown element is selected, and so on. I also instructed Claude to generate every new UI component in a reusable way within our UI library, test them, and provide Storybook stories and documentation.

I asked AI to create an implementation plan that multiple agents could work on in parallel (because I was curious how this would work). I required a contract-first approach so that the results of the asynchronously working agents could be integrated at the end.

Opus 4.6 worked for 9 minutes to create the plan. It correctly found all the files in the project that it needed to modify and the workspaces and folders where it should create the new files.

It separated the work into 4 agents with clear responsibilities, tasks, and restrictions:

Agent 1: API Layer

Update the API schema
Generate API types from the schema
Create and export DTO types
Add mapping functions
Implement new API client methods based on the given interface
Add unit tests

Agent 2: UI Components

Implement the discovered new UI components (the plan listed their names, dependencies, and functional descriptions)
Add unit tests
Create Storybook stories
Export the components from the UI library

Agent 3: Page Composition

Replace the current placeholder component on the page (server component)
Call the proper API client methods for data
Feed the data to the created client component
Implement the layout and state management of the client component
Place the required UI components on the page
Implement the page actions

Agent 4: E2E Tests

Write the necessary BDD-style E2E tests for the page (the BDD features were also included in the plan for quick human verification)

It created an execution order as well:

Phase 1 (parallel): Agent 1 (API) + Agent 2 (UI Components)
Phase 2 (after Phase 1): Agent 3 (Page Composition)
Phase 3 (after Phase 2): Agent 4 (E2E Tests)

Agents 1 and 2 have zero dependencies and run fully in parallel.
Agent 3 depends on both but can start skeleton code immediately.
Agent 4 runs last as it needs rendered DOM.

The execution of the plan took 29 minutes and 25 seconds.

The result at first glance was a bit odd. It clearly contained all the required elements and showed the data correctly from the API, but the layout was broken, and the component designs were only ~80–90% faithful to the Figma designs. No hallucinations, though!

All in all, it was not great, not terrible for a first iteration.

Refining the design

I asked Claude to use Playwright MCP to verify its result: find the differences from the Figma design and fix them.

Using Playwright MCP as a feedback loop in frontend development works surprisingly well. Claude opens the page in a browser, takes screenshots, analyzes them, finds the problems, fixes them, verifies the fix with Playwright again, and iterates until it's solved.

However, my prompt was too vague, and the use of Figma MCP is still far from perfect, so the result was also disappointing. What worked much better was creating screenshots of both the UI implementation and the expected design, then describing the problems. Most of the design issues could be solved this way.

Creating a 100%, pixel-perfect design is still not something an LLM is capable of.

You need to recognize the point when the agent gets stuck in a loop, when every iteration just makes the problem different, but you don't get any closer to the solution. That's the point when you need to take the keyboard and finish the job yourself.

In the case of pixel-perfect design implementation, in my experience with current models and tools, you can usually reach a ~90–95% state with AI.

Refining the code

Opus 4.6 generates relatively decent code, but most of the time it needs some refactoring. In the case of this experiment, here's what I found during code review:

It didn't create components for all the UI elements it should have. This led to unnecessary duplication that would have been difficult to maintain.
It didn't always use the tokens from the design system or our SASS mixins (e.g., for typography).
I found some overcomplicated, mutating logic that could have been written more simply.
It didn't create some of the components as reusable as I expected.
It hardcoded some constants that shouldn't have been hardcoded.
It wasn't forward-thinking enough to extract functionality we could reuse later into a hook or utility function.
It used far more useMemo than necessary. But when I pointed these out to Claude, it could fix them much faster than I would have. With proper permissions, it can even read PR comments from GitHub, so you don't necessarily need to prompt these manually. You can just review on GitHub, then give a short fix my reviews on the PR instruction.

What didn’t work

Figma MCP still surprisingly underperforms compared to a simple screenshot.

The multi-agent implementation was fun to try, but resulted in a 3,000+ line PR, which is far from optimal. Next time, after I have the multi-agent implementation plan and the contracts (types, interfaces) in the code, I'd try to solve the task on separate branches using worktrees.

What worked

With all the refinements, I could ship the module in ~8 hours instead of the estimated 48 hours. Fully tested and documented. Two things stood out from the start:

1. Playwright MCP is a MUST in frontend development.
2. Creating the API schema mapping, types, and API client based on the given OpenAPI specification worked perfectly, even on the first iteration.

I don't think AI frontend development is solved, but for the first time, the velocity feels real. One thing's for sure: I'll keep testing, and I'll keep writing when something interesting happens.

DEV Community