LeeTaihe

Posted on Apr 11

How I Built a Design Review Tool for AI-Generated Frontends

#ai #frontend #opensource #webdev

How I Built a Design Review Tool for AI-Generated Frontends

If you've used Claude Code, Cursor, Codex, or other AI coding agents to generate front-end code, you probably know this feeling: the page works, but it still doesn't look good enough.

The layout is technically correct. The sections are there. The responsive behavior mostly works. But the visual result often feels cramped, flat, and generic — more like a tutorial project than a polished product.

I kept running into this problem in my own workflow. I looked around for tools that focused specifically on reviewing the visual quality of AI-generated frontends, but I didn't find much that matched what I wanted. So I built AetherPane.

GitHub: https://github.com/lihytaihe-lang/aetherpane

The Gap Between "Working" and "Polished"

AI coding agents are getting very good at structure.

They can generate layouts, components, responsive sections, and even reasonably clean code. In many cases, they can get a page from idea to implementation surprisingly quickly.

But visual quality is a different layer of the problem.

A polished interface depends on a lot of small decisions that are easy to underestimate:

visual hierarchy
spacing rhythm
surface depth
typography contrast

These details are exactly what make the difference between "this technically works" and "this feels production-ready."

And in my experience, this is also where AI-generated frontends still struggle most often.

That made me realize something: the missing piece wasn't just better prompting. It was a more systematic way to review design quality.

Why Prompting Alone Wasn't Enough

When an AI-generated UI looks mediocre, the default reaction is usually to keep prompting:

"make it more premium"
"improve the spacing"
"make it feel more polished"
"add more depth"

Sometimes that helps. Often it doesn't.

The problem is that these prompts are vague, and the feedback loop is weak. If the result improves, it is hard to tell exactly why. If it gets worse, there is no structured way to understand what regressed.

I wanted something more concrete than intuition and repeated prompting.

I wanted a way to look at an AI-generated page and ask:

Is the hierarchy clear?
Does the spacing breathe?
Do the surfaces have enough depth?
Does the typography create enough contrast?

That idea became AetherPane.

What AetherPane Is

AetherPane is a design review and scoring tool for AI-generated frontends.

It is not a component library, and it is not just a collection of style rules. The goal is to act more like a design intelligence layer: review what an AI agent produced, score it across a few design dimensions, and make the refinement step more structured.

Instead of only saying "make it prettier," I wanted a workflow that could say why a page feels weak and where the problems are likely to be.

The Four Design Dimensions

Right now, AetherPane evaluates UI across four dimensions:

Dimension	What It Measures
Visual Hierarchy	Whether important elements actually stand out
Breathing Space	Whether spacing feels open or cramped
Glass Quality	Whether surfaces have enough depth, layering, and refinement
Typography	Whether text has enough contrast, scale, and hierarchy

These dimensions are not meant to replace human design judgment. They are meant to make design review more explicit and easier to iterate on.

In practice, this gives me a much clearer way to inspect AI-generated frontend output than simply eyeballing the page and throwing another round of prompts at it.

A Minimal Workflow

The basic workflow is simple:

let an AI coding agent generate a page
run AetherPane against that page
inspect the score and review report
refine the design based on clearer feedback

A minimal example looks like this:

node skills/web-ui-polish/tools/cli.cjs critique your-page.html

That produces a score breakdown and a review report highlighting issues such as weak spacing, missing typography scaling, or flat surface treatment.

The point is not to automate taste completely. The point is to make the review loop more legible.

What I Found Valuable While Building It

The most useful shift for me was moving from vague aesthetic reactions to more structured critique.

Before, my loop looked like this:

generate UI with an agent
feel that it looked "off"
try more prompts
hope it got better

Now the loop is closer to this:

generate UI with an agent
review it across specific design dimensions
identify where it is weak
iterate with more targeted changes

That change sounds small, but in practice it makes refinement much less random.

I also found that this kind of tooling becomes more useful as AI coding gets better. The stronger agents become at generating functional code, the more obvious the remaining weakness becomes: visual quality.

Why I Think This Category Needs to Exist

I think we are heading toward a real need for AI design quality tooling.

Right now, a lot of attention goes to code generation speed, model benchmarks, and agent workflows. But for frontend work, there is a very real gap between:

the page exists
the page looks finished

That gap matters.

As more developers use AI agents to build interfaces, I think more people will run into the same frustration: the frontend is functional, but it still needs another pass before it feels good enough.

That is the category I wanted to explore with AetherPane.

Not a replacement for designers.
Not a magic button for instant taste.
Just a more structured way to evaluate and improve AI-generated UI.