Aditya P Dixit

Posted on Jun 30

I Think Browser Agents Are Built on the Wrong Abstraction (So I Built a Compiler)

#ai #programming #computerscience #api

Every browser agent starts the same way.

It downloads HTML.

Builds a DOM.

Searches for CSS selectors.

Finds buttons.

Waits for JavaScript.

Clicks something.

Reads more HTML.

Repeats.

We keep making LLMs dramatically smarter...

...yet we're still asking them to reason over one of the lowest-level representations on the web.

That felt wrong.

So I started asking a different question:

What if websites could be compiled into semantic interfaces instead of being rediscovered every time an AI agent visits them?

That question eventually became an open-source project called Shiny Fishstick.

Yes.

That's actually the name.

The Problem

Imagine asking an AI agent to buy a laptop.

Today, its internal reasoning looks something like this:

Find login button.

Click login.

Wait.

Find email field.

Fill email.

Find password field.

Fill password.

Click submit.

Wait for navigation.

Search for "Laptop".

Find Add to Cart.

Click Add to Cart.

Now imagine doing that...

Every.

Single.

Time.

The website hasn't changed.

The workflow hasn't changed.

Yet the agent keeps rediscovering it.

HTML Is A Great Human Interface

HTML was designed for browsers.

It tells browsers:

where text goes
what buttons exist
how pages should render

An AI agent doesn't care about any of that.

It doesn't care whether the button is blue.

It doesn't care whether the developer changed a

to a .

It cares about actions.

Those are semantic concepts.

HTML doesn't represent them very well.

Thinking Like A Compiler

Compiler design has an interesting idea.

You don't execute source code directly.

You first transform it into an Intermediate Representation (IR).

Everything else builds from there.

I wondered if websites could work the same way.

Instead of repeatedly reasoning over HTML...

Compile the website once.

Generate a reusable semantic representation.

Enter Preflight

The compiler produces a specification called preflight.yaml.

A simplified example:

version: 1.0.0

actions:

parameters:
  - email
  - password

add_to_cart:
action_type: api

api:
  method: POST
  url: /api/cart/add

Notice something.

There's no HTML.

No XPath.

No brittle selectors.

Only actions.

API Disco

One of my favorite modules is called API Disco.

(Yes, that's the real filename.)

While the crawler performs browser interactions, it watches every network request.

If it discovers that an action is actually backed by a reusable API...

The compiler upgrades that action automatically.

Instead of generating browser automation...

It generates an API-backed SDK method.

If no API exists?

No problem.

The generated SDK simply falls back to resilient browser execution.

The developer never has to think about the difference.

One Specification, Multiple Outputs

Once the compiler has generated preflight.yaml, everything else becomes code generation.

Today it produces:

Python SDKs
TypeScript SDKs
Rust SDKs
MCP Servers

Tomorrow it could just as easily generate:

Go SDKs
Java SDKs
C# SDKs

The compiler doesn't change.

Only the backend generator does.

Why I Think This Is Interesting

The thing I'm most excited about isn't actually the compiler.

It's the specification.

Imagine a future where multiple tools understand the same semantic website format.

Different compilers.

Different validators.

Different SDK generators.

Different execution engines.

All sharing the same representation.

That feels much more powerful than another browser automation library.

Benchmarks

I also wanted to avoid hand-wavy performance claims.

So the repository includes public benchmark methodology measuring:

Token reduction
Execution speed
Reliability
Memory usage
Self-healing capability
Developer implementation effort

The goal wasn't to "win benchmarks."

The goal was to make every claim reproducible.

Is This A Browser Automation Replacement?

No.

Some websites simply don't expose reusable APIs.

Some authentication flows require browser interaction.

Some workflows are inherently visual.

Browser automation isn't going away.

The idea is to separate:

What an agent wants to do

from

How that action gets executed

Sometimes that's an API.

Sometimes it's a browser.

The interface stays the same.

What's Next?

The roadmap currently includes:

More SDK targets (Go & Java)
Better API discovery
Plugin architecture
Visual regression support
Improved compatibility across modern web frameworks

But more importantly...

I'd like to hear what other developers think.

Am I solving the wrong problem?

Is there a better abstraction?

Could something like preflight.yaml actually become useful outside this project?

I'd genuinely love the discussion.

Links

🌐 Website

https://adityapdixit.me/shiny-fishstick/

⭐ GitHub

https://github.com/Hootsworth/shiny-fishstick

If nothing else, I hope the name made you curious enough to click.

Thanks for reading.

DEV Community

I Think Browser Agents Are Built on the Wrong Abstraction (So I Built a Compiler)

Top comments (0)