Why Teaching AI to Click Buttons Is a Broken Abstraction

#ai #machinelearning #automation #agents

We've all experienced it.

You automate a repetitive workflow, only for everything to break because someone moved a button or redesigned a page. Surprisingly, many AI agents suffer from the exact same problem.

Instead of understanding the underlying task, they're often forced to interact with software the same way humans do: by clicking buttons, reading screens, and navigating menus.

This approach works.
But it raises an important question:

Are we teaching AI systems to solve problems, or are we simply teaching them to use software designed for humans?

By the end of this article, we'll explore why UI navigation has become one of the biggest bottlenecks in AI agents and why many developers believe the industry is solving the wrong problem.

A Simple Task Isn't Really Simple

Imagine someone wants to check the status of an order.

For a browser-based AI agent, the process might look something like this:

What seems like a simple request actually requires multiple steps. And every step introduces opportunities for failure.

The UI Was Built For Humans

Graphical user interfaces exist because humans need visual representations of information. Buttons, menus, and forms are abstractions created for us.

Computers themselves don't need buttons.

Yet many modern AI agents spend enormous amounts of computation trying to interpret screenshots, locate interface elements, and simulate mouse clicks.
They're essentially pretending to be humans.

We've Seen This Problem Before

In Robotic Process Automation (RPA). Large enterprises invested heavily in automation workflows that depended on screen layouts and specific button locations.

Then a website redesign happened.
Or a menu moved.
Or a label changed.

And suddenly entire workflows stopped working.

Fragile automation became one of the biggest challenges facing RPA systems.
Many AI agents are repeating the same pattern—only with language models replacing traditional scripts.

Browser Agents Pay a Heavy Price

Research from companies building browser agents has also highlighted challenges such as:

Slow execution
Increased token costs
Fragility
Ambiguous actions

The Industry Is Optimizing The Wrong Layer

Today's AI systems are incredibly powerful.

Yet some of them spend most of their time performing tasks like:

Finding buttons.
Waiting for pages to load.
Reading screenshots.
Navigating menus.

These aren't the user's actual goals.
Users don't want a button clicked.
Users want a problem solved.
There's a difference.

The Irony

Humans navigate interfaces because we have no alternative.

Machines do.

And yet we're investing enormous amounts of compute teaching AI systems how to behave like users instead of asking whether user interfaces are even the right abstraction for machines.

The result is an interesting paradox: