Browser Automation for AI Agents: What Actually Works

Dylan Worrall — Thu, 18 Jun 2026 23:16:36 +0000

Originally published at dylanworrall.com.

Most agent demos that involve a browser are shot in one take for a reason. The moment you try to make browser automation reliable — running unattended, across sites you don't control, hundreds of times — it stops being a demo and starts being an engineering problem. I've spent a lot of time on that problem building the browser layer inside Froots, and a handful of patterns made the difference between "works in the video" and "works at 3am while I'm asleep."

Prefer structured verbs over raw `eval`

It's tempting to give the agent one giant escape hatch: run arbitrary JavaScript in the page and parse whatever comes back. It works right up until it doesn't, and when it fails it fails opaquely.

A small vocabulary of structured commands beats one omnipotent one:

navigate <url>
click <selector>
fill <selector> <value>
type <selector> <value>      # contenteditable-safe; composers ignore plain fill
text <selector>              # read innerText back
wait_selector <selector>     # poll until it exists

The point isn't that eval is useless — it's the fallback, not the default. Structured verbs give you predictable error messages ("selector not found" beats a stack trace from inside a minified bundle), and they make the agent's intent legible.

Kill the `sleep` instinct — wait on conditions

The single biggest source of flakiness is sleep(2000). Too short and you act before the element exists; too long and every run wastes seconds. Replace time with conditions: poll until the element exists, until the spinner is gone, or until navigation lands. An agent that waits on the thing it actually needs is both faster and dramatically more reliable than one that guesses at timing.

Always read something back

This is the lesson I learned the hard way. A command would return success and I'd assume the work was done — then find the agent had been talking to a pane that wasn't there. Every call "succeeded" by doing nothing.

The fix is a discipline: a write should be confirmed by a read. After you fill a field, read it back. After you click submit, wait for the URL or a success node. Silent success is not the same as success.

Use the session's own cookies for reads

A lot of useful data sits behind a login. Rather than scraping a login wall, do an in-page fetch with credentials: 'include' from the right origin — you reuse the existing session instead of re-authenticating or storing credentials. Probe for a login cookie before you reach for authenticated data, so you can ask the human to sign in rather than silently scraping an error page.

Screenshots are the honest fallback

When the DOM is hostile — shadow roots, canvas UIs, obfuscated class names — stop fighting selectors and take a screenshot. A vision model reading a picture of the page is sometimes the most robust path.

The meta-lesson

Reliable browser automation is less about clever selectors and more about closing the loop: act, observe, confirm, and never trust a result you didn't verify.

I write more about agent architecture — reliable memory, agents you can watch work, and building toward a one-person company — over on my blog.

— Dylan Worrall, founder of Froots

DEV Community: Dylan Worrall