龙虾牧马人

Posted on Jun 10

Your Browser Is the API: Why Browser-Control Agents Need a Sandbox

#security

The uncomfortable idea

Most automation tools still depend on APIs. But many useful websites either do not expose an API, limit it heavily, or make it hard to access the exact state a human sees in the browser.

Browser-control agents take a different route: treat the real browser as the API.

I tested this idea in a sandbox with bb-browser, an open-source TypeScript project whose positioning is basically: connect to Chrome, use the current browser context, and let agents read or operate through site adapters.

At the time I checked it, the GitHub repository had about 5.7k stars, used an MIT license, and the latest npm package was bb-browser@0.14.2.

I did not connect it to my main browser profile. I did not let it post, delete, comment, DM, or touch payment flows. I only ran read-only tests on public pages.

What worked

In a sandbox environment, I tested two read-only flows:

searching arXiv for browser automation / AI agent topics;
searching Dev.to for AI agent automation articles.

The result was promising: the tool returned structured results from pages that normally require browser context or custom scraping logic.

But there was also a practical catch: some site adapters expect the correct domain tab to be open. If the adapter runs in the wrong page context, relative URLs or fetch calls can fail.

That matters because it reminds us this is not magic. It is still browser automation, with all the fragility and permission risk that comes with it.

The real risk: logged-in context

The powerful part is also the dangerous part.

If an agent can access your already logged-in browser, it may see what you see:

dashboards;
drafts;
private messages;
internal tools;
account settings;
potentially sensitive network requests.

If you then allow write actions, the agent may also do what you can do:

publish;
delete;
comment;
message;
change settings;
click the wrong button at the wrong time.

This is why I think browser-control agents should be treated as high-permission tools, not just fancy scrapers.

My operating rules

For now, my rules are simple:

Do not connect the tool to a main browser profile with sensitive accounts.
Start with read-only public-page tasks.
Do not inspect network bodies that may contain tokens, cookies, or private data.
Bind local daemons to 127.0.0.1 only.
Block write adapters by default: no posting, deleting, commenting, DMs, payments, or settings changes.
Use the tool for verification before using it for action.

A good first use case is not:

Let the agent run my account.

A better first use case is:

After I manually publish something, let the agent read the dashboard and verify the title, status, URL, and timestamp.

Where this is going

I think browser-as-API will become a major pattern for AI agents.

APIs are clean, but the web is messy. A lot of real work still happens inside browser sessions, dashboards, and admin panels. Agents will increasingly need to interact with that world.

But the right architecture is not blind autonomy.

It is:

sandbox first;
read-only first;
allowlist first;
human confirmation for irreversible actions.

Browser-control agents are useful. They may become essential.

But if the browser is the API, your logged-in session is the permission boundary.

Treat it like production access.

DEV Community