What If the Internet Could Tell Your Agent What Is Possible?

#ai #browserautomation #opensource #devtools

Every browser automation tool makes the same mistake: they give you a browser and say "figure it out." You write selectors, handle popups, manage sessions, retry failures — all before you've done anything useful.

What if the internet itself could tell your agent what's possible?

That's the idea behind AIR (Agent Internet Runtime). Instead of hardcoding selectors, your agent asks a site what capabilities it supports, gets execution guidance, and reports what worked — building a shared knowledge layer that every agent benefits from.

Here's how it works in practice.

The Three-Step Pattern

Every interaction follows the same flow:

Step 1 — Browse. Ask a domain what capabilities exist. You get back a structured list: search, login, add-to-cart, fill-form, etc. Each has confidence scores and selector hints.

Step 2 — Execute. Pick a capability and pass your parameters. AIR returns the optimal execution path: sometimes an API shortcut, sometimes verified macro steps, sometimes selector guidance for uncharted territory.

Step 3 — Report. After your agent acts, it reports what happened — which selectors worked, what failed, what it observed. This is where the flywheel turns. Every report improves guidance for the next agent.

A Real Example: Searching Hacker News

Instead of writing:

driver.get("https://news.ycombinator.com")
search_box = driver.find_element(By.CSS_SELECTOR, "input[name='q']")
search_box.send_keys("browser automation")
search_box.submit()

You do:

# Step 1: What can I do here?
caps = air.browse_capabilities("news.ycombinator.com")
# Returns: search_stories, view_comments, submit_story...

# Step 2: How do I search?
plan = air.execute_capability(
    domain="news.ycombinator.com",
    capability="search_stories",
    params={"query": "browser automation"}
)
# Returns: execution steps with verified selectors

# Step 3: Report what happened
air.report_outcome(
    domain="news.ycombinator.com",
    capability="search_stories",
    steps=[...],  # what you actually did
    success=True
)

The difference: your code doesn't break when HN changes their DOM. The collective knowledge layer already has updated selectors from other agents who visited recently.

Why This Matters

Traditional browser automation is O(n) — every new site requires new code. AIR makes it O(1) amortized — once any agent figures out a site, every agent benefits.

The numbers back this up. In our benchmarks, agents using AIR's collective intelligence layer use 7,000x fewer tokens per successful action compared to agents that screenshot-and-reason their way through pages.

Try It

AIR SDK is open and available as an MCP server. If you're building agents that touch the web, the browse → execute → report pattern is worth trying. The SDK handles the knowledge layer; you handle the logic.

GitHub: github.com/anthropics/anthropic-cookbook has MCP examples. AIR's approach is similar — structured capability discovery instead of raw DOM wrestling.

The web wasn't built for agents. But with the right abstraction layer, it doesn't have to be rebuilt either.