DEV Community

Arcede
Arcede

Posted on

Your Browser Agent Costs More Than Your Intern — Here's the Math

Every week, a new "AI browser agent" demo goes viral. It fills out a form! It books a flight! It navigates a checkout flow!

And every week, engineering teams try to deploy these agents in production — and hit the same wall.

The Hidden Cost Nobody Talks About

A typical browser agent exploring an unfamiliar website burns through this loop:

  1. Take screenshot (~1,500 tokens)
  2. Reason about what to see (~500 tokens)
  3. Decide on action (~300 tokens)
  4. Execute action
  5. Take another screenshot
  6. Repeat 15-40 times per task

That'''s 35,000-90,000 tokens per task. At current model prices, a single "search for flights on Kayak" costs $0.30-0.80. Run that across 1,000 customers and you'''re spending $300-800 on something a human does in 45 seconds.

The problem isn'''t the model. The problem is that every agent starts from zero, every time.

What If Agents Could Share What They Learn?

Imagine the first agent that visits kayak.com discovers the search form, the date picker pattern, and the results selector. Now imagine every subsequent agent already knows this — zero exploration, straight to execution.

That'''s not hypothetical. That'''s what a shared capability layer does:

  • First visit: Agent explores, maps the site, reports what it found
  • Second visit: Agent gets a pre-verified execution plan, skips exploration entirely
  • 100th visit: Execution is near-instant, costs drop 95%+

The economics flip. Instead of each agent paying the full exploration tax, the cost is amortized across every agent that touches that domain.

The Numbers That Matter

Metric Solo Agent Shared Intelligence
Tokens per task 35,000-90,000 2,000-5,000
Cost per task $0.30-0.80 $0.01-0.04
Time to execute 30-90 seconds 2-5 seconds
Failure rate 30-50% <5%

This isn'''t marginal improvement. It'''s the difference between "browser agents are a cool demo" and "browser agents are a production tool."

The Compounding Effect

Here'''s what makes shared intelligence different from caching or hard-coded scrapers:

Every agent that reports its outcomes makes the system smarter. A failed selector gets flagged. A new site layout gets mapped. An API fast-path gets discovered. The system improves because agents use it, not despite them.

This is the same pattern that made Google Maps accurate (every phone contributing GPS data) and Waze useful (every driver reporting traffic). Individual contribution, collective benefit.

What This Means for Your Stack

If you'''re building agents that touch the web, the question isn'''t "which model should I use?" It'''s "how do I avoid paying the exploration tax on every single request?"

The teams that figure this out first will ship agents that are 10-100x cheaper and faster than their competitors. The teams that don'''t will keep burning tokens on screenshots of loading spinners.


We'''re building this shared intelligence layer at Arcede. If you'''re shipping browser agents and the cost-per-task math keeps you up at night, the AIR SDK is open for early integrators.

Top comments (0)