DEV Community

Skila AI
Skila AI

Posted on • Originally published at news.skila.ai

"Google Just Shipped a Web Agent That Runs 10 Tabs at Once. It Beat OpenAI's Score."

Originally published at news.skila.ai

Everyone said Google was cooked on agents. OpenAI had the Responses API. Anthropic had computer use. Google had a 2024 research preview that nobody remembered.

Then on April 22, 2026, Sundar Pichai walked on stage at Cloud Next and shipped Project Mariner. 83.5% on WebVoyager. 10 concurrent browser tasks. Generally available today.

That number matters. WebVoyager is the hardest public benchmark for autonomous web agents — it tests real websites, multi-step tasks, and error recovery. 83.5% puts Mariner ahead of every publicly reported score from OpenAI's Computer Use and Anthropic's computer-use tooling at comparable task difficulty.

And that is not even the headline.

What Project Mariner Actually Does

Mariner is a web-browsing agent built on Gemini 2.0 (up from the Gemini 1.5 research preview shown in late 2024). You give it a goal — "book the cheapest Tuesday flight from SFO to Tokyo, under $1,200, no Basic Economy" — and it opens a browser, navigates, clicks, types, and completes the task.

Three things set it apart:

  1. It runs on Google's cloud, not your laptop. OpenAI's Computer Use drives your local browser. Anthropic's implementation does the same. Mariner spins up isolated VMs in Google Cloud. Your machine is free while the agent works. Your cookies are not exposed.
  2. Ten tabs at once. You can dispatch 10 parallel tasks. One Mariner instance can be comparing flights while another drafts an email while another scrapes three competitor websites. This is the first web agent where parallelism is a product feature, not a hack.
  3. It's GA, not a waitlist. If you have a Gemini Enterprise subscription, you can use it right now.

The Benchmark Number Nobody Is Disputing

83.5% on WebVoyager. Independent agent teams in 2025 reported numbers in the 60-75% range on updated splits. Google claims Mariner's GA build lands at 83.5% on the current public benchmark.

Bloomberg's Mark Bergen confirmed the number came straight from the public leaderboard run, not an internal eval. The product demo showed Mariner completing a Kayak booking, a Workday expense submission, and a Salesforce lead-capture flow in parallel — all three finished without human intervention.

Gemini Enterprise: The Rebrand That Actually Matters

Vertex AI is dead. Long live Gemini Enterprise Agent Platform.

Google consolidated five products into one agent control plane. What shipped:

  • 200+ models in the Model Garden, including Anthropic Claude Opus 4.6 and Sonnet 4.5
  • Managed MCP servers across every Google Cloud service — BigQuery, Spanner, GKE, Cloud Run, Firestore. No install. No token rotation. OAuth 2.1 baked in.
  • ADK 1.0 (Agent Development Kit) for Python or TypeScript
  • A2A v1.0 (Agent-to-Agent protocol) — the official spec for agents to talk to each other across vendors. Salesforce, Workday, Box, and ServiceNow all adopted it at launch.
  • Workspace Studio no-code builder for non-engineers

The Real Story: Partner Agents on Day One

Google shipped Mariner with pre-built partner agents from Box, Workday, Salesforce, and ServiceNow. These are not generic API integrations. They are full agents that speak A2A v1.0 and can be invoked by Mariner as subroutines.

Translation: your enterprise stack is already wired. If your company runs Workday for HR, Salesforce for CRM, and ServiceNow for IT, Mariner can hand off tasks to those systems' agents without you writing a single line of glue code.

The 10-Tab Parallelism Changes Agent Economics

Ten concurrent tabs means a research task that took a human analyst 45 minutes can finish in 4. A procurement specialist comparing SaaS vendors across five websites can run all five simultaneously.

This is the kind of capability that justifies the $85/user/month Pro tier on a single use case. For ops-heavy teams, the payback is measured in days, not quarters.

Security Posture: The Quiet Enterprise Win

Mariner runs in an isolated Google Cloud VM with a fresh browser session per task. Credentials are injected through a managed secrets vault — never stored in the agent's memory. Audit logs capture every page visit, every click, every form submission.

If your compliance team rejected computer-use agents in 2025 on data-exfiltration grounds, Mariner is the first architecture that answers their objections.

Verdict

Google was never behind on agents. It was behind on shipping agents. That changed on April 22.

The 83.5% number is real. The 10-tab parallelism is real. The managed MCP servers across every GCP service is the quiet kill shot.

For the next 90 days, Google owns the enterprise agent narrative.


Read the full analysis with pricing breakdown and FAQ at news.skila.ai

Top comments (0)