tumf

Posted on Feb 6 • Originally published at blog.tumf.dev

Vibium: A Browser Automation Tool Optimized for AI Agents Over Playwright

#mcp #ai

Originally published on 2026-01-05
Original article (Japanese): Vibium: PlaywrightよりAIエージェントに最適化されたブラウザ自動化ツール

Jason Huggins, the creator of Selenium, has announced a new browser automation tool, Vibium, which comes approximately 20 years after Selenium. In this article, we will discuss Vibium's design philosophy, its differences from Playwright and Puppeteer, and why a new tool was necessary in the era of AI agents.

What is Vibium?

Vibium is a browser automation infrastructure designed for AI agents. All of the following features are integrated into a single binary of about 10MB:

Browser Lifecycle Management: Detection and launching of Chrome
WebDriver BiDi Proxy: Communication with the browser
MCP Server: Integration with LLM agents (like Claude Code)
Automatic Waiting: Polling until elements appear
Screenshots: PNG captures of the viewport

The standout feature is that integration with Claude Code can be completed with a single command:

claude mcp add vibium -- npx -y vibium

With this single line, Claude Code can directly manipulate the browser. Chrome is automatically downloaded, eliminating the need for manual setup.

From Selenium to Vibium: 20 Years of Evolution

Jason Huggins created Selenium in 2004, paving the way for browser automation. Since then, the industry has evolved with Selenium WebDriver, Puppeteer, and Playwright, but what prompted Huggins to create a tool again?

Challenges with Existing Tools

Selenium WebDriver (2011 onwards) is mature but has the following issues:

Complex setup (driver management, browser version compatibility)
Boilerplate code required for element waiting
Lack of consideration for integration with AI agents

Playwright (2020 onwards) and Puppeteer (2018 onwards) addressed these issues using the Chrome DevTools Protocol (CDP). However:

CDP is a Chrome-specific protocol (not standardized)
Additional abstraction layers are needed to support multiple browsers
MCP server functionality needs to be implemented separately

The Choice of WebDriver BiDi

Vibium adopts the WebDriver BiDi protocol. This is a next-generation protocol being developed as a W3C standard, combining the best aspects of Selenium WebDriver and CDP:

Bidirectional Communication: Real-time reception of events from the browser
Standardization: Works across Chrome, Firefox, and Safari (by specification)
Low-Level Access: Direct access to network, console, and DOM

Huggins stated in an interview on the TestGuild Podcast:

WebDriver BiDi is a protocol that has learned all the lessons from CDP that made Puppeteer and Playwright great.

Why Create Vibium Instead of Using Playwright?

So, why create a new tool instead of using Playwright? The primary reason is differences in design philosophy.

1. AI Agent-First Design

Vibium is optimized for AI agents:

Built-in MCP Server: Instant integration with Claude Code, Gemini, and local LLMs
stdio Communication: Conforms to the standard communication protocol for LLM agents
Simple API: Minimal methods that are easy for AI to understand

In contrast, Playwright is designed for human test engineers, requiring separate implementation for MCP integration.

2. Zero Setup Philosophy

The design goal of Vibium is to be "invisible binary":

// Just running npm install vibium makes this work
const { browserSync } = require('vibium')

const vibe = browserSync.launch()
vibe.go('https://example.com')
vibe.find('a').click()
vibe.quit()

Downloading the browser, placing drivers, and setting paths are all automated. This emphasizes a developer experience that prioritizes "getting it running first" in the AI era.

3. Simplicity of a Single Binary

Vibium achieves everything with a single Go binary of about 10MB:

┌─────────────────────────────────────────────────────────────┐
│                         LLM / Agent                         │
│          (Claude Code, Codex, Gemini, Local Models)         │
└─────────────────────────────────────────────────────────────┘
                      ▲
                      │ MCP Protocol (stdio)
                      ▼
           ┌─────────────────────┐
           │   Vibium Clicker    │
           │                     │
           │  ┌───────────────┐  │
           │  │  MCP Server   │  │
           │  └───────▲───────┘  │         ┌──────────────────┐
           │          │          │         │                  │
           │  ┌───────▼───────┐  │WebSocket│                  │
           │  │  BiDi Proxy   │  │◄───────►│  Chrome Browser  │
           │  └───────────────┘  │  BiDi   │                  │
           │                     │         │                  │
           └─────────────────────┘         └──────────────────┘

Playwright consists of multiple npm packages and browser binaries, leading to complex dependencies. Vibium chose simplicity.

Practical Use Cases for Vibium

Using as an AI Agent

After integration with Claude Code, you can issue commands in natural language:

"Go to example.com and click the first link"

Claude Code will automatically invoke the following MCP tools:

Tool	Description
`browser_launch`	Launch the browser (visible by default)
`browser_navigate`	Navigate to a URL
`browser_find`	Find elements using CSS selectors
`browser_click`	Click an element
`browser_type`	Input text
`browser_screenshot`	Take a screenshot
`browser_quit`	Close the browser

Using as a JavaScript Library

You can also use it directly as an npm package:

Synchronous API (REPL Friendly)

const fs = require('fs')
const { browserSync } = require('vibium')

const vibe = browserSync.launch()
vibe.go('https://example.com')

const png = vibe.screenshot()
fs.writeFileSync('screenshot.png', png)

const link = vibe.find('a')
link.click()
vibe.quit()

Asynchronous API

const fs = await import('fs/promises')
const { browser } = await import('vibium')

const vibe = await browser.launch()
await vibe.go('https://example.com')

const png = await vibe.screenshot()
await fs.writeFile('screenshot.png', png)

const link = await vibe.find('a')
await link.click()
await vibe.quit()

Automatic Waiting Mechanism

Vibium automatically waits until elements are displayed:

// This will automatically poll until the element appears
const button = vibe.find('button.submit')
button.click()

In Playwright and Selenium, explicit wait code was necessary, but Vibium waits intelligently by default, simplifying the code.

Platform Support

Vibium supports the following platforms:

Platform	Architecture	Status
Linux	x64	✅ Supported
macOS	x64 (Intel)	✅ Supported
macOS	arm64 (Apple Silicon)	✅ Supported
Windows	x64	✅ Supported

During installation, the appropriate binary for the platform is automatically selected, and Chrome's cache is stored in the following locations:

Linux: ~/.cache/vibium/
macOS: ~/Library/Caches/vibium/
Windows: %LOCALAPPDATA%\vibium\

Roadmap: Plans Beyond v2

The current v1 focuses on "integration of AI and browsers," but the v2 roadmap outlines the following features that are released or planned:

Python Client: Released in December 2025 (pip install vibium)
Java Client: Planned for enterprise use
Cortex: Memory and navigation layer
Retina: Recording extension
Video Recording: Capturing test execution
AI Locator: Smarter element searching

All of these are extensions of the vision of "AI agents handling browsers more naturally."

Why Vibium Now?

The reason Jason Huggins created a new tool after about 20 years since Selenium is due to the paradigm shift brought about by the emergence of AI agents.

From Testing Tools to AI Tools

Selenium was created for test automation. The same goes for Playwright and Puppeteer. However, AI agents like Claude Code, Gemini, and ChatGPT use different approaches:

Instead of executing human-written test scripts, AI makes dynamic judgments
Instead of fixed selectors, elements are identified based on visual information and context
Browser operations are part of task achievement, not the ultimate goal

A tool optimized for this new usage was needed. That is Vibium.

The Rise of the MCP Ecosystem

The Model Context Protocol (MCP) is an integration standard for AI agents and tools proposed by Anthropic. Vibium was designed from the ground up as an MCP server, allowing for instant integration with Claude Code, Cursor, and other MCP-compliant AI editors.

This represents a shift in thinking from "creating a tool and then figuring out how to integrate it" to "designing a tool with integration in mind."

A Return to Simplicity

Over 20 years, browser automation tools have become feature-rich but also complex. Vibium regains simplicity by focusing on "only the truly necessary features":

Single binary
Zero setup
Minimal API
Automatic waiting

This philosophy aligns with the recent trend of "reducing complexity" seen in tools like exo and dotenvx.

Token Efficiency Comparison Experiment Between Playwright and Vibium

For AI agents, the important factor is the amount of token consumption required to achieve a task. We executed the same task with both tools and measured token efficiency.

Experimental Conditions

Task: "Access example.com and take a screenshot of the page"

We used OpenCode's token counter to measure actual token consumption.

Measured Results: Token Consumption

In the Case of Vibium

# Executed tool calls
1. browser_launch         # Launch the browser
2. browser_navigate       # Navigate to https://example.com
3. browser_screenshot     # Take a screenshot
4. browser_quit          # Close the browser

Consumed Tokens: 240 tokens

Examples of responses from each tool:

browser_launch: "Browser launched (headless: false)" (7 words)
browser_navigate: "Navigated to https://example.com/" (4 words)
browser_screenshot: "Screenshot saved to /path/to/file.png" (5 words)
browser_quit: "Browser session closed" (3 words)

In the Case of Playwright

# Executed tool calls
1. browser_navigate       # Navigate to https://example.com
2. browser_take_screenshot # Take a screenshot

Consumed Tokens: 2,061 tokens

Examples of responses from each tool:

browser_navigate:
- Executed code snippet: await page.goto('https://example.com');
- Page information (URL, title)
- Entire accessibility tree (in YAML format, hundreds to thousands of words)
browser_take_screenshot:
- Executed code snippet
- Screenshot image data (consumed tokens via Vision API)

Surprising Result: Vibium is 8.6 Times More Efficient

Playwright: 2,061 tokens (2 tool calls)
Vibium:       240 tokens (4 tool calls)

Efficiency Ratio: Vibium achieves the same task at about 1/8.6 the tokens of Playwright

{{< figure-desc src="/images/vibium-selenium-creator-browser-automation/token-comparison-race.png" alt="Token Efficiency Comparison: Playwright vs. Vibium Race" >}}

Why Such a Difference?

Reasons Playwright Consumes More Tokens:

Automatic Sending of Accessibility Tree: browser_navigate returns the entire DOM structure of the page in YAML format every time.

   - generic [ref=e2]:
     - heading "Example Domain" [level=1] [ref=e3]
     - paragraph [ref=e4]: This domain is for use...
     - paragraph [ref=e5]:
       - link "Learn more" [ref=e6]:
         - /url: https://iana.org/domains/example

This data alone consumes hundreds of tokens.

Displaying Executed Code: Displays actual Playwright code for debugging purposes.

   await page.goto('https://example.com');
   await page.screenshot({...});

Image Data: Screenshots are returned as images and processed by the Vision API.

Reasons Vibium is Efficient:

Minimal Responses: Only success/failure messages (averaging fewer than 5 words).
Images Stored Locally: Screenshots return only the file path (no token consumption).
No Code Display: Only simple status messages are returned.

Impact on Complex Tasks

Even for a simple page like example.com, an 8.6x difference emerges. In actual web applications (e.g., dashboards, admin panels), this difference will widen even further:

Page Complexity	Playwright Consumption	Vibium Consumption	Ratio
Simple (example.com)	2,061 tokens	240 tokens	8.6x
Medium (blog post)	Estimated 5,000–10,000	240–300	16–33x
Complex (admin panel)	Estimated 10,000–50,000	240–400	25–125x

Conclusion:

Vibium is designed to return "only the necessary information for AI."
Playwright returns "information for humans to debug."
Vibium's approach is overwhelmingly advantageous for reducing operational costs for AI agents.

These experimental results clearly illustrate why Vibium was created separately from Playwright. In the era of AI agents, "how efficient" is more important than "how feature-rich."

Differentiating Between Playwright and Vibium

Both tools are excellent, but the optimal choice varies depending on the use case.

Cases Where Playwright is More Suitable

Playwright is better suited for scenarios such as:

1. Human-Written E2E Tests

Fixed test scripts executed in CI/CD pipelines
Existing Playwright test suites
Need for debugging information (accessibility tree, executed code)

2. Cross-Browser Testing

Running the same code across Chrome, Firefox, and Safari
Verifying differences in behavior between browsers
Emulating mobile browsers

3. Advanced DOM Manipulation

Complex operations with Shadow DOM or iframes
Intercepting and mocking network requests
Fine control over browser contexts

4. Integration with Existing Ecosystems

Official tools like Playwright Test Runner, Playwright Inspector
Benefits of TypeScript type definitions (auto-completion, type checking)
Official support and community from Microsoft

Example: Complex E2E Test Scenario

// Playwright's strength: Detailed control
import { test, expect } from '@playwright/test';

test('Complex payment flow', async ({ page, context }) => {
  // Mocking network requests
  await page.route('**/api/payment', route => {
    route.fulfill({ status: 200, body: '{"success": true}' });
  });

  // Operations across multiple tabs
  const [newPage] = await Promise.all([
    context.waitForEvent('page'),
    page.click('a[target="_blank"]')
  ]);

  // Manipulating elements within Shadow DOM
  const shadowHost = await page.locator('custom-element');
  const shadowButton = await shadowHost.evaluateHandle(
    el => el.shadowRoot.querySelector('button')
  );
});

Cases Where Vibium is More Suitable

Conversely, Vibium is optimal for scenarios such as:

1. Automation by AI Agents

LLMs (Claude, Gemini, etc.) operating the browser
Executing tasks based on natural language instructions
Emphasizing token cost efficiency in operations

2. Dynamic Browser Operations

Tasks where steps are not predetermined
Situations requiring actions to change based on user input
Prioritizing "reaching the goal" over fixed procedures

3. Simple Scripts and REPL

Interactively operating the browser in Node.js REPL
Direct invocation from Python scripts
Writing simply with synchronous API

4. Zero Setup is Essential

Minimizing dependencies in CI environments
Keeping Docker container sizes small
Quick prototyping

Example: AI Agent Task

// Vibium's strength: Simplicity and AI integration
const { browserSync } = require('vibium');

// REPL-friendly synchronous API
const vibe = browserSync.launch();
vibe.go('https://example.com');

// AI determines the next step
// "Find the login button and click it"
// → Automatically waits if the element is not found
// → If still not found, requests AI to reassess

Criteria for Differentiation

Criteria	Playwright	Vibium
Executor	Human-written scripts	AI agents
Nature of Tests	Deterministic (fixed steps)	Dynamic (context-dependent judgments)
Need for Debugging	High (detailed information needed)	Low (results-focused)
Token Cost	Not a concern	Important
Cross-Browser	Essential	Chrome-centric is fine
Existing Assets	Playwright code available	Zero start

Practical Suggestion: Use Both

In many projects, the ideal approach is to differentiate as follows:

Fixed tests in CI/CD → Playwright (stability-focused)
Exploratory testing and demos → Vibium (flexibility-focused)
AI assistant integration → Vibium (token efficiency-focused)

Both are excellent tools, and the decision should be based on "which is more suitable" rather than "which is better."

{{< figure-desc src="/images/vibium-selenium-creator-browser-automation/tool-selection-crossroads.png" alt="Tool Selection Crossroads: Differentiating Playwright and Vibium" >}}

Conclusion

The reasons Vibium was created anew rather than using Playwright can be summarized in the following three points:

AI Agent-First Design: Built-in MCP server, stdio communication, simple API
Adoption of WebDriver BiDi: A standardized next-generation protocol
Zero Setup Philosophy: Single binary, automatic browser downloads, immediate functionality

The evolution from Selenium to Playwright aimed at creating "better testing tools." In contrast, Vibium pioneers a new category as "infrastructure for AI to operate browsers."

As demonstrated by the token efficiency experiment, Vibium adopts a new approach where "AI agents visually understand the web." This contrasts with the traditional DOM manipulation-centric tools.

If you're interested in browser automation in the AI agent era, be sure to try out Vibium.

DEV Community