Originally published on 2026-01-05
Original article (Japanese): Vibium: PlaywrightよりAIエージェントに最適化されたブラウザ自動化ツール
Jason Huggins, the creator of Selenium, has announced a new browser automation tool, Vibium, which comes approximately 20 years after Selenium. In this article, we will discuss Vibium's design philosophy, its differences from Playwright and Puppeteer, and why a new tool was necessary in the era of AI agents.
What is Vibium?
Vibium is a browser automation infrastructure designed for AI agents. All of the following features are integrated into a single binary of about 10MB:
- Browser Lifecycle Management: Detection and launching of Chrome
- WebDriver BiDi Proxy: Communication with the browser
- MCP Server: Integration with LLM agents (like Claude Code)
- Automatic Waiting: Polling until elements appear
- Screenshots: PNG captures of the viewport
The standout feature is that integration with Claude Code can be completed with a single command:
claude mcp add vibium -- npx -y vibium
With this single line, Claude Code can directly manipulate the browser. Chrome is automatically downloaded, eliminating the need for manual setup.
From Selenium to Vibium: 20 Years of Evolution
Jason Huggins created Selenium in 2004, paving the way for browser automation. Since then, the industry has evolved with Selenium WebDriver, Puppeteer, and Playwright, but what prompted Huggins to create a tool again?
Challenges with Existing Tools
Selenium WebDriver (2011 onwards) is mature but has the following issues:
- Complex setup (driver management, browser version compatibility)
- Boilerplate code required for element waiting
- Lack of consideration for integration with AI agents
Playwright (2020 onwards) and Puppeteer (2018 onwards) addressed these issues using the Chrome DevTools Protocol (CDP). However:
- CDP is a Chrome-specific protocol (not standardized)
- Additional abstraction layers are needed to support multiple browsers
- MCP server functionality needs to be implemented separately
The Choice of WebDriver BiDi
Vibium adopts the WebDriver BiDi protocol. This is a next-generation protocol being developed as a W3C standard, combining the best aspects of Selenium WebDriver and CDP:
- Bidirectional Communication: Real-time reception of events from the browser
- Standardization: Works across Chrome, Firefox, and Safari (by specification)
- Low-Level Access: Direct access to network, console, and DOM
Huggins stated in an interview on the TestGuild Podcast:
WebDriver BiDi is a protocol that has learned all the lessons from CDP that made Puppeteer and Playwright great.
Why Create Vibium Instead of Using Playwright?
So, why create a new tool instead of using Playwright? The primary reason is differences in design philosophy.
1. AI Agent-First Design
Vibium is optimized for AI agents:
- Built-in MCP Server: Instant integration with Claude Code, Gemini, and local LLMs
- stdio Communication: Conforms to the standard communication protocol for LLM agents
- Simple API: Minimal methods that are easy for AI to understand
In contrast, Playwright is designed for human test engineers, requiring separate implementation for MCP integration.
2. Zero Setup Philosophy
The design goal of Vibium is to be "invisible binary":
// Just running npm install vibium makes this work
const { browserSync } = require('vibium')
const vibe = browserSync.launch()
vibe.go('https://example.com')
vibe.find('a').click()
vibe.quit()
Downloading the browser, placing drivers, and setting paths are all automated. This emphasizes a developer experience that prioritizes "getting it running first" in the AI era.
3. Simplicity of a Single Binary
Vibium achieves everything with a single Go binary of about 10MB:
┌─────────────────────────────────────────────────────────────┐
│ LLM / Agent │
│ (Claude Code, Codex, Gemini, Local Models) │
└─────────────────────────────────────────────────────────────┘
▲
│ MCP Protocol (stdio)
▼
┌─────────────────────┐
│ Vibium Clicker │
│ │
│ ┌───────────────┐ │
│ │ MCP Server │ │
│ └───────▲───────┘ │ ┌──────────────────┐
│ │ │ │ │
│ ┌───────▼───────┐ │WebSocket│ │
│ │ BiDi Proxy │ │◄───────►│ Chrome Browser │
│ └───────────────┘ │ BiDi │ │
│ │ │ │
└─────────────────────┘ └──────────────────┘
Playwright consists of multiple npm packages and browser binaries, leading to complex dependencies. Vibium chose simplicity.
Practical Use Cases for Vibium
Using as an AI Agent
After integration with Claude Code, you can issue commands in natural language:
"Go to example.com and click the first link"
Claude Code will automatically invoke the following MCP tools:
| Tool | Description |
|---|---|
browser_launch |
Launch the browser (visible by default) |
browser_navigate |
Navigate to a URL |
browser_find |
Find elements using CSS selectors |
browser_click |
Click an element |
browser_type |
Input text |
browser_screenshot |
Take a screenshot |
browser_quit |
Close the browser |
Using as a JavaScript Library
You can also use it directly as an npm package:
Synchronous API (REPL Friendly)
const fs = require('fs')
const { browserSync } = require('vibium')
const vibe = browserSync.launch()
vibe.go('https://example.com')
const png = vibe.screenshot()
fs.writeFileSync('screenshot.png', png)
const link = vibe.find('a')
link.click()
vibe.quit()
Asynchronous API
const fs = await import('fs/promises')
const { browser } = await import('vibium')
const vibe = await browser.launch()
await vibe.go('https://example.com')
const png = await vibe.screenshot()
await fs.writeFile('screenshot.png', png)
const link = await vibe.find('a')
await link.click()
await vibe.quit()
Automatic Waiting Mechanism
Vibium automatically waits until elements are displayed:
// This will automatically poll until the element appears
const button = vibe.find('button.submit')
button.click()
In Playwright and Selenium, explicit wait code was necessary, but Vibium waits intelligently by default, simplifying the code.
Platform Support
Vibium supports the following platforms:
| Platform | Architecture | Status |
|---|---|---|
| Linux | x64 | ✅ Supported |
| macOS | x64 (Intel) | ✅ Supported |
| macOS | arm64 (Apple Silicon) | ✅ Supported |
| Windows | x64 | ✅ Supported |
During installation, the appropriate binary for the platform is automatically selected, and Chrome's cache is stored in the following locations:
-
Linux:
~/.cache/vibium/ -
macOS:
~/Library/Caches/vibium/ -
Windows:
%LOCALAPPDATA%\vibium\
Roadmap: Plans Beyond v2
The current v1 focuses on "integration of AI and browsers," but the v2 roadmap outlines the following features that are released or planned:
-
Python Client: Released in December 2025 (
pip install vibium) - Java Client: Planned for enterprise use
- Cortex: Memory and navigation layer
- Retina: Recording extension
- Video Recording: Capturing test execution
- AI Locator: Smarter element searching
All of these are extensions of the vision of "AI agents handling browsers more naturally."
Why Vibium Now?
The reason Jason Huggins created a new tool after about 20 years since Selenium is due to the paradigm shift brought about by the emergence of AI agents.
From Testing Tools to AI Tools
Selenium was created for test automation. The same goes for Playwright and Puppeteer. However, AI agents like Claude Code, Gemini, and ChatGPT use different approaches:
- Instead of executing human-written test scripts, AI makes dynamic judgments
- Instead of fixed selectors, elements are identified based on visual information and context
- Browser operations are part of task achievement, not the ultimate goal
A tool optimized for this new usage was needed. That is Vibium.
The Rise of the MCP Ecosystem
The Model Context Protocol (MCP) is an integration standard for AI agents and tools proposed by Anthropic. Vibium was designed from the ground up as an MCP server, allowing for instant integration with Claude Code, Cursor, and other MCP-compliant AI editors.
This represents a shift in thinking from "creating a tool and then figuring out how to integrate it" to "designing a tool with integration in mind."
A Return to Simplicity
Over 20 years, browser automation tools have become feature-rich but also complex. Vibium regains simplicity by focusing on "only the truly necessary features":
- Single binary
- Zero setup
- Minimal API
- Automatic waiting
This philosophy aligns with the recent trend of "reducing complexity" seen in tools like exo and dotenvx.
Token Efficiency Comparison Experiment Between Playwright and Vibium
For AI agents, the important factor is the amount of token consumption required to achieve a task. We executed the same task with both tools and measured token efficiency.
Experimental Conditions
Task: "Access example.com and take a screenshot of the page"
We used OpenCode's token counter to measure actual token consumption.
Measured Results: Token Consumption
In the Case of Vibium
# Executed tool calls
1. browser_launch # Launch the browser
2. browser_navigate # Navigate to https://example.com
3. browser_screenshot # Take a screenshot
4. browser_quit # Close the browser
Consumed Tokens: 240 tokens
Examples of responses from each tool:
-
browser_launch: "Browser launched (headless: false)" (7 words) -
browser_navigate: "Navigated to https://example.com/" (4 words) -
browser_screenshot: "Screenshot saved to /path/to/file.png" (5 words) -
browser_quit: "Browser session closed" (3 words)
In the Case of Playwright
# Executed tool calls
1. browser_navigate # Navigate to https://example.com
2. browser_take_screenshot # Take a screenshot
Consumed Tokens: 2,061 tokens
Examples of responses from each tool:
-
browser_navigate:- Executed code snippet:
await page.goto('https://example.com'); - Page information (URL, title)
- Entire accessibility tree (in YAML format, hundreds to thousands of words)
- Executed code snippet:
-
browser_take_screenshot:- Executed code snippet
- Screenshot image data (consumed tokens via Vision API)
Surprising Result: Vibium is 8.6 Times More Efficient
Playwright: 2,061 tokens (2 tool calls)
Vibium: 240 tokens (4 tool calls)
Efficiency Ratio: Vibium achieves the same task at about 1/8.6 the tokens of Playwright
{{< figure-desc src="/images/vibium-selenium-creator-browser-automation/token-comparison-race.png" alt="Token Efficiency Comparison: Playwright vs. Vibium Race" >}}
Why Such a Difference?
Reasons Playwright Consumes More Tokens:
-
Automatic Sending of Accessibility Tree:
browser_navigatereturns the entire DOM structure of the page in YAML format every time.
- generic [ref=e2]:
- heading "Example Domain" [level=1] [ref=e3]
- paragraph [ref=e4]: This domain is for use...
- paragraph [ref=e5]:
- link "Learn more" [ref=e6]:
- /url: https://iana.org/domains/example
This data alone consumes hundreds of tokens.
- Displaying Executed Code: Displays actual Playwright code for debugging purposes.
await page.goto('https://example.com');
await page.screenshot({...});
- Image Data: Screenshots are returned as images and processed by the Vision API.
Reasons Vibium is Efficient:
- Minimal Responses: Only success/failure messages (averaging fewer than 5 words).
- Images Stored Locally: Screenshots return only the file path (no token consumption).
- No Code Display: Only simple status messages are returned.
Impact on Complex Tasks
Even for a simple page like example.com, an 8.6x difference emerges. In actual web applications (e.g., dashboards, admin panels), this difference will widen even further:
| Page Complexity | Playwright Consumption | Vibium Consumption | Ratio |
|---|---|---|---|
| Simple (example.com) | 2,061 tokens | 240 tokens | 8.6x |
| Medium (blog post) | Estimated 5,000–10,000 | 240–300 | 16–33x |
| Complex (admin panel) | Estimated 10,000–50,000 | 240–400 | 25–125x |
Conclusion:
- Vibium is designed to return "only the necessary information for AI."
- Playwright returns "information for humans to debug."
- Vibium's approach is overwhelmingly advantageous for reducing operational costs for AI agents.
These experimental results clearly illustrate why Vibium was created separately from Playwright. In the era of AI agents, "how efficient" is more important than "how feature-rich."
Differentiating Between Playwright and Vibium
Both tools are excellent, but the optimal choice varies depending on the use case.
Cases Where Playwright is More Suitable
Playwright is better suited for scenarios such as:
1. Human-Written E2E Tests
- Fixed test scripts executed in CI/CD pipelines
- Existing Playwright test suites
- Need for debugging information (accessibility tree, executed code)
2. Cross-Browser Testing
- Running the same code across Chrome, Firefox, and Safari
- Verifying differences in behavior between browsers
- Emulating mobile browsers
3. Advanced DOM Manipulation
- Complex operations with Shadow DOM or iframes
- Intercepting and mocking network requests
- Fine control over browser contexts
4. Integration with Existing Ecosystems
- Official tools like Playwright Test Runner, Playwright Inspector
- Benefits of TypeScript type definitions (auto-completion, type checking)
- Official support and community from Microsoft
Example: Complex E2E Test Scenario
// Playwright's strength: Detailed control
import { test, expect } from '@playwright/test';
test('Complex payment flow', async ({ page, context }) => {
// Mocking network requests
await page.route('**/api/payment', route => {
route.fulfill({ status: 200, body: '{"success": true}' });
});
// Operations across multiple tabs
const [newPage] = await Promise.all([
context.waitForEvent('page'),
page.click('a[target="_blank"]')
]);
// Manipulating elements within Shadow DOM
const shadowHost = await page.locator('custom-element');
const shadowButton = await shadowHost.evaluateHandle(
el => el.shadowRoot.querySelector('button')
);
});
Cases Where Vibium is More Suitable
Conversely, Vibium is optimal for scenarios such as:
1. Automation by AI Agents
- LLMs (Claude, Gemini, etc.) operating the browser
- Executing tasks based on natural language instructions
- Emphasizing token cost efficiency in operations
2. Dynamic Browser Operations
- Tasks where steps are not predetermined
- Situations requiring actions to change based on user input
- Prioritizing "reaching the goal" over fixed procedures
3. Simple Scripts and REPL
- Interactively operating the browser in Node.js REPL
- Direct invocation from Python scripts
- Writing simply with synchronous API
4. Zero Setup is Essential
- Minimizing dependencies in CI environments
- Keeping Docker container sizes small
- Quick prototyping
Example: AI Agent Task
// Vibium's strength: Simplicity and AI integration
const { browserSync } = require('vibium');
// REPL-friendly synchronous API
const vibe = browserSync.launch();
vibe.go('https://example.com');
// AI determines the next step
// "Find the login button and click it"
// → Automatically waits if the element is not found
// → If still not found, requests AI to reassess
Criteria for Differentiation
| Criteria | Playwright | Vibium |
|---|---|---|
| Executor | Human-written scripts | AI agents |
| Nature of Tests | Deterministic (fixed steps) | Dynamic (context-dependent judgments) |
| Need for Debugging | High (detailed information needed) | Low (results-focused) |
| Token Cost | Not a concern | Important |
| Cross-Browser | Essential | Chrome-centric is fine |
| Existing Assets | Playwright code available | Zero start |
Practical Suggestion: Use Both
In many projects, the ideal approach is to differentiate as follows:
- Fixed tests in CI/CD → Playwright (stability-focused)
- Exploratory testing and demos → Vibium (flexibility-focused)
- AI assistant integration → Vibium (token efficiency-focused)
Both are excellent tools, and the decision should be based on "which is more suitable" rather than "which is better."
{{< figure-desc src="/images/vibium-selenium-creator-browser-automation/tool-selection-crossroads.png" alt="Tool Selection Crossroads: Differentiating Playwright and Vibium" >}}
Conclusion
The reasons Vibium was created anew rather than using Playwright can be summarized in the following three points:
- AI Agent-First Design: Built-in MCP server, stdio communication, simple API
- Adoption of WebDriver BiDi: A standardized next-generation protocol
- Zero Setup Philosophy: Single binary, automatic browser downloads, immediate functionality
The evolution from Selenium to Playwright aimed at creating "better testing tools." In contrast, Vibium pioneers a new category as "infrastructure for AI to operate browsers."
As demonstrated by the token efficiency experiment, Vibium adopts a new approach where "AI agents visually understand the web." This contrasts with the traditional DOM manipulation-centric tools.
If you're interested in browser automation in the AI agent era, be sure to try out Vibium.
Top comments (0)