DEV Community

Cover image for Vibium: A Browser Automation Tool Optimized for AI Agents Over Playwright
tumf
tumf

Posted on • Originally published at blog.tumf.dev

Vibium: A Browser Automation Tool Optimized for AI Agents Over Playwright

Originally published on 2026-01-05
Original article (Japanese): Vibium: PlaywrightよりAIエージェントに最適化されたブラウザ自動化ツール

Jason Huggins, the creator of Selenium, has announced a new browser automation tool, Vibium, which comes approximately 20 years after Selenium. In this article, we will discuss Vibium's design philosophy, its differences from Playwright and Puppeteer, and why a new tool was necessary in the era of AI agents.

What is Vibium?

Vibium is a browser automation infrastructure designed for AI agents. All of the following features are integrated into a single binary of about 10MB:

  • Browser Lifecycle Management: Detection and launching of Chrome
  • WebDriver BiDi Proxy: Communication with the browser
  • MCP Server: Integration with LLM agents (like Claude Code)
  • Automatic Waiting: Polling until elements appear
  • Screenshots: PNG captures of the viewport

The standout feature is that integration with Claude Code can be completed with a single command:

claude mcp add vibium -- npx -y vibium
Enter fullscreen mode Exit fullscreen mode

With this single line, Claude Code can directly manipulate the browser. Chrome is automatically downloaded, eliminating the need for manual setup.

From Selenium to Vibium: 20 Years of Evolution

Jason Huggins created Selenium in 2004, paving the way for browser automation. Since then, the industry has evolved with Selenium WebDriver, Puppeteer, and Playwright, but what prompted Huggins to create a tool again?

Challenges with Existing Tools

Selenium WebDriver (2011 onwards) is mature but has the following issues:

  • Complex setup (driver management, browser version compatibility)
  • Boilerplate code required for element waiting
  • Lack of consideration for integration with AI agents

Playwright (2020 onwards) and Puppeteer (2018 onwards) addressed these issues using the Chrome DevTools Protocol (CDP). However:

  • CDP is a Chrome-specific protocol (not standardized)
  • Additional abstraction layers are needed to support multiple browsers
  • MCP server functionality needs to be implemented separately

The Choice of WebDriver BiDi

Vibium adopts the WebDriver BiDi protocol. This is a next-generation protocol being developed as a W3C standard, combining the best aspects of Selenium WebDriver and CDP:

  • Bidirectional Communication: Real-time reception of events from the browser
  • Standardization: Works across Chrome, Firefox, and Safari (by specification)
  • Low-Level Access: Direct access to network, console, and DOM

Huggins stated in an interview on the TestGuild Podcast:

WebDriver BiDi is a protocol that has learned all the lessons from CDP that made Puppeteer and Playwright great.

Why Create Vibium Instead of Using Playwright?

So, why create a new tool instead of using Playwright? The primary reason is differences in design philosophy.

1. AI Agent-First Design

Vibium is optimized for AI agents:

  • Built-in MCP Server: Instant integration with Claude Code, Gemini, and local LLMs
  • stdio Communication: Conforms to the standard communication protocol for LLM agents
  • Simple API: Minimal methods that are easy for AI to understand

In contrast, Playwright is designed for human test engineers, requiring separate implementation for MCP integration.

2. Zero Setup Philosophy

The design goal of Vibium is to be "invisible binary":

// Just running npm install vibium makes this work
const { browserSync } = require('vibium')

const vibe = browserSync.launch()
vibe.go('https://example.com')
vibe.find('a').click()
vibe.quit()
Enter fullscreen mode Exit fullscreen mode

Downloading the browser, placing drivers, and setting paths are all automated. This emphasizes a developer experience that prioritizes "getting it running first" in the AI era.

3. Simplicity of a Single Binary

Vibium achieves everything with a single Go binary of about 10MB:

┌─────────────────────────────────────────────────────────────┐
│                         LLM / Agent                         │
│          (Claude Code, Codex, Gemini, Local Models)         │
└─────────────────────────────────────────────────────────────┘
                      ▲
                      │ MCP Protocol (stdio)
                      ▼
           ┌─────────────────────┐
           │   Vibium Clicker    │
           │                     │
           │  ┌───────────────┐  │
           │  │  MCP Server   │  │
           │  └───────▲───────┘  │         ┌──────────────────┐
           │          │          │         │                  │
           │  ┌───────▼───────┐  │WebSocket│                  │
           │  │  BiDi Proxy   │  │◄───────►│  Chrome Browser  │
           │  └───────────────┘  │  BiDi   │                  │
           │                     │         │                  │
           └─────────────────────┘         └──────────────────┘
Enter fullscreen mode Exit fullscreen mode

Playwright consists of multiple npm packages and browser binaries, leading to complex dependencies. Vibium chose simplicity.

Practical Use Cases for Vibium

Using as an AI Agent

After integration with Claude Code, you can issue commands in natural language:

"Go to example.com and click the first link"
Enter fullscreen mode Exit fullscreen mode

Claude Code will automatically invoke the following MCP tools:

Tool Description
browser_launch Launch the browser (visible by default)
browser_navigate Navigate to a URL
browser_find Find elements using CSS selectors
browser_click Click an element
browser_type Input text
browser_screenshot Take a screenshot
browser_quit Close the browser

Using as a JavaScript Library

You can also use it directly as an npm package:

Synchronous API (REPL Friendly)

const fs = require('fs')
const { browserSync } = require('vibium')

const vibe = browserSync.launch()
vibe.go('https://example.com')

const png = vibe.screenshot()
fs.writeFileSync('screenshot.png', png)

const link = vibe.find('a')
link.click()
vibe.quit()
Enter fullscreen mode Exit fullscreen mode

Asynchronous API

const fs = await import('fs/promises')
const { browser } = await import('vibium')

const vibe = await browser.launch()
await vibe.go('https://example.com')

const png = await vibe.screenshot()
await fs.writeFile('screenshot.png', png)

const link = await vibe.find('a')
await link.click()
await vibe.quit()
Enter fullscreen mode Exit fullscreen mode

Automatic Waiting Mechanism

Vibium automatically waits until elements are displayed:

// This will automatically poll until the element appears
const button = vibe.find('button.submit')
button.click()
Enter fullscreen mode Exit fullscreen mode

In Playwright and Selenium, explicit wait code was necessary, but Vibium waits intelligently by default, simplifying the code.

Platform Support

Vibium supports the following platforms:

Platform Architecture Status
Linux x64 ✅ Supported
macOS x64 (Intel) ✅ Supported
macOS arm64 (Apple Silicon) ✅ Supported
Windows x64 ✅ Supported

During installation, the appropriate binary for the platform is automatically selected, and Chrome's cache is stored in the following locations:

  • Linux: ~/.cache/vibium/
  • macOS: ~/Library/Caches/vibium/
  • Windows: %LOCALAPPDATA%\vibium\

Roadmap: Plans Beyond v2

The current v1 focuses on "integration of AI and browsers," but the v2 roadmap outlines the following features that are released or planned:

  • Python Client: Released in December 2025 (pip install vibium)
  • Java Client: Planned for enterprise use
  • Cortex: Memory and navigation layer
  • Retina: Recording extension
  • Video Recording: Capturing test execution
  • AI Locator: Smarter element searching

All of these are extensions of the vision of "AI agents handling browsers more naturally."

Why Vibium Now?

The reason Jason Huggins created a new tool after about 20 years since Selenium is due to the paradigm shift brought about by the emergence of AI agents.

From Testing Tools to AI Tools

Selenium was created for test automation. The same goes for Playwright and Puppeteer. However, AI agents like Claude Code, Gemini, and ChatGPT use different approaches:

  • Instead of executing human-written test scripts, AI makes dynamic judgments
  • Instead of fixed selectors, elements are identified based on visual information and context
  • Browser operations are part of task achievement, not the ultimate goal

A tool optimized for this new usage was needed. That is Vibium.

The Rise of the MCP Ecosystem

The Model Context Protocol (MCP) is an integration standard for AI agents and tools proposed by Anthropic. Vibium was designed from the ground up as an MCP server, allowing for instant integration with Claude Code, Cursor, and other MCP-compliant AI editors.

This represents a shift in thinking from "creating a tool and then figuring out how to integrate it" to "designing a tool with integration in mind."

A Return to Simplicity

Over 20 years, browser automation tools have become feature-rich but also complex. Vibium regains simplicity by focusing on "only the truly necessary features":

  • Single binary
  • Zero setup
  • Minimal API
  • Automatic waiting

This philosophy aligns with the recent trend of "reducing complexity" seen in tools like exo and dotenvx.

Token Efficiency Comparison Experiment Between Playwright and Vibium

For AI agents, the important factor is the amount of token consumption required to achieve a task. We executed the same task with both tools and measured token efficiency.

Experimental Conditions

Task: "Access example.com and take a screenshot of the page"

We used OpenCode's token counter to measure actual token consumption.

Measured Results: Token Consumption

In the Case of Vibium

# Executed tool calls
1. browser_launch         # Launch the browser
2. browser_navigate       # Navigate to https://example.com
3. browser_screenshot     # Take a screenshot
4. browser_quit          # Close the browser
Enter fullscreen mode Exit fullscreen mode

Consumed Tokens: 240 tokens

Examples of responses from each tool:

  • browser_launch: "Browser launched (headless: false)" (7 words)
  • browser_navigate: "Navigated to https://example.com/" (4 words)
  • browser_screenshot: "Screenshot saved to /path/to/file.png" (5 words)
  • browser_quit: "Browser session closed" (3 words)

In the Case of Playwright

# Executed tool calls
1. browser_navigate       # Navigate to https://example.com
2. browser_take_screenshot # Take a screenshot
Enter fullscreen mode Exit fullscreen mode

Consumed Tokens: 2,061 tokens

Examples of responses from each tool:

  • browser_navigate:
    • Executed code snippet: await page.goto('https://example.com');
    • Page information (URL, title)
    • Entire accessibility tree (in YAML format, hundreds to thousands of words)
  • browser_take_screenshot:
    • Executed code snippet
    • Screenshot image data (consumed tokens via Vision API)

Surprising Result: Vibium is 8.6 Times More Efficient

Playwright: 2,061 tokens (2 tool calls)
Vibium:       240 tokens (4 tool calls)

Efficiency Ratio: Vibium achieves the same task at about 1/8.6 the tokens of Playwright
Enter fullscreen mode Exit fullscreen mode

{{< figure-desc src="/images/vibium-selenium-creator-browser-automation/token-comparison-race.png" alt="Token Efficiency Comparison: Playwright vs. Vibium Race" >}}

Why Such a Difference?

Reasons Playwright Consumes More Tokens:

  1. Automatic Sending of Accessibility Tree: browser_navigate returns the entire DOM structure of the page in YAML format every time.
   - generic [ref=e2]:
     - heading "Example Domain" [level=1] [ref=e3]
     - paragraph [ref=e4]: This domain is for use...
     - paragraph [ref=e5]:
       - link "Learn more" [ref=e6]:
         - /url: https://iana.org/domains/example
Enter fullscreen mode Exit fullscreen mode

This data alone consumes hundreds of tokens.

  1. Displaying Executed Code: Displays actual Playwright code for debugging purposes.
   await page.goto('https://example.com');
   await page.screenshot({...});
Enter fullscreen mode Exit fullscreen mode
  1. Image Data: Screenshots are returned as images and processed by the Vision API.

Reasons Vibium is Efficient:

  1. Minimal Responses: Only success/failure messages (averaging fewer than 5 words).
  2. Images Stored Locally: Screenshots return only the file path (no token consumption).
  3. No Code Display: Only simple status messages are returned.

Impact on Complex Tasks

Even for a simple page like example.com, an 8.6x difference emerges. In actual web applications (e.g., dashboards, admin panels), this difference will widen even further:

Page Complexity Playwright Consumption Vibium Consumption Ratio
Simple (example.com) 2,061 tokens 240 tokens 8.6x
Medium (blog post) Estimated 5,000–10,000 240–300 16–33x
Complex (admin panel) Estimated 10,000–50,000 240–400 25–125x

Conclusion:

  • Vibium is designed to return "only the necessary information for AI."
  • Playwright returns "information for humans to debug."
  • Vibium's approach is overwhelmingly advantageous for reducing operational costs for AI agents.

These experimental results clearly illustrate why Vibium was created separately from Playwright. In the era of AI agents, "how efficient" is more important than "how feature-rich."

Differentiating Between Playwright and Vibium

Both tools are excellent, but the optimal choice varies depending on the use case.

Cases Where Playwright is More Suitable

Playwright is better suited for scenarios such as:

1. Human-Written E2E Tests

  • Fixed test scripts executed in CI/CD pipelines
  • Existing Playwright test suites
  • Need for debugging information (accessibility tree, executed code)

2. Cross-Browser Testing

  • Running the same code across Chrome, Firefox, and Safari
  • Verifying differences in behavior between browsers
  • Emulating mobile browsers

3. Advanced DOM Manipulation

  • Complex operations with Shadow DOM or iframes
  • Intercepting and mocking network requests
  • Fine control over browser contexts

4. Integration with Existing Ecosystems

  • Official tools like Playwright Test Runner, Playwright Inspector
  • Benefits of TypeScript type definitions (auto-completion, type checking)
  • Official support and community from Microsoft

Example: Complex E2E Test Scenario

// Playwright's strength: Detailed control
import { test, expect } from '@playwright/test';

test('Complex payment flow', async ({ page, context }) => {
  // Mocking network requests
  await page.route('**/api/payment', route => {
    route.fulfill({ status: 200, body: '{"success": true}' });
  });

  // Operations across multiple tabs
  const [newPage] = await Promise.all([
    context.waitForEvent('page'),
    page.click('a[target="_blank"]')
  ]);

  // Manipulating elements within Shadow DOM
  const shadowHost = await page.locator('custom-element');
  const shadowButton = await shadowHost.evaluateHandle(
    el => el.shadowRoot.querySelector('button')
  );
});
Enter fullscreen mode Exit fullscreen mode

Cases Where Vibium is More Suitable

Conversely, Vibium is optimal for scenarios such as:

1. Automation by AI Agents

  • LLMs (Claude, Gemini, etc.) operating the browser
  • Executing tasks based on natural language instructions
  • Emphasizing token cost efficiency in operations

2. Dynamic Browser Operations

  • Tasks where steps are not predetermined
  • Situations requiring actions to change based on user input
  • Prioritizing "reaching the goal" over fixed procedures

3. Simple Scripts and REPL

  • Interactively operating the browser in Node.js REPL
  • Direct invocation from Python scripts
  • Writing simply with synchronous API

4. Zero Setup is Essential

  • Minimizing dependencies in CI environments
  • Keeping Docker container sizes small
  • Quick prototyping

Example: AI Agent Task

// Vibium's strength: Simplicity and AI integration
const { browserSync } = require('vibium');

// REPL-friendly synchronous API
const vibe = browserSync.launch();
vibe.go('https://example.com');

// AI determines the next step
// "Find the login button and click it"
// → Automatically waits if the element is not found
// → If still not found, requests AI to reassess
Enter fullscreen mode Exit fullscreen mode

Criteria for Differentiation

Criteria Playwright Vibium
Executor Human-written scripts AI agents
Nature of Tests Deterministic (fixed steps) Dynamic (context-dependent judgments)
Need for Debugging High (detailed information needed) Low (results-focused)
Token Cost Not a concern Important
Cross-Browser Essential Chrome-centric is fine
Existing Assets Playwright code available Zero start

Practical Suggestion: Use Both

In many projects, the ideal approach is to differentiate as follows:

  • Fixed tests in CI/CD → Playwright (stability-focused)
  • Exploratory testing and demos → Vibium (flexibility-focused)
  • AI assistant integration → Vibium (token efficiency-focused)

Both are excellent tools, and the decision should be based on "which is more suitable" rather than "which is better."

{{< figure-desc src="/images/vibium-selenium-creator-browser-automation/tool-selection-crossroads.png" alt="Tool Selection Crossroads: Differentiating Playwright and Vibium" >}}

Conclusion

The reasons Vibium was created anew rather than using Playwright can be summarized in the following three points:

  1. AI Agent-First Design: Built-in MCP server, stdio communication, simple API
  2. Adoption of WebDriver BiDi: A standardized next-generation protocol
  3. Zero Setup Philosophy: Single binary, automatic browser downloads, immediate functionality

The evolution from Selenium to Playwright aimed at creating "better testing tools." In contrast, Vibium pioneers a new category as "infrastructure for AI to operate browsers."

As demonstrated by the token efficiency experiment, Vibium adopts a new approach where "AI agents visually understand the web." This contrasts with the traditional DOM manipulation-centric tools.

If you're interested in browser automation in the AI agent era, be sure to try out Vibium.

Reference Links

Top comments (0)