Introduction
Starting from Bun v1.3.12, a new experimental API called Bun.Webview was introduced. It enables simple browser automation and can partially replace tools like Playwright. Pretty exciting, so I gave it a try.
For macOS users, Bun.Webview can directly use the system’s native WebKit as the backend. On Windows and Linux, Chrome can be used as the backend via:
const view = new Bun.WebView({ backend: "chrome" });
According to the Bun documentation, Bun searches for Chrome in the following order:
- The
pathprovided inbackend: { type: "chrome", path: "..." } - The
BUN_CHROME_PATHenvironment variable -
$PATH(google-chrome-stable, google-chrome, chromium-browser, chromium, brave-browser, microsoft-edge, chrome) - Common installation directories
- Playwright cache (
~/Library/Caches/ms-playwrightor~/.cache/ms-playwright) forchrome-headless-shell
Integrating the Chrome Backend
In practice, I found that on Windows, no matter how I set BUN_CHROME_PATH, Bun often fails to correctly locate and launch Chrome—even when Chrome, Chromium, or Edge is installed.
I found a related issue in the Bun GitHub repository, which suggests this is still an early-stage limitation and will likely improve in future releases.
So I switched to another approach: manually launching Chrome with remote debugging enabled.
Chromium-based browsers support a remote debugging mode. For Chrome, it's available via chrome://inspect/#remote-debugging, and for Edge via edge://inspect/#remote-debugging.
In theory, enabling “Allow remote debugging” starts a server at 127.0.0.1:9222. However, on my laptop, although the server started, all expected endpoints returned 404—which was odd.
Eventually, I resolved it by manually launching Edge from the command line:
"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe" --remote-debugging-port=9222
Now the debugging server works correctly.
Next, we need to connect Bun.Webview to this browser instance. We can fetch the WebSocket debugging URL like this:
import axios from "axios";
async function getBrowserDebuggingURL(): Promise<string> {
try {
const response = await axios.get("http://localhost:9222/json/version");
return response.data.webSocketDebuggerUrl;
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
console.error(`Failed to get browser debugging URL: ${message}`);
throw new Error("Failed to get browser debugging URL");
}
}
export { getBrowserDebuggingURL };
Then pass it into Bun.Webview:
const view = new Bun.WebView({
backend: {
type: "chrome",
url: await getBrowserDebuggingURL(),
},
headless: true,
});
At this point, Bun should successfully connect to the Chrome backend.
Web Scraping & Formatting
The Bun.Webview API is similar to Playwright. We can extract page data like this:
const title = await view.evaluate(`
document.title
|| document.querySelector('meta[property="og:title"]')?.content
|| document.querySelector('meta[name="twitter:title"]')?.content
|| document.querySelector('h1')?.textContent?.trim()
|| document.querySelector('h2')?.textContent?.trim()
|| ""
`);
const html = await view.evaluate("document.documentElement.outerHTML");
const text = await view.evaluate("document.documentElement.innerText");
For processing, I built a custom parser using cheerio to clean up the DOM, removing unnecessary tags like script and style, keeping only the body, and then converting HTML into Markdown using @mizchi/readability.
This helps reduce token usage (and yes—save money 😄):
import { extract, toMarkdown } from "@mizchi/readability";
import * as cheerio from "cheerio";
function normalizeHtml(html: string) {
try {
const $ = cheerio.load(html);
return $("body").html() ?? "";
} catch (error) {
console.warn("Failed to normalize HTML:", error);
return html;
}
}
async function htmlParser(url: string, html: string): Promise<string> {
try {
const normalizedHtml = normalizeHtml(html);
const extracted = extract(normalizedHtml, {
charThreshold: 100,
});
if (!extracted?.root) {
console.warn(`No root element found: ${url}`);
return "";
}
const parsed = toMarkdown(extracted.root);
if (typeof parsed !== "string" || parsed.trim().length === 0) {
console.warn(`Markdown conversion empty: ${url}`);
return "";
}
return parsed;
} catch (error) {
const message = error instanceof Error ? error.message : String(error);
console.error(`HTML parsing failed: ${message}`);
return "";
}
}
export { htmlParser };
Unfortunately, Markdown conversion often fails. My guess is that the library is designed for readability-mode pages, and some sites don’t support that structure. In such cases, I fallback to innerText.
A quick test via curl:
curl -X POST http://localhost:9233/read \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-magic-access-token" \
-d '{"url":"https://www.gengyue.site"}'
Example output:
---
title: gengyue
url: https://www.gengyue.site
---
# Hi 👋!
...
User-Agent & Plugin System
Time to test real-world scraping: Zhihu, Xiaohongshu, and WeChat.
Interestingly, Zhihu and Xiaohongshu worked fine, but WeChat triggered anti-scraping protection.
So I tried a trick: spoofing the User-Agent.
Here are some presets:
export const UA_PRESETS = {
iPhone_WebView: "...MicroMessenger/8.0.49",
iPhone_Safari: "...Safari/604.1",
Android_WebView: "...MicroMessenger/8.0.49",
Android_Chrome: "...",
Desktop_Chrome: "...",
Desktop_Safari: "...",
Baidu_Spider: "...",
Googlebot: "...",
} as const;
Surprisingly, using an iPhone WebView UA bypassed WeChat restrictions.
We define a plugin:
const wechatPlugin = {
name: "wechat",
match(url: string): boolean {
const hostname = new URL(url).hostname;
return hostname.endsWith("mp.weixin.qq.com");
},
getUserAgent() {
return UA_PRESETS.iPhone_WebView;
},
};
Then apply it:
await view.cdp("Network.setUserAgentOverride", {
userAgent: matchedUA,
});
Deployment
I originally built this for integration with a QQ bot, so I wrapped it into a simple HTTP backend using Bun + Hono.
Deployment is straightforward: clone the repo, install dependencies, and run with pm2.
On Ubuntu, install Chromium:
sudo apt update
sudo apt install -y ca-certificates fonts-liberation fonts-noto-cjk
sudo apt install -y chromium-browser
To make Chromium behave more like a real browser, I also installed xvfb for non-headless mode.
A systemd service:
[Unit]
Description=Chromium Browser
After=network.target
[Service]
ExecStart=/usr/bin/xvfb-run --auto-servernum --server-args="-screen 0 1920x1080x24" \
/usr/bin/chromium-browser \
--no-sandbox \
--remote-debugging-port=9222 \
--user-data-dir=/tmp/chrome-debug-profile \
https://www.google.com
Restart=always
RestartSec=10
[Install]
WantedBy=default.target
Enable it:
systemctl --user daemon-reload
systemctl --user enable chromium.service
systemctl --user start chromium.service
Test again:
curl -X POST http://localhost:9233/read \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-magic-access-token" \
-d '{"url":"https://www.gengyue.site"}'
Everything works 🎉
Closing Thoughts
This is still a rough experimental setup. It’s not production-grade, but it works surprisingly well as a lightweight scraping backend and can already power bots or automation workflows.
Source code: https://github.com/gengyue2468/fig
Original article published by myself in Chinese(简体中文):
https://www.gengyue.site/blog/build-fig-via-bun-webview/
Top comments (0)