DEV Community

Custodia-Admin
Custodia-Admin

Posted on • Originally published at pagebolt.dev

WebMCP just shipped in Chrome 146. Here's what it means for screenshot APIs.

WebMCP just shipped in Chrome 146. Here's what it means for screenshot APIs.

Google shipped WebMCP in Chrome 146 Canary last week. It is a real thing, the coverage is accurate, and if you build anything that touches browser automation or AI agents, you should understand what it does.

Here is the short version: WebMCP is a proposed W3C standard, built jointly by Google and Microsoft, that lets websites expose structured callable tools directly to AI agents through a new browser API (navigator.modelContext). Instead of an agent scraping the DOM or simulating clicks, the website declares what it can do and the agent calls it directly. Think of it as MCP, but running inside the browser tab rather than on a separate server.

The early preview is Chrome 146 Canary only, behind a flag. Stable Chrome is months away. Meaningful site adoption is further out than that.


What WebMCP actually covers

The canonical example from the spec: a travel site exposes a searchFlights() tool. An AI agent calls it with structured parameters, gets structured data back, and books a flight without ever having to parse a DOM, click a button, or handle a flaky selector.

That is genuinely useful for the sites that implement it. Structured data, no scraping fragility, actions that run in the context of the user's existing auth session.

The key phrase: the sites that implement it.

WebMCP requires each website to explicitly opt in. The site wraps its client-side JavaScript into agent-readable tools. The spec recommends fewer than 50 tools per page. Right now, the number of sites with WebMCP support is effectively zero outside of demos. That will change, but the change takes years, not months. Adoption of web standards follows a long tail — robots.txt was proposed in 1994 and there are still major sites that don't implement it correctly.


What WebMCP does not cover

WebMCP is about structured interaction with cooperating websites. It is not a visual capture protocol. Here is what falls outside it entirely:

Video recording. No WebMCP tool returns an MP4 of a browser session with cursor effects, AI voice narration, and step-by-step annotations. This capability doesn't exist in the spec and isn't related to what the spec is trying to do. If you want a narrated video demo of a checkout flow to post in a PR comment, WebMCP is not the tool.

PDF generation. Rendering a URL or raw HTML to PDF is a visual pipeline — it captures the page as rendered, including fonts, images, and layout. WebMCP exposes data; it does not render documents.

OG image generation. Same category. Generating a social card image from a title and description is a visual task. Structured tool calling has nothing to offer here.

Sites that don't implement WebMCP. Today that's 99%+ of the web. Legacy sites, competitor sites, third-party tools you want to capture, any page that hasn't opted in — all of them are still just HTML and pixels. Screenshot-based capture is not going anywhere; it is the fallback for everything that doesn't cooperate, which is most things.

Archival and visual regression. A screenshot records what something looked like at a specific moment, at a specific viewport, on a specific device. A structured API call records what data a site chose to expose. These are different things for different purposes.


How PageBolt fits

PageBolt handles the visual side of web capture. Screenshots with 25+ device presets, cookie banner removal, ad blocking, and custom CSS injection. PDFs from URLs or raw HTML. OG images with templates or custom HTML. Multi-step browser sequences. And video recording with AI voice narration — you define browser steps, add notes to each one, and get back an MP4 with cursor effects, browser chrome, and a narrated voice that reads each step.

The CI/CD workflow: push a PR, a GitHub Action calls PageBolt, a narrated demo video of your changes posts to the PR comment automatically. The reviewer watches a 30-second video instead of reading a description of a diff. WebMCP has nothing to say about any of that.

The honest read: WebMCP and screenshot APIs are not substitutes. They are aimed at different problems. WebMCP is for structured agent interaction with sites that have opted in. Screenshot APIs are for visual capture, document rendering, and any site that hasn't and won't.


What to do now

If you are building a site and want to stay ahead: watch the WebMCP spec. It is worth implementing when stable Chrome support arrives, if your site has actions that benefit from structured agent access.

If you need screenshots, PDFs, OG images, or narrated video recordings today: PageBolt has a free tier with 100 requests per month. No credit card required.

The agentic web is being built in layers. WebMCP is one layer. Visual capture is a different one. Both will be part of the stack.

Top comments (0)