DEV Community

Cover image for WebMCP: A Browser-Native Execution Model for AI Agents
Astrodevil
Astrodevil

Posted on • Originally published at insforge.dev

WebMCP: A Browser-Native Execution Model for AI Agents

On February 13, Google announced the Early Preview of WebMCP, introducing a browser-native way for AI agents to interact with websites. To understand why this matters, consider how agents operate today.

WebMCP

AI agents interpret interfaces by parsing the DOM, inspecting accessibility trees, analyzing rendered pages, and then simulating clicks or inputs. Each action depends on inference over presentation layers. This increases token usage, adds latency, and often leads to brittle execution.

The limitation is structural. The web was designed for people navigating interfaces. Agents, however, require clearly defined capabilities they can invoke programmatically.

WebMCP addresses this gap by allowing websites to register structured JavaScript functions that agents can call directly within the browser runtime. These tools execute under existing session state and same-origin constraints, exposing only what the site explicitly defines.

The result is a more direct model of interaction that aligns frontend systems with the deterministic tool patterns already established in backend MCP integrations.

In this article, we examine WebMCP’s architecture, how it compares to traditional MCP, and what it signals for agent-driven web infrastructure.

Model Context Protocol (MCP): Current State and Browser Constraints

Model Context Protocol (MCP) established a structured model for how AI agents interact with external systems. Tools are defined with clear schemas, agents invoke them with structured inputs, and responses return in predictable formats. This ensures deterministic execution rather than relying on free-form reasoning.

The architecture is typically client–server. An agent connects to an MCP server that exposes tools wrapping APIs, databases, or internal services. This model fits naturally in backend environments where execution happens outside the browser.

MCP

Web applications operate under different assumptions. User identity, session state, and much of the application logic live inside the browser. Authentication flows depend on cookies and federated login systems tied to that session. An external MCP server does not automatically inherit this context, which complicates authorization and state management.

Because of this separation, agents interacting with web applications often end up controlling the interface itself instead of invoking structured capabilities.

WebMCP Technical Overview

WebMCP is a browser-native API that allows websites to expose structured, agent-callable tools directly within the page runtime. It adapts the conceptual model of Model Context Protocol schema-defined tools invoked by agents, but implements it specifically for client-side execution inside the browser.

WebMCP is in early preview

At its core, WebMCP introduces a new browser surface:

navigator.modelContext
Enter fullscreen mode Exit fullscreen mode

This interface allows a web page to register capabilities that AI agents can discover and invoke. Each tool consists of:

  • A name
  • A description
  • An input schema (structured definition of parameters)
  • An execution handler

Unlike traditional MCP, WebMCP does not rely on a separate JSON-RPC server. The web page itself becomes the tool provider. Execution occurs in the same JavaScript environment as the application logic.

The formal specification is being developed under the W3C Web Machine Learning Community Group and is available at: https://webmachinelearning.github.io/webmcp/

Tool Exposure and Execution Model

WebMCP defines how capabilities are exposed and how agents invoke them inside the browser runtime. It supports two exposure

1. Declarative API (HTML-based)

Forms can be annotated with metadata that enables automatic tool registration. The browser derives the tool definition from form inputs, enabling simple actions to be agent-callable without additional JavaScript.

2. Imperative API (JavaScript-based)

Developers can programmatically register tools using:

navigator.modelContext.registerTool({...})
Enter fullscreen mode Exit fullscreen mode

This method provides full control over input schemas and execution logic, enabling dynamic, state-aware, or complex capabilities.

When an agent loads a WebMCP-enabled page:

  1. The browser exposes the registered tools.
  2. The agent inspects available capabilities.
  3. The agent invokes a selected tool with structured parameters.
  4. The handler executes inside the page runtime.
  5. A structured response is returned to the agent.

The defining characteristic of WebMCP is locality. Tool execution happens inside the browser session, inheriting:

  • Current authentication state
  • Session cookies
  • Same-origin boundaries

This removes the need for an external transport layer or a separate authorization stack.

WebMCP focuses specifically on schema-defined tool invocation optimized for browser environments, adapting MCP concepts to client-side execution.

Core Architectural Components

WebMCP introduces a browser-mediated architecture that connects agents directly to application capabilities without external transport layers.

Below is the full execution path.

WebMCP Architecture

WebMCP defines a browser-mediated execution model that connects agents directly to declared application capabilities.

  • AI Agent: The agent discovers registered tools, selects one based on user intent, sends structured input that conforms to the declared schema, then receives structured output. Interaction occurs through explicit capabilities rather than direct interface manipulation.
  • Browser Runtime Control Plane: The browser exposes navigator.modelContext, which maintains the tool registry, validates inputs against schemas, routes invocations to the appropriate handler, enforces same origin boundaries, and executes handlers within the active page context. This removes the need for an external transport layer or separate MCP server.
  • Tool Layer Capability Surface: Each tool defines a named capability, its expected input schema, and an execution handler. These tools form a contract between the application and the agent. Only declared capabilities are accessible.
  • Application Execution Layer: Handlers run in the same JavaScript environment as the web application. They can access session cookies, rely on existing authentication state, call internal services, and update application state. Execution remains within the active browser session.

The overall flow is direct. The page loads and registers tools. The agent inspects available capabilities and invokes one with structured input. The browser validates the request, executes the handler inside the page runtime, and returns structured output to the agent.

Comparison with Traditional MCP and Browser Automation

WebMCP sits between backend MCP servers and browser automation frameworks. The differences become clearer when compared across architecture, execution model, and capability exposure.

Capability Traditional MCP Browser Automation (Selenium / Playwright) WebMCP
Execution Location External server Inside browser via UI control Inside browser via declared tools
Transport Layer JSON-RPC or similar WebDriver protocol Browser-native API
Interaction Surface Structured tools DOM elements and selectors Schema-defined tools
Session Inheritance Requires coordination Native to browser session Native to browser session
Authentication Handling Separate from browser Uses active browser state Uses active browser state
Dependency on UI Layout None High None
Token Overhead Low High due to DOM inspection Low due to structured schemas
Determinism High Medium, selector-dependent High

Traditional MCP provides structured invocation but operates outside the browser context. Browser automation preserves session state but relies on interface manipulation. WebMCP combines structured schemas with in-browser execution, exposing declared capabilities without depending on layout or selectors.

Security Model and Execution Boundaries

WebMCP narrows the interaction surface between agents and web applications by constraining execution to explicitly declared tools.

  • Explicit Capability Exposure: Only registered tools are visible to the agent. The agent cannot arbitrarily traverse the DOM or trigger undocumented behaviors unless those capabilities are intentionally exposed.
  • Same Origin Enforcement: Tool execution occurs under the browser’s same-origin policy. A page can expose capabilities only within its own origin boundary. Cross-site execution is not permitted by default.
  • Session Inheritance: Tools execute within the active browser session. They inherit authentication state, cookies, and user context already established in the page. There is no additional credential exchange layer introduced by WebMCP itself.
  • Controlled Invocation Surface: Input parameters must conform to declared schemas. The browser validates structured inputs before routing execution, limiting malformed or unexpected calls.

WebMCP reduces the attack surface compared to interface-level automation by limiting what the agent can access to declared functions. It does not eliminate broader risks, such as prompt injection within tool logic, but it constrains execution to defined capability boundaries enforced by the browser runtime.

Chrome Early Preview and Built-In AI Strategy

WebMCP is available through Chrome’s Early Preview Program and can be enabled in experimental Chromium builds. The preview allows developers to test tool registration via navigator.modelContext and evaluate structured agent interaction inside the browser.

WebMCP complements Chrome’s Built-In AI APIs, which support on-device model execution. While Built-In AI enables local inference, WebMCP defines how agents interface with web applications through declared tools.

Together, these initiatives position the browser as both an AI execution environment and a structured capability surface for external agents.

InsForge and Model Context Protocol

InsForge is an open-source backend-as-a-service platform built for AI-assisted development. It provides core backend infrastructure, including database management, authentication, storage, serverless functions, and AI integrations. Its APIs are structured to support deterministic agent execution.

InsForge

At its core, InsForge exposes a Model Context Protocol server that allows AI agents to interact with backend resources through schema-defined tools. Agents can inspect database schemas, execute queries, manage authentication, perform storage operations, and invoke backend functions using structured inputs and predictable responses.

This MCP-based design enables agents to complete backend workflows with clearer execution paths and reduced ambiguity. By exposing explicit capability contracts, InsForge supports reliable multi-step operations without relying on interface-level automation.

Summary

WebMCP gives AI agents a defined way to interact with web apps inside the browser. Instead of scraping the DOM or simulating clicks, agents call explicitly declared functions with typed schemas.

Those functions execute within the user’s active session and respect normal browser security boundaries. This makes agent behavior more predictable and easier to reason about.

InsForge leverages Model Context Protocol (MCP) to provide structured, schema-defined backend capabilities for AI agents, enabling deterministic execution and more reliable infrastructure for AI-native applications.

Try InsForge

Quickstart guide here

Early Preview of WebMCP

Top comments (2)

Collapse
 
axrisi profile image
Nikoloz Turazashvili (@axrisi)
Collapse
 
astrodevil profile image
Astrodevil

Cool, I'll check yours too!