Introduction
Nowadays, AI agents are becoming increasingly powerful at assisting users in their daily web activities. However, we cannot yet allow them to act completely autonomously—there is still a risk of them clicking on the wrong elements, for instance.
In theory, these agents are capable of performing impressive tasks, provided they are guided step-by-step through the interface. The challenge here is not a lack of intelligence in the model, nor a shortage of web APIs to expose data to the agent. The core issue lies in the fact that the agent must currently "guess" its way through applications that were designed exclusively for humans.
This is precisely the problem that WebMCP is here to solve.
It is important to note that these are not intended to replace standard APIs as access points for an application. Instead, they provide a structured way for a web application to "instruct" the AI agent used in the browser on how to navigate its interface. This results in:
- Fewer misplaced clicks.
- Less trial-and-error when interacting with the UI.
When utilized to their full potential, WebMCPs could redefine the user experience in the coming years.
What is WEBMCP?
As you may have guessed, WebMCP is a browser-side "guide/standard" for exposing tools to an AI agent directly from an active web page.
During Google I/O, this new feature was introduced as a way for web applications to describe how a page functions—and what actions can be performed—to various AI agents. As a result, agents can execute these described actions faster, more efficiently, and with greater precision.
Unsurprisingly, the syntax for creating these descriptions relies on JavaScript functions. These functions take natural language descriptions as parameters, along with structured schemas directly exposed from the web page.
This is exactly where the power of WebMCP lies.
Today, while we have Playwright (designed for end-to-end testing of web applications) and Playwright MCP (which extends this model to LLMs), WebMCP sits right at the intersection. It allows a page to effectively communicate to the agent: "Here are the actions you can perform on this page, how it works, and how you can trigger them."
Difference between WebMCP | MCP | Automation
At first glance, it may seem like all these approaches are similar and perform the same functions. In reality, each serves a well defined purpose, and they are highly complementary. To better understand the distinctions between them, let’s look at things from an AI integration perspective.
Backend MCP
At its core, an MCP server acts as a specialized bridge between AI models and external systems. It functions by providing a direct connection to APIs, effectively removing the friction typically involved in integrating new tools. By establishing a structured framework for data access, it ensures that the information exposed to the agent is organized, predictable, and secure. This architecture makes it an ideal solution for executing data-driven actions, allowing AI agents to perform complex, precise tasks with the reliability required for production grade applications.
Browser Automation
Traditional automation concepts rely on mimicking user behavior to interact with the web. They reproduce human-like interactions, such as clicking buttons or scrolling through pages, and can access any element within the UI by targeting it directly in the DOM. A key characteristic of these methods is that they do not require a predefined data structure, as they operate by observing the visual state of the application. However, because they are so tightly coupled to the underlying HTML structure, the main flow can be quite brittle; even minor UI changes often cause the automation to break, requiring constant maintenance.
WebMCP
By design, it shares the current application context directly with the AI, moving beyond what is merely visible on the screen. It structures and describes UI actions, providing the agent with a clear map of what is possible rather than forcing it to guess. This results in a highly reliable, context-aware interaction model that significantly reduces errors. Furthermore, the architecture is designed with human-in-the-loop capabilities at its core, ensuring that users maintain oversight and can intervene whenever necessary for sensitive or complex workflows.
When to use what: rules to apply.
To ensure your AI agent performs at its best, it is important to select the right integration strategy based on your specific requirements:
- When an agent requires direct, structured access to data without interacting with a live web page, an MCP server is the ideal solution.
- When you need to interact with a web interface exactly like a human would—whether for automated testing, task execution, or full application usage—browser automation is the right choice.
- When a user is already actively navigating your web application and requires an agent that can interact with the entire page with high precision, WebMCP is the optimal path forward.
Different layers, different jobs.
How to use WEBMCP
It is worth noting that, for now, WebMCP is only available in Chrome starting from version 149 (with a strong drive to integrate this feature into the Baseline movement).
To enable this functionality in Chrome today, you need to use a feature flag:
- Navigate to chrome://flags in your browser.
- Search for and enable the enable-webmcp-testing flag.
- Relaunch your browser to apply the changes.
There are two primary ways to utilize the WebMCP APIs to configure the tools exposed by your page:
- The Declarative API: This method enables you to create WebMCP tools simply by adding annotations to your standard HTML forms, making the integration process more streamlined and semantic.
- The Imperative API: This approach allows you to define custom tools such as form submission, navigation, or other specialized actions directly using standard JavaScript functions.
Defining tools using the Imperative API
document.modelContext.registerTool({
name: 'search_product',
description: 'search product based on a user search input'
inputSchema: {
type: "object",
properties: {
search: { type: 'string' }
}
required: ['search']
}
outputSchema: { type: 'string', description: 'List of products corresponding to the search'}
execute: async ({ search }) => {
if(!search) return [];
const products = await (await(fetch(`${url}?search=${search}`)).json());
return JSON.stringify(products);
}
})
Defining Form tool with the Declarative API
As previously mentioned, the Declarative API allows you to create form-based tools using annotations. More specifically, it leverages standard HTML attributes on your forms to define and manage how the tool is created.
<form toolname="supportRequestTool"
tooldescription="Submit a request for support."
action="/submit">
<label for="firstName">First Name</label>
<input type=text name=firstName>
<label for="lastName">Last Name</label>
<input type=text name=lastName>
<select name="select" required
toolparamdescription="Determines what team this request is routed to.">
<option value="Customer happiness team">Return my purchase.</option>
<option value="Distribution team">Check where my package is.</option>
<option value="Website support team">Get help on the website.</option>
</select>
<button type=submit>Submit</button>
</form>
The declaration process relies on specific attributes to define your tools:
-
toolname: Used to assign a specific name to your tool, reflecting its purpose. -
tooldescription: Used to provide a clear explanation of the action the tool performs.
In practice, when the agent invokes the supportRequestTool, the browser brings the corresponding form to the foreground and fills out the fields while keeping the form visible to the user. Note that if you remove either the toolname or tooldescription attribute, the tool will be automatically unregistered and become inaccessible to the agent.
-
toolparamdescription: This optional attribute allows you to map a specific element to a detailed description. If this attribute is omitted, the agent will default to using the field's label as the description.
Security
As a reminder, LLMs process all text—including instructions and user data—as a single sequence of tokens. Consequently, once your application implements the WebMCP feature, it becomes susceptible to indirect prompt injection, where malicious instructions can be hidden within the content.
To mitigate these risks, here are several security recommendations for implementing WebMCP:
-
untrustedContentHint: Use this attribute to signal to the agent that the data originates from an external source, prompting it to exercise greater vigilance. -
readOnlyHint: Apply this attribute to tools that should not modify data states, ensuring they require explicit agent confirmation before execution. - Character budgets: Define strict character limits for your tool names and descriptions to prevent prompt-injection attempts via long, malicious strings.
-
exposedTo: Use this property to restrict access to your tools, limiting them to specific, trusted domains.
It is important to remember that by default, tools are only exposed to the AI agent and are not accessible to other websites or iframes.
Conclusion
WebMCP represents a fundamental shift in how AI agents interact with the web. We are moving beyond the era of "computer-use stunts" where agents must guess intent from messy interfaces toward a model of direct, structured cooperation between sites and assistants.
By exposing structured actions and context, WebMCP allows the browser to share what it already knows: the current page, the user's session, and the precise moment help is needed. While backend integrations remain the best fit for bulk data processing, WebMCP is the optimal path for sites that want to provide a reliable, context-aware experience during live navigation.
We have only scratched the surface. In our next article, we will move from theory to practice by diving deep into implementation join us as we explore how to seamlessly integrate WebMCP into Angular applications.
Top comments (0)