Daniel Balcarek

Posted on Jul 1

Build a Minimal WebMCP Agent with Playwright and Gemini

#agents #ai #tutorial #webdev

Bypasses extension limits for stronger models

WebMCP lets a web page expose tools that AI agents can discover and execute inside the browser. That sounds simple until you want to test those tools with a model outside the Model Context Tool Inspector Chrome extension.

A while ago, I built a small puzzle game that exposes WebMCP tools. I tested and debugged those tools using the Model Context Tool Inspector, which is great for quick experiments, the limitation is that it only gives access to a small set of lightweight Gemini models and I wanted to test the same WebMCP tools with stronger ones.

My first idea was to build another Chrome extension, but that felt like overkill. WebMCP tools need a real browser context: the browser must open the page directly, discover the tools and execute them inside the page. So instead of building another extension, I looked for something that could simply open Chrome and control the page.

And that is where Playwright fits nicely.

So in this article, I will show how to create a simple agent that wires up the Gemini API with WebMCP through Playwright. Gemini requests a tool call and Playwright executes the matching WebMCP tool inside a real Chrome browser.

Prerequisites
Prepare the Solution
Check if modelContext Exists
Read Exposed WebMCP Tools
Execute a WebMCP Tool
Create a Minimal Agent Proof of Concept
- Gen AI SDK
- Agent Creation
What This Proves
- Repositories
Summary

Prerequisites

For this example, you need:

Node.js 20+
Google Chrome
A Gemini API key

Prepare the Solution

The first thing we need to do is enable WebMCP in Chrome. WebMCP is still experimental, so for local development it must be enabled through a Chrome flag:

Open Chrome and navigate to chrome://flags/#enable-webmcp-testing
Set the flag to Enabled.
Relaunch Chrome to apply the changes.

After that, we can create a small Node.js project:

mkdir custom-agent
cd custom-agent
npm init -y

Next, install Playwright as a development dependency. I also use tsx to run TypeScript files directly and dotenv to read environment variables from a .env file:

npm install -D playwright tsx dotenv typescript @types/node

This gives us everything we need to run TypeScript code, open Chrome and access environment variables.

Because the agent will also call an AI model, we need to install the Gemini SDK. For this example, I use @google/genai:

npm install @google/genai

The last preparation step is to add a script to package.json:

{
  "scripts": {
    "agent": "tsx agent.ts"
  }
}

This command will run the agent.ts file, where we will put the main logic.

Check if modelContext Exists

Now that the project is prepared, let’s create the first version of agent.ts. At this stage, I only want to check whether modelContext is available inside the browser page.

import { chromium } from "playwright";

const gameUrl = process.argv[2] ?? "http://localhost:5173";

async function main() {
  const context = await chromium.launchPersistentContext(
    "./.chrome-agent-profile",
    {
      channel: "chrome",
      headless: false,
      args: ["--enable-experimental-web-platform-features"],
    },
  );

  const page = await context.newPage();

  await page.goto(gameUrl, { waitUntil: "networkidle" });

  const result = await page.evaluate(() => ({
    userAgent: navigator.userAgent,
    hasNavigatorModelContext: "modelContext" in navigator,
    hasDocumentModelContext: "modelContext" in document,
  }));

  console.log(result);
}

main().catch((error) => {
  console.error(error);
  process.exit(1);
});

This code opens Chrome, navigates to the game page, and checks if modelContext exists on navigator or document.

One important detail is that I am not using the bundled Chromium from Playwright. Instead, I am opening the real Chrome installed on my machine by using launchPersistentContext with channel: "chrome". This matters because WebMCP is still experimental. In my case, the isolated Chromium browser did not discover the WebMCP tools correctly, while real Chrome with the enabled flag worked.

Note: Because launchPersistentContext creates a local Chrome profile, do not forget to add this folder to .gitignore:
.chrome-agent-profile/
The profile can contain local browser data such as cache, cookies, and other Chrome state. It should not be committed to the repository.

Read Exposed WebMCP Tools

The first check only tells us whether modelContext exists. The next step is to read the tools exposed by the page.

We can do that by calling modelContext.getTools() inside the page.evaluate() method:

  const result = await page.evaluate(async () => {
    const modelContext = navigator.modelContext;

    if (!modelContext) {
      return {
        hasModelContext: false,
        tools: [],
      };
    }

    const tools = await modelContext.getTools();

    return {
      hasModelContext: true,
      tools: tools.map((tool) => ({
        name: tool.name,
        description: tool.description,
        inputSchema: tool.inputSchema,
        origin: tool.origin,
      })),
    };
  });

This code returns the list of tools exposed by the current page. For each tool, I print basic metadata such as the name, description, input schema and origin.

At this point, it is useful to print the result as formatted JSON:

console.log(JSON.stringify(result, null, 2));

This makes it easier to verify that Chrome discovered the WebMCP tools correctly.

Execute a WebMCP Tool

Reading tools is useful, but the real goal is to execute them. In my game, one of the exposed tools is called getGameState. It returns the current state of the puzzle, including the map, remaining moves and collected wood. For the first test, I can find this tool by name and execute it directly:

const gameState = await page.evaluate(async () => {
    const modelContext = (navigator as any).modelContext;

  if (!modelContext) {
    throw new Error("modelContext is empty");
  }

  const tools = await modelContext.getTools();

  const getGameStateTool = tools.find((tool: any) => tool.name === "getGameState");

  if (!getGameStateTool) {
    throw new Error("getGameState tool not found");
  }

  return await modelContext.executeTool(getGameStateTool, "{}");
});

This proves that Playwright can open the page, access modelContext, find a WebMCP tool and execute it inside the browser context.

However, hardcoding the tool execution like this is not ideal. The agent should be able to execute any tool by name, so I extracted the logic into a reusable helper function:

import type { Page } from "playwright";

export async function executeWebMcpTool<T>(
  page: Page,
  toolName: string,
  args: unknown,
): Promise<T> {
  return await page.evaluate(
    async ({ toolName, args }) => {
    const modelContext =
        (document as any).modelContext ?? (navigator as any).modelContext;
    if (!modelContext) {
        throw new Error("Model Context API is not available");
    }

    const tools = await modelContext.getTools();

    const tool = tools.find((tool: any) => tool.name === toolName);

    if (!tool) {
        throw new Error(`Tool not found: ${toolName}`);
    }

    const result = await modelContext.executeTool(
        tool,
        JSON.stringify(args),
    );

    return result;
    },
    { toolName, args },
);
}

This function receives a Playwright Page, the tool name and arguments. It then evaluates code inside the browser page, finds the matching WebMCP tool, serializes the arguments and executes the tool. With this helper, the Node.js code does not need to know the internal implementation of the page. It only needs the tool name and arguments.

That is the important bridge: Playwright controls Chrome, Chrome sees the WebMCP tools and our Node.js code can execute them.

Note: In my setup, navigator.modelContext worked reliably, but WebMCP is still experimental, so in the reusable helper I check both document.modelContext and navigator.modelContext.

Create a Minimal Agent Proof of Concept

Now we can connect the WebMCP tool execution with an AI model.

For this article, I want to keep the example small. The goal is not to build the full game-playing agent here. The goal is to prove the basic flow:

Send tool definitions to Gemini.
Let Gemini decide which tool it wants to call.
Execute that tool through WebMCP.
Print the result.

The full agent can build on top of this by sending the tool result back to the model and continuing the loop.

Gen AI SDK

For this example, I use the @google/genai package. We already installed it earlier, so now we can create a small service for communicating with Gemini.

Create a new file called genai.service.ts:

import "dotenv/config";
import {
  GoogleGenAI,
  type Content,
  type GenerateContentConfig,
  type GenerateContentResponse,
} from "@google/genai";

export type GenerateRequest = {
  contents: Content[];
  config?: GenerateContentConfig;
};

export class GenaiService {
  private readonly ai: GoogleGenAI;
  private readonly model: string;

  constructor(model: string = "gemini-2.5-flash-lite") {
    this.model = model;
    const apiKey = process.env.GEMINI_API_KEY;
    if (!apiKey) {
      throw new Error("Missing GEMINI_API_KEY in .env");
    }

    this.ai = new GoogleGenAI({ apiKey });
  }

  public async generateContentAsync(
    request: GenerateRequest,
  ): Promise<GenerateContentResponse> {
    const response = await this.ai.models.generateContent({
      model: this.model,
      contents: request.contents,
      config: request.config,
    });

    return response;
  }
}

The implementation is straightforward. The service reads GEMINI_API_KEY from the .env file, creates an instance of GoogleGenAI and exposes one method called generateContentAsync.

I also created a small GenerateRequest type. The reason is simple: I only want to expose the properties that this example needs. The original SDK request type contains more options and for this proof of concept that would make the code harder to read.

You also need to create a .env file:

GEMINI_API_KEY=your-api-key

Do not forget to add the .env file to .gitignore, so you do not commit your API key to the repository.

Agent Creation

Now we can put everything together in agent.ts.

In this example, the tool definition is hardcoded. That keeps the proof of concept simple and easier to understand. In a more generic version, we could read WebMCP tools from the page and map them into Gemini tool declarations automatically. But that would add more code and I want this article to stay focused on the core idea.

import { chromium, type Page } from "playwright";
import { GenaiService } from "./genai.service";
import {
  FunctionCallingConfigMode,
  type Content,
  type Tool,
} from "@google/genai";

export const tools: Tool[] = [
  {
    functionDeclarations: [
      {
        name: "getGameState",
        description:
          "Get the current board. visibleMap rows run top-to-bottom; each character is x=0 onward. P=player, .=land, W=tree, ~=water, B=bridge, R=rock, and G=goal.",
        responseJsonSchema: {
          type: "object",
          properties: {
            remainingMoves: { type: "number" },
            wood: { type: "number" },
            visibleMap: {
              type: "array",
              items: { type: "string" },
            },
          },
          required: ["remainingMoves", "wood", "visibleMap"],
        },
      },
    ],
  },
];

const gameUrl = process.argv[2] ?? "https://tower-before-dusk.gramli.workers.dev";

async function main() {
  const aiService = new GenaiService();
  const context = await chromium.launchPersistentContext(
    "./.chrome-agent-profile",
    {
      channel: "chrome",
      headless: false,
      args: ["--enable-experimental-web-platform-features"],
    },
  );

  const page = await context.newPage();
  await page.goto(gameUrl, { waitUntil: "networkidle" });

  const contents: Content[] = [
    {
      role: "user",
      parts: [
        {
          text: "Inspect the current Tower Before Dusk game state.",
        },
      ],
    },
  ];

  const response = await aiService.generateContentAsync({
    contents,
    config: {
      tools,
      toolConfig: {
        functionCallingConfig: {
          mode: FunctionCallingConfigMode.ANY,
          allowedFunctionNames: ["getGameState"],
        },
      },
    },
  });

  const functionCall = response.functionCalls?.[0];
  if (!functionCall?.name) {
    throw new Error("Gemini did not return a tool call");
  }
  if (functionCall.name !== "getGameState") {
    throw new Error(`Gemini requested an unknown tool: ${functionCall.name}`);
  }

  console.log("Gemini tool call:", functionCall);

  const gameState = await executeWebMcpTool(
    page,
    functionCall.name,
    functionCall.args ?? {},
  );

  console.log("Tool result:", gameState);
}

main().catch((error) => {
  console.error(error);
  process.exitCode = 1;
});

export async function executeWebMcpTool<T>(
  page: Page,
  toolName: string,
  args: unknown,
): Promise<T> {
  return await page.evaluate(
    async ({ toolName, args }) => {
      const modelContext =
        (document as any).modelContext ?? (navigator as any).modelContext;
      if (!modelContext) {
        throw new Error("Model Context API is not available");
      }

      const tools = await modelContext.getTools();

      const tool = tools.find((tool: any) => tool.name === toolName);

      if (!tool) {
        throw new Error(`Tool not found: ${toolName}`);
      }

      const result = await modelContext.executeTool(tool, JSON.stringify(args));

      return result;
    },
    { toolName, args },
  );
}

The flow is simple:

First, the script opens Chrome and navigates to the game page. Then it sends a prompt to Gemini together with the available tool definition. In this example, Gemini is allowed to call only one function: getGameState.

After Gemini returns a function call, the script validates that the requested function is really getGameState. This is important because the application should never blindly execute arbitrary tool names returned by the model. Then the script passes the function name and arguments to executeWebMcpTool. The tool is executed inside the browser page through WebMCP and the result is printed to the console.

And that is the proof of concept.

Our Node.js script does not call the game directly. It opens the game in Chrome, lets Chrome discover the WebMCP tools, lets Gemini request a function call and then executes that function call against the page.

What This Proves

This small example proves that Playwright can be used as a bridge between an AI model and WebMCP tools.

The browser still owns the WebMCP context. The page still exposes the tools, but our external Node.js process can orchestrate the flow and connect those tools to a stronger model.

This is useful when the existing browser-based tooling is too limited, or when you want to experiment with your own agent loop.

The example in this article only executes one tool call. A real agent would need a loop:

Ask the model what to do.
Execute the requested WebMCP tool.
Send the tool result back to the model.
Let the model decide the next step.
Repeat until the task is finished.

That full implementation would make this article much longer, so I kept the article focused on the proof of concept.

You can find the source code here:

Repositories

Proof of Concept agent - the minimal implementation used in this article.
Full agent repository - the extended implementation for Tower Before Dusk. It can already play through the first two levels.

Summary

In this article, I showed how to use Playwright to create a custom proof of concept agent for WebMCP. First, I checked whether modelContext is available, then I discovered the exposed tools, executed one of them and finally connected the flow with Gemini function calling.

Of course, this is not a fully autonomous agent yet, but it is the foundation for one.

WebMCP is still experimental and the Model Context Tool Inspector is great for debugging. However, the available models can feel limiting for some types of web apps. I hope this approach can help others test WebMCP tools with stronger models without the need to create another Chrome extension.

Top comments (43)

Web Developer Hyper • Jul 1

Interesting post! 😀 WebMCP + Playwright + AI seems like a really powerful combination. I'm also experimenting with AI and Playwright for E2E testing right now, and I'm surprised by how much Playwright can do. It makes many tasks much easier than I expected!

Daniel Balcarek • Jul 1

Thanks! 🙌

Nice, AI + Playwright for E2E testing sounds really interesting. I’m curious how you are using it, because the first thing that comes to my mind is high token consumption. 😅

Web Developer Hyper • Jul 1

I'm mainly using AI to create tests rather than run them, so the token usage isn't exceptionally high. 😊 If I gather enough useful content and experience, I'd like to write a DEV post about it.

Daniel Balcarek • Jul 2

Nice! That sounds like a great topic. Looking forward to the post! 🚀

Web Developer Hyper • Jul 2

Thank you! 🕺

Mykola Kondratiuk • Jul 5

playwright for MCP tool testing is a clever pivot - usually reach for it for browser automation not agent harness work. curious if gemini calls stay within context limits or if you needed chunking on the tool responses.

Daniel Balcarek • Jul 5

Thanks! For the Gemini free tier, it was surprisingly okay.

I think I only hit the limit with Gemini 3.5 Flash. The other models were fine without chunking, at least for this experiment, because the tool responses were still quite small.

Mykola Kondratiuk • Jul 5

makes sense - flash is the one that runs into rate limits fastest on free tier even with small payloads. the response size being small probably helped though - once tool calls start returning large blobs it gets painful fast.

Hemapriya Kanagala • Jul 1

Really interesting approach, Daniel. I hadn't really thought about using Playwright as the bridge here instead of building another extension, but it makes a lot of sense after reading this.

Also appreciate how you broke it down step by step. Made it much easier to follow.

Daniel Balcarek • Jul 1

Thanks for the kind words, Hemapriya! I’m glad you found it interesting and easy to follow. That was exactly what I hoped for with the breakdown.

Nazar Boyko • Jul 1

Quick question on the safety check before executing a tool. Right now you allow only getGameState and reject anything else, which is a clean guard while there's just one tool. Once this grows into the full loop where Gemini can call whatever the page exposes, how are you thinking about that check? Do you keep an allowlist you maintain by hand, or trust the page to only ever expose tools that are safe to run? I ask because the second the model picks the name, the page becomes the real trust boundary, and I'm curious where you'd draw the line.

Daniel Balcarek • Jul 1

Great question, and yes, I would definitely not blindly trust the model here.

For this proof of concept, I allow only getGameState because I wanted to prove the bridge first. In the full loop, I still keep an allowlist on the agent side, something like const PLANNING_TOOL_NAMES = ["getGameState","checkPlan", "submitPlan"];. The page can expose multiple WebMCP tools, but the external agent should decide which of them it is willing to execute.

So for me, the page is one trust boundary, but the agent still needs its own execution policy. You have to trust the page/tool provider at least a little, because the tools are implemented there, but model-selected tool calls should still be treated as untrusted input, especially for state-changing tools like submitPlan.

xulingfeng • Jul 1

Hey Daniel, I've been building almost the same thing on our end — Playwright + DeepSeek V4 instead of Gemini, Python instead of TS — but the idea is the same: skip the middleware, plug a capable LLM straight into the browser via Playwright. Your executeWebMcpTool helper is cleaner than what I cobbled together though. Definitely stealing that pattern next time 😄
Also, smart call going with Gemini 2.5 Flash Lite for this — low-latency function calling fits tool orchestration really well. Did the free tier give you any rate limit headaches?

Daniel Balcarek • Jul 1

Nice! I’d love to see your results.🙌 Are you planning to write a post about it?

Feel free to use any of the code! At first, I also wanted to create a generic mapper that would map WebMCP tool definitions to Gemini API tools, but that felt like overkill for this proof of concept and also for the game agent. 😅

There are actually different rate limits for different models. I hit the limit with Gemini 3 Flash multiple times, especially when the AI started playing level two. But Gemini 2.5 Flash and Gemini 2.5 Flash Lite were fine. And when I reached the limit, I just switched the model, so overall it was okay.

xulingfeng • Jul 1

Not planning to write about it just yet — the 36 Stratagems series is eating all my brain cells 😅 Might open source the framework down the road though, but I wanna put it through its paces first. Real-world usage always surfaces the kind of bugs you'd never catch in a demo. Need a few more rounds of that before I'd feel good about putting it out there.

Daniel Balcarek • Jul 1

I saw that you started the series, and I’m definitely planning to read it!

Oh, so you have a framework around WebMCP? Cool! 🔥 That’s a much higher level than proof of concept. 😂

And yeah, a demo and a real app are like a seed and a plant. Real-world usage always shows what actually grows.

xulingfeng • Jul 1

"Framework" is generous 😂 It's more of a glue script that grew legs. But it's been catching real bugs so far, and that's what matters.
Looking forward to hearing what you think of the series when you get to it!

Daniel Balcarek • Jul 1

That actually sounds good, even for a script. If it’s already catching real bugs, then it’s doing its job.

Definitely! I like your writing style, so I’m looking forward to reading it too.

Kartik N V J K • Jul 1

Using Playwright as the execution layer instead of another Chrome extension is the part I keep coming back to, because it means one agent can drive tools across pages that never shipped an inspector. The failure I would watch for is the model calling a WebMCP tool whose page state changed after discovery, so the tool signature and the live DOM disagree. Did you hit any drift between the discovered tool list and what actually executed once the page had been interacted with?

Daniel Balcarek • Jul 1

That is a good point. In my case, I did not really hit this problem because the WebMCP tool list was stable. Tools like getGameState, checkPlan, and submitPlan kept the same signatures during the whole session.

What changed was only the game state returned by getGameState, not the tool schema itself. So I did not see any drift between discovered tools and executed tools in this example.

I think this highly depends on the design and architecture of the page. If the page can expose different tools depending on UI state, navigation or lifecycle state, the agent should probably rediscover tools or validate them before execution. The page should also be designed with that in mind and keep tool contracts stable where possible.

Kartik N V J K • Jul 3

Using Playwright to get a real browser context instead of building a second Chrome extension is a clean call, since WebMCP tools only make sense when the page itself is live. The part I always end up wrestling with is the discovery step: when modelContext exposes a dozen tools, do you hand Gemini all of them every turn, or filter the schema down first so the model does not misfire on the wrong tool call?

Daniel Balcarek • Jul 3

Thanks for the comment!

Yes, I think it makes sense to pre-filter a large number of tools before sending them to the model. It helps with token consumption, and it can also make it easier for the model to choose the right tool.

In my full game-agent implementation, I only have three tools, but I still do some simple filtering based on the current state. At the start of a level, I send only the getGameState tool, because that is always the first step. After that, in the planning loop, I send only the tools that make sense for the current phase, without getGameState.

So yes, I would say the agent should not blindly send every available tool every time. It should send only the tools that make sense for the current state.

Theo Valmis • Jul 3

Nice minimal build. The thing worth flagging once an agent can drive a browser: the interesting risk stops being can it do the task and becomes what stops it from doing the wrong one. A Playwright agent with a goal will find a way to that goal, including paths you didn't intend. Minimal is the right way to learn it; the moment it touches anything real, the bounds on what it's allowed to click become the actual design work.

Daniel Balcarek • Jul 4

Thanks for the comment!

Yes I agree. Once an agent is interacting with a real prod app, the key concern isn’t just whether it can complete a task, but also what it’s permitted to do along the way.

For this article, I intentionally kept things minimal to focus on the bridge itself. In a real-world scenario, though, the agent would need well-defined boundaries, such as restricted tools and actions, validation of page state and likely user confirmation before performing any sensitive or state-changing operations.

Yusuke kimura • Jul 3

Thank you post good article,
This article is very interesting because as I am a software developer, I have a little experience with AI.
I only use the Cursor but I do not know this program how to run it correctly.
I hope that y you will post next article and I want to communicate with you
thank.

Daniel Balcarek • Jul 3

Thank you for the kind words!

Cursor is a nice way to start using AI for coding.

I’m glad you found it interesting, and I’ll try to share more articles like this. Happy to connect here on DEV!

Wren Calloway • Jul 1

The bridge works, but the schema round-trip is where this pattern quietly breaks. You're reading tool.inputSchema from getTools(), but then in the agent you hardcode a Gemini functionDeclarations entry by hand. The moment you make this generic — mapping the page's inputSchema straight into Gemini's parameters — you'll hit the fact that Gemini's function-calling schema is a constrained subset of JSON Schema (OpenAPI-flavored). WebMCP tools can expose things Gemini won't accept: oneOf/anyOf, $ref, tuple-style items arrays, unbounded additionalProperties. A page author writing a rich Zod-to-JSON-Schema tool has no idea their schema won't survive the trip, and the failure shows up as an opaque 400 from generateContent, not a clear "unsupported keyword" error.

Worth adding a normalization/validation layer between getTools() and the model rather than passing the schema through raw — even just stripping unsupported keywords and logging what you dropped. That's the difference between "plays two levels" and "works on any WebMCP page you throw at it," which is the actual promise of going generic here.

Daniel Balcarek • Jul 2

Thanks for actually checking the code and for the thoughtful comment!

You are right. In this proof of concept, I hardcoded the Gemini function declarations instead of building a generic mapper between getTools() from WebMCP and the tools object expected by the Gemini API.

I also mentioned this limitation in the article. The reason was mostly scope. A proper mapper/normalization layer would probably be larger than the proof of concept itself, especially if it needs to handle unsupported schema features, strip or transform keywords, and log what was changed.

For a real generic WebMCP agent, I agree that passing the schema through raw would not be enough. There should be a validation/normalization layer between WebMCP tool discovery and the model-specific function-calling schema.

For this article, I wanted to prove the bridge first: discover tools, execute them through Playwright, and connect that loop to Gemini. For my game agent implementation, I intentionally skipped the generic mapper because the agent is designed specifically for the game, so it was much easier to hardcode the tools while everything is still experimental.

But yes, making it work reliably across arbitrary WebMCP pages would require exactly the kind of schema handling you described.