DEV Community

Shada Daab
Shada Daab

Posted on

Browser Automation Workflows with Playwright MCP and JavaScript

Browser automation just got smarter — and conversational. With the Model Context Protocol (MCP) and Playwright MCP, developers can now let AI agents control browsers: navigating pages, clicking elements, capturing screenshots, and running end-to-end tests — all through natural language.

This post will walk you through how to use the Playwright MCP to set up an AI-driven browser automation workflow in JavaScript or TypeScript.


Why Playwright MCP?

The Playwright MCP server wraps the full power of Playwright (Microsoft’s browser automation library) in the Model Context Protocol — a universal API standard that lets AI systems safely interact with tools.

That means an LLM can now:

  • Navigate to URLs (goto)
  • Click and type on elements
  • Capture screenshots or full-page renders
  • Extract data or verify page content

All without writing traditional test code — but still fully inspectable and structured.


Step 1: Install Playwright MCP

If you already have Node.js (v18+), installation is one command away:

npx @playwright/mcp init
Enter fullscreen mode Exit fullscreen mode

This installs and configures the Playwright MCP server, exposing endpoints defined by the protocol (such as open_page, click_element, fill_field, and screenshot).

You can also run it manually:

npx @playwright/mcp serve
Enter fullscreen mode Exit fullscreen mode

By default, it runs on http://localhost:4000/mcp.


Step 2: Explore Available Tools

Once the server is live, it exposes MCP tools like:

Tool Name Description
open_page Opens a given URL
click_element Clicks a DOM element by selector
fill_field Types text into input fields
get_content Retrieves visible text or HTML
screenshot Takes a screenshot of the current page

These structured tool calls allow an AI assistant (like Claude or GPT-5) to reason about page layouts, element trees, and accessibility data rather than raw pixels.


Step 3: Connect from a JavaScript Client

Let’s write a small Node.js client that interacts with the Playwright MCP server.

npm install @modelcontextprotocol/sdk
Enter fullscreen mode Exit fullscreen mode

Then, create a file called browserClient.js:

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';

const client = new Client({ name: 'playwright-demo' });
const transport = new StreamableHTTPClientTransport('http://localhost:4000/mcp');

await client.connect(transport);

// 1. Open GitHub login page
await client.callTool({
  name: 'open_page',
  arguments: { url: 'https://github.com/login' }
});

// 2. Fill in username & password fields
await client.callTool({
  name: 'fill_field',
  arguments: { selector: '#login_field', value: 'demo_user' }
});

await client.callTool({
  name: 'fill_field',
  arguments: { selector: '#password', value: 'secret_pass' }
});

// 3. Click the login button
await client.callTool({
  name: 'click_element',
  arguments: { selector: 'input[type="submit"]' }
});

// 4. Capture a screenshot after login
const result = await client.callTool({ name: 'screenshot', arguments: {} });

console.log('Screenshot saved:', result.structuredContent.filePath);
Enter fullscreen mode Exit fullscreen mode

This script effectively lets an AI-driven process control your Playwright browser — structured, auditable, and automatable.


Step 4: Natural Language Control with VS Code or Claude

If you use Claude Desktop or the VS Code MCP extension, you can link your Playwright MCP server so that your AI assistant can perform automation via conversation.

Try prompting:

“Open my dashboard at https://example.com/dashboard and tell me how many new messages I have.”

or

“Navigate to the contact form and fill out the name and email fields.”

The AI uses MCP tools like open_page, get_content, and fill_field to execute these steps — safely and reproducibly.


Step 5: Example — AI-Powered Website Testing

You can use the same setup to perform automated end-to-end (E2E) tests. For example:

const response = await client.callTool({
  name: 'open_page',
  arguments: { url: 'https://example.com/login' }
});

await client.callTool({
  name: 'fill_field',
  arguments: { selector: '#email', value: 'user@example.com' }
});

await client.callTool({
  name: 'fill_field',
  arguments: { selector: '#password', value: 'mypassword' }
});

await client.callTool({ name: 'click_element', arguments: { selector: 'button[type="submit"]' } });

const content = await client.callTool({ name: 'get_content', arguments: { selector: 'h1' } });
console.log('Page heading after login:', content.structuredContent);
Enter fullscreen mode Exit fullscreen mode

You can even let the AI generate these steps itself — by describing the goal instead of coding it.


Step 6: Accessibility Snapshots for Safer Automation

Unlike brittle pixel-based automation, MCP’s structured data layer uses accessibility snapshots and semantic element trees. That means commands like “click the ‘Submit’ button” map to actual labeled UI elements — improving reliability and safety.

This approach is crucial for integrating AI into production-grade testing environments.


Final Thoughts

Playwright MCP transforms the way developers and AI systems interact with the web. By exposing structured browser-control tools, it bridges the gap between conversational AI and real-world automation.

🔧 Key Benefits:

  • Seamless LLM integration with browser automation.
  • Declarative and safe access to page content.
  • Perfect for automated QA, scraping, or RAG data pipelines.

If you’ve ever wished your AI assistant could click that button for you — now it can.


Resources:

Top comments (0)