Mani Movassagh

Posted on Mar 26

MCP Explained: How AI Plus MCP Controls a Real Browser

#ai #mcp #claude #playwright

The Problem

AI models like Claude are powerful — but they live inside a text box.

By default, I can read your message, think, and reply. That's it. I cannot open a browser, click a button, fill a form, or take a screenshot. My "hands" stop at the conversation window.

Developers worked around this by building custom integrations — glue code that connected AI to specific tools. Every team reinvented the wheel. There was no standard.

Without MCP:
  Claude ── custom glue ──► Playwright
  Claude ── custom glue ──► Supabase
  Claude ── custom glue ──► Gmail

Every integration was one-off, fragile, and hard to maintain.

What MCP Is

MCP stands for Model Context Protocol. It is an open standard created by Anthropic that defines one consistent way for any AI to communicate with any external tool or service.

Think of it like USB. Before USB, every device had its own connector. After USB, one standard plug worked with everything. MCP does the same for AI tools.

With MCP:
  Claude ── MCP ──► Playwright MCP Server ──► Browser
  Claude ── MCP ──► Supabase MCP Server   ──► Database
  Claude ── MCP ──► Gmail MCP Server      ──► Email

Same protocol. Different servers. Infinite tools.

The Language: JSON

MCP communicates using JSON messages — structured text passed back and forth between Claude and an MCP server.

When I want to navigate to a URL, I send this:

{
  "method": "browser_navigate",
  "params": {
    "url": "https://google.com"
  }
}

The server executes the action and sends back a result:

{
  "status": "navigated",
  "url": "https://google.com"
}

That's it. No magic. Just JSON flying between two programs.

The MCP Server: The Translator

Raw Playwright has no idea what to do with my JSON. It doesn't speak MCP natively.

That's where the Playwright MCP Server comes in. It is a small program that:

Listens for JSON messages from Claude
Translates them into real Playwright API calls
Controls the browser
Sends results back to Claude

Claude ──JSON──► Playwright MCP Server ──► Playwright ──► Browser

The MCP Server is the middleman. It bridges the gap between my structured messages and the actual browser automation library.

Where It Runs: Your Machine, Not the Cloud

This is a common misconception. The Playwright MCP Server does not run on AWS, Anthropic's servers, or anywhere on the internet.

It runs as a local process on your machine, right alongside Claude Code:

Your Machine
├── Claude Code
├── Playwright MCP Server  ← a Node.js process, running locally
└── Chrome / Firefox

Claude Code connects to it through a local socket or pipe. Your browser never leaves your machine unless you navigate to an external URL.

How Claude Discovers Tools: The Handshake

When Claude Code starts, it reads a config file (.mcp.json or settings.json) that lists available MCP servers:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp"]
    }
  }
}

Claude Code starts the process, then immediately asks:

"What tools do you offer?"

The Playwright MCP Server replies with a full list of available tools and their parameters:

{
  "tools": [
    {
      "name": "browser_navigate",
      "description": "Navigate to a URL",
      "parameters": {
        "url": { "type": "string", "required": true }
      }
    },
    {
      "name": "browser_click",
      "description": "Click an element on the page",
      "parameters": {
        "selector": { "type": "string", "required": true }
      }
    },
    {
      "name": "browser_take_screenshot",
      "description": "Take a screenshot of the current page",
      "parameters": {}
    }
  ]
}

Claude reads this list and now knows exactly what it can ask Playwright to do.

A Full Real Example: End to End

Let's trace one complete action. You ask Claude:

"Take a screenshot of google.com"

Here is every step that happens:

1. You ─────────────────────► Claude
        "screenshot google.com"

2. Claude ──────────────────► Playwright MCP Server
        {
          "method": "browser_navigate",
          "params": { "url": "https://google.com" }
        }

3. MCP Server ──────────────► Browser
        playwright.goto("https://google.com")

4. MCP Server ──────────────► Claude
        { "status": "navigated" }

5. Claude ──────────────────► Playwright MCP Server
        {
          "method": "browser_take_screenshot"
        }

6. MCP Server ──────────────► Claude
        { "image": "<base64 encoded image data>" }

7. Claude ──────────────────► You
        [displays the screenshot]

Two JSON calls. One visible result. Claude is orchestrating each step, deciding what to call next based on the previous response.

How Claude Finds Elements: Visual Reasoning

Simple selectors work fine when the HTML is clean:

{ "method": "browser_click", "params": { "selector": "#submit-btn" } }

But real pages are rarely that clean:

<div class="x7k2p" data-v="3">Submit</div>

No useful selector. So Claude takes a different approach:

Calls browser_take_screenshot — receives a base64 image
Visually inspects the image — Claude is multimodal, it can see
Locates the button by its appearance and position
Clicks using pixel coordinates

{
  "method": "browser_click",
  "params": { "coordinate": [842, 560] }
}

No CSS selector needed. Claude reasons the same way a human would — it sees a blue "Submit" button in the bottom right and clicks there.

This is the loop that makes MCP powerful:

Claude ──► screenshot ──► [sees, thinks] ──► click ──► [sees, thinks] ──► next step

Snapshot vs Screenshot: Choosing the Right Tool

Taking a screenshot on every step is expensive — images are large and slow to process.

Playwright MCP offers a smarter alternative: browser_snapshot. Instead of an image, it returns the accessibility tree of the page as text:

button "Submit" [id=submit-btn] at (842, 560)
input "Email" [id=email] at (400, 300)
link "Forgot password?" at (400, 400)

This gives Claude exact element references, names, and positions — without processing a full image.

Situation	Use
Need to see visual layout or styling	`browser_take_screenshot`
Need to find and interact with elements	`browser_snapshot`
Debugging a visual bug	`browser_take_screenshot`
Fast, efficient automation	`browser_snapshot`

MCP vs Writing Your Own Playwright Script

Both can automate a browser. So when do you use each?

Situation	Use
One-off task: "grab this data for me"	MCP — no code needed
Repeatable test in CI/CD	Playwright script — version controlled
Debugging a flaky test interactively	MCP — Claude reasons step by step
Production test suite	Playwright script — no AI dependency

The core difference:

MCP              = Claude drives the browser live, reasoning at each step
Playwright script = code drives the browser, no AI at runtime

But here is the most powerful pattern — use both together:

Use MCP to explore the page, figure out the right selectors and flow, then ask Claude to write the Playwright script based on what it just learned.

You get the best of both worlds: AI-assisted discovery, production-grade output.

Setting It Up Yourself

Getting Playwright MCP running takes three steps.

1. Install the MCP server

npm install -g @playwright/mcp

2. Add it to your Claude Code config

Create or edit .mcp.json in your project root:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp"]
    }
  }
}

3. Restart Claude Code

Claude Code will automatically start the server, perform the handshake, and add all Playwright tools to its capabilities.

Now try it:

"Go to github.com and take a screenshot"

Claude will navigate, capture, and show you the result — no code written, no script run.

What You Now Know

MCP is a standard protocol — the USB of AI tools
It uses simple JSON messages over a local connection
The MCP Server is a translator between Claude and the real tool
Everything runs locally on your machine
Claude discovers tools automatically via a handshake at startup
Claude reasons visually between steps — it sees, thinks, then acts
MCP is for interactive exploration; scripts are for production automation
The killer combo: use MCP to discover, then generate a script

Top comments (1)

Alex Serebriakov • Apr 8

good timing — also explored alternatives to self-hosted puppeteer recently

snapapi.pics is what we settled on. REST API for screenshots/PDF, no browser infra, handles scale