DEV Community

Cover image for MCP Explained: How AI Plus MCP Controls a Real Browser
Mani Movassagh
Mani Movassagh

Posted on

MCP Explained: How AI Plus MCP Controls a Real Browser

The Problem

AI models like Claude are powerful — but they live inside a text box.

By default, I can read your message, think, and reply. That's it. I cannot open a browser, click a button, fill a form, or take a screenshot. My "hands" stop at the conversation window.

Developers worked around this by building custom integrations — glue code that connected AI to specific tools. Every team reinvented the wheel. There was no standard.

Without MCP:
  Claude ── custom glue ──► Playwright
  Claude ── custom glue ──► Supabase
  Claude ── custom glue ──► Gmail
Enter fullscreen mode Exit fullscreen mode

Every integration was one-off, fragile, and hard to maintain.


What MCP Is

MCP stands for Model Context Protocol. It is an open standard created by Anthropic that defines one consistent way for any AI to communicate with any external tool or service.

Think of it like USB. Before USB, every device had its own connector. After USB, one standard plug worked with everything. MCP does the same for AI tools.

With MCP:
  Claude ── MCP ──► Playwright MCP Server ──► Browser
  Claude ── MCP ──► Supabase MCP Server   ──► Database
  Claude ── MCP ──► Gmail MCP Server      ──► Email
Enter fullscreen mode Exit fullscreen mode

Same protocol. Different servers. Infinite tools.


The Language: JSON

MCP communicates using JSON messages — structured text passed back and forth between Claude and an MCP server.

When I want to navigate to a URL, I send this:

{
  "method": "browser_navigate",
  "params": {
    "url": "https://google.com"
  }
}
Enter fullscreen mode Exit fullscreen mode

The server executes the action and sends back a result:

{
  "status": "navigated",
  "url": "https://google.com"
}
Enter fullscreen mode Exit fullscreen mode

That's it. No magic. Just JSON flying between two programs.


The MCP Server: The Translator

Raw Playwright has no idea what to do with my JSON. It doesn't speak MCP natively.

That's where the Playwright MCP Server comes in. It is a small program that:

  1. Listens for JSON messages from Claude
  2. Translates them into real Playwright API calls
  3. Controls the browser
  4. Sends results back to Claude
Claude ──JSON──► Playwright MCP Server ──► Playwright ──► Browser
Enter fullscreen mode Exit fullscreen mode

The MCP Server is the middleman. It bridges the gap between my structured messages and the actual browser automation library.


Where It Runs: Your Machine, Not the Cloud

This is a common misconception. The Playwright MCP Server does not run on AWS, Anthropic's servers, or anywhere on the internet.

It runs as a local process on your machine, right alongside Claude Code:

Your Machine
├── Claude Code
├── Playwright MCP Server  ← a Node.js process, running locally
└── Chrome / Firefox
Enter fullscreen mode Exit fullscreen mode

Claude Code connects to it through a local socket or pipe. Your browser never leaves your machine unless you navigate to an external URL.


How Claude Discovers Tools: The Handshake

When Claude Code starts, it reads a config file (.mcp.json or settings.json) that lists available MCP servers:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Claude Code starts the process, then immediately asks:

"What tools do you offer?"

The Playwright MCP Server replies with a full list of available tools and their parameters:

{
  "tools": [
    {
      "name": "browser_navigate",
      "description": "Navigate to a URL",
      "parameters": {
        "url": { "type": "string", "required": true }
      }
    },
    {
      "name": "browser_click",
      "description": "Click an element on the page",
      "parameters": {
        "selector": { "type": "string", "required": true }
      }
    },
    {
      "name": "browser_take_screenshot",
      "description": "Take a screenshot of the current page",
      "parameters": {}
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Claude reads this list and now knows exactly what it can ask Playwright to do.


A Full Real Example: End to End

Let's trace one complete action. You ask Claude:

"Take a screenshot of google.com"

Here is every step that happens:

1. You ─────────────────────► Claude
        "screenshot google.com"

2. Claude ──────────────────► Playwright MCP Server
        {
          "method": "browser_navigate",
          "params": { "url": "https://google.com" }
        }

3. MCP Server ──────────────► Browser
        playwright.goto("https://google.com")

4. MCP Server ──────────────► Claude
        { "status": "navigated" }

5. Claude ──────────────────► Playwright MCP Server
        {
          "method": "browser_take_screenshot"
        }

6. MCP Server ──────────────► Claude
        { "image": "<base64 encoded image data>" }

7. Claude ──────────────────► You
        [displays the screenshot]
Enter fullscreen mode Exit fullscreen mode

Two JSON calls. One visible result. Claude is orchestrating each step, deciding what to call next based on the previous response.


How Claude Finds Elements: Visual Reasoning

Simple selectors work fine when the HTML is clean:

{ "method": "browser_click", "params": { "selector": "#submit-btn" } }
Enter fullscreen mode Exit fullscreen mode

But real pages are rarely that clean:

<div class="x7k2p" data-v="3">Submit</div>
Enter fullscreen mode Exit fullscreen mode

No useful selector. So Claude takes a different approach:

  1. Calls browser_take_screenshot — receives a base64 image
  2. Visually inspects the image — Claude is multimodal, it can see
  3. Locates the button by its appearance and position
  4. Clicks using pixel coordinates
{
  "method": "browser_click",
  "params": { "coordinate": [842, 560] }
}
Enter fullscreen mode Exit fullscreen mode

No CSS selector needed. Claude reasons the same way a human would — it sees a blue "Submit" button in the bottom right and clicks there.

This is the loop that makes MCP powerful:

Claude ──► screenshot ──► [sees, thinks] ──► click ──► [sees, thinks] ──► next step
Enter fullscreen mode Exit fullscreen mode

Snapshot vs Screenshot: Choosing the Right Tool

Taking a screenshot on every step is expensive — images are large and slow to process.

Playwright MCP offers a smarter alternative: browser_snapshot. Instead of an image, it returns the accessibility tree of the page as text:

button "Submit" [id=submit-btn] at (842, 560)
input "Email" [id=email] at (400, 300)
link "Forgot password?" at (400, 400)
Enter fullscreen mode Exit fullscreen mode

This gives Claude exact element references, names, and positions — without processing a full image.

Situation Use
Need to see visual layout or styling browser_take_screenshot
Need to find and interact with elements browser_snapshot
Debugging a visual bug browser_take_screenshot
Fast, efficient automation browser_snapshot

MCP vs Writing Your Own Playwright Script

Both can automate a browser. So when do you use each?

Situation Use
One-off task: "grab this data for me" MCP — no code needed
Repeatable test in CI/CD Playwright script — version controlled
Debugging a flaky test interactively MCP — Claude reasons step by step
Production test suite Playwright script — no AI dependency

The core difference:

MCP              = Claude drives the browser live, reasoning at each step
Playwright script = code drives the browser, no AI at runtime
Enter fullscreen mode Exit fullscreen mode

But here is the most powerful pattern — use both together:

Use MCP to explore the page, figure out the right selectors and flow, then ask Claude to write the Playwright script based on what it just learned.

You get the best of both worlds: AI-assisted discovery, production-grade output.


Setting It Up Yourself

Getting Playwright MCP running takes three steps.

1. Install the MCP server

npm install -g @playwright/mcp
Enter fullscreen mode Exit fullscreen mode

2. Add it to your Claude Code config

Create or edit .mcp.json in your project root:

{
  "mcpServers": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

3. Restart Claude Code

Claude Code will automatically start the server, perform the handshake, and add all Playwright tools to its capabilities.

Now try it:

"Go to github.com and take a screenshot"

Claude will navigate, capture, and show you the result — no code written, no script run.


What You Now Know

  • MCP is a standard protocol — the USB of AI tools
  • It uses simple JSON messages over a local connection
  • The MCP Server is a translator between Claude and the real tool
  • Everything runs locally on your machine
  • Claude discovers tools automatically via a handshake at startup
  • Claude reasons visually between steps — it sees, thinks, then acts
  • MCP is for interactive exploration; scripts are for production automation
  • The killer combo: use MCP to discover, then generate a script

Top comments (0)