The Problem
AI models like Claude are powerful — but they live inside a text box.
By default, I can read your message, think, and reply. That's it. I cannot open a browser, click a button, fill a form, or take a screenshot. My "hands" stop at the conversation window.
Developers worked around this by building custom integrations — glue code that connected AI to specific tools. Every team reinvented the wheel. There was no standard.
Without MCP:
Claude ── custom glue ──► Playwright
Claude ── custom glue ──► Supabase
Claude ── custom glue ──► Gmail
Every integration was one-off, fragile, and hard to maintain.
What MCP Is
MCP stands for Model Context Protocol. It is an open standard created by Anthropic that defines one consistent way for any AI to communicate with any external tool or service.
Think of it like USB. Before USB, every device had its own connector. After USB, one standard plug worked with everything. MCP does the same for AI tools.
With MCP:
Claude ── MCP ──► Playwright MCP Server ──► Browser
Claude ── MCP ──► Supabase MCP Server ──► Database
Claude ── MCP ──► Gmail MCP Server ──► Email
Same protocol. Different servers. Infinite tools.
The Language: JSON
MCP communicates using JSON messages — structured text passed back and forth between Claude and an MCP server.
When I want to navigate to a URL, I send this:
{
"method": "browser_navigate",
"params": {
"url": "https://google.com"
}
}
The server executes the action and sends back a result:
{
"status": "navigated",
"url": "https://google.com"
}
That's it. No magic. Just JSON flying between two programs.
The MCP Server: The Translator
Raw Playwright has no idea what to do with my JSON. It doesn't speak MCP natively.
That's where the Playwright MCP Server comes in. It is a small program that:
- Listens for JSON messages from Claude
- Translates them into real Playwright API calls
- Controls the browser
- Sends results back to Claude
Claude ──JSON──► Playwright MCP Server ──► Playwright ──► Browser
The MCP Server is the middleman. It bridges the gap between my structured messages and the actual browser automation library.
Where It Runs: Your Machine, Not the Cloud
This is a common misconception. The Playwright MCP Server does not run on AWS, Anthropic's servers, or anywhere on the internet.
It runs as a local process on your machine, right alongside Claude Code:
Your Machine
├── Claude Code
├── Playwright MCP Server ← a Node.js process, running locally
└── Chrome / Firefox
Claude Code connects to it through a local socket or pipe. Your browser never leaves your machine unless you navigate to an external URL.
How Claude Discovers Tools: The Handshake
When Claude Code starts, it reads a config file (.mcp.json or settings.json) that lists available MCP servers:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp"]
}
}
}
Claude Code starts the process, then immediately asks:
"What tools do you offer?"
The Playwright MCP Server replies with a full list of available tools and their parameters:
{
"tools": [
{
"name": "browser_navigate",
"description": "Navigate to a URL",
"parameters": {
"url": { "type": "string", "required": true }
}
},
{
"name": "browser_click",
"description": "Click an element on the page",
"parameters": {
"selector": { "type": "string", "required": true }
}
},
{
"name": "browser_take_screenshot",
"description": "Take a screenshot of the current page",
"parameters": {}
}
]
}
Claude reads this list and now knows exactly what it can ask Playwright to do.
A Full Real Example: End to End
Let's trace one complete action. You ask Claude:
"Take a screenshot of google.com"
Here is every step that happens:
1. You ─────────────────────► Claude
"screenshot google.com"
2. Claude ──────────────────► Playwright MCP Server
{
"method": "browser_navigate",
"params": { "url": "https://google.com" }
}
3. MCP Server ──────────────► Browser
playwright.goto("https://google.com")
4. MCP Server ──────────────► Claude
{ "status": "navigated" }
5. Claude ──────────────────► Playwright MCP Server
{
"method": "browser_take_screenshot"
}
6. MCP Server ──────────────► Claude
{ "image": "<base64 encoded image data>" }
7. Claude ──────────────────► You
[displays the screenshot]
Two JSON calls. One visible result. Claude is orchestrating each step, deciding what to call next based on the previous response.
How Claude Finds Elements: Visual Reasoning
Simple selectors work fine when the HTML is clean:
{ "method": "browser_click", "params": { "selector": "#submit-btn" } }
But real pages are rarely that clean:
<div class="x7k2p" data-v="3">Submit</div>
No useful selector. So Claude takes a different approach:
- Calls
browser_take_screenshot— receives a base64 image - Visually inspects the image — Claude is multimodal, it can see
- Locates the button by its appearance and position
- Clicks using pixel coordinates
{
"method": "browser_click",
"params": { "coordinate": [842, 560] }
}
No CSS selector needed. Claude reasons the same way a human would — it sees a blue "Submit" button in the bottom right and clicks there.
This is the loop that makes MCP powerful:
Claude ──► screenshot ──► [sees, thinks] ──► click ──► [sees, thinks] ──► next step
Snapshot vs Screenshot: Choosing the Right Tool
Taking a screenshot on every step is expensive — images are large and slow to process.
Playwright MCP offers a smarter alternative: browser_snapshot. Instead of an image, it returns the accessibility tree of the page as text:
button "Submit" [id=submit-btn] at (842, 560)
input "Email" [id=email] at (400, 300)
link "Forgot password?" at (400, 400)
This gives Claude exact element references, names, and positions — without processing a full image.
| Situation | Use |
|---|---|
| Need to see visual layout or styling | browser_take_screenshot |
| Need to find and interact with elements | browser_snapshot |
| Debugging a visual bug | browser_take_screenshot |
| Fast, efficient automation | browser_snapshot |
MCP vs Writing Your Own Playwright Script
Both can automate a browser. So when do you use each?
| Situation | Use |
|---|---|
| One-off task: "grab this data for me" | MCP — no code needed |
| Repeatable test in CI/CD | Playwright script — version controlled |
| Debugging a flaky test interactively | MCP — Claude reasons step by step |
| Production test suite | Playwright script — no AI dependency |
The core difference:
MCP = Claude drives the browser live, reasoning at each step
Playwright script = code drives the browser, no AI at runtime
But here is the most powerful pattern — use both together:
Use MCP to explore the page, figure out the right selectors and flow, then ask Claude to write the Playwright script based on what it just learned.
You get the best of both worlds: AI-assisted discovery, production-grade output.
Setting It Up Yourself
Getting Playwright MCP running takes three steps.
1. Install the MCP server
npm install -g @playwright/mcp
2. Add it to your Claude Code config
Create or edit .mcp.json in your project root:
{
"mcpServers": {
"playwright": {
"command": "npx",
"args": ["@playwright/mcp"]
}
}
}
3. Restart Claude Code
Claude Code will automatically start the server, perform the handshake, and add all Playwright tools to its capabilities.
Now try it:
"Go to github.com and take a screenshot"
Claude will navigate, capture, and show you the result — no code written, no script run.
What You Now Know
- MCP is a standard protocol — the USB of AI tools
- It uses simple JSON messages over a local connection
- The MCP Server is a translator between Claude and the real tool
- Everything runs locally on your machine
- Claude discovers tools automatically via a handshake at startup
- Claude reasons visually between steps — it sees, thinks, then acts
- MCP is for interactive exploration; scripts are for production automation
- The killer combo: use MCP to discover, then generate a script
Top comments (0)