DEV Community

אחיה כהן
אחיה כהן

Posted on • Originally published at achiya-automation.com

I replaced Chrome DevTools MCP with Safari on my Mac. Here's what happened.

Everyone in the MCP world uses Chrome. I stopped.

Not because I'm contrarian. Because my MacBook Pro was hitting 97°C running Chrome DevTools MCP during a 3-hour automation session, and the fan noise was louder than my Spotify playlist.

I'm an automation developer who builds AI-powered workflows for businesses (achiya-automation.com). My AI agents need to interact with browsers constantly — filling forms, scraping data, clicking through multi-step flows. Chrome DevTools MCP worked, but the cost was real: heat, zombie processes, and my browser getting hijacked mid-work.

So I built Safari MCP — a native macOS MCP server with 80 tools, running entirely through AppleScript and JavaScript. No Chrome. No Puppeteer. No Playwright.

Here are 7 things I learned along the way.


1. WebKit on Apple Silicon is dramatically cheaper than Chromium

This isn't a fabricated benchmark — it's something any Mac developer can verify themselves. Safari uses the native WebKit engine that Apple optimizes specifically for their hardware. Chrome brings its own Chromium engine, which runs as a separate process architecture.

The difference is measurable. Open Activity Monitor, run the same page in Safari and Chrome, and watch the CPU and energy impact columns. On Apple Silicon Macs, WebKit consistently uses roughly 60% less CPU than Chromium for equivalent workloads. Apple publishes WebKit energy efficiency comparisons on webkit.org, and independent tests from outlets like iMore and Tom's Guide have confirmed the gap on M-series chips.

For MCP automation, this compounds. An AI agent might execute hundreds of browser commands per session. Each one is cheaper when the underlying engine is native.

2. You don't need Chrome DevTools Protocol for browser automation on macOS

This was the biggest mental shift. The MCP ecosystem gravitates toward Chrome DevTools Protocol (CDP) because it's well-documented and cross-platform. But on macOS, AppleScript has been automating Safari since the 90s.

The do JavaScript command in AppleScript lets you execute arbitrary JS in any Safari tab, by index, without activating the window. That single capability covers about 80% of what CDP does for MCP tools — navigation, DOM reading, clicking, form filling, screenshots.

The key insight: AppleScript doesn't spawn a new process for each command if you keep a persistent osascript process running. I use a Swift helper daemon that keeps one process alive and pipes commands through stdin/stdout. Result: ~5ms per command, compared to ~80ms when spawning a fresh osascript each time. That's a 16x improvement from a single architectural decision — keep the process alive instead of spawning one per command. It sounds obvious in retrospect, but I only figured it out after profiling showed that 90% of the latency was process startup, not the actual JavaScript execution.

Claude/Cursor/AI Agent
        ↓ MCP Protocol (stdio)
   Safari MCP Server (Node.js)
        ↓                    ↓
   Extension (HTTP)     AppleScript + Swift daemon
   (~5-20ms/cmd)        (~5ms/cmd, always available)
        ↓                    ↓
   Content Script       do JavaScript in tab N
        ↓                    ↓
   Page DOM ←←←←←←←←←← Page DOM
Enter fullscreen mode Exit fullscreen mode

3. The "no focus steal" problem nearly killed the project

Here's the scenario that made me start this project: I'm typing an email in Safari. My AI agent decides to navigate to a different page for a task I assigned it. Chrome DevTools MCP opens a new Chrome window — which steals focus from Safari. Annoying, but manageable.

But the real nightmare: what if the agent navigates in a tab I'm actively using? Forms I was filling get wiped. Unsaved text disappears. This actually happened to me.

Safari MCP solves this at the architecture level. Every tool targets tabs by index. The safari_new_tab command opens tabs in the background. There is literally no activate command anywhere in the codebase. Safari never comes to the foreground.

I added a safety protocol: the agent must call safari_list_tabs at session start, note which tabs already exist, and only interact with tabs it opened via safari_new_tab. Any tab the agent didn't open is treated as the user's tab — untouchable.

This alone was worth building the entire project.

4. React, Vue, and Angular break naive form filling

This was the hardest engineering problem. If you just set element.value = "text" on a React input, nothing happens. React uses a synthetic event system and tracks internal state via a _valueTracker on the element. You need to:

  1. Get the native HTMLInputElement setter
  2. Call it with Object.getOwnPropertyDescriptor(HTMLInputElement.prototype, 'value').set.call(element, value)
  3. Delete the _valueTracker so React doesn't skip the update
  4. Dispatch input and change events with bubbles: true
  5. Dispatch blur for validation

Vue and Angular have their own quirks. Closure-based editors like Medium's need execCommand insertion or synthetic clipboard paste events.

Safari MCP's safari_fill handles all of these. It auto-detects the framework and applies the correct strategy. This was easily 40% of the development effort for what seems like a trivially simple feature.

5. A Safari Extension unlocks what AppleScript can't touch

AppleScript gets you 80% of the way. But modern web apps have:

  • Closed Shadow DOM (Reddit, web components) — AppleScript JavaScript can't see inside
  • Strict Content Security Policy — blocks injected scripts on sites like GitHub
  • Complex editor state (Draft.js, ProseMirror, Slate) — needs deep framework-level manipulation

So I built a Safari Web Extension that injects a content script into every page. It runs in the MAIN world (not ISOLATED), giving it full access to the page's JavaScript context, Shadow DOM, and framework internals.

The extension communicates with the MCP server via HTTP polling on port 9224. The MCP server exposes a simple REST API, the extension polls it every 100ms for pending commands, executes them, and posts results back.

The dual-engine design means everything works with just AppleScript, but the extension makes it better. If the extension isn't connected, the server falls back automatically. There's no configuration needed — the server tries the extension first, and if it doesn't respond within a few milliseconds, routes through AppleScript. Users who don't want to bother with Xcode and extension setup get a fully working tool out of the box. Power users who need Shadow DOM access or CSP bypass can opt into the extension.

6. The honest limitations matter

I keep Chrome DevTools MCP installed. Here's why:

  • Lighthouse audits — there's no Safari equivalent. When I need Core Web Vitals scores, I use Chrome.
  • Performance traces — Chrome's Performance panel and trace format are irreplaceable for debugging rendering bottlenecks.
  • Memory snapshots — heap snapshots for finding memory leaks.
  • Cross-platform — Safari MCP is macOS only. Period. If you're on Linux or Windows, this doesn't exist for you.

My actual workflow: Safari MCP handles 95% of daily automation (navigation, form filling, data extraction, screenshots). Chrome DevTools MCP handles the 5% that requires Chromium-specific devtools.

7. Session persistence is the feature nobody talks about

With Playwright or Puppeteer, every session starts with a fresh browser profile. No cookies. No logins. No saved passwords. You need to handle authentication flows every single time, or manage browser profiles and cookie injection.

Safari MCP uses your actual Safari browser. Gmail is logged in. GitHub is logged in. Your company's internal tools are logged in. When your AI agent needs to check something on a site you're authenticated on — it just works.

This saves an absurd amount of time. No OAuth flows to automate. No cookie files to manage. No headless browser profiles. I've literally saved hours per week not having to deal with authentication in automation scripts.

The trade-off is obvious: your agent has access to your real sessions, so you need to trust what it's doing. But for a local development tool running on your own machine, that's the correct trade-off. You're already trusting your AI coding assistant with your filesystem and terminal — browser sessions aren't a meaningful expansion of that trust boundary.


Try it

npm install -g safari-mcp
Enter fullscreen mode Exit fullscreen mode

Prerequisites:

  • macOS (any version)
  • Node.js 18+
  • Safari → Settings → Advanced → "Show features for web developers" ✓
  • Safari → Develop → "Allow JavaScript from Apple Events" ✓

Add to your MCP client config (Claude Code, Cursor, Windsurf, etc.):

{
  "mcpServers": {
    "safari": {
      "command": "npx",
      "args": ["-y", "safari-mcp"]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The GitHub repo has full docs, all 80 tools listed, and a comparison table against Chrome DevTools MCP and Playwright MCP.

If you want to see how I use it in production automation workflows — I run an automation agency (achiya-automation.com) where Safari MCP is a core part of the stack alongside n8n for workflow orchestration.


Mac developers — what's your biggest pain point with browser automation tools in your MCP/AI agent setup? I'm genuinely curious whether the problems I solved (heat, focus stealing, session persistence) are the ones that bother you most, or if there are pain points I haven't addressed yet. If you've built your own MCP server for something unconventional, I'd love to hear about that too.

Top comments (0)