אחיה כהן

Posted on Apr 14

I Tried to Auto-Launch My MCP Server Using My MCP Server. It Found Its Own Bug.

#javascript #mcp #browserautomation #debugging

TLDR

I built safari-mcp, an MCP server that lets AI agents drive Safari natively on macOS. This week I shipped a discoverability push for it: post the launch announcement to Hacker News, X, LinkedIn, and Reddit. Naturally, I tried to automate the campaign using safari-mcp itself.

It worked for HN. It worked for X. Then LinkedIn started running clicks on a completely different tab — Catchpoint Internet Performance Monitoring, which I'd never visited. Three windows, a URL prefix match, and a 500 ms cache TTL conspired to teach me a lesson about tab identity.

Here's the detective story, the root cause, and the fix that ships in v2.8.3 today.

The Setup: Eating My Own Dog Food

I had four launch targets:

Show HN — submit the link, post a first comment
X (Twitter) — a single thread that quotes the article
LinkedIn — a Hebrew-English bilingual long-form post
Reddit r/ClaudeAI — a tool-launch-with-context post

I'd just shipped a HackerNoon technical deep-dive about how I built browser automation for a browser that has no Chrome DevTools Protocol. The launch was the natural follow-on. And of course I was going to drive it through safari-mcp — what's the point of building a Safari automation tool if you don't use it for your own launch?

"Eat your own dog food at launch — bugs surface fast." — me, after this incident.

Round 1: HN and X Worked Beautifully

The HN submission flow was textbook. Open news.ycombinator.com/submit, fill the title and URL inputs, call form.submit() via injected JS, follow the redirect, find the new item ID via submitted?id=<user>. About 8 seconds end-to-end.

// Verify the form is real, not some other tab
JSON.stringify({
  url: location.href,
  hasTitleInput: !!document.querySelector('input[name="title"]'),
  hasUrlInput: !!document.querySelector('input[name="url"]')
})
// → {"url":"https://news.ycombinator.com/submit","hasTitleInput":true,"hasUrlInput":true}

Filled both inputs. Called form.submit(). Got redirected to /newest. Walked back to /submitted?id=Achiyacohen and confirmed the new post sat at #1 with 1 point. Live.

X was even smoother. The compose textbox in x.com/home is a contenteditable with aria-label="Post text". I filled it with the thread text, found the button[data-testid="tweetButtonInline"], dispatched a React-aware pointer event sequence (mousedown → mouseup → click), and watched the textbox empty itself. Verified by reading the user's profile timeline 30 seconds later: the tweet was there, with my exact text and a fresh status/2044134672683110740 URL. Live.

Two for two. I was feeling good.

Round 2: Then LinkedIn Got Weird

LinkedIn's "Start a post" button (in Hebrew: "כתבו פוסט") is a div with class names like _73dfa4c8 ed6e5932 _1d1c97a4. I found it, dispatched the same React-aware click sequence, and waited for the compose modal to appear.

It didn't.

I called safari_evaluate to check whether [contenteditable="true"] had appeared anywhere on the page. The result came back empty — zero contenteditable elements. That was strange. Even the LinkedIn feed itself has search inputs and other interactive elements. So I asked the page for its URL and title to make sure I was in the right place.

The response:

{
  "title": "API Monitoring | Catchpoint Internet Performance Monitoring",
  "url": "https://www.catchpoint.com/application-experience/api-monitoring?utm_campaign=Hackernoon-TOFU-billboard"
}

Catchpoint. I'd never visited Catchpoint.

The First Suspicion: Tab Tracking

The first hypothesis was that safari-mcp's tab tracking had drifted. The MCP keeps a cached _activeTabIndex in memory and uses it for all subsequent operations on a tab it opened. The cache has a TTL of 500 ms, after which resolveActiveTab re-verifies by URL prefix matching.

I called safari_list_tabs and got 12 tabs in the profile window — but with the LinkedIn tab right where I expected it. So the cache and the actual tab layout agreed: tab 12 was LinkedIn.

Then why was safari_evaluate returning Catchpoint?

Detective Work: There Are Three Windows

I dropped down to raw AppleScript to bypass the MCP layer:

tell application "Safari"
  set output to ""
  set wCount to count of windows
  set output to "Total windows: " & wCount & linefeed
  repeat with w from 1 to wCount
    set output to output & "Window " & w & ": " & (count of tabs of window w) & " tabs" & linefeed
    set output to output & "  name: " & (name of window w) & linefeed
    set output to output & "  tab1: " & (URL of tab 1 of window w) & linefeed
  end repeat
  return output
end tell

Output:

Total windows: 3
Window 1: 2 tabs
  name: אישי — Documenso
  tab1: https://mail.google.com/mail/u/0/#starred/...
Window 2: 12 tabs
  name: אוטומציות — API Monitoring | Catchpoint Internet Performance Monitoring
  tab1: https://hackernoon.com/login?redirect=app
Window 3: 3 tabs
  name: אישי — תוכנה קלה לשליחה למחשב מרחוק - Claude
  tab1: https://claude.ai/recents

Three windows. Two profiles ("אישי" / Personal and "אוטומציות" / Automation). Safari MCP was correctly targeting Window 2 ("אוטומציות"), where my LinkedIn tab actually lived as tab 12. So far so good.

The Catchpoint URL? It was tab 5 of Window 2 — a tab the user (me) had clicked open earlier from a HackerNoon ad without thinking. It was sitting there idle. And somehow safari_evaluate was hitting it instead of tab 12.

The Real Bug: Resolve Cache + URL Prefix

I traced through resolveActiveTab line by line:

async function resolveActiveTab() {
  if (!_activeTabURL) return _activeTabIndex;

  const safeUrl = _activeTabURL.replace(/"/g, '\\"');
  const domain = _activeTabURL
    .replace(/^https?:\/\//, '')
    .split('/')[0];

  const result = await osascriptFast(`
    tell application "Safari"
      set w to ${getTargetWindowRef()}
      set tabCount to count of tabs of w

      // Strategy 1: verify cached index still matches URL
      try
        if tabCount >= ${_activeTabIndex} then
          if URL of tab ${_activeTabIndex} of w starts with "${safeUrl}" then
            return ${_activeTabIndex}
          end if
        end try
      end try

      // Strategy 2: search all tabs by URL prefix
      repeat with i from tabCount to 1 by -1
        if URL of tab i of w starts with "${safeUrl}" then return i
      end repeat

      // Strategy 3: search by domain (returns negative — partial match)
      repeat with i from tabCount to 1 by -1
        if URL of tab i of w contains "${domain}" then return -(i)
      end repeat

      return "0:" & tabCount
    end tell
  `);
  // ...
}

The bug was right there in the strategies. When I navigated LinkedIn to https://www.linkedin.com/feed/, that became _activeTabURL. Then LinkedIn's React router silently rewrote the URL to https://www.linkedin.com/feed/?shareActive=true because of the query parameter I'd passed. Strategy 1 — the fast path — failed because URL of tab 12 starts with "https://www.linkedin.com/feed/"... wait, that should still match. The new URL starts with the old prefix.

So why did it fail?

The actual cause was even more subtle: a different Safari instance, in a different profile window, had completed an HTTP redirect that rewrote the URL to a shorter form. AppleScript's URL of tab was returning the post-redirect URL, which did not start with my saved _activeTabURL because _activeTabURL had query parameters that the post-redirect URL didn't.

Strategy 1 fell through. Strategy 2 (full URL search across all tabs) also fell through for the same reason. Strategy 3 (domain search) found... a tab in the wrong profile window? No — it found Catchpoint. Why?

Because of how I'd extracted the domain:

const domain = _activeTabURL.replace(/^https?:\/\//, '').split('/')[0];
// "www.linkedin.com"

And the AppleScript:

if URL of tab i of w contains "${domain}" then return -(i)

contains is a substring match. Catchpoint's ad URL was https://www.catchpoint.com/.../?utm_campaign=Hackernoon-TOFU-billboard&utm_source=hackernoon&utm_medium=paidsocial. Did it contain www.linkedin.com? No.

Wait, then how did it match?

After two more hours of tracing, I found the actual cause. The MCP server runs as a singleton, but Claude Code occasionally spawns a second instance for ~40 ms during connection negotiation. That second instance had its own _activeTabIndex state, and it had set the index to point at Catchpoint because it saw Catchpoint as the active tab when it briefly took over. When the original instance came back, it read the wrong index from a stale cache check that hadn't yet been invalidated by the singleton kill code.

The 500 ms cache window was just long enough for that race.

The Fix: window.__mcpTabMarker

URL prefix matching is fragile. Domain matching is fragile. Cached indices are fragile. What's not fragile?

A unique identifier injected into the page's JavaScript context.

The new fix: every safari_new_tab writes a unique marker into window.__mcpTabMarker:

const tabMarker = `MCP_${SESSION_ID}_${Date.now().toString(36)}_${Math.random().toString(36).slice(2, 8)}`;
await osascriptFast(
  `tell application "Safari" to do JavaScript "window.__mcpTabMarker='${tabMarker}'" in tab ${_activeTabIndex} of ${getTargetWindowRef()}`
);
_activeTabMarker = tabMarker;

The marker survives:

Same-tab navigation — window.__mcpTabMarker lives in the JS realm, which persists across location.href = ... if the new URL is same-origin. For cross-origin navigations it gets wiped, which is fine because that's a deliberate context boundary.
Hash changes — location.hash = "#x" doesn't reload the JS context.
pushState and replaceState — single-page-app routers don't reset the realm.
Query string mutations — same as above.
Redirects within the same origin — still in the same realm.

resolveActiveTab now tries the marker first:

async function resolveActiveTab() {
  // Strategy 1: window.__mcpTabMarker (bulletproof)
  if (_activeTabMarker && _activeTabIndex) {
    const checkScript = `(function(){return window.__mcpTabMarker==='${safeMarker}'?'1':'0'})()`;

    // Check cached index first (fast path)
    const matchAtCached = await osascriptFast(
      `tell application "Safari" to do JavaScript "${checkScript}" in tab ${_activeTabIndex} of ${getTargetWindowRef()}`
    );
    if (matchAtCached === '1') return _activeTabIndex;

    // Cached index doesn't match — scan all tabs in profile window
    const tabCount = Number(await osascriptFast(
      `tell application "Safari" to return count of tabs of ${getTargetWindowRef()}`
    ));
    for (let i = tabCount; i >= 1; i--) {
      const m = await osascriptFast(
        `tell application "Safari" to do JavaScript "${checkScript}" in tab ${i} of ${getTargetWindowRef()}`
      );
      if (m === '1') {
        _activeTabIndex = i;
        return i;
      }
    }
  }

  // Strategy 2: URL prefix (fallback for tabs created before the marker was set)
  // ...
}

The marker check costs about 5 ms per tab via the persistent osascriptFast daemon. On a tab list of 12 tabs, the worst case is 60 ms — slower than the previous "check cached index" path, but correct.

I also dropped the resolve cache from 500 ms to 100 ms. The check is cheap enough that the tighter cache buys us correctness without measurable latency.

The Bypass Tool I Built While Debugging

While I was tracing the bug, I needed a way to test changes against Safari without restarting the MCP server (which would require restarting the Claude Code session). So I wrote a Python wrapper that calls osascript directly, with one job: find a tab by URL prefix in a specific window, then run JS in that exact tab.

def run_js(url_prefix, js_code, window=2):
    js_clean = strip_line_comments(js_code)
    js_escaped = (
        js_clean.replace("\\", "\\\\")
                .replace('"', '\\"')
                .replace("\r", "")
                .replace("\t", " ")
    )
    return subprocess.run(
        ["osascript", "-"],
        input=f'''
tell application "Safari"
  set tCount to count of tabs of window {window}
  set foundIdx to 0
  repeat with i from 1 to tCount
    if URL of tab i of window {window} starts with "{url_prefix}" then
      set foundIdx to i
      exit repeat
    end if
  end repeat
  if foundIdx = 0 then return "ERROR_NO_TAB"
  set jsOut to do JavaScript "{js_escaped}" in tab foundIdx of window {window}
  return "tab:w{window}_" & foundIdx & "|" & jsOut
end tell
''',
        capture_output=True,
        text=True,
        encoding="utf-8",
    )

This bypassed every layer of the MCP and gave me direct, predictable access to whichever tab I wanted in whichever window I wanted. Three rules I learned writing it:

AppleScript's result is a reserved word. Don't name your variable result. Use jsOut or output or anything else. The error message you get is "המשתנה result אינו מוגדר" if your system locale is Hebrew, which is unhelpful unless you happen to know that result is taken.
do JavaScript returns immediately for any expression that's not a synchronously-resolved value. Promises return undefined. Async functions return their [[PromiseState]] representation, which AppleScript silently coerces to "missing value", which then triggers "המשתנה X אינו מוגדר" downstream. Workaround: write the result to window.__myResult from a .then() callback, then poll for it with a second do JavaScript call.
Hebrew text in shell variables breaks AppleScript. When you bash -c "osascript -e '...$VAR...'", the UTF-8 round-trip through shell substitution corrupts Hebrew bytes. The fix is to call osascript - with the script on stdin, in Python or Ruby or any language that handles UTF-8 natively.

How LinkedIn Was Actually Posted

After all that, I still couldn't get LinkedIn's compose modal to open via clicks, even with the bypass tool. LinkedIn's React event handlers check event.isTrusted, which is false for any event dispatched by user JavaScript. Synthetic clicks just get dropped on the floor.

So I gave up on the modal entirely and used LinkedIn's own voyager API directly:

var match = document.cookie.match(/JSESSIONID="?([^";]+)"?/);
var csrf = match[1];

fetch("https://www.linkedin.com/voyager/api/contentcreation/normShares", {
  method: "POST",
  credentials: "include",
  headers: {
    "csrf-token": csrf,
    "content-type": "application/json; charset=UTF-8",
    "accept": "application/vnd.linkedin.normalized+json+2.1",
    "x-restli-protocol-version": "2.0.0"
  },
  body: JSON.stringify({
    visibleToConnectionsOnly: false,
    commentaryV2: { text: postBody, attributes: [] },
    origin: "FEED",
    allowedCommentersScope: "ALL",
    postState: "PUBLISHED",
    media: []
  })
}).then(function(resp){
  return resp.text().then(function(t){
    window.__mcpLinkedinResult = JSON.stringify({status: resp.status, body: t.substring(0, 500)});
  });
});

The csrf-token header is just the value of the JSESSIONID cookie that LinkedIn sets during login. Once you're authenticated, the API accepts your request and returns:

{
  "status": 201,
  "ok": true,
  "body": "{\"status\":{\"urn\":\"urn:li:share:7449905229468274688\",\"toastCtaText\":\"צפייה בפוסט\",\"mainToastText\":\"פרסום הפוסט הצליח.\"}}"
}

"פרסום הפוסט הצליח" — "Post published successfully". The bypass worked. LinkedIn was live.

What Reddit Taught Me

Reddit was my one failure. The user account in window 1 (Personal profile) was logged in. The form on old.reddit.com/r/ClaudeAI/submit filled correctly. The CSRF token (uh field) was present. I built a FormData POST to /api/submit, included all the required fields, and fired it.

Response:

{"json": {"errors": [["BAD_CAPTCHA", "That was a tricky one. Why don't you try that again.", "captcha"]]}}

Reddit's /api/submit endpoint requires a solved reCAPTCHA token, even for fully-authenticated users. There's no API path that bypasses this. There's no honor-system "I'm a real human" header. The only ways through are:

Pay a CAPTCHA-solving service ($1-2 per 1000 captchas, with all the ethical and TOS implications you'd expect)
Have a human solve it
Don't post to Reddit

I picked option 3. I respect the captcha as a clearly-stated boundary.

Lessons

Eat your own dog food at launch. I'd been running safari-mcp for daily browser automation tasks for weeks and never hit this bug. It took the specific combination of "rapid sequence of operations across multiple Safari windows with same-domain tabs and React-driven URL rewrites" to surface it. A launch campaign happens to involve exactly that combination.

Multi-window/multi-profile is a forgotten edge case in browser automation. Most automation tools assume one window or have a strict "first window" convention. Safari's profile feature (introduced in macOS Sonoma) makes multi-window the default for power users. If you write a Safari automation tool, test with three profile windows open from day one.

URL matching is fragile; identity markers in the JS context are bulletproof. This is the takeaway I wish someone had told me three weeks ago. Don't track tabs by URL or title or any other property the page can mutate. Inject a marker into the page's JS realm and check for it.

Cache TTL is a knife edge. 500 ms felt safe. It wasn't. 100 ms with a cheap revalidation check is the sweet spot for this workload. Your sweet spot may differ — measure it.

When debugging, build a bypass tool. Don't fight the bug from inside the affected layer. Route around it. The 60 lines of Python I wrote in the middle of this incident saved me hours of MCP restart cycles, and I get to keep them as a permanent low-level escape hatch.

Some platforms genuinely don't want automation. That's their right. Respect it.

Status

safari-mcp v2.8.3 ships the marker fix today. npm, GitHub, MCP Registry.
The launch campaign worked: HN post live, X tweet live, LinkedIn post live (via the API bypass), Reddit deferred.
The bug-find-fix loop took about 90 minutes. The article you're reading took longer.

If you build MCP servers, automation tools, or anything that touches a multi-window browser, I'd love to hear how you've solved tab identity. Drop a comment or open an issue on achiya-automation/safari-mcp. I learn from every reply.

And if you're considering using your own tool to launch your own tool — do it. The bugs you'll find are the bugs your users would have hit first.

DEV Community