אחיה כהן

Posted on May 7 • Originally published at dev.to

An AI agent overwrote two of my browser tabs. The fix took three releases.

#ai #opensource #debugging #javascript

I was eating dinner when my AI agent ate my tabs.

I had Safari open with a Chatwoot Meta dashboard in one tab and an n8n executions view in another — both with unsaved state, both in the middle of real work. In a third tab, my own tab, my agent was supposed to be testing a new feature in the MCP server I maintain (Safari MCP).

I came back to the laptop and both real-work tabs had been navigated to URLs the agent picked. The Chatwoot tab was now showing some test page. The n8n tab was on a Reddit comment thread the agent had been debugging an unrelated module against.

The agent hadn't gone rogue. The MCP server had a state-tracking bug — and instead of failing loudly, it had silently fallen back to "use whatever tab the user is on."

This is a postmortem. The fix took three releases — v2.10.0, v2.10.1, and v2.10.3 — and the iteration is the interesting part.

The shape of the bug

Safari MCP exposes a safari_new_tab(url) tool. Internally, it tracks "the tab MCP owns" via:

A tab index (_activeTabIndex) — Safari's positional handle.
A DOM marker (window.__mcpTabMarker) — injected JS that lets future calls verify "yes, this is still our tab."

Every subsequent safari_navigate, safari_click, safari_fill etc. resolves "where to act" by:

if marker still in current tab → use it
else if _activeTabIndex still valid → switch to it, re-verify
else → fall back to "front document of front window"

That last branch is the catastrophe. "Front document of front window" is, by definition, whatever the user is looking at right now.

So why did the fallback fire? Three different reasons, across three releases.

v2.10.0 — the original failure mode

The original safari_new_tab(url) did exactly what its name said: open a new tab and navigate it to url in one call.

async function newTab(url) {
  const idx = await openBlankTab();     // creates blank tab
  await navigate(idx, url);              // navigates immediately
  await injectMarker(idx);               // marker for future calls
  _activeTabIndex = idx;
}

Spot the bug? It's about what happens when navigate(idx, url) fails to load — file:// blocked by Safari, network error, an app:// scheme that Safari doesn't understand. The new tab stays at about:blank. The marker injection runs, but then the next user-driven navigation in any tab can wipe it. By the time the next safari_navigate arrives, our marker check fails. Our _activeTabIndex still points at a tab, but Safari's real DOM in that tab has been replaced.

The "front document" fallback fires. We navigate the user's current tab.

I shipped this. I tested it on a clean Safari with one window. I never hit the bug because in clean state, the user's tab is my tab.

v2.10.1 — the grace window (almost-fix)

The first fix was a NEW_TAB_GRACE_MS = 30_000 window. For 30 seconds after safari_new_tab, ANY mutating operation that would fall back to the user's tab now throws a clear error:

if (Date.now() - _lastNewTabAt < NEW_TAB_GRACE_MS && !markerOk) {
  throw new Error(
    "tab tracking lost shortly after new_tab — call safari_new_tab again instead of letting MCP touch your active tab"
  );
}

Plus a fix for the marker wipe — safari_navigate now re-injects window.__mcpTabMarker after every successful navigation, so JS-context resets don't lose tracking.

This passed all my tests. It also worked correctly for ~95% of real sessions.

The 5% it missed: sessions longer than 30 seconds where the tab-ghost recovery path nullified _activeTabIndex mid-session.

v2.10.3 — the permanent guard

runJS (the workhorse for every JS-driven tool) has a tab-ghost recovery path. If a JavaScript evaluate fails because the tab Safari thinks is at index N has been closed/replaced, runJS nullifies _activeTabIndex so the next call resolves cleanly.

The intent: avoid using a stale index after Safari shuffles tabs.

The unintended consequence: 30+ minutes into a session, after a routine ghost-recovery, _activeTabIndex is null. The grace window from v2.10.1 has long expired. The marker check fails (the agent has navigated several times since). Fallback fires. User's current tab gets clobbered.

The bug pattern: a "safe" recovery path created the exact failure mode the grace window was designed to prevent.

The permanent fix is a one-line change in spirit:

let _hasOwnedTab = false;  // session-scoped, sticky

async function newTab(url) {
  // ... existing logic ...
  _hasOwnedTab = true;     // ← set once, never reset
}

function _assertNotFallingBackToUserTab() {
  if (_hasOwnedTab) {
    throw new Error(
      "MCP previously owned a tab in this session, but tracking was lost. " +
      "Refusing to fall back to the user's current tab. Call safari_new_tab to re-establish."
    );
  }
  // sessions that never called new_tab can still use front-document fallback
}

The flag is set the first time safari_new_tab succeeds, and it never resets for the lifetime of the MCP process. The four entry points that can target a tab — _assertNotFallingBackToUserTab (used by navigate and navigateAndRead), runJS's tab-ghost fallback path, and runJSLarge — all call this assertion before falling back to the user's current tab.

If the assertion throws, the agent gets a clear error pointing back at safari_new_tab. The user's tab is untouched.

Sessions that never call safari_new_tab (e.g. tools that explicitly read the user's current tab) are unaffected — _hasOwnedTab stays false, and the front-document fallback still works for them.

What I'd take away if I were writing my own MCP server

1. The fallback you don't notice is the fallback that bites.
"Use the user's current tab" looks like a reasonable degraded mode in isolation. In context — an autonomous agent acting on the user's real, logged-in browser — it's the worst possible default. The fix wasn't "make the fallback work better." It was "the fallback should not exist in this branch of the state machine."

2. State-tracking bugs aren't subtle. They're catastrophic.
A misidentified tab is a misidentified action. The class of bug here — I think I'm acting on X but I'm actually acting on Y — is the same class as a deployment script targeting prod instead of staging, or a Git rebase rewriting the wrong branch. There's no "minor version" of this bug. Engineering effort should be priced accordingly.

3. "Sticky" flags beat "windowed" flags for invariants you actually need.
The v2.10.1 grace window was time-bounded. That made sense for the failure mode I'd seen. But sessions are unbounded. Anything that can happen during the session can happen after the grace window expires. If the property "MCP has owned a tab in this session" is the actual thing protecting the user, that property must hold for the whole session — not 30 seconds.

4. Tests on clean state miss the bugs that matter.
I tested v2.10.0 on a fresh Safari with no other tabs. The user-tab-clobber bug is invisible in that environment, because the user's tab and MCP's tab are the same tab. Real users have eight tabs open and were just clicking around in tab six. If your tool drives a user's real browser, your test environment must have unrelated, in-progress tabs — and your failure modes must be loud when you brush against them.

5. Errors are a feature.
The replacement for "silent fallback to user tab" is a thrown error with a remediation message: "Call safari_new_tab to re-establish." That error is better than the original happy path — because the original happy path was sometimes a disaster. A loud, fixable error is always better than a quiet, irreversible mistake.

The diff that mattered

+ let _hasOwnedTab = false;

  async function newTab(url) {
    const idx = await openBlankTab();
    await navigate(idx, url);
    await injectMarker(idx);
    _activeTabIndex = idx;
+   _hasOwnedTab = true;
  }

  function getFallbackTarget() {
+   if (_hasOwnedTab) {
+     throw new Error("…re-establish via safari_new_tab");
+   }
    return frontDocumentOfFrontWindow();
  }

That's the load-bearing part. Everything else in v2.10.3 is plumbing.

If you're building an MCP server (or any tool that drives a user's real browser/editor/database), the question I'd ask in code review is:

"What's the worst thing this fallback can do, and does the fallback's existence buy enough to be worth that worst case?"

For tab fallback in Safari MCP, the answer was: no, it doesn't.

Source: github.com/achiya-automation/safari-mcp
Install: npx safari-mcp
Releases discussed: v2.10.0, v2.10.1, v2.10.3

Have you shipped a state-tracking bug that ate user data? What was the failure mode, and what flag/invariant ended up being the real fix?

DEV Community