אחיה כהן

Posted on May 28

An MCP server can vanish from your AI agent mid-conversation. Here's the 30-second timeout that did it to me.

#devtools #ai #debugging #mcp

The bug report was: "the browser tools are gone."

I'd been running the same Claude Code session for an hour, calling safari_navigate, safari_click, safari_read_page — the usual flow. Then I opened a new conversation in the same project and the safari tools weren't in the catalog at all. The agent didn't say "I tried to use safari-mcp and it's not available." It just… didn't use them. It re-implemented half of what I needed with Bash and curl.

That second part is the worst part. The agent doesn't know that the tool catalog is incomplete. It only knows what's in front of it. If a tool is missing, it makes do with what it has — and the user has no idea their last release broke discoverability.

This post is about the 30-second timeout that caused it, the diagnosis path, and the one-line fix. But more than that, it's about a failure mode in stdio MCP that I think every MCP author needs to know about and most don't.

The setup

safari-mcp is an MCP server that drives the real macOS Safari. When the user wants their agent to use a separate browser profile (e.g. "Work" vs "Personal"), they launch the server with SAFARI_PROFILE=work and the server scopes every tool call to that profile's window. That means at startup the server has to find the window — call AppleScript, enumerate Safari's open windows, match by profile name, cache the window ref.

Here's what the startup code used to do:

if (SAFARI_PROFILE) {
  await new Promise(r => setTimeout(r, 50));
  await refreshTargetWindow(true);   // <-- this line
  if (_targetWindowRef) {
    _logProfile(`Startup: Profile "${SAFARI_PROFILE}" → ${_targetWindowRef}`);
  } else {
    _logProfile(`WARNING: Profile "${SAFARI_PROFILE}" window NOT found`);
  }
}

ES module top-level await. Looks fine. Profile detection runs once, the server knows which window to target, life is good.

In testing this took ~50–200ms. In production it sometimes took longer than 30 seconds.

What "longer than 30 seconds" means in stdio MCP

When Claude Code launches an MCP server it expects an initialize response within 30 seconds. That's the handshake — the server announces its protocol version and tool catalog, the client says "ok, here's my session." Until that handshake completes, the server's tools don't enter the conversation's tool catalog.

If your top-level await runs >30s before the stdio loop gets a chance to respond, the handshake misses the deadline. The client gives up. The server is killed. No retry. No warning surfaced to the user, just a log entry deep in the Claude Code internals that says "MCP server failed to initialize in 30s."

And critically: the conversation continues. The agent's tool catalog is whatever responded in time. Safari tools just aren't there. The agent has no way to know they were supposed to be there.

I want to underline this: the failure was completely invisible to me as a user. I didn't see a stack trace. I didn't see a "your tools didn't load." I saw an agent that didn't reach for the tools I'd just shipped a fix for.

Why AppleScript stalls past 30 seconds

refreshTargetWindow(true) calls into a Swift helper that runs:

tell application "Safari"
  return name of every window
end tell

On a fresh Safari with three tabs this returns in 12ms. On a real user's Safari, it does any of the following:

Spotlight is reindexing. AppleScript's mach port is fighting for the same lock the indexer holds. 15–60s pause.
The user has 80+ tabs across 12 windows, half of which are loading. name of every window waits for each window's title to settle. 5–20s.
The user just closed and reopened Safari. The Apple Event Manager queues your request behind Safari's own startup activations. 10–30s.
Some background process touched ~/Library/Containers/com.apple.Safari/. The TCC privacy subsystem reverifies your bundle's automation permission. Anywhere from instant to "until the user moves their mouse."

None of those are bugs. They are normal macOS behavior. They were not in the 99th percentile when I tested — they showed up in the 99.9th percentile when the server hit my actual user base.

The naive diagnosis path (mine)

The first thing I did was assume the bug was in the MCP protocol layer. I went and looked at the stdio framing code, the JSON-RPC parser, the request dispatcher. None of it was the problem.

The second thing I did was look at the refreshTargetWindow call and think "well, it works in my testing." Which is the most expensive sentence in software.

The actual diagnostic, which took me about 20 minutes to find, was to read the Claude Code MCP debug logs:

[MCP] safari-mcp: spawned (pid 47192)
[MCP] safari-mcp: sending initialize request
[MCP] safari-mcp: initialize timed out after 30000ms, killing process

That's it. That's the only signal. The MCP client doesn't tell you what the server was doing. It doesn't ask the server "are you stuck?" It just kills it.

Once I had that line, the rest was obvious: the only thing that runs before stdio is refreshTargetWindow. If refreshTargetWindow is slow, stdio never gets a chance. Therefore: don't block stdio on it.

The fix

if (SAFARI_PROFILE) {
  (async () => {
    await new Promise(r => setTimeout(r, 50));
    await refreshTargetWindow(true);
    if (_targetWindowRef) {
      _logProfile(`Startup: Profile "${SAFARI_PROFILE}" → ${_targetWindowRef}`);
    } else {
      _logProfile(`WARNING: Profile "${SAFARI_PROFILE}" window NOT found`);
    }
  })();
}

Wrap the whole startup probe in a fire-and-forget IIFE. Module init returns immediately. Stdio loop binds. Initialize handshake responds in ~5ms. By the time the first safari_* tool call arrives, the profile window probe has usually finished — and if it hasn't, getTargetWindowRef() already has a lazy-refresh path that handles a missing cache by running the probe inline.

The correctness story is: the probe is a cache warm-up, not a hard prerequisite. The tool call path already knows how to handle a cold cache. So there is no reason to make module init wait.

Three lines changed. The bug is gone.

The lesson I want every MCP author to take from this

If your MCP server does anything at startup that touches an external process, an external API, the filesystem outside your bundle, or a system service, you cannot let it block initialize.

Spawning a daemon? Spawn it, don't wait for it.
Reading user config? Do it in tools/list or first tool call, not at import.
Validating credentials? Lazy. First call to the protected tool.
Calling AppleScript / xdotool / win32 / xprop? You have no SLA on those. Defer.
Loading a ~/.config/<your-tool> file? Pretty safe. But still: if it's gone or corrupted, log and continue; don't crash module init.

The asymmetric cost matters here. If your slow probe blocks initialize, the failure mode is the worst possible kind: silent absence of your tools, no error surfaced, agent doesn't know to retry. If your slow probe runs in the background and a tool call arrives before it finishes, the failure mode is at most a single slow tool call — and you can return a clear error message that the agent can see and act on.

There is no version of this trade where blocking is the right answer.

What I wish stdio MCP did differently

This bug shouldn't be possible to ship. Some specific changes I'd love to see:

initialize should not block on tool catalog completeness. Servers should be able to say "I'm up, my catalog is tools/list?ttl=lazy, ask me later."
MCP clients should surface failed server boots in-conversation. Even one line: "Note: safari-mcp failed to start. Tools listed in its package.json will be unavailable this session." The agent reads that, the user reads that, everyone can act on it.
initialize timeouts should generate a retry, not a kill. The current behavior — kill on first timeout, never look again — assumes the server is broken. Often it's just busy.

None of those are the server author's problem to fix, but all of them would have caught this bug for me before a user did.

Why this matters for the agent ecosystem broadly

We are heading toward agents that ship with dozens of MCP servers. The probability that at least one of them silently fails to initialize on any given launch goes from "small" to "nearly certain" once you're stacking 10+ servers. If the failure mode is "agent silently lacks tools it should have," the user experience for AI agents becomes "sometimes the AI is dumber than usual and we don't know why."

That's not a future I want. It's a future where users blame the model for missing capabilities that the harness didn't even surface.

If you're shipping an MCP server: audit your top-level await. If it touches anything that could stall, move it off the critical path. Today. Before someone files the bug report I just filed against myself.

The fix shipped in safari-mcp v2.11.9 today. The full diff is here. The project is on GitHub if you want to see how an MCP server scopes itself to a Safari profile, or to file the next bug.

Top comments (2)

Harjot Singh • May 31

"The agent doesn't know the tool catalog is incomplete" is the genuinely scary part, and it's a whole class of failure people aren't watching for. A missing tool isn't a clean error the model can report, it's an absence, and absence is invisible to something that only knows what's in front of it. So it does the rational thing and re-implements with Bash and curl, quietly, confidently, worse. That's the same shape as a lot of agent failures: the model can't reason about what it can't see, so silent capability loss becomes silent quality loss. A 30-second timeout silently dropping a server from the catalog is exactly the kind of infra flake that turns into "why did the agent suddenly get dumber" with no log line pointing at the cause. The defense I'd want is the agent treating its tool catalog as something to verify, not assume, if safari_* was there an hour ago and a task clearly needs it, the absence should surface as a problem, not a shrug. This is squarely the reliability surface I think about for Moonshift. Did you end up adding catalog health-checks, or pinning the connection to survive the timeout?

אחיה כהן • Jun 2

Exactly — it's the silent-degradation class. The system doesn't error, it just quietly does less, and nothing surfaces "you're now running with 60 tools instead of 80." MCP has no "tool catalog changed" signal back to the model mid-session, so the agent literally can't notice the gap — it just stops reaching for the tools that vanished and never explains why.

The 30s init timeout was only the trigger I happened to hit; any transport hiccup at initialize lands you in the same blind spot. Appreciate you reading it closely.