The bug report was three words long: "navigate_and_read returns junk."
safari_navigate_and_read is supposed to navigate Safari to a URL and hand the page's text back to the AI agent driving it. So I called it. It didn't throw. It didn't time out. It returned. The content field said:
[object Promise]
Not the page title. Not the body text. The literal string [object Promise], as if the page I'd just loaded contained nothing but the stringified shadow of a JavaScript object.
And here's the part that kept me up: the agent that had been calling this tool all week never complained. It couldn't. As far as it was concerned, it asked for a page and got a string back. Strings are what pages look like. It just kept building on it.
So let me say it plainly, before any code, because it's the whole point of this postmortem: a tool that returns the wrong answer with no error is more dangerous than a tool that crashes. A crash would have been a gift. A crash is a signpost. This was a forgery. The agent has no exception to catch, no status code to check, nothing to pattern-match against — it summarizes [object Promise], decides the page is empty or broken, and acts on that conclusion. The failure is silent, and it propagates.
This is a story about a single architectural constraint — one I'd already solved once and then failed to apply everywhere — and the eleven tools it silently broke. If you've never touched AppleScript in your life, stay with me anyway. You almost certainly have a version of this bridge somewhere in your own stack.
The setup
safari-mcp is an MCP server that drives the real macOS Safari — the user's actual logged-in browser — so AI coding agents can navigate, read, fill forms, take screenshots, all 80-some tools of it. The way it talks to Safari is AppleScript's do JavaScript command. You hand AppleScript a string of JavaScript, it runs it inside the page, and gives you back the result. safari-mcp wraps that in a runJS(...) helper.
Here's the one sentence this entire bug hinges on:
AppleScript's
do JavaScriptis synchronous. It returns the moment the synchronous portion of your JavaScript finishes running.
Read that again with an async function in mind. Because some of these tools needed to wait — wait for a page to load, wait for navigation to settle, loop until an element appeared — and the natural way to express "wait" in JavaScript is await. So the old code did the natural thing. It wrapped the work in an async IIFE and handed it across the bridge:
// The broken shape — an async IIFE handed to a synchronous bridge.
(async () => {
history.back();
await waitForLoad(); // ← control returns to AppleScript HERE
return document.body.innerText;
})()
Walk through what AppleScript actually sees:
- The IIFE starts running.
- It runs its synchronous prologue, hits the first
await, and — as every JS engine must — immediately returns a pending Promise to its caller. That caller is AppleScript. - AppleScript is synchronous. It has no concept of a Promise, no event loop to drain, nothing to
.then()onto. So it does the only thing it can do with an object it was handed: it stringifies it.String(promise)returns"[object Promise]"(Object.prototype.toStringreads the Promise'sSymbol.toStringTag). That string travels back across the bridge, up through Node, and out to the agent as "the result." - Meanwhile the actual async work — the
await waitForLoad(), thereturn document.body.innerText— does eventually run inside the page's event loop. Its resolution value resolves into the void milliseconds later, with nobody listening. Fire-and-forget.
The await never happened on the side that mattered. The bridge can't see across the sync/async boundary, so it grabbed the only thing that existed synchronously: the Promise object itself.
The diagnosis
Here's the part that stung. This constraint was not news. safari_evaluate and safari_wait_for had handled it correctly for a long time — they never await inside the page; they poll observable page state from the Node side instead, where Node actually can await. The right pattern already existed in the codebase. It was load-bearing. It worked.
The bug was that eleven other tools had never been migrated to it. And the moment I understood why navigate_and_read failed, the satisfaction curdled into dread. Because the cause wasn't a typo. It was a shape. And shapes get copied.
So I went looking for the shape: any tool passing an async IIFE into do JavaScript and expecting a meaningful return. That one search surfaced ten more. Eleven tools, one root cause — and they failed in two distinct flavors.
Mode 1 — returned the Promise as data. These are the trust-betrayal cases, because the bad value flows straight into the agent's context:
-
safari_navigate_and_read— fully broken, returned[object Promise]instead of the page content. -
safari_get_indexed_db/safari_list_indexed_dbs— same:[object Promise]where structured data should be. -
safari_screenshot_element(canvas path) — returned a Promise and so never fell through to the reliablescreencapture-and-crop fallback. Thesafari_screenshotcanvas fallback and thesafari_upload_filedrop-fallback had the identical flaw.
Mode 2 — fired-and-forgot a side effect, then acted on stale state. Arguably worse, because nothing wrong shows up in this call at all. These performed a visible action — so they looked fine in a demo — but the await that was supposed to wait for the result evaporated:
-
safari_go_back/safari_go_forward— navigated, but never updated the tracked URL, so the next operation could target the wrong tab. -
safari_reload— reloaded, but never waited for load or refreshed the URL. -
safari_click_and_wait— clicked, but never waited. -
safari_fill_and_submit— submitted, but didn't wait for the result page. -
safari_scroll_to(text mode) — scrolled once instead of looping. -
safari_emulate/safari_reset_emulation— didn't wait for the reload.
Mode 2 is the genuinely dangerous one. The agent calls go_back, gets a clean success, then calls fill — and the fill lands on a page the engine thinks it's on but isn't, because the back-navigation hadn't settled and the tracked URL was stale. There is no error anywhere in that chain. There are just two states quietly diverging.
The fix
The repair isn't to make do JavaScript async — you can't; the constraint is real. The fix is to stop asking it to be. The async loop moves to the side that can await — Node. Fire the synchronous trigger into the page, then poll an observable signal — document.readyState — from Node until the page settles, then read.
Here's the shared helper that eleven tools now route through:
// Poll document.readyState from the Node side and return {title,url[,text]} once the
// page settles. `do JavaScript` returns immediately and never awaits an async IIFE
// (see _evaluateAsync), so any in-page `await` loop is fire-and-forget — page-load
// waits MUST be driven from Node. Shared by goBack/goForward/reload/navigateAndRead.
async function _pollReadyAndRead(navIndex, { maxLength } = {}) {
const readExpr = maxLength != null
? `JSON.stringify({title:document.title,url:location.href,text:document.body?document.body.innerText.substring(0,${Number(maxLength)}):''})`
: `JSON.stringify({title:document.title,url:location.href})`;
let result = '{}';
for (let poll = 0; poll < 60; poll++) {
await new Promise(r => setTimeout(r, poll < 10 ? 200 : 500));
try {
const state = await runJS('document.readyState', { tabIndex: navIndex, timeout: 5000 });
if (state === 'complete' || state === 'interactive') {
result = await runJS(readExpr, { tabIndex: navIndex, timeout: 5000 });
if (state === 'complete') break;
if (poll > 10) break; // interactive after ~2s is good enough
}
} catch { /* page still loading, retry */ }
}
return result;
}
Notice where the await lives now. Every await is in Node. Each call into the page through runJS is a single synchronous, immediately-resolving expression — document.readyState returns a string, JSON.stringify({...}) returns a string. Nothing crosses the boundary in a pending state. The asynchronicity — the waiting, the looping, the backoff from 200ms to 500ms — is entirely Node's job, on the side of the bridge that has an event loop.
And the call site, goBack, shows the two-step rhythm: fire the synchronous trigger, then poll from Node.
export async function goBack() {
await refreshTargetWindow();
const navIndex = _activeTabIndex;
// history.back() is synchronous; the page-load wait is polled from Node (see _pollReadyAndRead).
await runJS("history.back()", { tabIndex: navIndex, timeout: 5000 });
const result = await _pollReadyAndRead(navIndex);
try { const p = JSON.parse(result); if (p.url) _activeTabURL = p.url; } catch {}
return result;
}
history.back() is synchronous — it survives the bridge fine. Then _pollReadyAndRead does the waiting in Node. And critically — that last line is the fix for Mode 2: now that we actually have the settled result, _activeTabURL gets updated. The tracked state and the real state stay in sync, and the next operation targets the page that's actually there. The two [object Promise] IndexedDB tools and the genuinely-async cases got routed through a sibling helper, _evaluateAsync, which kicks the async work into a page global and polls that from Node — same principle, same side of the boundary. All of it shipped in safari-mcp v2.12.0.
The lesson every bridge author should steal
AppleScript is incidental. The shape of this bug is not.
Any bridge that crosses a sync/async boundary cannot observe a Promise that lives on the other side. AppleScript's do JavaScript is one example. A synchronous FFI call into a JS runtime is another. An embedded scripting engine whose eval returns immediately is another. A C extension calling into your async-first language is another. In every case the rule is identical: hand async work to a synchronous caller and you get back the handle — the Promise, the future, the task object — not the result. The work still runs, into the void.
The failure is silent by construction. No exception fires, because nothing went wrong from the bridge's perspective — it returned exactly what it had the moment the sync portion finished. You get either a stringified [object Promise] masquerading as your answer, or a fire-and-forget side effect that runs but reports nothing.
The robust pattern is the inversion: keep the async loop on the side that can actually await, and poll observable state across the boundary instead of awaiting across it. Don't ask the bridge to wait. Ask it, repeatedly, "are you done yet?" — and do the waiting yourself.
For agent tooling specifically, here are the tells to grep your own codebase for tonight, because silent failures don't announce themselves. Raise your suspicion the moment a tool "succeeds" but its output is:
- a result that is literally
[object Promise],[object Object], orundefinedstringified; - a follow-up operation that quietly acts on stale state.
And if you find one — don't fix the one. Find its siblings first. The shape was copied once; it was almost certainly copied again.
One symptom. One grep. Eleven tools. The one-off patch would have fixed navigate_and_read and left ten landmines. The systemic fix — one helper, one rule, the await lives in Node — disarmed all of them. That's the difference between fixing a bug and fixing the bug's family.
I'd rather a tool blow up loudly a hundred times than hand my agent a confident wrong answer once.
Top comments (0)