DEV Community: אחיה כהן

My agent could see the dropdown. It just couldn't pick anything.

אחיה כהן — Sun, 24 May 2026 08:41:42 +0000

The agent had a list. I asked it to pick an item. It refused.

Element not found

Refresh. Same.

So I opened DevTools and pasted in:

document.querySelector('select[name="status"]')
// null

null. On a page that obviously had a dropdown. I could see it. I could click it. I could expand it with the mouse. But for some reason document.querySelector insisted it didn't exist.

This is a story about three layers of DOM that don't talk to each other, and what safari_select_option had to learn in v2.11.3 of Safari MCP to reach across them.

The setup

The page was a Salesforce/Lightning support form embedded in a customer portal. The portal is the parent document. The form is in an <iframe> that Lightning ships from a different (but same-origin) host. Inside that iframe, Lightning composes its UI from a graph of custom elements — each one with its own shadow root, each one with its own internal layout.

So when a developer writes a "Status" dropdown in Lightning, the actual <select> element ends up rendered inside something like:

top document
└── <iframe src="...lightning host...">
    └── <support-form>
        └── #shadow-root
            └── <lightning-status-field>
                └── #shadow-root
                    └── <select>   ← the element my agent needed

document.querySelector('select[name="status"]'), called from the top document, traverses none of that. Not the iframe, not the shadow roots. To it, the <select> simply doesn't exist.

What was confusing

Two things made this hard to spot:

safari_snapshot saw it. When the agent took an accessibility snapshot of the page, the dropdown was right there — with a ref, a label, an aria-expanded state, options. Nothing felt missing.
safari_click worked. I'd been clicking deep-DOM elements for weeks without thinking about it. The button that opened this same form was inside a different shadow root, and click resolved it just fine.

So the agent kept asking itself: "I just clicked into this form. The dropdown is right there. Why can't I select anything?"

The answer, embarrassingly, was that click and select_option were using different finders.

Two finders, one tool that hadn't been told

Safari MCP ships two element-resolution paths inside the page:

mcpFindRef(ref) — given a ref from safari_snapshot, walk the document, every same-origin iframe, and every reachable shadow root to find the element that ref points to.
mcpQuerySelectorDeep(selector) — given a CSS selector, do the same deep walk, but match by selector instead of ref.

Click had been using both of these for a long time. That's why click "just worked" on Lightning forms and React component libraries and modal dialogs that render into portals.

safari_select_option, meanwhile, was still doing this:

var el = document.querySelector(sel);

One line. Top frame only. No iframes, no shadow roots, no nothing. On any normal <select> on a normal page, that line is fine — and it had been fine since the day the tool was written, which is exactly why nobody had touched it.

But once a single user dropped Safari MCP into a real Salesforce portal, that line was wrong on every call.

The fix

The v2.11.3 patch is small. It teaches safari_select_option what safari_click already knew:

let finder;
if (ref) {
  finder = `mcpFindRef('${ref}')`;
} else if (selector) {
  finder = `(document.querySelector('${sel}')||mcpQuerySelectorDeep('${sel}'))`;
}

Two paths:

ref path — the one to use for anything found via safari_snapshot. It resolves through the same deep walker as click. Iframes, shadow roots, both kinds of nested-component land.
selector path — start with the cheap top-frame query (still right 95% of the time), fall through to the deep walker only when that returns null.

The rest of the tool — the _valueTracker reset that wakes up React's controlled-input bookkeeping, the value-then-text-then-substring matching for selects whose visible text doesn't equal their value, the input/change/blur event sequence — is unchanged. That part wasn't broken. The element lookup was.

Full v2.11.3 release notes here.

What I should have done earlier

The honest version of this is: I had two finders. I used one of them in click. I forgot it existed when I wrote select_option. The fix took an hour. The bug had been there for three months.

The lesson I keep relearning while building this tool is that DOM tools should not have a "happy path." Every tool that resolves an element should resolve it the same way as every other tool, because the page doesn't know which tool you're going to call next. Click into a shadow root, then try to fill a field, and the fill tool had better look in the same place click did.

safari_fill, safari_get_element, safari_hover, safari_get_computed_style — they all went through this same migration earlier in the v2.x series, one bug report at a time. safari_select_option was the last one I hadn't audited. v2.11.3 closes that gap.

If you're building any kind of browser-automation tool — MCP or otherwise — the question worth asking your codebase tonight is: do all my element-resolution paths agree on what "the element" is? Because the day they don't, the agent will be the one to find out.

Safari MCP is open source (MIT) at github.com/achiya-automation/safari-mcp. Native browser automation for AI agents on macOS — your real Safari, your real logins, no Chrome.

Buying a WhatsApp Bot in 2026? Five Traps to Avoid

אחיה כהן — Sun, 17 May 2026 16:09:29 +0000

I build WhatsApp automation for small businesses, and in 2026 the first thing you notice about the market is how similar every offer looks. A dozen vendors, the same promises — 24/7 replies, appointment booking, lead capture — and quotes that all cluster in the same range.

After several years of these projects, the pattern is clear: the expensive mistakes are almost never in the build itself. They are in the buying decision — the questions nobody asked before signing. Here are the five that account for most of the regret.

Trap 1 — Judging the quote by its setup price

The visible number on a bot quote is the one-time build fee. In the market I work in (Israel), that is broadly ₪3,500 for a basic bot, around ₪6,500 for a mid-tier with CRM integration, and ₪12,000+ for a full AI agent. Buyers line those numbers up and pick the lowest.

But the setup fee varies least between serious vendors. What actually decides whether the bot is cheap or expensive over three years is the recurring cost: per-message template fees, language-model usage, hosting, maintenance. A bot with a low setup fee and an unexamined monthly tail can cost more over two years than a pricier build with a lean tail. Ask for the three-year total, not the sticker.

Trap 2 — Not asking which API the bot runs on

This is the single most consequential question, and most buyers never ask it.

A WhatsApp bot connects in one of two ways: through Meta's official WhatsApp Business API, or through an unofficial route that automates the regular WhatsApp app (the best-known open-source project here is WAHA). The unofficial route is genuinely cheaper — no per-message fee — and at low volume it works.

It also runs against Meta's terms of service, and the number it runs on can be suspended without warning or appeal. For most businesses the WhatsApp number is the customer database and the primary contact channel. Losing it is not an inconvenience; it is a small catastrophe. The unofficial route is reasonable for narrow cases — internal tools, low-stakes notifications — but you should know which one you are buying, and why. A quote that doesn't specify is a quote to question.

Trap 3 — Paying for AI the bot will never use

"AI-powered" is the headline tier on most 2026 quotes. It is also the tier most often sold and least often delivered — not because vendors cut corners, but because an AI bot is only as good as what it is given.

A language model wired to WhatsApp but never fed the business's real price list, policies, hours, and tone will underperform a plain, well-built menu bot. The capability is real; it just isn't automatic. Before paying the AI premium, ask the concrete question: what exactly will the model be given to work from, who assembles that, and who keeps it current. If the answer is vague, you're buying a label.

Trap 4 — Treating the bot as build-once

A WhatsApp bot is not an appliance. It is software sitting on top of three things that move constantly — Meta's platform, the automation tooling, and the AI models underneath.

The pace is not theoretical. Inside one 90-day window in 2026, Meta repriced its message templates, n8n shipped native AI-agent capability, the major model vendors cut their cheapest production models by 40–60%, and Meta's WhatsApp Calling API reached general availability. A bot specified in early 2025, on early-2025 assumptions, is already running on stale economics. Either budget for maintenance, or budget — unknowingly — for a rebuild.

Trap 5 — No exit ramp to a human

The last trap is the most human one. A bot with no clean handoff to a person frustrates customers more than no bot at all — everyone has been stuck in a loop with an automated system that won't let them out.

The measure of a well-built bot is not how many conversations it handles end to end. It is how gracefully it recognizes the ones it cannot, and how fast it puts a real person in the chat. Ask any vendor to walk through exactly what happens when the bot doesn't know the answer.

What good looks like

A strong vendor's quote answers all five of these before you think to ask — it states the API, breaks out the recurring costs, is specific about what the AI is fed, includes a maintenance path, and treats the human handoff as a feature, not an afterthought.

If you want the longer version — the bot types, the official-vs-unofficial API tradeoff in detail, and current pricing — I keep a full guide here: WhatsApp Bot for Business: the complete guide.

The market is crowded, and from the outside the offers really do look alike. They stop looking alike the moment you ask the five questions.

I run an automation consultancy focused on WhatsApp chatbots, n8n workflows, and CRM integrations for small businesses.

Why element.click() Isn't a Click

אחיה כהן — Sun, 17 May 2026 13:58:14 +0000

My AI agent had a checkbox to tick. A multi-step form: tick two boxes, hit Next. It ticked the boxes. It hit Next. The form rejected it and snapped back to step one — every time.

The boxes were visibly checked. The DOM said checked = true. The agent had done everything right. The form still didn't believe it.

Over the next day I shipped four patch releases of safari-mcp — v2.10.6 through v2.10.9 — and still didn't completely win. Here's what each layer of the stack taught me about why a programmatic click isn't a click.

The tell: `isTrusted`

When software clicks, it calls element.click() or dispatches a MouseEvent. The handler runs — but the event carries isTrusted: false. That flag is the browser stating, on the record, that no human did this.

Most code never checks. But the modern stack has at least four layers that do, and each rejects a forged click in its own way. I met all four.

Layer 1 — the component library

The form used react-select. My tool opened the dropdown by clicking the chevron, then clicked the option. Fine — for the first few rows. Past row four, clicking the chevron did nothing. No menu. The element still had a live React fiber; its pointer handler had simply, silently, stopped responding.

So I stopped driving the UI. The v2.10.6 fix walks the React fiber up from the target node, finds the Select component, and calls its onChange directly — with the same { action: 'select-option', option, name } payload react-select dispatches internally. No menu, no chevron, no click.

Lesson: when a component's visible affordance gets flaky, its internal API usually isn't. Reach for the fiber.

Layer 2 — the framework

Next layer: Vue 3. My tool clicked the checkbox. The DOM .checked flipped to true. Vue's reactive v-model proxy did not.

So the box looked checked, but Vue's internal state still held the old value — and the next form submission read Vue's state, not the DOM. That was the snap-back.

The v2.10.7 fix is belt-and-suspenders: after the click, redispatch input and change with composed: true so they cross Shadow DOM and Vue Teleport portals, and reset React's _valueTracker for the shared React-checkbox case. Now the reactive layer hears the change — not just the DOM.

Lesson: flipping a DOM property is not the same as telling the framework you flipped it.

Layer 3 — the browser's own geometry

Safari MCP can also do real clicks — actual OS-level CGEvent mouse events at screen coordinates — for cases synthetic clicks can't reach.

To turn a page coordinate into a screen coordinate, you need the height of everything above the web content: title bar, toolbar, tab bar. I had hardcoded it at 74 px.

On modern Safari the chrome above the content is closer to 90 px. Every native click landed ~16 px high. Often that's still inside the target row, so it sort of worked — but for a button near whitespace, 16 px is a hit versus a silent miss.

The v2.10.8 fix: stop guessing. Compute outerHeight - innerHeight in JavaScript at click time, with a sanity range and a fallback. The browser already knows how tall its own chrome is. Ask it.

Lesson: never hardcode a number the platform will hand you for free.

Layer 4 — the operating system

Those OS-level clicks need macOS Accessibility permission. macOS stores that grant in its TCC database, keyed to the code-signing identifier of the binary asking.

My helper binary was ad-hoc signed with a hash-based identifier — a new string on every rebuild. So every npm install produced a binary macOS had never seen. The Accessibility grant from yesterday was bound to yesterday's identifier; today's binary inherited nothing.

The symptom was maddening: the helper reported success, no clicks reached the page, and System Settings showed the permission as granted — for the stale identifier.

The v2.10.9 fix: postinstall re-signs the helper with a stable identifier so the grant survives upgrades.

Lesson: if a permission keeps "randomly" resetting, check whether the thing requesting it has a stable identity.

Layer 5 — the one I haven't beaten

Four releases, four layers, one day. And then, on macOS 26, a click still didn't land.

With everything above fixed — right coordinates, stable permission, valid target — CGEvent.postToPid reports a successful injection and the page receives nothing. No isTrusted event at all. The private window-targeting fields the call needs are still present in the macOS 26 SDK; the event simply never crosses into Safari's sandboxed WebContent process.

I can't yet prove it's an OS change rather than something I'm still missing — so it's tracked in the open as issue #29, with the full repro and everything ruled out. If you've automated macOS UI and have a theory, that thread's the place.

"Just click the button"

A click looks atomic. It isn't. Before a real finger reaches a checkbox, an event has to satisfy a component library, a reactive framework, the browser's coordinate math, and the OS permission model — all in one motion — and on a new OS release, the OS can quietly change the rules underneath all of it.

element.click() skips the finger and asks four contracts to take its word for it. Some of them won't. If you're building automation for AI agents, budget for every layer — and keep your release numbers cheap. Some days you'll spend four.

safari-mcp is open source — native Safari automation for AI agents on macOS, no Chrome, no headless. github.com/achiya-automation/safari-mcp

My AI agent saved the first paragraph and the last. It dropped 41 in between.

אחיה כהן — Tue, 12 May 2026 15:07:43 +0000

I asked an AI agent to cross-post a 7,000-character article from dev.to to Hashnode.

The Submit click succeeded. Hashnode returned a draft URL. I clicked through.

The draft had 446 characters: the first paragraph, then 41 empty paragraphs, then the last paragraph.

This is a postmortem of how I got there, why my first three diagnoses were wrong, and what fixed it. If you're shipping any kind of browser automation that touches modern rich-text editors, this one is worth the read.

The setup

Safari MCP is the macOS-native browser automation tool I maintain. One of the things it has to do is fill rich-text editors — Quill, ProseMirror, Lexical, the React-controlled stuff Featured.com uses, and a dozen variations.

For the cross-posting flow specifically, the agent does this:

Opens the Hashnode "new draft" page in my real Safari (already signed in).
Drops a 7,000-character markdown body into the editor.
Clicks Publish.

That's it. It worked for years on dev.to, Medium, X, LinkedIn (after their 2026 Quill migration), Featured.com. Hashnode was supposed to be the easy one — they sell themselves as "developer-friendly".

The symptom

After Submit, the draft saved with this structure:

<p>First paragraph (intact, ~280 chars).</p>
<p></p>
<p></p>
<p></p>
... 41 empty <p> tags ...
<p></p>
<p>Last paragraph (intact, ~166 chars).</p>

Total saved: 446 chars. Total sent: 6,808 chars. The middle 94% was silently dropped.

The agent had no error. The fill call returned successfully. The Submit click returned successfully. Hashnode's own draft view showed the broken structure as if it were intentional.

First diagnosis (wrong): paste race condition

My first guess was a paste-event timing issue. I'd recently fixed a similar bug on X.com where the synthetic ClipboardEvent('paste') was racing with their React useEffect cycle. The fix had been an explicit execCommand('delete') before the paste.

I tried the same thing here. No change.

I added a safari_verify_state call between fill and submit. The verifier confirmed the editor's .textContent matched what I sent — at the moment I checked. But by the time Submit fired ~100ms later, the editor state had reverted.

So whatever was eating the middle paragraphs, it was doing it after the fill returned. The agent's "success" signal was lying.

Second diagnosis (wrong): markdown auto-conversion

Hashnode's editor does auto-format certain characters at the start of a line:

> → blockquote
** → bold marker
[ → link helper
# → heading

I noticed the body had several paragraphs starting with these characters. So I theorized: the editor was rejecting paragraphs whose first characters tripped auto-format prompts, leaving them empty until the user manually accepted.

Fix: I escaped the leading characters with a zero-width space. Re-ran. Result: still 446 chars saved.

So that wasn't it either.

Third diagnosis (wrong): React reconciliation order

Hashnode's editor wraps ProseMirror in a React component. I suspected that the multiple beforeinput events I was dispatching were getting batched and only the first + last applied.

I switched from beforeinput to composing text events with intermediate setTimeout(0) calls to give React's render cycle a chance to flush.

Still 446 chars.

At this point I was four hours in and getting irrationally angry at a contenteditable div.

The real bug

I read the Safari MCP fill pipeline. For ProseMirror editors, it does this:

Walk the DOM looking for .pmViewDesc (ProseMirror's view marker).
If found, call view.dispatch(tr.replaceWith(...)) — the canonical way.
If not found, walk React Fiber for memoizedProps.view.
If still not found, fall through to char-by-char beforeinput + execCommand('insertText') per line.

Path 4 is the fallback for editors that don't expose ProseMirror's internals. It's worked everywhere I'd tested it. Including Hashnode in earlier dev.

But it has one assumption baked in: that the editor will accept the text it's told to insert. If the editor rejects an insert silently — for any reason — the fill pipeline never finds out. The function returns success. The DOM has empty paragraphs.

Hashnode's ProseMirror configuration has an "input rules" plugin that runs on every paragraph start. The plugin's job is to handle markdown shortcuts. But its implementation aborts the insert if the matched text doesn't form a valid command — and just doesn't insert anything.

So > blockquote text doesn't become a blockquote. It also doesn't become a regular paragraph. It becomes nothing.

The fill pipeline is char-by-char per line. It walks down, fires beforeinput, fires execCommand. The input rule fires on each >, kills the line silently. Pipeline moves to next line. Same thing.

Only paragraphs whose first character doesn't trip a rule survive. In my article, that was paragraph 1 and the final paragraph.

The fix

The fix is straightforward once you see it:

// After char-by-char fill, verify what actually landed.
const actual = editor.textContent.length;
const expected = value.length;

if (actual < expected * 0.6) {
  // More than 40% missing. The editor ate it.
  // Clear via DOM replacement, then re-fill via insertHTML with paragraph-wrapped HTML.
  while (editor.firstChild) editor.removeChild(editor.firstChild);
  const html = value
    .split('\n\n')
    .map(p => `<p>${escapeHtml(p)}</p>`)
    .join('');
  document.execCommand('insertHTML', false, html);
  return `Filled CE (ProseMirror insertHTML fallback, ${editor.textContent.length}/${expected})`;
}

insertHTML bypasses the input-rules plugin because the rules only fire on character-level input events. A bulk HTML insert is treated as a paste and goes through a different code path — one that doesn't run the markdown-shortcut interceptor.

Important: the values going through escapeHtml and insertHTML here come from the agent's own controlled context — text the agent itself is trying to place into a logged-in editor in the user's own browser session. This isn't a server rendering untrusted user input, so the escape step is purely to preserve the literal characters, not to harden against an attacker.

Verify-after-fill is the part that took me too long to add. Trust the framework to tell you what happened, not the call that just returned.

This shipped in v2.10.4 of Safari MCP.

The lesson

The deeper issue isn't ProseMirror. It's the assumption that a successful tool call means the action succeeded.

Browser automation has a recurring failure mode: the framework keeps its own state separately from the DOM, and the framework's state is what gets submitted on form post. Your synthetic events change one or the other, sometimes both, sometimes neither. The DOM looks right. The submit ships the wrong thing.

I added a tool called safari_verify_state in v2.10.0 specifically for this. It checks framework state (ProseMirror view, Lexical editor state, React _valueTracker desync, Closure component values) and returns whether the framework agrees with the DOM. The Hashnode ProseMirror case is now a built-in check.

If you're building any agent that touches a serious editor — Quill, ProseMirror, Lexical, Slate, Tiptap — assume the fill happened, then verify what landed before you click Submit. The 5 ms it takes is the cheapest insurance you'll buy this year.

Postscript

I cross-posted this article using the fixed code path. It saved as 41 intact paragraphs.

The agent didn't notice anything was different. That's the point.

Safari MCP is open source (MIT) and runs on macOS. It's used by Claude Code, Cursor, and other MCP clients to drive a real, logged-in Safari instead of spinning up a headless Chromium.

An AI agent overwrote two of my browser tabs. The fix took three releases.

אחיה כהן — Thu, 07 May 2026 12:45:23 +0000

I was eating dinner when my AI agent ate my tabs.

I had Safari open with a Chatwoot Meta dashboard in one tab and an n8n executions view in another — both with unsaved state, both in the middle of real work. In a third tab, my own tab, my agent was supposed to be testing a new feature in the MCP server I maintain (Safari MCP).

I came back to the laptop and both real-work tabs had been navigated to URLs the agent picked. The Chatwoot tab was now showing some test page. The n8n tab was on a Reddit comment thread the agent had been debugging an unrelated module against.

The agent hadn't gone rogue. The MCP server had a state-tracking bug — and instead of failing loudly, it had silently fallen back to "use whatever tab the user is on."

This is a postmortem. The fix took three releases — v2.10.0, v2.10.1, and v2.10.3 — and the iteration is the interesting part.

The shape of the bug

Safari MCP exposes a safari_new_tab(url) tool. Internally, it tracks "the tab MCP owns" via:

A tab index (_activeTabIndex) — Safari's positional handle.
A DOM marker (window.__mcpTabMarker) — injected JS that lets future calls verify "yes, this is still our tab."

Every subsequent safari_navigate, safari_click, safari_fill etc. resolves "where to act" by:

if marker still in current tab → use it
else if _activeTabIndex still valid → switch to it, re-verify
else → fall back to "front document of front window"

That last branch is the catastrophe. "Front document of front window" is, by definition, whatever the user is looking at right now.

So why did the fallback fire? Three different reasons, across three releases.

v2.10.0 — the original failure mode

The original safari_new_tab(url) did exactly what its name said: open a new tab and navigate it to url in one call.

async function newTab(url) {
  const idx = await openBlankTab();     // creates blank tab
  await navigate(idx, url);              // navigates immediately
  await injectMarker(idx);               // marker for future calls
  _activeTabIndex = idx;
}

Spot the bug? It's about what happens when navigate(idx, url) fails to load — file:// blocked by Safari, network error, an app:// scheme that Safari doesn't understand. The new tab stays at about:blank. The marker injection runs, but then the next user-driven navigation in any tab can wipe it. By the time the next safari_navigate arrives, our marker check fails. Our _activeTabIndex still points at a tab, but Safari's real DOM in that tab has been replaced.

The "front document" fallback fires. We navigate the user's current tab.

I shipped this. I tested it on a clean Safari with one window. I never hit the bug because in clean state, the user's tab is my tab.

v2.10.1 — the grace window (almost-fix)

The first fix was a NEW_TAB_GRACE_MS = 30_000 window. For 30 seconds after safari_new_tab, ANY mutating operation that would fall back to the user's tab now throws a clear error:

if (Date.now() - _lastNewTabAt < NEW_TAB_GRACE_MS && !markerOk) {
  throw new Error(
    "tab tracking lost shortly after new_tab — call safari_new_tab again instead of letting MCP touch your active tab"
  );
}

Plus a fix for the marker wipe — safari_navigate now re-injects window.__mcpTabMarker after every successful navigation, so JS-context resets don't lose tracking.

This passed all my tests. It also worked correctly for ~95% of real sessions.

The 5% it missed: sessions longer than 30 seconds where the tab-ghost recovery path nullified _activeTabIndex mid-session.

v2.10.3 — the permanent guard

runJS (the workhorse for every JS-driven tool) has a tab-ghost recovery path. If a JavaScript evaluate fails because the tab Safari thinks is at index N has been closed/replaced, runJS nullifies _activeTabIndex so the next call resolves cleanly.

The intent: avoid using a stale index after Safari shuffles tabs.

The unintended consequence: 30+ minutes into a session, after a routine ghost-recovery, _activeTabIndex is null. The grace window from v2.10.1 has long expired. The marker check fails (the agent has navigated several times since). Fallback fires. User's current tab gets clobbered.

The bug pattern: a "safe" recovery path created the exact failure mode the grace window was designed to prevent.

The permanent fix is a one-line change in spirit:

let _hasOwnedTab = false;  // session-scoped, sticky

async function newTab(url) {
  // ... existing logic ...
  _hasOwnedTab = true;     // ← set once, never reset
}

function _assertNotFallingBackToUserTab() {
  if (_hasOwnedTab) {
    throw new Error(
      "MCP previously owned a tab in this session, but tracking was lost. " +
      "Refusing to fall back to the user's current tab. Call safari_new_tab to re-establish."
    );
  }
  // sessions that never called new_tab can still use front-document fallback
}

The flag is set the first time safari_new_tab succeeds, and it never resets for the lifetime of the MCP process. The four entry points that can target a tab — _assertNotFallingBackToUserTab (used by navigate and navigateAndRead), runJS's tab-ghost fallback path, and runJSLarge — all call this assertion before falling back to the user's current tab.

If the assertion throws, the agent gets a clear error pointing back at safari_new_tab. The user's tab is untouched.

Sessions that never call safari_new_tab (e.g. tools that explicitly read the user's current tab) are unaffected — _hasOwnedTab stays false, and the front-document fallback still works for them.

What I'd take away if I were writing my own MCP server

1. The fallback you don't notice is the fallback that bites.
"Use the user's current tab" looks like a reasonable degraded mode in isolation. In context — an autonomous agent acting on the user's real, logged-in browser — it's the worst possible default. The fix wasn't "make the fallback work better." It was "the fallback should not exist in this branch of the state machine."

2. State-tracking bugs aren't subtle. They're catastrophic.
A misidentified tab is a misidentified action. The class of bug here — I think I'm acting on X but I'm actually acting on Y — is the same class as a deployment script targeting prod instead of staging, or a Git rebase rewriting the wrong branch. There's no "minor version" of this bug. Engineering effort should be priced accordingly.

3. "Sticky" flags beat "windowed" flags for invariants you actually need.
The v2.10.1 grace window was time-bounded. That made sense for the failure mode I'd seen. But sessions are unbounded. Anything that can happen during the session can happen after the grace window expires. If the property "MCP has owned a tab in this session" is the actual thing protecting the user, that property must hold for the whole session — not 30 seconds.

4. Tests on clean state miss the bugs that matter.
I tested v2.10.0 on a fresh Safari with no other tabs. The user-tab-clobber bug is invisible in that environment, because the user's tab and MCP's tab are the same tab. Real users have eight tabs open and were just clicking around in tab six. If your tool drives a user's real browser, your test environment must have unrelated, in-progress tabs — and your failure modes must be loud when you brush against them.

5. Errors are a feature.
The replacement for "silent fallback to user tab" is a thrown error with a remediation message: "Call safari_new_tab to re-establish." That error is better than the original happy path — because the original happy path was sometimes a disaster. A loud, fixable error is always better than a quiet, irreversible mistake.

The diff that mattered

+ let _hasOwnedTab = false;

  async function newTab(url) {
    const idx = await openBlankTab();
    await navigate(idx, url);
    await injectMarker(idx);
    _activeTabIndex = idx;
+   _hasOwnedTab = true;
  }

  function getFallbackTarget() {
+   if (_hasOwnedTab) {
+     throw new Error("…re-establish via safari_new_tab");
+   }
    return frontDocumentOfFrontWindow();
  }

That's the load-bearing part. Everything else in v2.10.3 is plumbing.

If you're building an MCP server (or any tool that drives a user's real browser/editor/database), the question I'd ask in code review is:

"What's the worst thing this fallback can do, and does the fallback's existence buy enough to be worth that worst case?"

For tab fallback in Safari MCP, the answer was: no, it doesn't.

Source: github.com/achiya-automation/safari-mcp
Install: npx safari-mcp
Releases discussed: v2.10.0, v2.10.1, v2.10.3

Have you shipped a state-tracking bug that ate user data? What was the failure mode, and what flag/invariant ended up being the real fix?

LinkedIn Quietly Migrated From ProseMirror to Quill — and Broke Every Browser Automation Tool That Touched the Composer

אחיה כהן — Sun, 03 May 2026 07:34:07 +0000

I shipped a fix to my MCP server last week for LinkedIn's ProseMirror composer. It worked. Two days later, every LinkedIn post automation broke.

This is the post-mortem of what changed, how I figured it out, and why "automate the platform" stories almost always end this way.

The crash

The symptom was specific. My MCP server's safari_fill tool — which dutifully filled ProseMirror by walking React Fiber and calling editor.commands.setContent(html) — was now crashing the helper daemon and dismissing the composer dialog the instant it touched the contenteditable.

Same composer URL. Same DOM tree at first glance. Same selectors. Different editor underneath.

The DOM tells the truth

I dropped into the browser console and ran the usual probe:

const el = document.querySelector('[contenteditable="true"]');
el.editor // -> undefined
el.closest('.ProseMirror') // -> null
el.closest('.ql-editor') // -> <div class="ql-editor">

There it was. .ql-editor is the canonical Quill class name. LinkedIn had swapped the post composer from ProseMirror to Quill at some point in early 2026 with no announcement I can find.

Why it was crashing

Quill, like ProseMirror, doesn't let you "just" stuff text into the contenteditable. Both editors hold an internal model — Quill calls it a Delta — and the DOM is downstream of that model.

If you bypass the model and write to the DOM directly, two things happen:

The model and DOM disagree.
The next user-driven event (a keystroke, a save) triggers a re-render that throws because the diff is incoherent.

That's what was killing the composer. My fill was writing to innerText, the Delta state thought the editor was still empty, the React tree tried to reconcile, and the dialog evaporated. The Swift daemon caught the cascading exception and crashed itself for good measure.

The fix: drive Quill the way it expects to be driven

Quill exposes a programmatic API. You just need a reference to the instance. The lookup order I landed on:

Walk up to find an ancestor with class .ql-container.
Try .__quill — Quill 2.x attaches the instance there directly.
Fall back to React Fiber: walk up the fiber chain looking for memoizedProps.quill or stateNode.quill (LinkedIn wraps Quill in a React component that holds the instance in props).
If still nothing, fall back to a real CGEvent Cmd+V paste — Quill respects clipboard events with isTrusted: true.

Once you have the instance, the actual fill is one line:

quill.setContents([{ insert: text + '\n' }], 'api');

The 'api' source flag is the part that matters. It tells Quill "this came from your own API, update your model and the DOM together." The text commits, the Delta stays consistent, and the React parent doesn't try to re-conciliate against a corrupted model.

What this taught me about platform automation

Two lessons, both old, both worth re-learning:

Editors aren't a stable interface. ProseMirror and Quill have different APIs, different state models, and different rules for "what counts as a real edit." Targeting one of them only works until the platform decides it doesn't anymore. LinkedIn made this swap with zero changelog. The only way I knew was that my code broke.

The DOM is the lowest common denominator. The editor model is the actual one. Every automation tool that synthesizes events on the contenteditable is operating one layer below the truth. Sometimes that works (because the editor reconciles). Sometimes it doesn't (because the editor crashes or silently discards the input). The robust path is always to find the editor instance and call its API.

There's a third lesson, which is more uncomfortable: I couldn't fully verify my fix on LinkedIn, because LinkedIn's modal-opening behavior in headless contexts is independently broken right now. The composer button accepts clicks, the dialog DOM materializes, but it never visually opens. So the Quill detection is in place — and verified on test pages — but the LinkedIn-specific live path is still gated on a separate modal issue I haven't cracked.

This is the texture of platform automation. Two unrelated bugs, same week, same target. Each one looks like the other. You ship a fix for one and the other one masquerades as a regression.

The takeaway

If you're building anything that types into a third-party rich text editor — Slack, LinkedIn, Discord, Medium, Notion — the editor identity is part of your contract with the platform, and the platform doesn't owe you stability there. Detect the editor type at runtime. Have a fallback for the unknown case (real clipboard events, ideally). Log what you found, so when it changes you find out from your own telemetry instead of from a Slack message at 11pm.

And read the contenteditable's class list before you touch it. ProseMirror and Quill have different class signatures and the DOM will tell you what you're dealing with — if you ask.

The fix shipped in safari-mcp@2.10.2. Source on GitHub.

When GitHub Actions Goes Silent: The Pending-Forever Bug I Hit Shipping My MCP Server to npm

אחיה כהן — Tue, 28 Apr 2026 19:22:06 +0000

I have an open-source MCP server. I tag a release, push, GitHub Actions builds, npm publishes, MCP Registry updates. That's the contract. It worked for v2.7.6 through v2.8.4.

Then v2.8.5 didn't publish. Neither did v2.8.6. Or v2.9.0. Or v2.9.1. Or v2.9.2. Or v2.9.3.

Six releases stuck. Not failing — stuck. Yellow dot. Forever.

Here's what was actually happening. And how I got the releases out without GitHub Actions.

The symptom that doesn't match any docs

Every release event triggered the workflow. Every workflow showed up in the runs list. None of them ever started a job.

$ gh run view 25001890100 --json status,conclusion,jobs
{
  "status": "queued",
  "conclusion": null,
  "jobs": []
}

No conclusion. No jobs. Empty pending_deployments. Not "waiting for approval". Not "in_progress". Not "failure". Just pending with no work scheduled — for 125 hours.

If you search "GitHub Actions stuck pending", you'll find a hundred forum posts. Every answer assumes one of:

You hit the runner concurrency limit (3 for free-tier macos)
You have a deployment environment requiring approval
Your runs-on: label is unreachable
You're using self-hosted runners with no online agents

None of those applied. My workflow was simple, no environments with required reviewers, runs-on: macos-latest, no self-hosted runners.

The thing GitHub doesn't tell you in the run UI

The runs list shows pending. The run detail page shows pending. The job list shows nothing. The "deployment" tab shows nothing.

But if you look at your billing dashboard, there's a different story:

Your account has used 100% of included macOS minutes for this billing period.

That's it. That's the entire diagnostic. There is no banner on the run page. The workflow doesn't fail with a clear error. It just sits in the queue forever — because the runner that would pick it up doesn't exist, and the queue doesn't time out events.

The minutes counter resets monthly. Until it does, every release event becomes another silent pending row.

Two facts that surprised me

Fact 1: macOS runners cost 10x more than Linux runners. Both runs-on: macos-latest and runs-on: macos-13 charge against your Actions minutes at a 10x multiplier. The free 2,000 minutes/month gets you 200 minutes of macOS — about 20 release builds if each takes 10 minutes.

Fact 2: Switching to Linux didn't fix it. I changed runs-on: macos-latest to runs-on: ubuntu-latest. Same symptom. 0 jobs queued, status "pending". Why?

The macOS minutes meter is one bucket. The Linux meter is another. When the macOS bucket emptied, my pending macOS runs were still in the queue, blocking new runs. Even after switching the workflow to ubuntu, the concurrency group in the YAML serialized everything:

concurrency:
  group: publish
  cancel-in-progress: false

So new ubuntu runs queued behind old stuck macOS runs and never started.

The two-part fix

Part 1: workflow_dispatch with tag input

Adding a manual trigger lets you re-publish a tag whose release-event run got stuck, without deleting and recreating the GitHub Release:

on:
  release:
    types: [published]
  workflow_dispatch:
    inputs:
      tag:
        description: "Tag to publish (e.g. v2.9.3)"
        required: true
        type: string

In every step that needs the tag, fall back through both event types:

- uses: actions/checkout@v6
  with:
    ref: ${{ github.event.inputs.tag || github.ref_name }}

That alone isn't enough — if the runner pool is still empty, the dispatched run also stalls. But it gives you a clean re-trigger path the moment runners are back.

Part 2: portable runner OS

The workflow downloaded mcp-publisher_darwin_${ARCH}.tar.gz — hardcoded "darwin". Switching to ubuntu broke that step. Generalize:

- name: Download mcp-publisher
  run: |
    OS=$(uname -s | tr '[:upper:]' '[:lower:]')
    ARCH=$(uname -m)
    if [ "$ARCH" = "x86_64" ]; then ARCH=amd64; fi
    if [ "$ARCH" = "aarch64" ]; then ARCH=arm64; fi
    curl -sL "https://github.com/modelcontextprotocol/registry/releases/latest/download/mcp-publisher_${OS}_${ARCH}.tar.gz" -o mcp-publisher.tar.gz
    tar -xzf mcp-publisher.tar.gz mcp-publisher

Now the same step works on macOS-arm64, macOS-x86_64, ubuntu-x86_64, and any future runner.

The manual workaround that actually shipped the release

While the workflow stays stuck, here's how I got v2.9.3 to npm and the MCP Registry from my laptop:

npm: the easy part

git checkout v2.9.3
npm publish --provenance --access public

--provenance requires a valid OIDC token, which only works inside GitHub Actions. Skip it locally:

npm publish --access public

You lose the provenance attestation, but the package ships. Provenance is a nice-to-have, not a publish blocker.

MCP Registry: the trickier part

The MCP Registry's CLI authenticates interactively:

mcp-publisher login github
# Opens a browser, asks you to paste a code, etc.

That's fine for humans. For a script — or for a Claude session running headless — you need non-interactive auth. The mcp-publisher binary accepts -token:

GH_TOKEN=$(gh auth token)
mcp-publisher login github -token "$GH_TOKEN"
mcp-publisher publish

The gh CLI you already use for everything else? Its token works as your GitHub PAT for mcp-publisher. No browser, no copy-paste.

After running these, the MCP Registry's io.github.achiya-automation/safari-mcp v2.9.3 went from "stuck on v2.7.6 for 3 weeks" to isLatest: true in about 15 seconds.

What I'd tell past-me

Check the billing dashboard *first* when an Actions run sits pending with no error. The run UI does not surface "you're out of minutes". The billing page does.
Don't trust runs-on: ubuntu-latest to "just be cheaper" — it is, but if you've burned your macOS minutes on stalled runs, the queue can still serialize new ones behind dead ones via your concurrency: group.
Keep a manual publish path documented. Both npm and the MCP Registry have non-interactive auth options. Write the bash one-liners somewhere your future self can find them at 2am.
workflow_dispatch with a tag input is cheap insurance. It costs you 6 lines of YAML and saves you from needing to delete-and-recreate GitHub Releases when the release-event run gets corrupted.

FAQ

Why didn't a timeout-minutes: rescue me?
That's a job-level timeout. It applies once a job starts. A run that never starts a job has nothing to time out.

Couldn't I have used a self-hosted runner?
Yes — and that's the right answer for high-volume projects. For an OSS hobby project, self-hosted is operationally heavier than the manual publish path.

Doesn't --provenance matter for supply-chain security?
For widely-installed packages, yes. For an OSS project's own emergency-publish workaround, the trade-off is "ship the release without provenance" vs "ship nothing". Pick the first one and re-publish with provenance on the next clean release.

Could I have known about the billing limit before hitting it?
GitHub does send an email when you cross 75% of your minutes. The email goes to the address on your billing account, which may not be the address you watch. Worth setting up a filter.

What about Actions minutes for OSS public repos?
GitHub gives unlimited minutes to public repos using GitHub-hosted runners — but that's only for repos owned by organizations on the Free plan, with the runner type matching the included unlimited tier. For personal accounts and certain runner combinations, the standard quota applies. Check the actual numbers under Settings → Billing → Plans for your specific account type.

If you've hit a similar stuck-pending pattern with no error in the run UI — that's the bug. Check your minutes. Then ship from your laptop.

The repo (with the workflow that handles all this now) is safari-mcp on GitHub.

The 3 isTrusted:false Bugs That Made LinkedIn Posts Impossible From My MCP Server

אחיה כהן — Wed, 22 Apr 2026 14:36:56 +0000

TL;DR

I couldn't post to LinkedIn from my MCP server. Not "sometimes fails" — never works. I assumed one bug. I was wrong. I found three, stacked, and each one looked like success to every automation tool I tried. Here is the anatomy of why your agent's "I posted it!" lies to you when a rich-text editor sits inside a dialog.

The symptom

I ship Safari MCP — an MCP server that drives the Safari you are already logged into. 80 tools. safari_fill is the most-used one. For three months it worked everywhere — Gmail, GitHub, Ahrefs, Google Docs, Shopify admin.

Then I tried posting to LinkedIn from an agent.

> agent: safari_fill({text: "Shipping v2.9.0 — modal detection in snapshot!"})
< result: "Filled. 67 chars."

Except the LinkedIn composer was empty. And closed. And I had a cursor in my address bar.

Three hours later I had a list. Three separate boundaries, each silently sabotaging the one before it.

Boundary 1: `focusout` dismisses the dialog

LinkedIn's composer is a <div role="dialog">. Specifically, its share composer listens for focusout on any descendant and closes the modal — the UX intent is "clicked outside → close."

My fill path did this at the end:

// Old: "polite" contenteditable fill (pseudo-code)
setEditableContent(editableEl, text);
editableEl.dispatchEvent(new Event('input', { bubbles: true }));
editableEl.blur();  // ← here's the assassin

The blur() call was there for a reason — some React frameworks only commit state on blur. Perfectly reasonable on a standalone textarea. Inside a dialog? The focusout listener takes the blur, concludes the user clicked away, and runs the dismiss animation.

My fill worked. For ~40ms. Then the dialog DOM disappeared and the text with it.

Fix: Remove the blur(). React commits state from input alone on any modern contenteditable. If a site truly requires blur to persist, it is broken for keyboard users anyway.

But removing blur was not enough. The next run showed the text finally landing, the Post button enabling — and then the button click did nothing. Why?

Boundary 2: ProseMirror's `isTrusted:false` paste rejection

LinkedIn's composer was ProseMirror when I started debugging. (They have since migrated to Lexical. We will get there.) ProseMirror has a paste handler. That handler is strict:

// ProseMirror source, paraphrased
handlePaste(view, event) {
  if (!event.isTrusted) {
    // Synthetic paste events don't reflect real user intent.
    // Reject them — the editor state must only change from real input.
    return false;
  }
  ...
}

This is a security decision, not a UX one. event.isTrusted is only true when the browser itself dispatches the event — a real keystroke, a real paste, a real click. JavaScript new Event() or dispatchEvent() produces isTrusted:false every time.

My fill was dispatching new ClipboardEvent('paste', { clipboardData: ... }). The editor reached its paste handler, saw isTrusted:false, and bailed. The execCommand('insertText') fallback went the same way.

The character-by-character beforeinput dispatch? Also isTrusted:false. Also rejected.

Fix that worked (and broke in Boundary 3): Route through a real OS paste. I already had a _nativeTypeViaClipboard path — uses AppleScript to set the system clipboard, then dispatches a real Cmd+V via macOS CGEvent. The browser sees it as a real user paste. isTrusted is true. Editor accepts it.

Boundary 3: CGEvent Cmd+V steals focus, triggers Boundary 1

Remember Boundary 1 was "focusout dismisses the dialog?" Well —

The CGEvent Cmd+V path delivers the keystroke to the frontmost window. To be the frontmost window, Safari has to be active. When I programmatically activate Safari via NSApplication activateIgnoringOtherApps, the previous window loses focus for a tiny window. Chrome's "focus stealing" behavior is a documented pet peeve of every automation tool; Safari is no different.

So the sequence was:

CGEvent fires Cmd+V
Safari gets activated (taking focus briefly)
The composer editor sees focusout during the ~10ms activation window
Dialog dismisses
Paste lands — but on the feed underneath the now-closed dialog

Cool.

First fix attempt: Use a background-activation variant that does not foreground Safari. This worked but required the user's Safari to already be the active app (fragile — the point of MCP is the user is doing other work).

Second fix attempt — the one that stuck: Bypass the OS keyboard entirely. Drive the editor through its own internal API.

The actual fix: editor-native API access

LinkedIn's composer (as of 2026-04) is Lexical, not ProseMirror. Lexical is Meta's replacement — also used in Shopify admin, some Meta apps, newer Notion surfaces.

Lexical exposes the editor instance on its DOM root element:

const editorEl = document.querySelector('[data-lexical-editor="true"]');
const editor = editorEl.__lexicalEditor;  // the actual LexicalEditor instance

// Build a minimal root → paragraph → text document
const newState = editor.parseEditorState(JSON.stringify({
  root: {
    children: [{
      children: [{ detail: 0, format: 0, mode: 'normal', text: value, type: 'text', version: 1 }],
      direction: 'ltr', format: '', indent: 0, type: 'paragraph', version: 1
    }],
    direction: 'ltr', format: '', indent: 0, type: 'root', version: 1
  }
}));
editor.setEditorState(newState);

Zero synthetic events. Zero focus shift. Zero clipboard. The editor updates its own state directly. Lexical's internal invariants hold. React re-renders the contenteditable tree through its normal diff path. The Post button observes the state change and enables itself.

For ProseMirror (which LinkedIn used to use), the equivalent is:

const pmView = editorEl.pmViewDesc?.view;  // ProseMirror's EditorView
const tr = pmView.state.tr.insertText(value, pmView.state.selection.from);
pmView.dispatch(tr);

Same principle: do not pretend to be a user. Be a caller.

The cascade of falsified "success"

Here is what is unsettling: every stage of every failed attempt returned success to my agent.

Setting the editable element's content → value set, DOM mutation event fires, "success"
dispatchEvent(new ClipboardEvent('paste')) → handler called, preventDefault returned, "looks like paste fired, success"
_nativeTypeViaClipboard → Cmd+V fired, clipboard had the content, "success"

The only honest verification is: did the editor state update? Not the DOM. Not the visible text. Not the event log. The editor's own source of truth.

For Lexical: editor.getEditorState().toJSON(). Compare to what you expected. Now you know.

This is why your agent's "I posted it" lies. Every layer of the automation stack reports local success. None of them verified the editor's internal state matched the intent.

Generalizations

Blur is radioactive in dialogs. Audit every automation tool's fill path. If it calls .blur(), it will close some modal somewhere.
isTrusted:false is a one-way door. Real-world rich-text editors audit it. Your synthetic paste/input/keydown will not cross. Either use a native OS path (Cmd+V via CGEvent/winuser) or drive the editor API directly.
Native OS paste moves focus. Which is fine — unless the target is inside a dialog that listens for focus loss. In that case, drive the editor API directly.
Editor API access is undocumented but stable. __lexicalEditor, pmViewDesc.view, Draft.js's internal store — these are all in production for years because the editors are themselves stable. They are not public but they are not moving.
Trust nothing downstream of the editor. The rendered DOM, text content, visible interface — any of these can be right while the editor's internal state is wrong. Verify editor state, not DOM state.

What this means if you use or build MCP servers

Most MCP browser tools today use page.type() or element.fill() — thin wrappers over DOM events. They will work for 80% of forms and silently fail for rich editors inside dialogs (which is roughly: every post/comment/share UI on every major social site, Notion, Google Docs, JIRA, Salesforce rich notes, Shopify description fields).

If you are evaluating browser-automation MCP servers for agent workflows that involve content creation, test this specifically:

Can it post to LinkedIn?
Can it type a multi-line comment on GitHub?
Can it fill a Notion page with formatted text?

If any of those fail silently (returns "success" but the target app shows nothing), the tool has one of these three bugs.

Safari MCP v2.9.4 ships the Lexical-native path. If you are on macOS and want to try it:

npx safari-mcp

MIT, 80 tools, github.com/achiya-automation/safari-mcp.

Is there a fourth boundary I missed? Drop a comment — I will buy the bug report with a merch sticker if it forces a v2.9.5.

WhatsApp Bot for Business 2026 — $1K-$4K (50+ Real Builds)

אחיה כהן — Mon, 20 Apr 2026 19:54:12 +0000

WhatsApp has over 2 billion users worldwide. If your customers are on WhatsApp (and they probably are), a bot can handle inquiries 24/7, book appointments, and qualify leads — while you sleep.

But there's a catch: do it wrong, and Meta will restrict or ban your number. I've seen businesses lose their primary WhatsApp number because they used the wrong tool, sent messages to people who didn't opt in, or scaled too aggressively.

This guide covers how to build a WhatsApp bot properly — which API to use, how to avoid bans, and what a realistic setup looks like.

TL;DR

Official API (via BSP) — safe, verified, but costs $50-100/month + per-message fees. Best for established businesses
WAHA (unofficial, open-source) — free, flexible, but not endorsed by Meta. Risk of account restrictions. Best for small businesses starting out
Ban prevention — get opt-in before messaging, don't send bulk unsolicited messages, respond to conversations (don't just broadcast)
Realistic cost — $1,000-4,000 setup + $5-100/month ongoing, depending on your approach

Two Ways to Connect: Official API vs. WAHA

This is the first decision you'll make, and it affects everything else — cost, reliability, features, and risk.

Option 1: Official WhatsApp Business API

The WhatsApp Business API is Meta's official solution for businesses. You access it through a Business Solution Provider (BSP) like Twilio, 360dialog, or MessageBird. (If you want a full breakdown of BSPs, fees, and the onboarding process, see our WhatsApp Business API guide.)

How it works:

Sign up with a BSP
Verify your business with Meta
Get a dedicated phone number (or use an existing one)
Send and receive messages through the API

Pricing (as of March 2026):

BSP monthly fee: $50-100/month (varies by provider)
Per-conversation fees (set by Meta):
- Marketing conversations: ~$0.035/conversation (varies by country)
- Utility conversations (order updates, etc.): ~$0.005/conversation
- Service conversations (customer-initiated): free for the first 1,000/month
Template messages must be pre-approved by Meta

Pros:

Officially supported — no risk of account bans for API usage
Green checkmark verification available
Template messages for outbound messaging
Higher rate limits
Multi-device support built in

Cons:

Per-message costs add up at scale
BSP adds another vendor and monthly cost
Template approval process can be slow (24-72 hours)
Less flexibility — you can only do what the API allows

Option 2: WAHA (Unofficial WhatsApp API)

Important: WAHA is NOT an official WhatsApp product. It's an open-source project that provides API access to WhatsApp by connecting through WhatsApp Web's protocol. Meta does not endorse or support it.

How it works:

Self-host WAHA on your server (Docker)
Scan a QR code with your WhatsApp number (like WhatsApp Web)
Send and receive messages through WAHA's REST API

Pricing:

WAHA Core: free and open-source
WAHA Plus: paid version with additional features
Your only cost: server hosting ($5-20/month)

Pros:

No per-message fees
No template approval process
Full flexibility — send any message type
Open-source — you can inspect and modify the code
No BSP middleman

Cons:

Not endorsed by Meta — using it technically violates WhatsApp's Terms of Service
Risk of account restrictions if you trigger spam detection
Relies on WhatsApp Web protocol — can break when WhatsApp updates
No green checkmark
Phone must stay connected (though WAHA handles multi-device well)

Which Should You Choose?

Scenario	Recommendation
Established business, customer communications	Official API
Marketing campaigns and broadcasts	Official API (with opt-in)
Small business, responding to incoming messages	WAHA can work well
Testing and prototyping	WAHA (lower cost to experiment)
Highly regulated industry (healthcare, finance)	Official API
Budget-conscious startup	WAHA to start, migrate to official later

In our experience, many small businesses start with WAHA because the barrier to entry is lower, then migrate to the official API as they scale.

How to NOT Get Banned (Critical)

Whether you use the official API or WAHA, these rules apply:

The #1 Rule: Get Opt-In First

Never send the first message to someone who hasn't explicitly asked to hear from you. This is both a WhatsApp policy requirement and common sense.

Good:

Customer fills out a form and checks "Contact me on WhatsApp"
Customer sends you a message first and you respond
Customer explicitly asks to receive updates via WhatsApp

Bad:

You bought a list of phone numbers and blast them all
You scrape numbers from websites and send cold messages
You add everyone in your phone contacts to a broadcast list

Rate Limiting

Don't send hundreds of messages per minute. WhatsApp's detection algorithms look for:

High volume in short time — sending 500 messages in 5 minutes is a red flag
Identical messages — sending the exact same text to many numbers looks like spam
High block rate — if many recipients block you, your quality rating drops fast
Messaging numbers that don't have you saved — this is a strong spam signal, especially at volume

For a deeper breakdown of the exact thresholds and how WhatsApp's four-layer detection system works in 2026, see the WhatsApp spam detection guide.

Safe practices:

Space out bulk messages (add 2-5 second delays between sends)
Personalize messages (use the recipient's name, reference their specific inquiry)
Keep your block rate under 2-3%
Start slow — send to 50 people first, monitor for blocks, then scale gradually

WAHA-Specific Precautions

If you're using WAHA (unofficial API):

Use a dedicated number — don't risk your primary business number
Don't blast — WAHA is best for responding to incoming messages, not mass outbound campaigns
Monitor your quality — if you notice messages not delivering, stop and investigate
Have a backup plan — if the number gets restricted, you need to be able to switch to the official API or a new number
Keep sessions stable — frequent disconnections/reconnections can trigger flags

Building Your Bot: A Practical Walkthrough

Here's how a typical WhatsApp bot setup looks using n8n (our preferred automation platform) and WAHA.

Architecture Overview

Customer sends WhatsApp message
        ↓
    WAHA (receives message via WhatsApp Web)
        ↓
    Webhook → n8n (processes the message)
        ↓
    Logic: FAQ? Appointment? Lead? → Route accordingly
        ↓
    Response sent back through WAHA
        ↓
    Customer receives reply on WhatsApp

Step 1: Set Up WAHA

WAHA runs as a Docker container. Basic setup:

# docker-compose.yml
services:
  waha:
    image: devlikeapro/waha
    ports:
      - "3000:3000"
    environment:
      - WHATSAPP_DEFAULT_ENGINE=WEBJS
      - WAHA_DASHBOARD_ENABLED=true
    volumes:
      - waha_data:/app/.sessions

volumes:
  waha_data:

After starting it (docker compose up -d), open http://your-server:3000/dashboard, start a session, and scan the QR code with your phone.

Step 2: Connect to n8n

In n8n, create a webhook node that WAHA will call when messages arrive:

Add a Webhook node — this receives incoming messages
Configure WAHA to send webhooks to your n8n webhook URL
Add a Switch node to route messages based on content
Add response nodes to send replies back through WAHA's API

Step 3: Build Your Logic

A basic FAQ bot might look like this in n8n:

Webhook (incoming message)
    → Switch node:
        - Contains "hours" or "open" → Send business hours
        - Contains "price" or "cost" → Send pricing info
        - Contains "appointment" or "book" → Start booking flow
        - Default → "Thanks for reaching out! A team member will reply shortly."

Step 4: Add AI (Optional)

To make your bot smarter, add an AI node:

Webhook (incoming message)
    → OpenAI/Claude node:
        System prompt: "You are a helpful assistant for [Business Name].
        You know: [business hours, services, pricing, FAQ].
        If you can't answer, say you'll connect them with a human."
    → Send AI response via WAHA

This turns your bot from a rigid keyword-matcher into a conversational agent that understands natural language.

Real-World Use Cases

These are use cases we've implemented (described in general terms):

1. Appointment Scheduling

The bot asks what service the customer needs, checks available time slots from Google Calendar, proposes options, and books the appointment — all within WhatsApp. Confirmation and reminder messages are automated.

2. Lead Qualification

When a new lead messages, the bot asks 3-4 qualifying questions (budget, timeline, requirements). Qualified leads get forwarded to a human agent immediately. Unqualified leads get a helpful resource and are added to a follow-up sequence.

3. Order Status Updates

Connected to the business's order management system, the bot responds to "Where's my order?" with real-time tracking information. No human intervention needed for 80%+ of status inquiries.

4. FAQ + Human Handoff

The bot handles common questions (pricing, hours, location, services). When it can't answer or the customer asks for a human, the conversation is routed to a support agent in Chatwoot (open-source customer support platform; 5% off Cloud with code UJR5GXWK) — with full conversation history preserved.

What It Actually Costs

Here's a realistic breakdown for a small business:

DIY with WAHA + n8n (self-hosted)

Component	Monthly Cost
VPS (2GB RAM)	$5-20
WAHA	Free (Core)
n8n	Free (self-hosted)
OpenAI API (if using AI)	$5-50 (depends on volume)
Total	$10-70/month

Professional Setup (someone builds it for you)

Component	Cost
Bot development	$1,000-4,000 (one-time)
Hosting + maintenance	$25-75/month
AI API costs	$5-50/month
Total	$1,000-4,000 setup + $30-125/month

Official API Route

Component	Monthly Cost
BSP subscription	$50-100
WhatsApp conversation fees	$20-200 (depends on volume)
n8n or automation platform	$0-25
Total	$70-325/month

Common Mistakes I See

1. Going straight to mass messaging. Build a bot that responds to incoming messages first. Get that working well. Then — and only then — consider outbound campaigns, and always with opt-in.

2. Not planning for human handoff. No bot handles 100% of conversations. You need a clear path for escalating to a human agent. We use Chatwoot for this — the bot handles routine questions, and complex issues are seamlessly transferred to a person. Reader perk: **5% off Chatwoot Cloud with code UJR5GXWK.

3. Ignoring the conversation window. With the official API, you have a 24-hour window to respond to a customer's message for free. After that, you need to use a pre-approved template (which costs money). Design your bot to respond instantly.

4. Overcomplicating the bot. Start with 5-10 common questions. Get those right. Then expand. A bot that handles 10 things well is better than one that handles 50 things poorly.

5. Not testing with real users. Your team will use the bot differently than your customers. Test with actual customers (or friends who can pretend to be customers) before going live.

Beyond the Bot: Scaling Into Full Automation

A WhatsApp bot is usually the first automation businesses deploy — but it's rarely the last. Once you have conversations flowing in, the natural next steps are:

AI agents for business — move from scripted replies to autonomous agents that handle multi-step tasks (lookup orders, escalate tickets, schedule appointments) without hand-written flows.
Broader business automation — the same n8n instance that powers your bot can automate invoicing, CRM updates, lead routing, and inventory sync. One workflow engine, many business processes.
Dedicated customer service chatbots — once your WhatsApp flow is stable, the same stack can power an omnichannel support bot (web chat + Messenger + email) with ticket routing and SLA tracking.

Most of our clients start with a WhatsApp bot and expand outward as they see ROI.

Getting Started

Building a WhatsApp bot doesn't have to be complicated or expensive. Start with a clear goal ("I want to handle appointment bookings automatically"), choose your API approach, and build from there.

If you want help building your WhatsApp bot — whether it's a simple FAQ responder or a full AI-powered agent — reach out to us. At Achiya Automation, we specialize in WhatsApp bots, business automation, and CRM integration using open-source tools.

I Replaced Chrome with Safari for AI Browser Automation. Here's What Broke (and What Finally Worked)

אחיה כהן — Sun, 19 Apr 2026 18:48:16 +0000

Or: why every browser-automation MCP uses Chromium, and why that's the wrong default on macOS.

The problem I kept hitting

Every browser automation MCP server I tried on my Mac — chrome-devtools-mcp, playwright-mcp, browsermcp, puppeteer-mcp — did the same thing: spin up a fresh Chromium instance with nothing in it. No logins, no cookies, no session state. Then my AI agent would spend the first 5 minutes of every task navigating Cloudflare, solving reCAPTCHA, or explaining to me that it couldn't log into Gmail.

Which is weird, because I was already logged into Gmail. In Safari. In the window right next to me.

The disconnect bothered me enough that I started reading Chromium-MCP source code. And what I found is that the entire ecosystem is built on an assumption that quietly doesn't hold for macOS users: "just spin up Chromium, it'll be fine."

It isn't fine.

What Chromium costs on Apple Silicon

Every Chromium process on M1/M2/M3 Macs pays a non-trivial tax:

Multiple helper processes per tab (GPU, renderer, network, storage)
WebKit-parity emulation that duplicates what Safari's WebKit gives you for free
RAM spike on tab open, and fans audibly spinning up
No access to the user's existing Safari extensions, iCloud Keychain, Apple Pay, or ApplePay-linked banking session

When you have a laptop on your lap, you feel every one of these.

The headless-browser fallacy

The first thing people say is: "use headless mode, it's lighter." Sort of. Headless Chromium is still Chromium — you've just hidden the window. More importantly, headless mode is what gets you blocked. Cloudflare, reCAPTCHA v3, Akamai, DataDome — they all fingerprint headless browsers within seconds. Your agent's first action on 30% of the real web becomes "prove you're human."

A headful browser running on your actual machine, with your actual fingerprint, doesn't have this problem. But headful Chromium-MCP means now you have two browsers open — Safari (which you're using) and Chromium (which your agent is using). That's a fan-melting setup.

The alternative no one was building

What I wanted was obvious once I said it out loud:

Drive the Safari the user already has open. Inherit their logins, cookies, extensions, Apple Pay session. Use the WebKit process that's already running. Don't spin up a second browser.

What I found out when I tried to build it: macOS has made this weirdly hard, and I think that's why nobody had done it.

The three things that kept breaking

1. React's _valueTracker.
You can't just set input.value = "hello" and call dispatchEvent("input"). React has an internal _valueTracker on every controlled input that decides whether your "input" event is real. If the tracker thinks the value didn't change, React ignores you. Fixing this means reaching into React's internal state and calling setter.call(input, value) via the prototype's native setter. It works, but it's the kind of code you don't write until you've spent an afternoon wondering why your form submission silently fails on every SPA.

2. Shadow DOM traversal.
Modern web components hide everything behind shadowRoot. document.querySelector stops at the shadow boundary. You need a recursive walker with a MutationObserver cache, because otherwise traversing a single YouTube page costs you 200ms. And if you get the cache invalidation wrong, clicks land on stale element refs.

3. CSP.
About 30% of high-value pages (Google Search Console, LinkedIn, Gmail's admin console, many banks) block inline eval and Function() via strict Content Security Policy. Pure JavaScript injection fails silently. The workaround is a 4-strategy fallback chain: try regular JS → try document.evaluate → try AppleScript do JavaScript → try an injected content script via a Safari extension. Each one has its own failure modes and you only know which applies by trial.

I ended up writing this out on HackerNoon last week, because the reverse-engineering took long enough that it felt worth sharing: the three hardest problems.

The unintentional side effects

After a couple of months of using Safari-backed MCP instead of Chrome-backed MCP, I noticed a few things I wasn't expecting:

My battery lasted measurably longer on coding-agent-heavy days. No surprise in retrospect — one browser instead of two.
My agent's success rate on "just book this for me" tasks went up. It was already logged into the calendar, the banking app, the booking portal.
I stopped having to re-authenticate everything every time I rebooted. Because the agent uses the browser I was already using.
Safari stays in the background. MCP calls run via AppleScript + a persistent Swift daemon. The window doesn't steal focus, so I can keep working while an agent finishes a long task.

Boring outcomes, maybe. But they compound over a workday.

Why this doesn't generalize

A caveat: this approach only makes sense on macOS. On Linux or Windows, Chromium is the right default — there's no equivalent "browser the user is already using" with the same automation surface. And you give up Chrome DevTools' performance traces and Lighthouse, which don't have Safari equivalents. I still keep Chrome DevTools MCP installed for those specific audits.

But "daily browsing tasks" — navigate, click, fill a form, extract some data, take a screenshot — those are 95% of what AI agents do with browsers. And for that 95%, on macOS, it's worth reconsidering the default.

If you want to try it

The project is called Safari MCP. It's MIT-licensed, one npx command to install, and works with Claude Code, Claude Desktop, Cursor, Windsurf, and VS Code:

npx safari-mcp

80 tools covering the full MCP surface — navigation, clicks, forms, screenshots, network mocking, cookies, accessibility snapshots, performance metrics. The README covers setup for each MCP client.

If you've been feeling the Chromium tax on Apple Silicon, maybe give this a try. And if it works for you, a star on GitHub helps other macOS developers find it.

Written after a few months of running Safari MCP as my primary browser automation tool on an M3 MacBook Air. Your mileage will vary — I'd love to hear what breaks for you.

Tags: mcp, claude, macos, webautomation, webdev

I Tried to Auto-Launch My MCP Server Using My MCP Server. It Found Its Own Bug.

אחיה כהן — Tue, 14 Apr 2026 20:04:02 +0000

TLDR

I built safari-mcp, an MCP server that lets AI agents drive Safari natively on macOS. This week I shipped a discoverability push for it: post the launch announcement to Hacker News, X, LinkedIn, and Reddit. Naturally, I tried to automate the campaign using safari-mcp itself.

It worked for HN. It worked for X. Then LinkedIn started running clicks on a completely different tab — Catchpoint Internet Performance Monitoring, which I'd never visited. Three windows, a URL prefix match, and a 500 ms cache TTL conspired to teach me a lesson about tab identity.

Here's the detective story, the root cause, and the fix that ships in v2.8.3 today.

The Setup: Eating My Own Dog Food

I had four launch targets:

Show HN — submit the link, post a first comment
X (Twitter) — a single thread that quotes the article
LinkedIn — a Hebrew-English bilingual long-form post
Reddit r/ClaudeAI — a tool-launch-with-context post

I'd just shipped a HackerNoon technical deep-dive about how I built browser automation for a browser that has no Chrome DevTools Protocol. The launch was the natural follow-on. And of course I was going to drive it through safari-mcp — what's the point of building a Safari automation tool if you don't use it for your own launch?

"Eat your own dog food at launch — bugs surface fast." — me, after this incident.

Round 1: HN and X Worked Beautifully

The HN submission flow was textbook. Open news.ycombinator.com/submit, fill the title and URL inputs, call form.submit() via injected JS, follow the redirect, find the new item ID via submitted?id=<user>. About 8 seconds end-to-end.

// Verify the form is real, not some other tab
JSON.stringify({
  url: location.href,
  hasTitleInput: !!document.querySelector('input[name="title"]'),
  hasUrlInput: !!document.querySelector('input[name="url"]')
})
// → {"url":"https://news.ycombinator.com/submit","hasTitleInput":true,"hasUrlInput":true}

Filled both inputs. Called form.submit(). Got redirected to /newest. Walked back to /submitted?id=Achiyacohen and confirmed the new post sat at #1 with 1 point. Live.

X was even smoother. The compose textbox in x.com/home is a contenteditable with aria-label="Post text". I filled it with the thread text, found the button[data-testid="tweetButtonInline"], dispatched a React-aware pointer event sequence (mousedown → mouseup → click), and watched the textbox empty itself. Verified by reading the user's profile timeline 30 seconds later: the tweet was there, with my exact text and a fresh status/2044134672683110740 URL. Live.

Two for two. I was feeling good.

Round 2: Then LinkedIn Got Weird

LinkedIn's "Start a post" button (in Hebrew: "כתבו פוסט") is a div with class names like _73dfa4c8 ed6e5932 _1d1c97a4. I found it, dispatched the same React-aware click sequence, and waited for the compose modal to appear.

It didn't.

I called safari_evaluate to check whether [contenteditable="true"] had appeared anywhere on the page. The result came back empty — zero contenteditable elements. That was strange. Even the LinkedIn feed itself has search inputs and other interactive elements. So I asked the page for its URL and title to make sure I was in the right place.

The response:

{
  "title": "API Monitoring | Catchpoint Internet Performance Monitoring",
  "url": "https://www.catchpoint.com/application-experience/api-monitoring?utm_campaign=Hackernoon-TOFU-billboard"
}

Catchpoint. I'd never visited Catchpoint.

The First Suspicion: Tab Tracking

The first hypothesis was that safari-mcp's tab tracking had drifted. The MCP keeps a cached _activeTabIndex in memory and uses it for all subsequent operations on a tab it opened. The cache has a TTL of 500 ms, after which resolveActiveTab re-verifies by URL prefix matching.

I called safari_list_tabs and got 12 tabs in the profile window — but with the LinkedIn tab right where I expected it. So the cache and the actual tab layout agreed: tab 12 was LinkedIn.

Then why was safari_evaluate returning Catchpoint?

Detective Work: There Are Three Windows

I dropped down to raw AppleScript to bypass the MCP layer:

tell application "Safari"
  set output to ""
  set wCount to count of windows
  set output to "Total windows: " & wCount & linefeed
  repeat with w from 1 to wCount
    set output to output & "Window " & w & ": " & (count of tabs of window w) & " tabs" & linefeed
    set output to output & "  name: " & (name of window w) & linefeed
    set output to output & "  tab1: " & (URL of tab 1 of window w) & linefeed
  end repeat
  return output
end tell

Output:

Total windows: 3
Window 1: 2 tabs
  name: אישי — Documenso
  tab1: https://mail.google.com/mail/u/0/#starred/...
Window 2: 12 tabs
  name: אוטומציות — API Monitoring | Catchpoint Internet Performance Monitoring
  tab1: https://hackernoon.com/login?redirect=app
Window 3: 3 tabs
  name: אישי — תוכנה קלה לשליחה למחשב מרחוק - Claude
  tab1: https://claude.ai/recents

Three windows. Two profiles ("אישי" / Personal and "אוטומציות" / Automation). Safari MCP was correctly targeting Window 2 ("אוטומציות"), where my LinkedIn tab actually lived as tab 12. So far so good.

The Catchpoint URL? It was tab 5 of Window 2 — a tab the user (me) had clicked open earlier from a HackerNoon ad without thinking. It was sitting there idle. And somehow safari_evaluate was hitting it instead of tab 12.

The Real Bug: Resolve Cache + URL Prefix

I traced through resolveActiveTab line by line:

async function resolveActiveTab() {
  if (!_activeTabURL) return _activeTabIndex;

  const safeUrl = _activeTabURL.replace(/"/g, '\\"');
  const domain = _activeTabURL
    .replace(/^https?:\/\//, '')
    .split('/')[0];

  const result = await osascriptFast(`
    tell application "Safari"
      set w to ${getTargetWindowRef()}
      set tabCount to count of tabs of w

      // Strategy 1: verify cached index still matches URL
      try
        if tabCount >= ${_activeTabIndex} then
          if URL of tab ${_activeTabIndex} of w starts with "${safeUrl}" then
            return ${_activeTabIndex}
          end if
        end try
      end try

      // Strategy 2: search all tabs by URL prefix
      repeat with i from tabCount to 1 by -1
        if URL of tab i of w starts with "${safeUrl}" then return i
      end repeat

      // Strategy 3: search by domain (returns negative — partial match)
      repeat with i from tabCount to 1 by -1
        if URL of tab i of w contains "${domain}" then return -(i)
      end repeat

      return "0:" & tabCount
    end tell
  `);
  // ...
}

The bug was right there in the strategies. When I navigated LinkedIn to https://www.linkedin.com/feed/, that became _activeTabURL. Then LinkedIn's React router silently rewrote the URL to https://www.linkedin.com/feed/?shareActive=true because of the query parameter I'd passed. Strategy 1 — the fast path — failed because URL of tab 12 starts with "https://www.linkedin.com/feed/"... wait, that should still match. The new URL starts with the old prefix.

So why did it fail?

The actual cause was even more subtle: a different Safari instance, in a different profile window, had completed an HTTP redirect that rewrote the URL to a shorter form. AppleScript's URL of tab was returning the post-redirect URL, which did not start with my saved _activeTabURL because _activeTabURL had query parameters that the post-redirect URL didn't.

Strategy 1 fell through. Strategy 2 (full URL search across all tabs) also fell through for the same reason. Strategy 3 (domain search) found... a tab in the wrong profile window? No — it found Catchpoint. Why?

Because of how I'd extracted the domain:

const domain = _activeTabURL.replace(/^https?:\/\//, '').split('/')[0];
// "www.linkedin.com"

And the AppleScript:

if URL of tab i of w contains "${domain}" then return -(i)

contains is a substring match. Catchpoint's ad URL was https://www.catchpoint.com/.../?utm_campaign=Hackernoon-TOFU-billboard&utm_source=hackernoon&utm_medium=paidsocial. Did it contain www.linkedin.com? No.

Wait, then how did it match?

After two more hours of tracing, I found the actual cause. The MCP server runs as a singleton, but Claude Code occasionally spawns a second instance for ~40 ms during connection negotiation. That second instance had its own _activeTabIndex state, and it had set the index to point at Catchpoint because it saw Catchpoint as the active tab when it briefly took over. When the original instance came back, it read the wrong index from a stale cache check that hadn't yet been invalidated by the singleton kill code.

The 500 ms cache window was just long enough for that race.

The Fix: window.__mcpTabMarker

URL prefix matching is fragile. Domain matching is fragile. Cached indices are fragile. What's not fragile?

A unique identifier injected into the page's JavaScript context.

The new fix: every safari_new_tab writes a unique marker into window.__mcpTabMarker:

const tabMarker = `MCP_${SESSION_ID}_${Date.now().toString(36)}_${Math.random().toString(36).slice(2, 8)}`;
await osascriptFast(
  `tell application "Safari" to do JavaScript "window.__mcpTabMarker='${tabMarker}'" in tab ${_activeTabIndex} of ${getTargetWindowRef()}`
);
_activeTabMarker = tabMarker;

The marker survives:

Same-tab navigation — window.__mcpTabMarker lives in the JS realm, which persists across location.href = ... if the new URL is same-origin. For cross-origin navigations it gets wiped, which is fine because that's a deliberate context boundary.
Hash changes — location.hash = "#x" doesn't reload the JS context.
pushState and replaceState — single-page-app routers don't reset the realm.
Query string mutations — same as above.
Redirects within the same origin — still in the same realm.

resolveActiveTab now tries the marker first:

async function resolveActiveTab() {
  // Strategy 1: window.__mcpTabMarker (bulletproof)
  if (_activeTabMarker && _activeTabIndex) {
    const checkScript = `(function(){return window.__mcpTabMarker==='${safeMarker}'?'1':'0'})()`;

    // Check cached index first (fast path)
    const matchAtCached = await osascriptFast(
      `tell application "Safari" to do JavaScript "${checkScript}" in tab ${_activeTabIndex} of ${getTargetWindowRef()}`
    );
    if (matchAtCached === '1') return _activeTabIndex;

    // Cached index doesn't match — scan all tabs in profile window
    const tabCount = Number(await osascriptFast(
      `tell application "Safari" to return count of tabs of ${getTargetWindowRef()}`
    ));
    for (let i = tabCount; i >= 1; i--) {
      const m = await osascriptFast(
        `tell application "Safari" to do JavaScript "${checkScript}" in tab ${i} of ${getTargetWindowRef()}`
      );
      if (m === '1') {
        _activeTabIndex = i;
        return i;
      }
    }
  }

  // Strategy 2: URL prefix (fallback for tabs created before the marker was set)
  // ...
}

The marker check costs about 5 ms per tab via the persistent osascriptFast daemon. On a tab list of 12 tabs, the worst case is 60 ms — slower than the previous "check cached index" path, but correct.

I also dropped the resolve cache from 500 ms to 100 ms. The check is cheap enough that the tighter cache buys us correctness without measurable latency.

The Bypass Tool I Built While Debugging

While I was tracing the bug, I needed a way to test changes against Safari without restarting the MCP server (which would require restarting the Claude Code session). So I wrote a Python wrapper that calls osascript directly, with one job: find a tab by URL prefix in a specific window, then run JS in that exact tab.

def run_js(url_prefix, js_code, window=2):
    js_clean = strip_line_comments(js_code)
    js_escaped = (
        js_clean.replace("\\", "\\\\")
                .replace('"', '\\"')
                .replace("\r", "")
                .replace("\t", " ")
    )
    return subprocess.run(
        ["osascript", "-"],
        input=f'''
tell application "Safari"
  set tCount to count of tabs of window {window}
  set foundIdx to 0
  repeat with i from 1 to tCount
    if URL of tab i of window {window} starts with "{url_prefix}" then
      set foundIdx to i
      exit repeat
    end if
  end repeat
  if foundIdx = 0 then return "ERROR_NO_TAB"
  set jsOut to do JavaScript "{js_escaped}" in tab foundIdx of window {window}
  return "tab:w{window}_" & foundIdx & "|" & jsOut
end tell
''',
        capture_output=True,
        text=True,
        encoding="utf-8",
    )

This bypassed every layer of the MCP and gave me direct, predictable access to whichever tab I wanted in whichever window I wanted. Three rules I learned writing it:

AppleScript's result is a reserved word. Don't name your variable result. Use jsOut or output or anything else. The error message you get is "המשתנה result אינו מוגדר" if your system locale is Hebrew, which is unhelpful unless you happen to know that result is taken.
do JavaScript returns immediately for any expression that's not a synchronously-resolved value. Promises return undefined. Async functions return their [[PromiseState]] representation, which AppleScript silently coerces to "missing value", which then triggers "המשתנה X אינו מוגדר" downstream. Workaround: write the result to window.__myResult from a .then() callback, then poll for it with a second do JavaScript call.
Hebrew text in shell variables breaks AppleScript. When you bash -c "osascript -e '...$VAR...'", the UTF-8 round-trip through shell substitution corrupts Hebrew bytes. The fix is to call osascript - with the script on stdin, in Python or Ruby or any language that handles UTF-8 natively.

How LinkedIn Was Actually Posted

After all that, I still couldn't get LinkedIn's compose modal to open via clicks, even with the bypass tool. LinkedIn's React event handlers check event.isTrusted, which is false for any event dispatched by user JavaScript. Synthetic clicks just get dropped on the floor.

So I gave up on the modal entirely and used LinkedIn's own voyager API directly:

var match = document.cookie.match(/JSESSIONID="?([^";]+)"?/);
var csrf = match[1];

fetch("https://www.linkedin.com/voyager/api/contentcreation/normShares", {
  method: "POST",
  credentials: "include",
  headers: {
    "csrf-token": csrf,
    "content-type": "application/json; charset=UTF-8",
    "accept": "application/vnd.linkedin.normalized+json+2.1",
    "x-restli-protocol-version": "2.0.0"
  },
  body: JSON.stringify({
    visibleToConnectionsOnly: false,
    commentaryV2: { text: postBody, attributes: [] },
    origin: "FEED",
    allowedCommentersScope: "ALL",
    postState: "PUBLISHED",
    media: []
  })
}).then(function(resp){
  return resp.text().then(function(t){
    window.__mcpLinkedinResult = JSON.stringify({status: resp.status, body: t.substring(0, 500)});
  });
});

The csrf-token header is just the value of the JSESSIONID cookie that LinkedIn sets during login. Once you're authenticated, the API accepts your request and returns:

{
  "status": 201,
  "ok": true,
  "body": "{\"status\":{\"urn\":\"urn:li:share:7449905229468274688\",\"toastCtaText\":\"צפייה בפוסט\",\"mainToastText\":\"פרסום הפוסט הצליח.\"}}"
}

"פרסום הפוסט הצליח" — "Post published successfully". The bypass worked. LinkedIn was live.

What Reddit Taught Me

Reddit was my one failure. The user account in window 1 (Personal profile) was logged in. The form on old.reddit.com/r/ClaudeAI/submit filled correctly. The CSRF token (uh field) was present. I built a FormData POST to /api/submit, included all the required fields, and fired it.

Response:

{"json": {"errors": [["BAD_CAPTCHA", "That was a tricky one. Why don't you try that again.", "captcha"]]}}

Reddit's /api/submit endpoint requires a solved reCAPTCHA token, even for fully-authenticated users. There's no API path that bypasses this. There's no honor-system "I'm a real human" header. The only ways through are:

Pay a CAPTCHA-solving service ($1-2 per 1000 captchas, with all the ethical and TOS implications you'd expect)
Have a human solve it
Don't post to Reddit

I picked option 3. I respect the captcha as a clearly-stated boundary.

Lessons

Eat your own dog food at launch. I'd been running safari-mcp for daily browser automation tasks for weeks and never hit this bug. It took the specific combination of "rapid sequence of operations across multiple Safari windows with same-domain tabs and React-driven URL rewrites" to surface it. A launch campaign happens to involve exactly that combination.

Multi-window/multi-profile is a forgotten edge case in browser automation. Most automation tools assume one window or have a strict "first window" convention. Safari's profile feature (introduced in macOS Sonoma) makes multi-window the default for power users. If you write a Safari automation tool, test with three profile windows open from day one.

URL matching is fragile; identity markers in the JS context are bulletproof. This is the takeaway I wish someone had told me three weeks ago. Don't track tabs by URL or title or any other property the page can mutate. Inject a marker into the page's JS realm and check for it.

Cache TTL is a knife edge. 500 ms felt safe. It wasn't. 100 ms with a cheap revalidation check is the sweet spot for this workload. Your sweet spot may differ — measure it.

When debugging, build a bypass tool. Don't fight the bug from inside the affected layer. Route around it. The 60 lines of Python I wrote in the middle of this incident saved me hours of MCP restart cycles, and I get to keep them as a permanent low-level escape hatch.

Some platforms genuinely don't want automation. That's their right. Respect it.

Status

safari-mcp v2.8.3 ships the marker fix today. npm, GitHub, MCP Registry.
The launch campaign worked: HN post live, X tweet live, LinkedIn post live (via the API bypass), Reddit deferred.
The bug-find-fix loop took about 90 minutes. The article you're reading took longer.

If you build MCP servers, automation tools, or anything that touches a multi-window browser, I'd love to hear how you've solved tab identity. Drop a comment or open an issue on achiya-automation/safari-mcp. I learn from every reply.

And if you're considering using your own tool to launch your own tool — do it. The bugs you'll find are the bugs your users would have hit first.

I've Deployed 50+ WhatsApp Bots — Here's How the Spam Detection Algorithm Actually Works in 2026

אחיה כהן — Sun, 12 Apr 2026 19:20:59 +0000

After deploying 50+ WhatsApp bots for businesses, I've learned the hard way how WhatsApp's spam detection works. Not from documentation — from watching accounts get restricted and figuring out why.

Here's the real picture in 2026.

The 4-Layer Detection System

WhatsApp doesn't use a single algorithm. It's a pipeline:

Layer 1: Registration Fingerprinting

Before you send a message, WhatsApp analyzes your registration signal — device metadata, IP clusters, phone number patterns, registration velocity. Bulk-registered numbers on VPS servers get flagged immediately.

Layer 2: Behavioral Analysis (Where Bots Get Caught)

This is the critical layer. WhatsApp monitors:

Send velocity — messages per minute/hour/day
Reply-to-send ratio — if you send 100 messages and get 5 replies, that's a 5% ratio = spam signal
Message timing patterns — bots send at precise intervals; humans don't
Contact interaction history — messages to contacts who never messaged you weigh more heavily

From our deployments, here are the thresholds I've observed:

Metric	Safe	Warning	Danger
Messages/hour	< 30	30-60	> 60
Reply rate	> 30%	15-30%	< 15%
New contacts/day	< 20	20-50	> 50
Identical messages	< 5/hr	5-15/hr	> 15/hr

Based on observations across 50+ deployments, not official Meta docs.

Layer 3: User Reports

Every block or spam report adds negative signal. Block rate > 2% = quality rating drops to "Low". Multiple reports in 24 hours = temporary restriction.

Layer 4: Content Pattern Matching

WhatsApp analyzes message metadata (length, media, links), forward patterns, and template similarity — without reading encrypted content.

The Big 2026 Change: Unanswered Message Counter

The most significant change this year: WhatsApp now tracks messages sent that received no reply within 48 hours.

This counter is:

Cumulative — counts across all conversations
Time-bounded — rolling 30-day window
Universal — affects both official and unofficial API

We saw this hit a dental clinic client running appointment reminders via the official API. Fully compliant, template-approved, opt-in collected. But 40% of patients confirmed by showing up, not replying to WhatsApp.

The fix: We added "Reply 1 to confirm, 2 to reschedule" to every reminder. Reply rate jumped from 60% to 89%. Quality rating recovered in two weeks.

Official vs Unofficial API: Risk Comparison

Aspect	Official API	Unofficial (WAHA/Baileys)
Registration ban	None	Medium
Behavioral ban	Low (templates enforce limits)	High
User report ban	Low (warnings first)	High (direct ban)
Recovery	Appeal through Meta	Permanent, no appeal
Cost	BSP $50-100/mo + per-msg	Server $5-20/mo

Key insight: Unofficial API bots that only respond to incoming messages have <2% ban rate over 12 months. Bots that proactively message new contacts see 15-30% ban rates.

7 Rules We Follow for Every Bot

Official API for proactive messaging — templates exist to keep you compliant
Explicit opt-in — not buried in ToS. Real: "I want reminders via WhatsApp"
Design for replies — quick-reply buttons, yes/no questions. Reply rate = trust signal
Rate-limit sending — 50-100/batch for marketing, 5-min gaps
Monitor quality rating weekly — Meta Business Suite → Phone Numbers
Segment audience — don't message contacts silent for 90+ days
Human escalation after 2 failed bot responses — frustrated users report + block

What If You're Already Restricted?

Official API: Pause marketing templates, improve reply rates, wait 7 days for quality re-evaluation.

Unofficial API: Stop proactive messaging immediately. If banned, the number is gone. Migrate to official API.

The algorithm isn't adversarial toward legitimate businesses. The formula:

Official API + Opt-in + Relevant Messages + Reply-Encouraging Design = Zero Risk

Full deep-dive with all technical details: WhatsApp Spam Detection Algorithm 2026

DEV Community: אחיה כהן

My agent could see the dropdown. It just couldn't pick anything.

The setup

What was confusing

Two finders, one tool that hadn't been told

The fix

What I should have done earlier

Buying a WhatsApp Bot in 2026? Five Traps to Avoid

Trap 1 — Judging the quote by its setup price

Trap 2 — Not asking which API the bot runs on

Trap 3 — Paying for AI the bot will never use

Trap 4 — Treating the bot as build-once

Trap 5 — No exit ramp to a human

What good looks like

Why element.click() Isn't a Click

The tell: isTrusted

Layer 1 — the component library

Layer 2 — the framework

Layer 3 — the browser's own geometry

Layer 4 — the operating system

Layer 5 — the one I haven't beaten

"Just click the button"

My AI agent saved the first paragraph and the last. It dropped 41 in between.

The setup

The symptom

First diagnosis (wrong): paste race condition

Second diagnosis (wrong): markdown auto-conversion

Third diagnosis (wrong): React reconciliation order

The real bug

The fix

The lesson

Postscript

An AI agent overwrote two of my browser tabs. The fix took three releases.

The shape of the bug

v2.10.0 — the original failure mode

v2.10.1 — the grace window (almost-fix)

v2.10.3 — the permanent guard

What I'd take away if I were writing my own MCP server

The diff that mattered

LinkedIn Quietly Migrated From ProseMirror to Quill — and Broke Every Browser Automation Tool That Touched the Composer

The crash

The DOM tells the truth

Why it was crashing

The fix: drive Quill the way it expects to be driven

What this taught me about platform automation

The takeaway

When GitHub Actions Goes Silent: The Pending-Forever Bug I Hit Shipping My MCP Server to npm

The symptom that doesn't match any docs

The thing GitHub doesn't tell you in the run UI

Two facts that surprised me

The two-part fix

Part 1: workflow_dispatch with tag input

Part 2: portable runner OS

The manual workaround that actually shipped the release

npm: the easy part

MCP Registry: the trickier part

What I'd tell past-me

FAQ

The 3 isTrusted:false Bugs That Made LinkedIn Posts Impossible From My MCP Server

TL;DR

The symptom

Boundary 1: focusout dismisses the dialog

Boundary 2: ProseMirror's isTrusted:false paste rejection

Boundary 3: CGEvent Cmd+V steals focus, triggers Boundary 1

The actual fix: editor-native API access

The cascade of falsified "success"

Generalizations

What this means if you use or build MCP servers

WhatsApp Bot for Business 2026 — $1K-$4K (50+ Real Builds)

TL;DR

Two Ways to Connect: Official API vs. WAHA

Option 1: Official WhatsApp Business API

Option 2: WAHA (Unofficial WhatsApp API)

Which Should You Choose?

How to NOT Get Banned (Critical)

The #1 Rule: Get Opt-In First

Rate Limiting

WAHA-Specific Precautions

Building Your Bot: A Practical Walkthrough

Architecture Overview

The tell: `isTrusted`

Boundary 1: `focusout` dismisses the dialog

Boundary 2: ProseMirror's `isTrusted:false` paste rejection