DEV Community: Leon

The authenticated browser MCP — why cloud tools can't see your logged-in state

Leon — Wed, 13 May 2026 05:01:32 +0000

I asked Claude Code to summarize my unfulfilled Shopify orders this morning. Claude reached for its browser tool. The tool opened a fresh Chromium, navigated to admin.shopify.com, and came back with: "Log in to continue."

Of course. My browser is logged in. Claude's browser isn't.

This isn't a missing feature. It's an architectural property of where the browser runs.

What each MCP browser tool actually does

Pull their source. What browser_navigate and browser_extract actually mean differs by tool:

Tool	Browser runs in	Session state	Can access your Shopify admin?
Playwright MCP	Fresh Chromium, spawned per session	Empty	No
Browserbase MCP	Managed cloud Chromium	Empty	No
Firecrawl MCP	Server-side crawler	Public web only	No
Stagehand (via MCP wrapper)	Fresh Playwright	Empty unless cookies injected	No
Bardeen (Chrome extension)	Your Chrome	Your real session	Yes — but no MCP, visual-only

Four technically excellent tools, all architecturally blind to 90% of your web workflow: SaaS accounts, vendor portals, internal dashboards, email.

Why this gap is architectural, not a bug

Three structural reasons cloud browsers can't "just add login support":

1. Cookies can't legally or safely travel to a cloud browser. Your shopify.com session cookie is scoped to your device. Handing it to a third party = handing over the auth token. Liability nobody wants.

2. Browser fingerprints betray cloud origin. Google / LinkedIn / most banks flag logins from unfamiliar IP and device fingerprints. A browser spawned on AWS trips 2FA, device verification, or an outright block, even with the right cookie.

3. 2FA and re-auth flows need you physically present. TOTP prompts. Device approval. Push notifications. The cloud browser can't answer; the human at the cloud browser's owner's desk also can't — because they're not at that browser.

Hard wall. No amount of engineering routes around it.

What real agent work actually looks like

The automations people want from Claude Code / Cursor / Cline are mostly auth-walled:

Summarize my Shopify unfulfilled orders to Slack
Pull this week's closed-won HubSpot deals into Notion
Check my Wise balance and alert if low
Download PDF invoices from vendor portals
Read Gmail for contracts and summarize obligations
Verify staging deployed the latest commit
Route Intercom conversations to CRM

None solvable by Playwright MCP, Browserbase, or Firecrawl. Not because they're bad — but because what they do is operate on the public web. Your real work lives behind logins.

The category: authenticated browser MCP

Requirements:

Runs in your real Chrome (or your real Playwright profile)
Operates on your real session — whatever cookies/localStorage/IndexedDB are already there
Accessible to AI agents over MCP — not visual-only
Nothing leaves your machine — blast radius = your local device

Tap is currently in this quadrant. Not because it's clever — because the constraints force this shape, and no other server started from these constraints.

Full architectural breakdown + the "compile once, replay forever" angle: The authenticated browser MCP →

Install: brew install LeonTing1010/tap/taprun. Free during v0.x.

Playwright MCP vs Tap vs Browserbase — where the credentials live

Leon — Wed, 13 May 2026 05:00:59 +0000

If you've evaluated MCP servers for browser automation, you've seen three credible options: Microsoft's Playwright MCP, Browserbase + Stagehand, and Tap. They look like substitutes. They aren't.

Same tool slot. Three different products.

Each answers a different question. The architectural axis that separates them: where does the browser actually run, and what credentials does it see?

	Where browser runs	Logged-in / cookies	Tokens per run	Trust boundary
Playwright MCP	Local Playwright (headless or `--extension`)	△ `--extension` reuses your Chrome; headless = none	Per-call (LLM at runtime)	Your machine
Browserbase + Stagehand	Browserbase cloud	✗ credentials must be uploaded	Per-call (LLM at runtime)	Browserbase's cloud
Tap	Your own Chrome (extension)	✓ uses your live session	0 (deterministic replay)	Nothing leaves the machine

Tokens: the architecture, not the engineering

For "scrape the top 30 stories from HN as JSON," runtime-extracting tools call an LLM each invocation. We measured ~9,600 tokens/call on a naive extraction loop.

Tap compiles a deterministic 11-op plan once with AI, then replays at zero tokens. Across 100 queries between drift events, that's 849× cheaper than re-extracting every time.

This advantage only exists because the workload repeats. For one-shot research, runtime extraction is the right tool. Tap's amortization math needs "I'll run this again" to be on the table.

Credentials: the dimension where they stop being substitutes

For unauthenticated work (HN, Wikipedia, docs) — pick on price. Doesn't matter.

For anything behind a login it dominates everything else:

Playwright MCP headless: no cookies. --extension flag (recent, good) attaches to your real Chrome — closes most of the auth gap.
Browserbase + Stagehand: your cookies go to their cloud. Real cost some teams accept (multi-tenant SaaS with isolation requirements), real cost others won't (solo founder; SOC2-restricted credentials).
Tap: nothing leaves the machine. The browser IS yours. The cookies ARE your cookies. The MCP server orchestrates but never sees them.

How to pick (one question each)

Does the task need a logged-in session? No → Playwright MCP. Yes → continue.
Is uploading those credentials to a third-party cloud OK? Yes → Browserbase + Stagehand. No → continue.
One-shot or repeated workflow? One-shot → Playwright MCP --extension. Repeated → Tap.

Inverse: each loses where the others win.

Tap loses on one-shot novel-site research — authoring time > runtime extraction.
Playwright MCP loses on repeated workloads against drifting sites — re-pay extraction tokens forever, absorb silent failures.
Browserbase loses when credentials can't legally or operationally leave the machine.

Full version with the token-cost data + drift handling deep-dive: Playwright MCP vs Tap vs Browserbase →

Tap install: brew install LeonTing1010/tap/taprun. Free during v0.x.

Stagehand vs Tap — Compile-Time AI vs Runtime AI for Browser Automation

Leon — Wed, 13 May 2026 05:00:26 +0000

TL;DR: Stagehand is great for one-shot AI browser tasks. The problem is when you need to run the same task daily — its per-run cost is linear, and its output varies between runs by design. Tap compiles your AI understanding into a deterministic JS program once, then replays at zero LLM tokens forever.

The architectural axis

Two ways to put an AI in your browser automation:

Interpreter (Stagehand, browser-use, etc.): an LLM reads the page and reacts at runtime. page.act("click the login button"). Same script, slightly different result each run.
Compiler (Tap): an LLM inspects the page once, emits a deterministic JS plan, then exits. Subsequent runs replay the plan. Zero LLM calls. Same input → same output.

	Stagehand	Tap
LLM calls per run	every step	0 (after compile)
Cost per run	$0.50–$2.00	$0
Consistency	60–95%	100% deterministic
Execution speed	seconds–minutes	<1s
Offline capable	no	yes

When the architecture matters

At 5 runs/day, the cost difference is rounding error. At 100 runs/day, you're paying $50–$200/day to Stagehand. At production scale (10 automations × every 5 min), $3,600/month minimum.

Cost isn't even the worst part. Reliability is.

Run the same Stagehand extraction 100 times and you'll get 15 rows on some, 12 on others, 17 on a third. There's no canonical "correct" output to monitor — variance is the design.

Tap replays the same program. If it returns 15 rows today and 0 tomorrow, that's a real signal. The deterministic baseline is what makes drift detection possible.

Where each fits

Stagehand:

One-off interactions you'll never repeat
You're already deep in Playwright and want AI flexibility
Exploratory data extraction

Tap:

Same task running ≥2 times — economics flip immediately
Production automations where deterministic output matters
MCP integration with Claude Code / Cursor / other agents

They're not mutually exclusive. Use Stagehand for exploration. Use Tap for production. When you find yourself running the same Stagehand script daily — tap forge it.

Full comparison (architecture deep-dive + breakage detection + the heal pipeline): Stagehand vs Tap on taprun.dev →

Tap is local-first — credentials never leave your machine. Install: brew install LeonTing1010/tap/taprun.

Playwright MCP burns 114k tokens for one workflow. Here's why, and what to do about it.

Leon — Wed, 22 Apr 2026 04:50:04 +0000

A recent r/ClaudeAI post measured a single Playwright MCP workflow at 114,000 tokens. Not a complex task — a 7-step navigation + form submission that ran in under a minute. Same workflow as a compiled tap.run: zero tokens.

This isn't "Playwright MCP is bad." It's a structural property of running an LLM at runtime versus compile time.

Where the tokens go

Each Playwright MCP call sends back to the model:

The current page's accessibility tree (~5-15K tokens for a typical SPA)
A screenshot encoded as base64 (~2-8K tokens depending on quality)
The console output since last call
The action result + any error context

The model needs all of that to decide the next action. So a 7-step workflow = 7 × (~15K) = ~100K tokens. Add the schema injection at session start (~1.3K per tool, ~28 tools loaded eagerly = ~36K) and you're at the 114K observed.

The optimisations help — DOM compression, accessibility-only modes, smaller screenshots — but the per-step cost is still proportional to page complexity. Add interaction depth and the cost goes up linearly.

The compiler alternative

The insight tap forge is built on: most browser automation is a known workflow. You're not exploring; you're executing the same task on the same site, repeatedly. The LLM is needed to figure out how the first time. After that, it's overhead.

# First time — LLM authors the program
$ tap forge https://example.com/login → submit
✓ Inspected: form#login, 3 fields
✓ Verified: redirect to /dashboard, status 200
✓ Saved: example/login.tap.js   (47 lines of JavaScript)

# Forever after — no LLM, no tokens
$ tap example login user=alice pass=xxx   # 200ms, $0.00
$ tap example login user=alice pass=xxx   # 200ms, $0.00

For the workflow that cost 114K tokens with Playwright MCP, the equivalent .tap.js file is ~80 lines. It runs in 200ms. Token cost: 0 (after the one-time forge).

When each makes sense

Playwright MCP wins when:

The workflow is unique each time (agent exploration)
The site changes structure between runs (no stable program possible)
You're prototyping and don't yet know what you want to extract

Compiled taps win when:

You run the same workflow more than ~5 times
The site's structural pattern is stable (~95% of sites — most A/B tests don't change DOM, just CSS)
You need monitoring (deterministic output = row count is a signal)
You need offline execution

The break-even is low. Even at $5/MTok, one Playwright MCP run of 100K tokens = $0.50. Five runs = $2.50, more than the entire $9/mo Hacker tier of a compiler-based tool.

Two structural differences worth understanding

1. Output consistency. When the same site/same prompt produces slightly different extractions across runs (the LLM is non-deterministic), monitoring is structurally hard. Row count fluctuation isn't noise — it's the model. With a compiled tap, row count fluctuation IS noise, and you can alert on it.

2. Failure detection. Playwright MCP detects failure reactively — the tool call returns an error, the LLM sees it, retries with a different approach. By the time you notice, tokens are spent and time is lost. Compiled taps detect failure proactively via fingerprint diffing — tap doctor checks if the page structure changed BEFORE the run fires. If drifted, the run doesn't even start.

The benchmark question

Honest comparison: Playwright MCP has been the most flexible browser-agent setup for the past year. The 114K tokens is the price you pay for that flexibility. If your workflow is varied enough that you need it, pay it. If your workflow is the same automation run 1,000 times, paying it is leaving money on the table.

The broader pattern: every browser-agent tool faces this LLM-at-runtime vs LLM-at-compile-time tradeoff. The question isn't "which tool is better" — it's "does my workload repeat enough to amortize a one-time compile?"

For most production scrapers, the answer is yes.

rtrvr.ai vs Taprun: cheaper LLM-at-runtime still isn't zero tokens

Leon — Wed, 22 Apr 2026 04:40:03 +0000

rtrvr.ai is a polished entrant in the browser-agent space. Their architecture is genuinely interesting — "DOM-native" processing with "Smart DOM Compression", 25× cheaper than vision-based alternatives, 81% SOTA accuracy on their reported benchmark. They ship Chrome extension, Cloud dashboard, API, MCP server, CLI, and even a WhatsApp bot. Their landing page lists 10 named competitors. The pricing mirrors Taprun almost exactly — $9.99 / $29.99 / $99.99 / $499.99 per month.

So when I first read their docs, the obvious question was: do they do what Taprun does? Because if they do, Taprun is in trouble.

They don't. And the reason sits on a single architectural line every browser-agent tool has to pick a side of.

The line: is the LLM at authoring, or at runtime?

There are two fundamentally different ways to point an LLM at a browser:

LLM at runtime. The model is called every time the automation executes. Each run is an inference pass. Optimisations make that pass smaller — DOM compression, accessibility trees instead of screenshots, smaller models — but the pass is still there.
LLM at authoring. The model is called once, during setup. It reads the site, figures out the structure, and emits a deterministic program. From then on, the program runs. The LLM never gets called again.

Browser Use, Stagehand, Playwright MCP, and rtrvr.ai all sit on side (1). They differ in how they call the LLM — vision vs DOM, big model vs small model, whole page vs compressed — but not in whether they call it.

Taprun sits on side (2). tap forge runs the LLM once to author a .tap.js file. tap.run executes that file forever with zero inference.

This distinction isn't marketing. It's Python vs compiled C. Both evaluate expressions; one evaluates at runtime, the other at compile time. You pick based on whether the workload repeats.

Where rtrvr.ai is genuinely strong

Credit where it's due. rtrvr gets a lot right:

Form-factor breadth. Chrome extension, Cloud dashboard, Sheets integration, REST API, MCP server, CLI, embeddable widget, WhatsApp bot. Each one hits a different user at a different moment. This is good distribution design.
"No CDP = No Failures" framing. Chrome DevTools Protocol automation is bot-detectable; their architecture avoids it. This is a real reliability argument.
25× cheaper vs vision. Replacing screenshots (~114K tokens) with accessibility trees and DOM compression (~26K tokens) is a meaningful improvement. Their claimed $0.12 per task is genuine progress.
BYOK path. "Bring your own key or local endpoint" collapses their cost to roughly your LLM provider's invoice. Clever.
Explicit competitive comparison. They list 10 competitors by name on their landing page. It's confident. It invites comparison. That's a good sign.

If your use case is agentic exploration — new sites, unknown tasks, one-off interactions — rtrvr is a serious tool. I'd reach for it myself.

Where the LLM-at-runtime model hits a ceiling

The ceiling isn't quality. It's structural.

Per-run cost scales linearly with runs. 26K tokens per task × 1,000 runs/day = 26M tokens/day. At Gemini Flash Lite rates that's real money; at Gemini Pro rates it's ~$260/day. rtrvr's own pricing acknowledges this: the Basic tier is 1,500 credits/month, which at "5 credits/task" is ~300 tasks. A single production workflow running every 5 minutes eats that budget in three days.

Output variance is by design. When the same page, same prompt, same task produces slightly different extractions across runs, you can't build monitoring around it. Row count fluctuation isn't "a bug" when the system is designed to re-interpret the page every time. The 81% SOTA accuracy number is a fine benchmark result, but it means 19% of invocations are wrong in some way, and you don't know which 19%.

"Self-healing" still pays tokens to heal. Every browser-agent tool in this category markets "self-healing". What they mean is: when the selector breaks, the LLM re-runs to figure out the new one. That's real, and it's useful — but it is reactive. The task has already failed (or silently returned garbage) before healing kicks in, and every heal is another inference pass.

What Taprun does differently — structurally, not just cheaper

Taprun moves the LLM to authoring time. Once.

# Authoring: LLM inspects once, emits deterministic code
$ tap forge https://reddit.com/r/programming
✓ Inspected: REST API detected at oauth.reddit.com
✓ Verified: 25 rows, score 95/100
✓ Saved: reddit/hot.tap.js  (pure JavaScript, on your disk)

# Runtime: no LLM, no tokens, same output every time
$ tap reddit hot     # 25 rows, ~200 ms, $0.00
$ tap reddit hot     # 25 rows, ~200 ms, $0.00
$ tap reddit hot     # 25 rows, ~200 ms, $0.00

Because the output is deterministic, monitoring is tractable. Because execution is deterministic, row count is a health signal. Because the program is on your disk, it works offline and doesn't depend on anyone's cloud.

And the "self-healing" axis flips from reactive to proactive:

$ tap doctor --auto reddit hot
✗ selector div.thing — gone since last run
⚠ fingerprint diff: ↑ 2 structural changes
✓ heal bundle ready — current code + git history + page snapshot

tap doctor checks a structural fingerprint before the run fires. If the site drifted, the run doesn't even start — you get a diff of what changed and a bundle your AI agent can patch offline. No retry tokens. No silent bad data.

Where the numbers actually land

Take a workflow that runs every 5 minutes — 288 runs/day, ~8,640 runs/month. Not extreme; this is a single production scraper.

Browser Use: 8,640 × $0.50 = $4,320/month (lower bound)
rtrvr.ai Basic ($9.99): 1,500 credits / 5 per task = 300 tasks. You're over budget by day two. Need Scale ($499.99) — 60,000 credits covers ~12,000 tasks. BYOK path lowers the number but your Gemini bill replaces it.
Taprun Free: 8,640 runs × $0 = $0/month. You keep the $9 if you want AI to forge new taps for you; you keep the $29 if you want auto-repair on cron.

At 10 runs a day, none of this matters. At 10 runs a minute, it's the only thing that matters.

When to pick each

Pick rtrvr.ai when:

You're doing agentic exploration — new sites, undefined tasks, high variance in what you're extracting
You want a polished cloud dashboard and don't mind hosted state
You need WhatsApp or embeddable widget form factors
Your per-task count stays under ~300/month, or you're comfortable with a $499/mo scale tier

Pick Taprun when:

You run the same automation more than once — and want to know the output is the same every time
You want the program on your disk, version-controlled, not saved in someone's dashboard
You want structural fingerprint diffs, not retry loops, as your breakage story
Your scale is "every 5 minutes forever" and you want the bill to stay $9/mo
You want to keep working offline and in sandboxed environments where outbound LLM calls aren't allowed

They're not really competitors — they're different tools for different moments. Use rtrvr to figure out what you want to extract. Use Taprun once you know.

The one-line summary

rtrvr made LLM-at-runtime 25× cheaper than the vision-based baseline. Taprun made it zero. Those aren't points on the same line.

MCP is the authoring layer. Execution should cost zero tokens.

Leon — Wed, 22 Apr 2026 04:15:26 +0000

Two posts on Reddit this month independently measured MCP's token overhead. Both reached the same number: 30–40% more tokens than the CLI equivalent.

"I added Notion, Sentry and Shortcut MCPs and was surprised to see every session starting off with 40% of the context used."
— NoSlicedMushrooms (28 upvotes), r/ClaudeAI

"A batch job with 4 MCP servers blew through our token budget in 2 hours. The schema injection on every turn is the killer."
— tom_mathews, r/ClaudeAI

The "MCP is dead, just use CLI" take followed immediately. But three independent users — in three different threads, on three different subreddits — arrived at the same conclusion: the problem isn't MCP. It's using MCP for the wrong job.

"MCP for the main orchestrator, CLI for sub-agents. Both hit the same backend."
— raphasouthall, r/mcp (48 upvotes)

"MCP makes sense for discovery, not for known workflows."
— tom_mathews, r/ClaudeAI

"Development Tool versus Production Tool. MCP the shit you serve to clients and CLI while building."
— mat8675, r/ClaudeAI

They're all describing the same architecture. And it's the architecture Tap has used from day one.

The Two-Layer Model

Layer 1: MCP (Authoring)
forge.inspect    → AI analyzes the site
forge.verify     → AI tests the program
forge.save       → program saved to disk

AI participates. Tokens consumed. One-time cost.

─────────────────────────────────────────────

Layer 2: CLI (Execution)
tap.run          → program executes

Zero AI. Zero tokens. Deterministic. Forever.

MCP is the authoring layer. It's where AI discovers what the site looks like, what API endpoints are available, which selectors match the data, and how to structure the extraction. This is a one-time process — forge — that produces a .tap.js file.

After that, tap.run executes the program directly. No MCP. No schema injection. No token overhead. The program is JavaScript. It runs in less than a second.

The Numbers

raphasouthall measured MCP overhead precisely for a 21-tool server:

MCPCLI / tap.run
Upfront cost~1,300 tokens (schema injection)0
Per-call cost~800 tokens~750 tokens
After 10 calls~880 tokens/call (amortized)750 tokens/call

For a single forge session (one-time), ~1,300 tokens of overhead is nothing. For 1,000 daily executions? It's the difference between $0 and $135/month.

Tap's architecture makes this explicit: pay the MCP overhead once during forge, then run at zero overhead forever.

How Tap's 40 MCP Tools Don't Blow Up Your Context

The obvious concern: Tap ships 40 MCP tools. With 21 tools costing ~1,300 tokens of schema, 40 tools should cost ~2,500+. That's over 1% of a 200k context window before you even ask a question.

Tap uses deferred tool loading. Only 12 core tools load at session start (~600 tokens). The other 28 — forge, doctor, fix, trace, watch, explain — load on demand, only when the agent actually needs them.

# What loads at session start (Tier 1 — always available)
tap.list   tap.run    tap.doctor   tap.nav
tap.click  tap.type   tap.eval     tap.find
tap.screenshot  tap.runtime  tap.pressKey  tap.upload

# What loads on demand (Tier 2 — disclosed via hints)
tap.fix    tap.explain   tap.trace    tap.watch
tap.refresh   tap.cookies   tap.wait   ...

# Forge tools — only load during forge sessions
forge.inspect   forge.draft   forge.verify   forge.save

This is the same pattern the community arrived at independently:

"Splitting tools into a tiny default set and a second on-demand pack, because dumping every possible tool into session start was where the waste really showed up."
— Organic-Bid-8298, r/mcp

Why Not Just Use CLI for Everything?

Because authoring requires tool discovery. When AI is figuring out how to scrape a site it's never seen before, it needs typed parameters, rich descriptions, and structured responses. That's what MCP does well.

"The one thing MCP does well is when it's tightly integrated (like Claude Code's built-in tools) — that feels natural because they control both sides."
— SmartYogurtcloset715 (8 upvotes), r/ClaudeAI

Tap controls both sides. The MCP server and the CLI are the same binary. The MCP tools call the same functions the CLI calls. The difference is when each is used:

Forge (one-time): MCP tools, because AI needs to discover and iterate
Run (every time): CLI, because the program already exists
Doctor (periodic): either — MCP for interactive diagnosis, CLI for scheduled health checks

The Implication for Browser Automation

Most browser MCP tools are execution-layer tools. They run in the browser on every call. That's where the token cost comes from — not just schema overhead, but the entire page state (accessibility tree, screenshot bytes, console output) flowing into the context window on every interaction.

"Every browser_navigate + browser_snapshot call costs ~1,500 tokens in JSON schema framing — even though the actual useful output is just a few lines of text."
— BagNervous, r/ClaudeAI (Browser CLI author)

Tap's browser tools exist in MCP for authoring only. During forge, AI uses tap.nav, tap.eval, tap.screenshot to understand the page. After forge produces a .tap.js, execution calls the browser directly — no MCP framing, no token overhead, no context window pollution.

The 1,500-token-per-call problem doesn't exist for tap.run. It's not an MCP call. It's a function call.

Health Contracts Catch What Pydantic Can't — semantic validation for scraper output
Programs Beat Prompts — why AI should write code, not run it
The Interface Protocol — 8 operations that replace every browser automation SDK

Facebook scrambles author names with Flexbox order — here's the 5-line diagnostic that proves it isn't custom fonts

Leon — Wed, 22 Apr 2026 04:06:36 +0000

A potential client posted on Reddit asking for a Facebook keyword-post scraper. Their budget: $500. My first instinct after looking at the page was to say no.

Here's what a naive scraper saw when it grabbed the first [role="article"] on the search results page:

oSodnprmmlffgfi1c3mSg0so0d000c0uh1l40llhe09n2991imm38opar · Shared with Public
In 24 months every serious website will talk. Get in before it's crowded.… See more
0:00 / 0:00
SNOWIE.AI
$67 Lifetime Deal!

The · Shared with Public and the post body are readable. But that first line — the one that should be the author's name — is gibberish. Snowie.Ai rendered visually. oSodnprm… returned by textContent.

I jumped to "Facebook ships custom-font character remapping at scale — this is uncompetable, decline the gig." I was wrong. Here's what the actual answer turned out to be, and why I had to write a diagnostic before I could see it.

The seven ways rendered text can escape textContent

Before declaring a site uncompetable, you have to know what you're looking at. There are exactly seven mechanisms by which what a human sees on screen can diverge from what Node.textContent returns:

#	Mechanism	Defeat cost
1	Selector mismatch (not actually anti-scraping — you grabbed the wrong node)	Minutes
2	CSS `::before` / `::after` content rules	Low — read computed style
3	Flexbox `order` reordering (DOM scrambled, CSS re-sorts visually)	Low — sort children by computed `order`
4	Custom font glyph remapping (`.woff2` rebinds codepoints)	High — OCR pixels or reverse each session's font table
5	Unicode homoglyph substitution	Low — NFKC + confusable normalize
6	Canvas pixel rendering (no DOM text at all)	High — OCR only
7	WebAssembly runtime decryption	Extreme — reverse the WASM module, track session keys

Each requires a different defeat strategy with wildly different economics. #1 is free (fix your selector). #4 and #6 start at ~$15K/year to maintain. #7 is measured in tens of thousands.

So the only useful question is: which one does this site use? Without a diagnostic you're guessing — and guessing wrong costs you either a scraping contract you could have fulfilled, or a contract you over-promised on.

The 5-minute diagnostic

I wrote a throwaway script that walks the first [role="article"] and dumps the signals that separate the seven mechanisms:

const el = document.querySelectorAll('[role="article"]')[0];
console.log({
  textContent: el.textContent.substring(0, 300),
  innerText:   el.innerText.substring(0, 300),
  font_family: getComputedStyle(el).fontFamily,
  has_canvas:  !!el.querySelector('canvas'),
  // Also check DevTools → Network → filter .wasm
  child_sample: Array.from(el.children).slice(0, 10).map(c => ({
    tag: c.tagName,
    order: getComputedStyle(c).order,
    text_len: (c.textContent || '').length,
  })),
});

I ran it. The result killed every hypothesis except one:

font_family = system-ui, -apple-system, sans-serif — Facebook is using the OS default font. Mechanism #4 ruled out (no custom .woff2, no glyph remapping).
No <canvas> element. #6 ruled out.
No WASM in network traffic. #7 ruled out.
innerText = "Snowie.Ai\no\ns\no\ne\nt\nS\nd\nn\np\nr\n…". Character-per-line. Flex-column newlines — that's the giveaway.
Child elements of the scrambled container: each contained 1–2 characters with a non-zero order value like order: 17, order: 4, order: 23.

That's the signature of mechanism #3 — Flexbox order reordering. Facebook splits author display names into individual single-character spans and gives each a scrambled order value. The browser's flexbox layout re-sorts them for visual rendering. textContent returns DOM order, which is randomized per render.

And only the author name gets this treatment. Post body, engagement counts, aria-labels, and timestamps are plain text.

The fix is ten lines

// When children are all single-character and at least one has a non-zero CSS order,
// sort by order, concat — that's the real text as the browser would paint it.
const unscramble = (el) => {
  const kids = Array.from(el.children);
  if (kids.length >= 4
      && kids.every(c => (c.textContent || '').length <= 2)
      && kids.some(c => parseInt(getComputedStyle(c).order || '0') !== 0)) {
    return kids
      .slice()
      .sort((a, b) => parseInt(getComputedStyle(a).order || '0')
                    - parseInt(getComputedStyle(b).order || '0'))
      .map(c => c.textContent)
      .join('');
  }
  return (el.textContent || '').trim();
};

With this helper wired in, author_name extraction went from "oSodnprm…" to "Snowie.Ai". Everything else — text, like_count, lang — was already plain. No OCR, no WASM reversal, no font-table reverse engineering. Ten lines.

Two honest caveats

No native post permalink. Facebook search result cards don't expose a /posts/<id>/ href — the visible links are profile URLs with encrypted __cft__ tracking params. When no native ID is found, my scraper emits an fb_<hash> id stable across runs for the same author+body combination. Downstream deduplication still works; you just can't deep-link back to the post.
Lazy-loaded pagination. The search page initially renders 1–3 results; the rest arrive as the user scrolls. A production scraper drives scroll events in a loop until limit is satisfied or no new articles appear.

The generalizable lesson

Every time I've been asked "can you scrape site X?" and said no without running a diagnostic, I was wrong at least half the time. The reflex is understandable — the DOM returns garbage, you assume the worst — but the cost asymmetry is severe. Five minutes of running the seven-factor diagnostic versus walking away from a paying contract.

The protocol is:

Extract textContent of the element the human can read. If it matches visible text → mechanism #1, fix your selector.
Check getComputedStyle(el).fontFamily. Points to a custom .woff2? Suspect #4.
Walk children. If many are single-character and have non-zero order? Mechanism #3, unscramble with ten lines.
Check for <canvas> siblings and WASM network requests. Both absent? #6 and #7 are ruled out.
Only after this do you get to say "un-scrapable."

Facebook is not un-scrapable for keyword search. They've applied a low-cost obfuscation to one high-value field (the author name you might use for audience targeting) and left everything else alone. That's a reasonable product decision — enough friction to discourage casual scrapers, not enough to break accessibility tooling that depends on rendered text. As a side effect, someone who runs the diagnostic wins.

Question for the room: anyone here actually shipped against mechanism #4 (custom font glyph remap) or #7 (WASM decryption) in production? Curious what the maintenance cost looks like once you account for font table rotation / WASM session-key changes. Drop your war stories in the comments.

Original post with full JSON-LD metadata + Tap reference: taprun.dev/blog/facebook-anti-scraping-flexbox-order

We Ran 15,000 Browser Automations. The Failure That Matters Most Is Invisible to Your Monitoring.

Leon — Mon, 20 Apr 2026 01:37:47 +0000

Half of our YouTube automation runs return 0 rows. Status: ok. No exception thrown. No error logged. The program finishes in about 20 seconds and hands back an empty array, silently.

We didn't know this until we looked at the traces.

Over the past few months, Tap has executed 15,455 automation programs across real websites — Reddit, GitHub, Bilibili, Xiaohongshu, YouTube, Twitter, and more. The traces are structured JSON: site, tap name, status, rows returned, duration, error message if any. We analyzed all of them. What we found disagrees with the conventional mental model of how browser automations break.

The Reliability Table Nobody Publishes

Here are the actual numbers. Each row is a real platform. Hard error rate is the fraction of runs that threw an exception. Silent empty rate is the fraction of successful runs (status: ok) that returned zero rows.

Platform	Total runs	Hard error %	Silent empty %	Effective failure %	Avg duration
Twitter / X	128	0%	0%	0%	154 ms
GitHub	437	0%	0.2%	0.2%	3,644 ms
Reddit	688	13.8%	6.4%	19.4%	4,075 ms
Xiaohongshu	361	15.8%	6.6%	21.5%	9,054 ms
Bilibili	259	30.1%	18.2%	43.1%	2,666 ms
Weibo	38	36.8%	0%	36.8%	4,644 ms
YouTube	49	30.6%	50.0%	65.3%	20,273 ms

GitHub and Twitter are near-zero failure. YouTube is the opposite: two out of three runs either throw an error or return nothing. The 50% silent empty rate is more alarming than the 30.6% hard error rate — at least hard errors are visible.

The Failure Mode You're Not Tracking

Here's the part that surprised us most. We expected "element not found" to be the dominant failure. The conventional model: selector breaks, automation throws, you fix the selector. Obvious, visible, actionable.

The actual numbers:

Element not found (explicit selector failure): 5 occurrences
Cannot read properties of undefined (reading 'url') (implicit structural failure): 176 occurrences

The ratio is 35:1 in favor of the failure mode your monitoring doesn't catch.

What does Cannot read properties of undefined (reading 'url') actually mean? The selector found something. The extraction ran. The automation didn't crash during navigation. It returned data — a list of objects — but the objects no longer have a url field. The downstream code hits undefined and throws.

This is a structural drift failure, not a selector failure. The DOM element is there. The page loaded. The program traversed the right nodes. But the shape of the data those nodes return has changed — a field that was always present quietly stopped being present.

The sites affected, in order of frequency:

Bilibili (videos, articles, analytics, benchmark, content-ideas, stats, trending)
Algora bounties
IssueHunt bounties
Douyin (search, hot)
Zhihu search
X/Twitter (notifications, trending)
Xiaohongshu search
Weibo search
Baidu hot
Hacker News hot
ProductHunt forum comments
TechCrunch latest
Ars Technica news

That list spans Chinese platforms, Western platforms, social networks, news sites, and developer bounty boards. The failure mode is not platform-specific. It's inherent to how browser automation interacts with any site that changes its rendering.

Why Your Monitoring Doesn't See This

Consider what's happening at the infrastructure layer when this failure occurs:

HTTP response: 200
Page loaded successfully: yes
Navigation completed: yes
Automation process exited: 0
Exception thrown: eventually — but only after the extraction, when downstream code accesses the malformed object

Most monitoring stacks see a successful process exit followed by an application exception. But the harder version is when the object does have a url field — it just points to something different. A related item section. A sponsored result. A pagination link that got included in the data array.

In those cases: status ok, rows returned, no exception, wrong data. Pydantic passes. Row count checks pass. Prometheus reports a healthy process. OTel has nothing to report. The only signal is semantic: these URLs aren't the URLs you wanted.

The Platform Reliability Pattern

GitHub and Twitter have published APIs that their web UIs reflect. A GitHub repository page structure is stable because it's owned by the same team that maintains the underlying data model.

Bilibili, Douyin, Xiaohongshu, and Weibo run aggressive A/B experiments on their rendering layer — sometimes multiple experiments simultaneously for different user cohorts. The same page, loaded twice in the same session, can return different DOM structures. The url field on a video card might be in item.url in one experiment variant and item.jumpUrl in another.

YouTube lands in between for a different reason: aggressive anti-bot measures that return empty results instead of blocking requests. A request that would return a 429 or CAPTCHA on a naive scraper returns 200 with an empty content container on a logged-out browser session. Status: ok. Rows: 0. Duration: 20 seconds of wasted compute.

What Catches This, and What Doesn't

Tool	Catches hard error?	Catches silent empty?	Catches wrong data (right shape)?
Process monitoring	✓	✗	✗
Pydantic / type validation	✓	Sometimes	✗
Row count threshold	✓	✓	✗
Health contracts (range + pattern + drift)	✓	✓	✓
Structural fingerprinting	✓	✓	Signals change, not interpretation

The only layer that catches all three failure classes is a contract that validates semantics — not just shape. A min_rows check catches silent empties. A pattern check on URLs catches wrong-source data. A drift check catches distribution shifts that look valid but represent changed behavior.

What We'd Do Differently

Treat silent empties as first-class failures. A run returning zero rows should be suspicious by default. Most automations that legitimately return zero rows are edge cases. Most that return zero rows unexpectedly are broken. The difference is detectable with a min_rows contract.

Fingerprint before running, not after. The structural drift that causes Cannot read properties of undefined is detectable in the DOM before you run your extraction logic. A fingerprint check is cheaper than a full tap execution.

Treat Chinese platforms as a separate reliability tier. The A/B experiment cadence is genuinely different. A tap targeting Bilibili needs shorter contract drift windows and more frequent health checks than one targeting GitHub.

Duration is a signal. Our YouTube taps average 20 seconds per run and fail 65% of the time. That's not slow extraction — that's waiting for content that's not coming. A timeout contract that fires at 8 seconds would catch most of these early.

The trace data from 15,455 runs is the most honest answer we have to "what actually breaks in browser automation?"

The answer: silent structural drift, not explicit selector failure. The sites that change fastest break most. The failures that matter most are the ones that look like success.

Built with Tap — browser automation programs that run forever.

Search arXiv in One Command — No API Key, No Tokens

Leon — Tue, 07 Apr 2026 09:46:59 +0000

Keeping up with AI research is exhausting. New papers drop daily. Most "paper discovery" tools require an account, burn API tokens on every search, or give you a bloated UI when all you wanted was a list.

Here's what I use instead:

npx -y @taprun/cli arxiv search --keyword "LLM" \
  | npx -y @taprun/cli sort --field published \
  | npx -y @taprun/cli table

Output: 20 papers sorted newest-first, with title, authors, published date, abstract, and URL — in under 2 seconds.

No account. No API key. No AI tokens consumed. First run downloads a ~30MB binary and caches it; every subsequent call is instant.

Why This Works

The arXiv Atom API has been public and stable for 15 years. arxiv/search is a Tap skill — a 20-line deterministic program that calls it directly. AI wrote it once. It runs forever at $0.

The Unix Pipeline Model

Every Tap skill is a composable Unix filter. Data flows as JSON:

# Search only
npx -y @taprun/cli arxiv search --keyword "RAG"

# Search → sort by date → filter recent → display
npx -y @taprun/cli arxiv search --keyword "RAG" \
  | npx -y @taprun/cli sort --field published \
  | npx -y @taprun/cli filter --field published --gt "2025-01-01" \
  | npx -y @taprun/cli table

Each command reads JSON from stdin, writes JSON to stdout. Exactly how Unix tools work.

Use It in CI

# GitHub Actions — daily paper digest
- name: Paper digest
  run: |
    npx -y @taprun/cli arxiv search --keyword "LLM agents" \
      | npx -y @taprun/cli sort --field published \
      | npx -y @taprun/cli limit --n 5 \
      > papers.json

200+ Skills, Same Pattern

arXiv is one of 200+ community skills that follow the same pattern: call an API, return structured rows, compose with any other skill.

Skill	Returns
`arxiv/search --keyword X`	Papers matching keyword
`github/trending`	Trending repos today
`reddit/search --keyword X`	Posts matching keyword
`stackoverflow/hot`	Hot questions

Browse and contribute: github.com/LeonTing1010/tap-skills

Try it now — no install required, works on any machine with Node.js:

npx -y @taprun/cli arxiv search --keyword "your topic" \
  | npx -y @taprun/cli sort --field published \
  | npx -y @taprun/cli table

→ taprun.dev

16 Comments, 6 Insights: Using HN and Reddit as a Positioning Lab

Leon — Mon, 06 Apr 2026 05:18:09 +0000

I spent an afternoon writing 16 comments across Hacker News and Reddit. Not to promote anything — to test which pain points actually resonate with developers.

The result: 6 content principles I now use to decide what to build, what to write about, and how to position my product. Here's the method.

The Method: Comments as Micro-Experiments

The premise is simple: a comment is the cheapest possible A/B test.

Writing a blog post takes hours. A landing page rewrite takes days. A comment takes 2 minutes. If it gets upvoted, the angle works. If it's ignored, you saved yourself a blog post nobody would read.

The process:

Find hot posts in your domain (automation, scraping, developer tools, AI)
Write a comment that tests a specific angle — one pain point, one insight
Track which angles get traction
Turn validated angles into blog posts and landing page copy

I posted across 10 subreddits and HN front-page posts spanning AI, infrastructure, open source, and developer tools.

The 6 Insights

#1 Silent failure is the universal pain

I commented on Gallery-dl's DMCA move (HN front page) and r/webscraping's "endgame for scraping" (104 upvotes). Both times, the angle that resonated was: the hard part isn't writing a scraper — it's knowing when it breaks.

"Most scrapers fail silently — they return empty arrays for days before anyone notices."

Principle: Lead with the maintenance problem, not the creation problem. Everyone can build a scraper. Nobody can keep it running.

Solution: Health contracts. Every program defines what "working" means: minimum rows, required fields. tap doctor checks all programs in one command. When something breaks, you know in seconds — not weeks.

health: { min_rows: 5, non_empty: ["title", "url"] }

$ tap doctor
hackernews/hot    ✔ ok     30 rows
reddit/hot        ✘ fail   0 rows — selector changed
  ↳ auto-healing...

#2 Cost anxiety is real and specific

Caveman hit 727 points — a post about reducing LLM token usage. Nanocode (177 points) was about self-hosting Claude Code to understand the real cost. Developers aren't just curious about AI costs — they're anxious about them.

Principle: Use exact numbers. "$1.05 per run" and "300x cheaper" land. "More affordable" doesn't. Developers think in math, not adjectives.

Solution: The compiler model. AI runs once at authoring time (~$0.15), produces a deterministic program, and every subsequent execution is $0. Fifty daily automations: $18,000/year with AI agents vs ~$60/year with compiled programs.

#3 "Open-source alternative" is not a value prop

The Modo post ("open-source alternative to Cursor and Windsurf") had only 2 comments despite being on the front page. My feedback: users don't switch tools for ideology — they switch for workflow improvements.

"The README should lead with a concrete before/after: 'In Cursor you do X in 5 steps, in Modo you do it in 1.' That's what converts users."

Principle: Show the delta, not the category. "I'm like X but open source" tells users nothing about why they should switch.

Solution: Concrete comparison. Browser Use: $0.50–$2.00/run, 60–95% reliability, 30–120s. Tap: $0/run, 100% deterministic, 1–5s. Same task, measurable difference.

#4 Local-first is having a moment

Three unrelated posts all trended around the same theme:

Gemma 4 on iPhone (496 points) — on-device AI inference
Zero-dependency browser IDE (r/opensource) — works offline, no npm
Nomad offline media server (r/selfhosted) — works without internet

Developers are increasingly allergic to tools that phone home.

Principle: "Runs on your machine, works offline" is a feature worth highlighting, not an implementation detail to bury.

Solution: Tap programs are plain .tap.js files that execute locally. No API calls at runtime, no data leaving your device, no cloud dependency. They work on a plane, in a cabin, wherever your laptop goes.

#5 Legal pressure on scraping is accelerating

Gallery-dl's DMCA notice trended on both HN and r/programming simultaneously. The pattern: open-source scraping tools face increasing legal pressure.

Principle: API-first data access is both technically superior and legally safer. Position accordingly.

Solution: tap.fetch() calls site APIs directly — structured JSON, stable endpoints, no DOM parsing. Only falls back to browser rendering when no API exists. Less breakage, less legal surface area.

#6 Infrastructure beats features

Switzerland's 25 Gbit internet (315 points, 249 comments) wasn't about speed — it was about structural fairness. Open fiber access vs. local monopolies.

The parallel to automation tooling: AI agents at $1/run create a cost barrier. Deterministic programs at $0 are infrastructure.

Principle: Frame your tool as infrastructure people own, not a service they rent.

Solution: Every .tap.js is a file you own. Git-versionable, diffable, composable. Cancel your subscription and your programs keep running. No vendor lock-in, no API keys required at runtime.

The Playbook

If you're building a developer tool and struggling with positioning:

Don't start with a landing page. Start with 10 comments on relevant posts.
Each comment tests one angle. One pain point, one insight, one framing.
Upvotes = validation. High-scoring posts where your comment resonates = confirmed pain point.
Silence = signal too. If nobody engages with your angle, it's not a pain point.
Turn validated angles into content. Blog post from the best angle. Landing page copy from the specific phrases that worked.
Never link your product in comments. Share expertise. Build credibility. The product link lives on your profile.

Comments are conversations. Conversations reveal what people actually care about. That's worth more than any amount of competitor analysis.

The tool I used to validate these insights: Tap turns AI into a compiler for browser automation. AI writes a program once, then it runs forever at $0. The positioning came from the comments. The product came from the pain.

Programs Beat Prompts: Why AI Should Write Code, Not Run It

Leon — Mon, 06 Apr 2026 05:04:19 +0000

Every AI browser agent works the same way: send a prompt, burn tokens, get a result. Next time you need the same result? Send the prompt again. Burn more tokens.

This is the prompt loop, and it's why AI automation is expensive, unreliable, and slow.

There's a better model: AI writes a program once, then the program runs forever at zero cost.

The Prompt Loop Problem

AI browser agents like Browser Use, Stagehand, and computer-use tools are impressive demos. Point an LLM at a website, tell it what to do, watch it click around. Magic.

Until you need it to work reliably. At scale. Every day.

"The program cost $1.05 to run. So doing it at any scale quickly becomes a little bit silly."
— rozap, Hacker News

The math is brutal. An AI agent that scrapes one site costs ~$1 per run. Run it daily across 50 sites and you're spending $1,500/month on data collection that a deterministic script would do for free.

But cost isn't even the worst part. Reliability is.

"If each step has a .95 chance of completing successfully, after not very many steps you have a pretty small overall probability of success."
— rozap, Hacker News

95% per step sounds good. A 10-step workflow? 60% overall. Twenty steps? 36%. Every AI call is a coin flip weighted slightly in your favor — but over enough steps, the house always wins.

The Compiler Model

There's a pattern from 60 years of computer science that solves this: compilation.

A compiler reads high-level code once, produces optimized machine code, and that machine code runs billions of times at zero marginal cost. The compiler is expensive. The output is free.

Apply this to browser automation:

	Prompt model (interpreter)	Program model (compiler)
AI cost per run	$0.50 – $2.00	$0.00
Reliability per run	60 – 95%	100% deterministic
Speed	30 – 120s (LLM thinking)	1 – 5s (direct execution)
AI usage	Every run	Only at authoring time

The insight: AI is the compiler, not the runtime. Use AI to understand the interface, write a deterministic program, and then execute that program forever without AI.

What This Looks Like

# Step 1: AI observes the interface (the "compile" step)
$ tap forge "get trending repos from GitHub"
Inspecting github.com...
Found API: api.github.com/search/repositories
Verifying: 25 rows, all fields present
✔ Saved: github/trending.tap.js

# Step 2: Program runs forever at $0 (the "execute" step)
$ tap github trending
25 rows (890ms) Cost: $0.00

The first command uses AI. Every subsequent run is pure code — no LLM, no tokens, no variability.

Why Programs Win

1. Deterministic means debuggable

When a prompt fails, you get "the AI didn't understand the page." When a program fails, you get a stack trace, a line number, and a selector that changed. One is a mystery. The other is a bug you can fix.

2. Programs compose

Prompts are isolated. Each one starts fresh, with no memory of what came before. Programs call other programs:

// This tap calls two other taps and combines the results
async run(tap) {
  const repos = await tap.run("github", "trending")
  const stars = await tap.run("github", "stars", { user: "me" })
  return repos.filter(r => !stars.includes(r.name))
}

Composition is free. No extra tokens. No prompt engineering to maintain context across steps.

3. Programs have contracts

A prompt returns whatever the AI decides. A program has a health contract:

health: {
  min_rows: 10,         // must return at least 10 results
  non_empty: ["title"]   // title field can't be empty
}

When the site changes and the program breaks, the contract catches it immediately. No silent failure. No stale data for days.

4. Programs version naturally

A .tap.js file is just JavaScript. It goes in Git. You get diffs, blame, history, rollback. Try doing that with a prompt chain stored in a vector database.

When You Do Need AI Again

Programs aren't magic. Websites change. When they do:

$ tap doctor
github/trending   ✔ ok     25 rows
reddit/hot        ✘ fail   0 rows  — selector changed

# Doctor detected the break. Re-forge just that one:
$ tap doctor --auto
Re-forging reddit/hot...
✔ Fixed: reddit/hot.tap.js (new selectors)

AI is called once to fix the program. Then it runs at $0 again until the next change. You pay for intelligence only when the world changes — which is 1% of the time, not 100%.

The Numbers

For a real workload of 50 automations running daily:

	Prompt-per-run	Programs
Daily AI cost	$50	$0
Monthly AI cost	$1,500	~$5 (occasional re-forge)
Annual AI cost	$18,000	~$60
Reliability	60 – 95%	100% (until site changes)
Speed per run	30 – 120s	1 – 5s

300x cost reduction. 20x faster. Deterministic. The math doesn't require a sales pitch.

Try it:

# Install (macOS / Linux)
curl -fsSL https://taprun.dev/install.sh | sh

# Forge your first program
tap forge "get top stories from Hacker News"

# Run it forever at $0
tap hackernews hot

Getting started · GitHub · 200+ community taps included

Your AI Browser Agent Costs $3,600/month. Here's How to Make It $0

Leon — Sun, 05 Apr 2026 15:08:47 +0000

A developer recently documented burning through 180 million tokens per month — $3,600 — running AI browser agents. That's not a typo.

The browser-use community (78K GitHub stars) is full of users asking the same question:

"I have a recurring task meant for webscraping to be done every 5 min. I do not want to use too many tokens. Is it possible to repeat the tasks?" — browser-use #494

"My business scenario requires solidifying the agent's execution process into a tool. I noticed save_as_playwright_script is commented out." — browser-use #4519

"Running the default task took 12 minutes on M3 Max, 36GB RAM" — browser-use #957

The problem is architectural: every run uses AI tokens, even when you're doing the exact same thing for the 1,000th time.

The Interpreter vs. Compiler Model

Today's browser agents work like interpreters — AI reasons about every click, every scroll, every form fill, every single time:

Interpreter (browser-use, Stagehand, Operator):
  Run 1:    AI reads page → decides action → executes    ($0.01)
  Run 2:    AI reads page → decides action → executes    ($0.01)
  Run 100:  AI reads page → decides action → executes    ($0.01)
  Run 1000: AI reads page → decides action → executes    ($0.01)
  Total: $10.00 (and growing)

But what if AI could compile the workflow once, then replay it forever?

Compiler approach:
  Run 1:    AI inspects page → generates program          ($0.04, one-time)
  Run 2:    Program runs deterministically                 ($0.00)
  Run 100:  Program runs deterministically                 ($0.00)
  Run 1000: Program runs deterministically                 ($0.00)
  Total: $0.04 (forever)

This isn't hypothetical. Tap implements this exact pattern:

forge inspect — Analyzes the page (framework, SSR state, APIs, DOM structure). Zero AI tokens.
AI generates a .tap.js program — One-time cost (~$0.04).
tap run — Executes the program forever. $0.00 per run.

Why API-First Beats DOM Replay

Most record-and-replay tools (including browser-use's workflow-use) capture DOM interactions — clicks, typing, scrolling. This breaks when the UI changes.

The better approach: extract via API when possible, DOM only as fallback.

Most modern websites have internal APIs (Next.js __NEXT_DATA__, Nuxt SSR state, REST endpoints). Calling the API directly is:

100x more reliable than simulating clicks
Immune to UI redesigns
Faster (no rendering needed)

For example, getting Hacker News front page:

// DOM approach (fragile):
document.querySelectorAll('.athing').forEach(row => { ... })

// API approach (robust):
const data = await fetch('https://hacker-news.firebaseio.com/v0/topstories.json')

Real Numbers

Metric	AI Agent (per run)	Compiled Program (per run)
Cost	$0.003–0.01	$0.00
Speed	12 min (reported)	5 seconds
Reliability	Varies (AI hallucinations)	Deterministic
Tokens	1K–10K per action	0

At 100 runs/day:

AI agent: $30–300/month
Compiled program: $0.04 total (one-time forge cost)

The Takeaway

If you're running the same browser task more than once, you're overpaying by 100–1000x. The future isn't smarter agents — it's agents that are smart once and produce deterministic programs.

Token prices are falling 10x/year. But $0 will always beat any price.

Tap is open source. 208 pre-built programs across 77 sites. One binary, zero dependencies.

Try it: taprun.dev | GitHub

DEV Community: Leon

The authenticated browser MCP — why cloud tools can't see your logged-in state

What each MCP browser tool actually does

Why this gap is architectural, not a bug

What real agent work actually looks like

The category: authenticated browser MCP

Playwright MCP vs Tap vs Browserbase — where the credentials live

Same tool slot. Three different products.

Tokens: the architecture, not the engineering

Credentials: the dimension where they stop being substitutes

How to pick (one question each)

Stagehand vs Tap — Compile-Time AI vs Runtime AI for Browser Automation

The architectural axis

When the architecture matters

Where each fits

Playwright MCP burns 114k tokens for one workflow. Here's why, and what to do about it.

Where the tokens go

The compiler alternative

When each makes sense

Two structural differences worth understanding

The benchmark question

rtrvr.ai vs Taprun: cheaper LLM-at-runtime still isn't zero tokens

The line: is the LLM at authoring, or at runtime?

Where rtrvr.ai is genuinely strong

Where the LLM-at-runtime model hits a ceiling

What Taprun does differently — structurally, not just cheaper

Where the numbers actually land

When to pick each

The one-line summary

MCP is the authoring layer. Execution should cost zero tokens.

The Two-Layer Model

The Numbers

How Tap's 40 MCP Tools Don't Blow Up Your Context

Why Not Just Use CLI for Everything?

The Implication for Browser Automation

Related

Facebook scrambles author names with Flexbox order — here's the 5-line diagnostic that proves it isn't custom fonts

The seven ways rendered text can escape textContent

The 5-minute diagnostic

The fix is ten lines

Two honest caveats

The generalizable lesson

We Ran 15,000 Browser Automations. The Failure That Matters Most Is Invisible to Your Monitoring.

The Reliability Table Nobody Publishes

The Failure Mode You're Not Tracking

Why Your Monitoring Doesn't See This

The Platform Reliability Pattern

What Catches This, and What Doesn't

What We'd Do Differently

Search arXiv in One Command — No API Key, No Tokens

Why This Works

The Unix Pipeline Model

Use It in CI

200+ Skills, Same Pattern

16 Comments, 6 Insights: Using HN and Reddit as a Positioning Lab

The Method: Comments as Micro-Experiments

The 6 Insights

#1 Silent failure is the universal pain

#2 Cost anxiety is real and specific

#3 "Open-source alternative" is not a value prop

#4 Local-first is having a moment

#5 Legal pressure on scraping is accelerating

#6 Infrastructure beats features

The Playbook

Programs Beat Prompts: Why AI Should Write Code, Not Run It

The Prompt Loop Problem

The Compiler Model

What This Looks Like

Why Programs Win

1. Deterministic means debuggable

2. Programs compose

3. Programs have contracts

4. Programs version naturally

When You Do Need AI Again

The Numbers

Your AI Browser Agent Costs $3,600/month. Here's How to Make It $0

The Interpreter vs. Compiler Model

Why API-First Beats DOM Replay

Real Numbers

The Takeaway