DEV Community

lipski
lipski

Posted on

I built a browser daemon for AI coding agents because Playwright wasn't enough

The problem

I use Claude Code on Windows — Anthropic's coding-agent CLI that runs alongside my editor. It drives my codebase through a pipeline of exec-style tool calls.

One day I wanted Claude to open a web UI, log in to my account on a SaaS dashboard, export a CSV, and diff it against yesterday's. Simple task. I reached for Playwright.

It didn't work. Here's why.

Claude Code's Bash tool can start a long-running process with a run_in_background flag and read its stdout later. But it cannot write to that process's stdin after launch. So any stdin-based REPL — Playwright's Inspector, a Node readline loop, anything interactive — dies the moment the first tool call returns. The browser session closes, cookies are gone, and the next Claude turn has to re-login.

I needed a persistent browser. One login. Many turns. Live for days.

The architecture

The solution is obvious in hindsight: HTTP-RPC. Instead of a stdin REPL, run the browser-holding daemon as a small HTTP server on 127.0.0.1:, and have each agent tool call fire a one-shot CLI client that POSTs a JSON command to it.


The daemon binds to an ephemeral port (server.listen(0)), writes the chosen port into a lockfile at %LOCALAPPDATA%\NativeWright\daemon.json (or the platform equivalent), and handles POST /cmd requests with a minimal command table — goto, click, fill, type, shot, etc.

The CLI client reads the lockfile to find the port, posts the command as JSON, prints the response, and exits. Each agent tool call is a one-shot. The browser stays.

Why Patchright, not Playwright

My target sites include Google (Gemini), LinkedIn, ChatGPT — anything with non-trivial anti-bot detection. Vanilla Playwright gets flagged immediately: navigator.webdriver === true, weird Runtime.enable CDP events leaking into the page, obvious CLI flags like --enable-automation.

Patchright is a drop-in Playwright fork with the detection vectors patched out. Chromium-only, maintained by the community. Same API as Playwright. I just require it instead.

const { chromium } = require('patchright');

const context = await chromium.launchPersistentContext(userDataDir, {
  channel: 'chrome',
  headless: false,
  viewport: null,
  acceptDownloads: true,
});
Enter fullscreen mode Exit fullscreen mode

That's the entire browser setup. One call.

Why a human-behavior layer

Stealth-patched browser is half the story. The other half is input.

Modern anti-bot systems don't just fingerprint the browser — they fingerprint the input stream. Mouse events at exact pixel coordinates with zero jitter. Keystrokes at uniform 50ms intervals. Scroll events with identical deltas. Log-in forms filled in 5 ms flat. These are all easy tells.

A sophisticated stealth-patched browser with robotic input is worse than a vanilla Playwright with patient human input. The input layer has to match the browser layer.

So I wrote src/human.js. It's ~500 lines, has zero dependencies beyond Node built-ins, and runs entirely on Patchright's existing page.mouse.* and page.keyboard.* primitives. Every interaction command in NativeWright — click, fill, type, hover, press, scroll — routes through it by default. Opt out per-call with --raw=true when you need robotic precision (invisible elements, programmatic fills, scripted logins where slow typing would itself look suspicious).

Here's what it covers:

Mouse paths

Real human mouse movement is curved (your wrist rotates, your arm has momentum) and non-uniform in velocity (slow start, fast middle, slow precise arrival). NativeWright generates cubic Bézier paths between two points:

function samplePath(from, to, rng) {
  const { p1, p2 } = buildControlPoints(from, to, rng);
  const easing = pickEasing(rng);
  const path = [];
  for (let i = 1; i <= steps; i++) {
    const t = easing(i / steps);
    const pt = bezier(from, p1, p2, to, t);
    pt.x += gaussian(rng) * jitter;
    pt.y += gaussian(rng) * jitter;
    path.push(pt);
  }
  return path;
}
Enter fullscreen mode Exit fullscreen mode

Each point gets per-sample Gaussian jitter. Three different easing curves are picked randomly per move (ease-in-out, ease-out, near-linear) so detectors can't fingerprint a single velocity profile. Longer moves have a ~20% chance of overshoot-and-correct (fling past, pause, snap back) — what real humans do with a mouse they're flinging across the screen.

Fitts-law timing

The total duration of a mouse move scales with distance and target width per Fitts's law:

duration ≈ 80 + 90 * log2(distance / width + 1) ms (log-normal-jittered)

Tiny distant targets take longer. Big close targets are fast. Same asymmetry real users have.

Keystroke cadence

Typing is log-normal, not uniform. Per-character base mean varies:

function charBaseMs(ch) {
  if (ch === ' ') return 140;
  if (/[.,!?;:]/.test(ch)) return 170;
  if (/[0-9]/.test(ch)) return 130;
  if (/[A-Z]/.test(ch)) return 150;  // Shift hold
  if (/[a-z]/.test(ch)) return 95;
  return 140;
}
Enter fullscreen mode Exit fullscreen mode

Post-space and post-punctuation pauses are extended (cognitive break). Occasional mid-sentence hesitations of 700-2200ms simulate thought. Per-key down-up dwell times are independently log-normal.

Bonus: a typo simulator inserts the wrong neighboring QWERTY key ~0.8% of the time, pauses (the "noticing" beat), presses Backspace, corrects. Auto-disabled for password/OTP/secret/CVV fields via attribute sniffing — no one wants a typo in their API token.

Scroll physics

page.mouse.wheel() with variable tick magnitude (60-180 px), log-normal inter-tick gaps, occasional longer "reading" pauses, and edge-detection: polls window.scrollY and stops after two consecutive ticks without progress. No infinite-wheeling on short pages.

Tradeoffs I accepted

Stealth-patched Chrome runs with --disable-blink-features=AutomationControlled to hide navigator.webdriver. Newer Chrome versions show a yellow "unsupported flag" warning bar when this flag is set. I tried stripping the flag — webdriver immediately becomes visible. That flag is the entire stealth. The warning is visible only to the human looking at the Chrome window; pages inside can't read it.

Patchright also includes --no-sandbox in its default args by default. That's a bot-telltale AND it triggers its own yellow warning bar. I strip it by default on Windows / macOS / non-root Linux (where the kernel sandbox works fine without it). Kept on root Linux (Docker, CI) where Chromium refuses to boot without it.

How agents actually use it

After installing (npm install -g nativewright && npx patchright install chrome), an agent workflow looks like:

# Check if daemon running
nativewright status

# Start (in background)
nativewright start &
nativewright wait-ready --timeout=30000

# Drive
nativewright goto https://gemini.google.com/app
nativewright type ".ql-editor.textarea" "generate a sunset photo"
nativewright press Enter
nativewright wait-for "img[alt*=generated i]"
nativewright click "button[aria-label='Download full size image']"

# Stop when done (flush cookies to disk)
nativewright stop
Enter fullscreen mode Exit fullscreen mode

The daemon preserves state between every command. The human-behavior layer kicks in for type, click, wait-for. Next agent turn an hour later — same profile, same login, same cookies.

What's in the package

  1. Node.js daemon + CLI + REPL (one binary, three modes)
  2. 30 browser-automation commands
  3. Two Claude Code skills pre-packaged in claude-skills/
  4. Cross-platform: Windows, macOS, Linux
  5. Zero runtime deps beyond patchright
  6. CI-tested on all three OSes
  7. 47 KB npm package, 149 KB unpacked
  8. Apache-2.0

Links

If any of this sounds useful, a ⭐ on the repo means a lot. PRs welcome.

Top comments (0)