Aditya Mitra

Posted on May 22

I Tested 5 Stealth Browsers Against Bot Detectors — Here's the Ranking

#webdev #opensource #tooling #buildinpublic

Basically I made Claude Code test 5 browsers which claimed to be anti-detectable against 20 different tests and studied the report on how each one performed.

I'm going to cover only the fun parts in this blog. The full report with how every browser performed and their scripts is available in this GitHub gist.

How It Worked

I gave a Claude Code a task list of 20 bot-detection checks — starting easy (navigator.webdriver flag) and escalating all the way up to live Cloudflare challenges, behavioral mouse tracking, and CreepJS trust scores. The agent installed each browser from its GitHub release and drove it through every task autonomously. Same tests, same URLs, no human in the loop.

The Leaderboard

Rank	Browser	Score	Hardest Task Cleared
🥇 1	Patchwright	18 / 20	Composite honeypot + realistic keystroke timing
🥈 2	CloakBrowser	14 + 3 partial	Live Cloudflare Turnstile — no challenge page
🥉 3	Camoufox	13 / 20	BrowserScan "Normal", all sub-checks green
4	Lightpanda	11 / 20	Honeypot form + realistic keystroke timing
5	Obscura	10 + 4 unknown	BrowserScan "Normal"

The Fun Parts

🥇 Patchwright — 18/20

Patchwright demolished the list. JS flags, headless detection, canvas/audio fingerprints, behavioral mouse simulation, TLS/JA3, live honeypot forms — all green.

The two failures? Both the exact same cause: no GPU on my test server. WebGL leaked SwiftShader, which is basically a neon sign that says "this is running in a container." Patchwright couldn't hide it because no browser can — it's a hardware gap, not a software one.

🥈 CloakBrowser — 14 + 3 partial

CloakBrowser did something no other browser in this test managed: it passed a live Cloudflare challenge. Loaded nowsecure.nl, got a clean 200, no CAPTCHA, no block. That's the hardest thing on this list and it's the only one that pulled it off.

Where it fell apart: the test machine was arm64, and userAgentData.getHighEntropyValues() kept leaking architecture: "arm" despite the UA string claiming Windows 10 x64. A cross-check catches that in milliseconds. This isn't a browser bug — it's a "don't run Windows-spoofing software on an ARM Mac" problem.

🥉 Camoufox — 13/20

Being Firefox-based is Camoufox's biggest strength and its most persistent headache. The TLS fingerprint is genuinely Firefox's — great for evading network-layer checks. But Chrome-targeting scanners look for window.chrome, Firefox doesn't have it, and that's an automatic flag.

The highlight reel moment: Task 11 (AudioContext) got marked a FAIL because the test site did not work on that browser. Yeah, some sites don't work on firefox as they do on chromium ones.

4th: Lightpanda — 11/20

Lightpanda is written in Zig, it's very new, and it shows. Basic DOM stuff works. The honeypot form test — surprisingly — works.

Then CreepJS ran, hit a cascade of unimplemented APIs, and crashed. The crash itself is a detection signal. And I think this is a hard signal about a stealth browser being caught because it broke the test trying to check it.

5th: Obscura — 10/20 + 4 unknown

Obscura has one genuinely good trick: TLS impersonation. It spoofs to match real Chrome at the network layer, and they have written this in their readme as well.

Everything else went sideways. The most self-defeating move: canvas over-randomization. The fingerprint hash changed on every single page reload. Real browsers are deterministic on the same hardware. A hash that mutates every run isn't stealth — it's a louder bot signal than doing nothing at all. I think trying too hard to hide made it more detectable.

The Patterns That Keep Showing Up

Looking across all five results, the same failure modes appear over and over:

GPU hardware is the biggest differentiator — and you can't software-patch it. Serious automation infrastructure needs real GPU hardware, or at least GPU passthrough in a VM.

Timezone ↔ IP mismatch is the most widespread misconfiguration. The fix would have been simple — match the proxy region to the configured timezone. I think this can be programatically implemented when writing the script.

Fonts were basically empty. A real desktop browser always has a rich OS font set. I guess brew install fonts-liberation fonts-dejavu fonts-noto is the minimum viable fix.

Behavioral simulation is now required. This one is interesting. The browsers which passed this had Bezier-curve mouse paths and variable scroll velocity. So automation is needs to produce human shaped events.

Takeaway

The gap between 1st and 5th was bigger than I expected. A 8 point difference isn't clever JS patches. It comes down to GPU hardware, behavioral simulation quality, and environment hygiene.

Using an agent to run the tests also turned out to be genuinely useful beyond saving time. It caught things a scripted harness would've silently logged as FAIL — like the 502 on the AudioContext test, or recognizing that Patchwright's failures were hardware issues, not browser weaknesses. Contextual judgment is hard to get from a script.

Full task list, raw results, and all the scripts the agent wrote are in the GitHub gist. If you've tested any of these browsers, drop your results in the comments.

I initially thought to let claude code write a blog but thought writing a blog would deepen my understanding by simplifying things. Also re-reading my blog and seeing the silly mistakes I made is also funny!

DEV Community