DEV Community

Double CHEN
Double CHEN

Posted on

Selenium keeps getting blocked by Cloudflare? Here's what the fingerprint actually catches (and how to stop triggering it)

A post on r/webscraping this week asked a question that keeps coming up: "I'm using Selenium through Chrome, need to scrape ~1M pages at ~1s/page, but every request hangs 7-8s on a Cloudflare challenge."

At 7 seconds per page, 1M pages takes 81 days. That's not a rate limit problem. That's a detection problem — and you can't fix it with more threads.

The replies to that post are a goldmine of advice that's mostly right but doesn't explain why. Let's do that.

The actual thing Cloudflare catches

Selenium's ChromeDriver leaks the automation flag in at least three observable ways:

  1. navigator.webdriver === true — exposed by design, WebDriver spec requires it
  2. CDP client signature — ChromeDriver wraps Chrome's DevTools Protocol with a specific RPC pattern that's detectable via timing and order of Target.* calls
  3. Missing browser UI signals — Selenium launches Chrome without certain accessibility/window events that real users always generate

One of the top comments on that Reddit thread summarized it well:

"Selenium operates using a ChromeDriver or a GeckoDriver binary, which any respectable company that doesn't want bots on its website can fingerprint. That doesn't mean Selenium is broken — it just means it was not made for what you're trying to do. Selenium's purpose is automated testing."

That's the right read. Selenium was designed for QA, where you want the site to know you're an automated test. Cloudflare's Bot Management scores those same signals against a human baseline, and the score tanks fast.

What the comments recommend (and what actually works)

Filtered through real use:

Tool What it does Catch
undetected-chromedriver Patches the WebDriver flag + CDP strings Cloudflare pushes updates that re-detect it every few months
SeleniumBase CDP mode Skips ChromeDriver, talks CDP directly to Chrome Works on most CF sites; still one process per browser
curl_cffi Impersonates a browser's TLS JA3 fingerprint No JS execution — breaks on sites that hydrate with React
nodriver / zendriver Headless-less Chrome with patched CDP Good for low-scale; resource-heavy at 1M pages
Real Chrome + stealth profile Actual Chrome binary, persistent profile, cookies survive What most anti-bot services assume

The last row is what I'll show below — and it's what I've been using.

The actual result

The two captures are from the same browser process, same machine, same IP. The only variable was the fingerprint config.

How I'm doing it now

I've been using browser-act — a CLI that drives a real Chrome with a persistent stealth profile. One command:

# Install (uses the skills package registry):
npx skills add browser-act/skills --skill browser-act

# Open a Cloudflare-protected page in a stealth session:
browser-act --session scrape browser open <profile-id> https://target.site
browser-act --session scrape get markdown > out.md
Enter fullscreen mode Exit fullscreen mode

The profile persists cookies and storage between runs, so the "warm browser" signals (history, localStorage, prior CF cookies) look human. For the r/webscraping OP's scale question (~1M pages), you'd run this with a pool of profile IDs and rotate — but that's a separate post.

Things worth arguing about

  • If your target is a Cloudflare Turnstile specifically (not the full JS challenge), you're in a different regime — curl_cffi + an injected widget can work, as one of the r/webscraping replies showed
  • undetected-chromedriver is the cheapest entry point if you already have Selenium code and low volume
  • Residential proxies matter almost as much as the browser fingerprint. If your IP is a datacenter ASN, nothing in the browser layer saves you

If you're fighting this problem right now, I'd love to hear what site you're on and what's been rejected — happy to compare notes. The full discussion is at r/webscraping's original thread.

Top comments (0)