Selenium keeps getting blocked by Cloudflare? Here's what the fingerprint actually catches (and how to stop triggering it)

#webscraping #cloudflare #python #automation

A post on r/webscraping this week asked a question that keeps coming up: "I'm using Selenium through Chrome, need to scrape ~1M pages at ~1s/page, but every request hangs 7-8s on a Cloudflare challenge."

At 7 seconds per page, 1M pages takes 81 days. That's not a rate limit problem. That's a detection problem — and you can't fix it with more threads.

The replies to that post are a goldmine of advice that's mostly right but doesn't explain why. Let's do that.

The actual thing Cloudflare catches

Selenium's ChromeDriver leaks the automation flag in at least three observable ways:

navigator.webdriver === true — exposed by design, WebDriver spec requires it
CDP client signature — ChromeDriver wraps Chrome's DevTools Protocol with a specific RPC pattern that's detectable via timing and order of Target.* calls
Missing browser UI signals — Selenium launches Chrome without certain accessibility/window events that real users always generate

One of the top comments on that Reddit thread summarized it well:

"Selenium operates using a ChromeDriver or a GeckoDriver binary, which any respectable company that doesn't want bots on its website can fingerprint. That doesn't mean Selenium is broken — it just means it was not made for what you're trying to do. Selenium's purpose is automated testing."

That's the right read. Selenium was designed for QA, where you want the site to know you're an automated test. Cloudflare's Bot Management scores those same signals against a human baseline, and the score tanks fast.

What the comments recommend (and what actually works)

Filtered through real use:

Tool	What it does	Catch
`undetected-chromedriver`	Patches the WebDriver flag + CDP strings	Cloudflare pushes updates that re-detect it every few months
`SeleniumBase` CDP mode	Skips ChromeDriver, talks CDP directly to Chrome	Works on most CF sites; still one process per browser
`curl_cffi`	Impersonates a browser's TLS JA3 fingerprint	No JS execution — breaks on sites that hydrate with React
`nodriver` / `zendriver`	Headless-less Chrome with patched CDP	Good for low-scale; resource-heavy at 1M pages
Real Chrome + stealth profile	Actual Chrome binary, persistent profile, cookies survive	What most anti-bot services assume

The last row is what I'll show below — and it's what I've been using.

The actual result

The two captures are from the same browser process, same machine, same IP. The only variable was the fingerprint config.

How I'm doing it now

I've been using browser-act — a CLI that drives a real Chrome with a persistent stealth profile. One command:

# Install (uses the skills package registry):
npx skills add browser-act/skills --skill browser-act

# Open a Cloudflare-protected page in a stealth session:
browser-act --session scrape browser open <profile-id> https://target.site
browser-act --session scrape get markdown > out.md

The profile persists cookies and storage between runs, so the "warm browser" signals (history, localStorage, prior CF cookies) look human. For the r/webscraping OP's scale question (~1M pages), you'd run this with a pool of profile IDs and rotate — but that's a separate post.

Things worth arguing about

If your target is a Cloudflare Turnstile specifically (not the full JS challenge), you're in a different regime — curl_cffi + an injected widget can work, as one of the r/webscraping replies showed
undetected-chromedriver is the cheapest entry point if you already have Selenium code and low volume
Residential proxies matter almost as much as the browser fingerprint. If your IP is a datacenter ASN, nothing in the browser layer saves you

If you're fighting this problem right now, I'd love to hear what site you're on and what's been rejected — happy to compare notes. The full discussion is at r/webscraping's original thread.

Top comments (1)

Blanche • Jun 11

Good breakdown. One thing I'd add: beyond fingerprint spoofing, proxy quality and session consistency often determine how long a scraper survives in production.

We've seen setups with solid browser stealth still get challenged because IP reputation changes too frequently or sessions hop between unrelated IPs. For long-running Selenium or Chrome automation, keeping cookies, fingerprints, and IP identity aligned is often more important than adding another stealth patch.

That's where residential proxies come in. Using sticky residential sessions from providers like Novada can help maintain a consistent identity across multi-step workflows, reducing Cloudflare challenges and improving stability over extended crawls.