Why your scraper plateaus at 5-6 concurrent Chrome instances (and the shared-cookie trap nobody names)

#webscraping #chrome #automation #performance

Someone on r/webscraping this week hit the wall I've seen a dozen projects hit:

"When I try to use multiple pages (tabs) within a single browser instance, Turnstile doesn't load properly on background or non-focused pages. Because of that, I'm forced to run one browser instance per page... I can do like 5 or 6 browsers simultaneously before throttling my CPU, avg about 30+ solves a minute."

5-6 browsers max. 30 solves/min. And — critically — the OP can't go up by using tabs, because tabs break the Turnstile flow.

This is a design constraint hiding as a performance problem. Let's name it.

Why tabs break Turnstile (and other CF challenges)

Cloudflare's Turnstile widget does two things that make it hostile to multi-tab scraping in one Chrome process:

It checks document.visibilityState. A backgrounded tab reports hidden, and the widget's challenge scripts bail out or stall waiting for a visible transition. This is what the OP observed as "Turnstile doesn't load properly on non-focused pages."
Cookies are shared across the entire browser profile. When two tabs both start a CF challenge on the same origin, they race for the same cf_clearance cookie slot. Whichever tab gets focused first writes its token; the other tab's challenge sees a mismatched state and blocks.

One of the replies on the thread nailed this in passing: "cookies are shared in a browser between all tabs, multiple challenges can block each other." That's the full story. Focus matters only as a symptom; the cookie race is the actual collision.

Why "5-6 browsers max" is the wrong number

If you profile what those 5-6 Chrome processes are doing, you'll see:

~200MB RSS each, most of it heap from V8 and the renderer
Two or three render threads spinning waiting for page load / paint
A pool of worker threads for networking and crypto

On a 16GB / 8-core machine you're CPU-bound, not memory-bound, because every page load triggers full Chromium rendering + JS execution for the challenge script — which is deliberately expensive (that's the "work" part of proof-of-work).

So the real ceiling isn't "how many Chrome binaries fit in RAM" — it's "how many concurrent CF challenges can your CPU solve in parallel." At the OP's 30 solves/min on 5-6 browsers, that's ~5 solves/min per browser, or about one every 10-12 seconds. That matches what the challenge takes on a cold profile.

The profile-isolation fix

The escape is not more threads or tabs. It's profile isolation with cached clearance.

The idea:

Pre-warm a pool of Chrome profiles (say, 20 of them) by letting each one solve a CF challenge once and storing the resulting cf_clearance cookie
For each scrape request, pick a profile whose clearance hasn't expired (CF clearances last ~30 min typically)
Run the scrape as that profile. Because the clearance is already present, no challenge runs — you skip the 10-second proof-of-work
When a profile's clearance expires, quietly re-warm it in the background

With this architecture the bottleneck shifts from "CF challenge compute per browser" to "network latency per page," and you can fan out to dozens of concurrent requests.

Real numbers from what I've been running with this pattern:

20 profiles, pre-warmed
~1.5s avg page load on CF-protected targets (from warm clearance)
~8× throughput vs "one browser per page with fresh challenges"

Doing it with browser-act

browser-act is a CLI that manages the profile pool for you — each browser-id is an isolated profile with its own cookies and storage.

# Install:
npx skills add browser-act/skills --skill browser-act

# Create 20 profiles:
for i in $(seq 1 20); do
  browser-act browser create --profile-name "scrape-$i"
done

# Warm each profile by opening the target site once (runs the CF challenge):
browser-act --session warmup browser open <profile-id> https://target.site

# Later, from your scraper, pick a warm profile and run:
browser-act --session scrape browser open <profile-id> https://target.site/page-N
browser-act --session scrape get markdown > page-N.md

The cookies persist on disk per profile, so restarting the scraper doesn't lose clearance.

Things worth arguing about

This doesn't help if your target site rotates challenge policy per request (some bank / gambling sites do). That's a different regime — you need a JS solver loop
20 profiles is about where you hit diminishing returns on a single machine. Past that, put them on separate instances with separate IPs — the cf_clearance cookie is IP-bound
If you only need ~30 pages/minute, the OP's 5-6 browser setup is fine. This matters past ~100 pages/min

Full r/webscraping thread is at optimised chrome? for multi threading. If you're fighting the same ceiling, drop what you've tried — happy to compare.