Alex Chen

Posted on Mar 22

Cloudflare Turnstile: What It Is, How It Works, and How to Handle It in Your Scraper

If you've been scraping the web in 2025, you've probably noticed reCAPTCHA showing up less and Cloudflare Turnstile showing up more. Major sites are switching — and it changes how your scraper needs to work.

Here's everything I've learned about Turnstile after dealing with it across dozens of projects.

What Is Turnstile?

Cloudflare Turnstile is a CAPTCHA alternative that aims to verify humans without showing puzzles. Unlike reCAPTCHA (which often shows image challenges), Turnstile runs a series of browser challenges in the background and returns a token.

From a user's perspective: you see a brief loading spinner, then a checkmark. No clicking fire hydrants.

From a developer's perspective: it's an invisible challenge that generates a cf-turnstile-response\ token you must submit with your form.

How to Detect Turnstile on a Page

Look for these indicators in the HTML source:

def has_turnstile(html: str) -> bool:
    indicators = [
        "challenges.cloudflare.com/turnstile",
        "cf-turnstile",
        "data-sitekey",  # shared with reCAPTCHA
        "turnstile.min.js",
    ]
    html_lower = html.lower()
    return any(ind in html_lower for ind in indicators)

The key difference from reCAPTCHA: the script source includes challenges.cloudflare.com/turnstile\ instead of google.com/recaptcha\.

Extracting the Sitekey

Every Turnstile widget has a sitekey. You need it to solve the challenge:

import re
from bs4 import BeautifulSoup

def extract_turnstile_sitekey(html: str) -> str | None:
    # Method 1: data-sitekey attribute
    soup = BeautifulSoup(html, "html.parser")
    widget = soup.find(attrs={"class": "cf-turnstile"})
    if widget and widget.get("data-sitekey"):
        return widget["data-sitekey"]

    # Method 2: Regex fallback (some sites render dynamically)
    match = re.search(r'sitekey["\s:=]+["\']([0-9x\-A-Za-z]+)', html)
    if match:
        return match.group(1)

    return None

Sitekeys look like 0x4AAAAAAA...\ — different format from reCAPTCHA keys.

Approach 1: Full Browser Automation

The brute-force approach — load the page in a real browser and wait for Turnstile to auto-solve:

from playwright.async_api import async_playwright

async def solve_with_browser(url: str) -> str:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        page = await browser.new_page()
        await page.goto(url)

        # Wait for Turnstile to auto-solve
        await page.wait_for_function("""
            () => {
                const input = document.querySelector(
                    'input[name="cf-turnstile-response"]'
                );
                return input && input.value.length > 0;
            }
        """, timeout=30000)

        token = await page.eval_on_selector(
            'input[name="cf-turnstile-response"]',
            'el => el.value'
        )

        await browser.close()
        return token

Problems with this approach:

Requires a real browser (headless often gets detected)
Slow — each solve needs a new page load
Resource-heavy — memory and CPU per browser instance
Cloudflare can still detect automation via browser fingerprinting

Approach 2: API-Based Solving (Recommended)

Send the sitekey to a solving service and get back a valid token:

import httpx
import os

async def solve_turnstile(sitekey: str, page_url: str) -> str:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://api.passxapi.com/solve",
            json={
                "type": "turnstile",
                "sitekey": sitekey,
                "url": page_url,
            },
            headers={"x-api-key": os.getenv("PASSXAPI_KEY")},
            timeout=30,
        )
        resp.raise_for_status()
        return resp.json()["token"]

Or with the SDK:

from passxapi import AsyncClient

solver = AsyncClient(api_key=os.getenv("PASSXAPI_KEY"))

async def solve(sitekey, url):
    result = await solver.solve(
        captcha_type="turnstile",
        sitekey=sitekey,
        url=url
    )
    return result["token"]

Injecting the Token

Once you have the token, submit it with the form:

With requests/httpx (no browser):

async def submit_form_with_turnstile(url, form_data, sitekey):
    token = await solve_turnstile(sitekey, url)

    form_data["cf-turnstile-response"] = token

    async with httpx.AsyncClient() as client:
        resp = await client.post(url, data=form_data)
        return resp

With Playwright (browser automation):

async def inject_turnstile_token(page, token):
    await page.evaluate(f"""
        // Set the hidden input
        const input = document.querySelector(
            'input[name="cf-turnstile-response"]'
        );
        if (input) input.value = '{token}';

        // Also try the callback approach
        const widgets = document.querySelectorAll('.cf-turnstile');
        widgets.forEach(w => {{
            const widgetId = w.getAttribute('data-widget-id');
            if (widgetId && window.turnstile) {{
                // Trigger success callback
                window.turnstile.execute(widgetId);
            }}
        }});
    """)

Turnstile Token Details

A few things to know about Turnstile tokens:

Property	Value
Token TTL	~300 seconds (5 minutes)
Token length	~2000-4000 characters
Single use	Yes — each token works once
IP binding	Weak — some flexibility
Solve time (API)	~3-8 seconds

The 300-second TTL is more forgiving than reCAPTCHA's 120 seconds, but don't pre-solve and cache — tokens are single-use.

Complete Scraping Example

Here's a full working example — scraping a Turnstile-protected site:

import httpx
import asyncio
import os
from bs4 import BeautifulSoup

async def scrape_protected_page(url: str) -> dict:
    async with httpx.AsyncClient(follow_redirects=True) as client:
        # Step 1: Load the page
        resp = await client.get(url)

        # Step 2: Check for Turnstile
        if "cf-turnstile" not in resp.text:
            return parse_content(resp.text)

        # Step 3: Extract sitekey
        sitekey = extract_turnstile_sitekey(resp.text)
        if not sitekey:
            raise ValueError("Found Turnstile but couldn't extract sitekey")

        # Step 4: Solve
        token = await solve_turnstile(sitekey, url)

        # Step 5: Submit with token
        resp = await client.post(url, data={
            "cf-turnstile-response": token,
        })

        return parse_content(resp.text)

def parse_content(html: str) -> dict:
    soup = BeautifulSoup(html, "html.parser")
    return {
        "title": soup.title.string if soup.title else None,
        "text": soup.get_text()[:500],
    }

# Usage
async def main():
    result = await scrape_protected_page("https://example.com/protected")
    print(result)

asyncio.run(main())

Turnstile vs reCAPTCHA vs hCaptcha

	Turnstile	reCAPTCHA v2	hCaptcha
User friction	None (invisible)	Checkbox + images	Checkbox + images
Token TTL	300s	120s	120s
Detection	Browser challenges	Behavior analysis	Image puzzles
API solve time	~5s	~5s	~5-10s
Growing or shrinking	Growing fast	Stable	Growing

Wrapping Up

Turnstile is becoming the default CAPTCHA for Cloudflare-protected sites. The good news: it's simpler to handle than reCAPTCHA because there are no image puzzles — just token-based solving.

Full SDK with Turnstile support: passxapi-python on GitHub

Have you run into Turnstile on any unexpected sites? I'd love to hear about it in the comments.

DEV Community