DEV Community

Alex Chen
Alex Chen

Posted on

Cloudflare Turnstile: What It Is, How It Works, and How to Handle It in Your Scraper

If you've been scraping the web in 2025, you've probably noticed reCAPTCHA showing up less and Cloudflare Turnstile showing up more. Major sites are switching — and it changes how your scraper needs to work.

Here's everything I've learned about Turnstile after dealing with it across dozens of projects.

What Is Turnstile?

Cloudflare Turnstile is a CAPTCHA alternative that aims to verify humans without showing puzzles. Unlike reCAPTCHA (which often shows image challenges), Turnstile runs a series of browser challenges in the background and returns a token.

From a user's perspective: you see a brief loading spinner, then a checkmark. No clicking fire hydrants.

From a developer's perspective: it's an invisible challenge that generates a cf-turnstile-response\ token you must submit with your form.

How to Detect Turnstile on a Page

Look for these indicators in the HTML source:

def has_turnstile(html: str) -> bool:
    indicators = [
        "challenges.cloudflare.com/turnstile",
        "cf-turnstile",
        "data-sitekey",  # shared with reCAPTCHA
        "turnstile.min.js",
    ]
    html_lower = html.lower()
    return any(ind in html_lower for ind in indicators)
Enter fullscreen mode Exit fullscreen mode

The key difference from reCAPTCHA: the script source includes challenges.cloudflare.com/turnstile\ instead of google.com/recaptcha\.

Extracting the Sitekey

Every Turnstile widget has a sitekey. You need it to solve the challenge:

import re
from bs4 import BeautifulSoup

def extract_turnstile_sitekey(html: str) -> str | None:
    # Method 1: data-sitekey attribute
    soup = BeautifulSoup(html, "html.parser")
    widget = soup.find(attrs={"class": "cf-turnstile"})
    if widget and widget.get("data-sitekey"):
        return widget["data-sitekey"]

    # Method 2: Regex fallback (some sites render dynamically)
    match = re.search(r'sitekey["\s:=]+["\']([0-9x\-A-Za-z]+)', html)
    if match:
        return match.group(1)

    return None
Enter fullscreen mode Exit fullscreen mode

Sitekeys look like 0x4AAAAAAA...\ — different format from reCAPTCHA keys.

Approach 1: Full Browser Automation

The brute-force approach — load the page in a real browser and wait for Turnstile to auto-solve:

from playwright.async_api import async_playwright

async def solve_with_browser(url: str) -> str:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=False)
        page = await browser.new_page()
        await page.goto(url)

        # Wait for Turnstile to auto-solve
        await page.wait_for_function("""
            () => {
                const input = document.querySelector(
                    'input[name="cf-turnstile-response"]'
                );
                return input && input.value.length > 0;
            }
        """, timeout=30000)

        token = await page.eval_on_selector(
            'input[name="cf-turnstile-response"]',
            'el => el.value'
        )

        await browser.close()
        return token
Enter fullscreen mode Exit fullscreen mode

Problems with this approach:

  • Requires a real browser (headless often gets detected)
  • Slow — each solve needs a new page load
  • Resource-heavy — memory and CPU per browser instance
  • Cloudflare can still detect automation via browser fingerprinting

Approach 2: API-Based Solving (Recommended)

Send the sitekey to a solving service and get back a valid token:

import httpx
import os

async def solve_turnstile(sitekey: str, page_url: str) -> str:
    async with httpx.AsyncClient() as client:
        resp = await client.post(
            "https://api.passxapi.com/solve",
            json={
                "type": "turnstile",
                "sitekey": sitekey,
                "url": page_url,
            },
            headers={"x-api-key": os.getenv("PASSXAPI_KEY")},
            timeout=30,
        )
        resp.raise_for_status()
        return resp.json()["token"]
Enter fullscreen mode Exit fullscreen mode

Or with the SDK:

from passxapi import AsyncClient

solver = AsyncClient(api_key=os.getenv("PASSXAPI_KEY"))

async def solve(sitekey, url):
    result = await solver.solve(
        captcha_type="turnstile",
        sitekey=sitekey,
        url=url
    )
    return result["token"]
Enter fullscreen mode Exit fullscreen mode

Injecting the Token

Once you have the token, submit it with the form:

With requests/httpx (no browser):

async def submit_form_with_turnstile(url, form_data, sitekey):
    token = await solve_turnstile(sitekey, url)

    form_data["cf-turnstile-response"] = token

    async with httpx.AsyncClient() as client:
        resp = await client.post(url, data=form_data)
        return resp
Enter fullscreen mode Exit fullscreen mode

With Playwright (browser automation):

async def inject_turnstile_token(page, token):
    await page.evaluate(f"""
        // Set the hidden input
        const input = document.querySelector(
            'input[name="cf-turnstile-response"]'
        );
        if (input) input.value = '{token}';

        // Also try the callback approach
        const widgets = document.querySelectorAll('.cf-turnstile');
        widgets.forEach(w => {{
            const widgetId = w.getAttribute('data-widget-id');
            if (widgetId && window.turnstile) {{
                // Trigger success callback
                window.turnstile.execute(widgetId);
            }}
        }});
    """)
Enter fullscreen mode Exit fullscreen mode

Turnstile Token Details

A few things to know about Turnstile tokens:

Property Value
Token TTL ~300 seconds (5 minutes)
Token length ~2000-4000 characters
Single use Yes — each token works once
IP binding Weak — some flexibility
Solve time (API) ~3-8 seconds

The 300-second TTL is more forgiving than reCAPTCHA's 120 seconds, but don't pre-solve and cache — tokens are single-use.

Complete Scraping Example

Here's a full working example — scraping a Turnstile-protected site:

import httpx
import asyncio
import os
from bs4 import BeautifulSoup

async def scrape_protected_page(url: str) -> dict:
    async with httpx.AsyncClient(follow_redirects=True) as client:
        # Step 1: Load the page
        resp = await client.get(url)

        # Step 2: Check for Turnstile
        if "cf-turnstile" not in resp.text:
            return parse_content(resp.text)

        # Step 3: Extract sitekey
        sitekey = extract_turnstile_sitekey(resp.text)
        if not sitekey:
            raise ValueError("Found Turnstile but couldn't extract sitekey")

        # Step 4: Solve
        token = await solve_turnstile(sitekey, url)

        # Step 5: Submit with token
        resp = await client.post(url, data={
            "cf-turnstile-response": token,
        })

        return parse_content(resp.text)

def parse_content(html: str) -> dict:
    soup = BeautifulSoup(html, "html.parser")
    return {
        "title": soup.title.string if soup.title else None,
        "text": soup.get_text()[:500],
    }

# Usage
async def main():
    result = await scrape_protected_page("https://example.com/protected")
    print(result)

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

Turnstile vs reCAPTCHA vs hCaptcha

Turnstile reCAPTCHA v2 hCaptcha
User friction None (invisible) Checkbox + images Checkbox + images
Token TTL 300s 120s 120s
Detection Browser challenges Behavior analysis Image puzzles
API solve time ~5s ~5s ~5-10s
Growing or shrinking Growing fast Stable Growing

Wrapping Up

Turnstile is becoming the default CAPTCHA for Cloudflare-protected sites. The good news: it's simpler to handle than reCAPTCHA because there are no image puzzles — just token-based solving.

Full SDK with Turnstile support: passxapi-python on GitHub


Have you run into Turnstile on any unexpected sites? I'd love to hear about it in the comments.

Top comments (0)