If you've been scraping the web in 2025, you've probably noticed reCAPTCHA showing up less and Cloudflare Turnstile showing up more. Major sites are switching — and it changes how your scraper needs to work.
Here's everything I've learned about Turnstile after dealing with it across dozens of projects.
What Is Turnstile?
Cloudflare Turnstile is a CAPTCHA alternative that aims to verify humans without showing puzzles. Unlike reCAPTCHA (which often shows image challenges), Turnstile runs a series of browser challenges in the background and returns a token.
From a user's perspective: you see a brief loading spinner, then a checkmark. No clicking fire hydrants.
From a developer's perspective: it's an invisible challenge that generates a cf-turnstile-response\ token you must submit with your form.
How to Detect Turnstile on a Page
Look for these indicators in the HTML source:
def has_turnstile(html: str) -> bool:
indicators = [
"challenges.cloudflare.com/turnstile",
"cf-turnstile",
"data-sitekey", # shared with reCAPTCHA
"turnstile.min.js",
]
html_lower = html.lower()
return any(ind in html_lower for ind in indicators)
The key difference from reCAPTCHA: the script source includes challenges.cloudflare.com/turnstile\ instead of google.com/recaptcha\.
Extracting the Sitekey
Every Turnstile widget has a sitekey. You need it to solve the challenge:
import re
from bs4 import BeautifulSoup
def extract_turnstile_sitekey(html: str) -> str | None:
# Method 1: data-sitekey attribute
soup = BeautifulSoup(html, "html.parser")
widget = soup.find(attrs={"class": "cf-turnstile"})
if widget and widget.get("data-sitekey"):
return widget["data-sitekey"]
# Method 2: Regex fallback (some sites render dynamically)
match = re.search(r'sitekey["\s:=]+["\']([0-9x\-A-Za-z]+)', html)
if match:
return match.group(1)
return None
Sitekeys look like 0x4AAAAAAA...\ — different format from reCAPTCHA keys.
Approach 1: Full Browser Automation
The brute-force approach — load the page in a real browser and wait for Turnstile to auto-solve:
from playwright.async_api import async_playwright
async def solve_with_browser(url: str) -> str:
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
page = await browser.new_page()
await page.goto(url)
# Wait for Turnstile to auto-solve
await page.wait_for_function("""
() => {
const input = document.querySelector(
'input[name="cf-turnstile-response"]'
);
return input && input.value.length > 0;
}
""", timeout=30000)
token = await page.eval_on_selector(
'input[name="cf-turnstile-response"]',
'el => el.value'
)
await browser.close()
return token
Problems with this approach:
- Requires a real browser (headless often gets detected)
- Slow — each solve needs a new page load
- Resource-heavy — memory and CPU per browser instance
- Cloudflare can still detect automation via browser fingerprinting
Approach 2: API-Based Solving (Recommended)
Send the sitekey to a solving service and get back a valid token:
import httpx
import os
async def solve_turnstile(sitekey: str, page_url: str) -> str:
async with httpx.AsyncClient() as client:
resp = await client.post(
"https://api.passxapi.com/solve",
json={
"type": "turnstile",
"sitekey": sitekey,
"url": page_url,
},
headers={"x-api-key": os.getenv("PASSXAPI_KEY")},
timeout=30,
)
resp.raise_for_status()
return resp.json()["token"]
Or with the SDK:
from passxapi import AsyncClient
solver = AsyncClient(api_key=os.getenv("PASSXAPI_KEY"))
async def solve(sitekey, url):
result = await solver.solve(
captcha_type="turnstile",
sitekey=sitekey,
url=url
)
return result["token"]
Injecting the Token
Once you have the token, submit it with the form:
With requests/httpx (no browser):
async def submit_form_with_turnstile(url, form_data, sitekey):
token = await solve_turnstile(sitekey, url)
form_data["cf-turnstile-response"] = token
async with httpx.AsyncClient() as client:
resp = await client.post(url, data=form_data)
return resp
With Playwright (browser automation):
async def inject_turnstile_token(page, token):
await page.evaluate(f"""
// Set the hidden input
const input = document.querySelector(
'input[name="cf-turnstile-response"]'
);
if (input) input.value = '{token}';
// Also try the callback approach
const widgets = document.querySelectorAll('.cf-turnstile');
widgets.forEach(w => {{
const widgetId = w.getAttribute('data-widget-id');
if (widgetId && window.turnstile) {{
// Trigger success callback
window.turnstile.execute(widgetId);
}}
}});
""")
Turnstile Token Details
A few things to know about Turnstile tokens:
| Property | Value |
|---|---|
| Token TTL | ~300 seconds (5 minutes) |
| Token length | ~2000-4000 characters |
| Single use | Yes — each token works once |
| IP binding | Weak — some flexibility |
| Solve time (API) | ~3-8 seconds |
The 300-second TTL is more forgiving than reCAPTCHA's 120 seconds, but don't pre-solve and cache — tokens are single-use.
Complete Scraping Example
Here's a full working example — scraping a Turnstile-protected site:
import httpx
import asyncio
import os
from bs4 import BeautifulSoup
async def scrape_protected_page(url: str) -> dict:
async with httpx.AsyncClient(follow_redirects=True) as client:
# Step 1: Load the page
resp = await client.get(url)
# Step 2: Check for Turnstile
if "cf-turnstile" not in resp.text:
return parse_content(resp.text)
# Step 3: Extract sitekey
sitekey = extract_turnstile_sitekey(resp.text)
if not sitekey:
raise ValueError("Found Turnstile but couldn't extract sitekey")
# Step 4: Solve
token = await solve_turnstile(sitekey, url)
# Step 5: Submit with token
resp = await client.post(url, data={
"cf-turnstile-response": token,
})
return parse_content(resp.text)
def parse_content(html: str) -> dict:
soup = BeautifulSoup(html, "html.parser")
return {
"title": soup.title.string if soup.title else None,
"text": soup.get_text()[:500],
}
# Usage
async def main():
result = await scrape_protected_page("https://example.com/protected")
print(result)
asyncio.run(main())
Turnstile vs reCAPTCHA vs hCaptcha
| Turnstile | reCAPTCHA v2 | hCaptcha | |
|---|---|---|---|
| User friction | None (invisible) | Checkbox + images | Checkbox + images |
| Token TTL | 300s | 120s | 120s |
| Detection | Browser challenges | Behavior analysis | Image puzzles |
| API solve time | ~5s | ~5s | ~5-10s |
| Growing or shrinking | Growing fast | Stable | Growing |
Wrapping Up
Turnstile is becoming the default CAPTCHA for Cloudflare-protected sites. The good news: it's simpler to handle than reCAPTCHA because there are no image puzzles — just token-based solving.
Full SDK with Turnstile support: passxapi-python on GitHub
Have you run into Turnstile on any unexpected sites? I'd love to hear about it in the comments.
Top comments (0)