Cloudflare doesn't just block scrapers by IP. In 2026, the primary detection mechanism is TLS fingerprinting — analyzing how your HTTPS connection is established before any HTTP request is sent.
Here's how it works and what you actually need to do to bypass it.
What TLS Fingerprinting Is
When you make an HTTPS request, your HTTP client and the server perform a TLS handshake. During this handshake, the client announces:
- Which TLS versions it supports
- Which cipher suites it prefers (and in what order)
- Which extensions it includes (SNI, ALPN, etc.)
- Which elliptic curves it supports
This combination creates a fingerprint. Different clients produce different fingerprints:
- Python's
requestslibrary produces a known fingerprint - Chrome 120 on Windows produces a different fingerprint
- Firefox 121 on macOS produces another
Cloudflare and similar services maintain databases of these fingerprints. When your scraper connects with Python's default fingerprint, Cloudflare knows before seeing any HTTP headers that this is a programmatic client, not a browser.
The Standard Python Fingerprint Problem
import requests
r = requests.get("https://cloudflare-protected-site.com")
# Cloudflare sees: Python-urllib/3.11 TLS fingerprint
# Blocked: often immediately, sometimes after first request
Python's built-in SSL implementation (via OpenSSL) produces a fingerprint that looks nothing like a browser. Specifically:
- Python uses a different cipher suite order
- Python typically doesn't include browser-specific TLS extensions
- Python's ALPN negotiation differs from Chrome's
This is detectable at the TCP level before the HTTP layer sees anything.
The curl_cffi Solution
curl_cffi wraps libcurl with Chromium's TLS stack, producing a TLS fingerprint that matches a real browser.
from curl_cffi import requests as curl_requests
# Impersonate Chrome 120
session = curl_requests.Session(impersonate="chrome120")
response = session.get("https://cloudflare-protected-site.com")
print(response.status_code) # 200 instead of 403
Available impersonation targets in 2026:
-
chrome120,chrome119,chrome110(recommended) -
firefox120,firefox110 -
safari17_0,safari16_5 edge99
When to use which: chrome120 is the default choice. If a site specifically blocks Chrome patterns, try safari17_0 — Safari's fingerprint is less commonly targeted in blocklists.
Advanced: httpx with TLS Customization
For more control, httpx with a custom TLS configuration can get you closer to browser fingerprints:
import httpx
import ssl
# Custom SSL context that matches Chrome's behavior more closely
def create_chrome_ssl_context():
ctx = ssl.create_default_context()
# Chrome's preferred cipher order (simplified)
ctx.set_ciphers(
"TLS_AES_128_GCM_SHA256:"
"TLS_AES_256_GCM_SHA384:"
"TLS_CHACHA20_POLY1305_SHA256:"
"ECDH+AESGCM:"
"ECDH+CHACHA20:"
"DHE+AESGCM"
)
return ctx
# This alone isn't sufficient for modern Cloudflare, but helps with basic fingerprinting
async with httpx.AsyncClient(verify=create_chrome_ssl_context()) as client:
r = await client.get(url)
In practice: for serious Cloudflare bypassing, curl_cffi is more reliable than manual SSL context configuration. The TLS fingerprint involves dozens of parameters; curl_cffi handles all of them correctly by using the actual Chromium TLS stack.
JA3 and JA3N Fingerprinting
The specific fingerprint format most services use is called JA3 (developed by Salesforce). It hashes:
- SSLVersion
- Ciphers (comma-separated)
- Extensions (comma-separated)
- EllipticCurves
- EllipticCurvePointFormats
The resulting MD5 hash is compared against a database. Python requests produces 7dc465e28e1a62b68be994b34ae9eb24 — a well-known scraper fingerprint.
JA3N (the newer variant) includes additional parameters and is harder to spoof without using the actual client SSL stack.
Check your current fingerprint:
import subprocess
# Check what fingerprint your current setup produces
result = subprocess.run([
"curl", "-v", "--tls-max", "1.3", "https://tls.peet.ws/api/all"
], capture_output=True, text=True)
# Parse the JSON response for your JA3 hash
Or visit tls.peet.ws in a browser vs. from your scraper to see the difference.
What curl_cffi Doesn't Fix
TLS fingerprinting is one layer of bot detection. Even with a perfect TLS fingerprint, you'll still be detected if:
1. Your HTTP headers are wrong:
# Wrong: missing or out-of-order headers
headers = {"User-Agent": "Mozilla/5.0", "Accept": "*/*"}
# Right: match browser header order and values exactly
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"DNT": "1",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "none",
}
2. Your behavior patterns are robotic:
- Zero delay between requests
- Perfect regularity (no human-like variation)
- Missing referer headers on subsequent page loads
- No cookie handling
3. JavaScript challenges aren't solved:
Cloudflare's highest protection level (Under Attack Mode) serves a JavaScript challenge that must be solved before you see any content. This requires a browser (playwright) not just TLS spoofing.
Practical Decision Tree
Is the site behind Cloudflare?
├── No → requests or httpx works fine
└── Yes → check protection level
├── Basic (static content loads) → curl_cffi with chrome impersonation
├── Anti-bot (5-second check) → curl_cffi + proper headers + cookie handling
└── Under Attack Mode (JS challenge) → playwright with stealth mode
curl_cffi in Production
from curl_cffi import requests as curl_requests
import time, random
session = curl_requests.Session(impersonate="chrome120")
def scrape_cloudflare_site(url: str) -> str:
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Referer": "https://www.google.com/",
"Sec-Fetch-Dest": "document",
"Sec-Fetch-Mode": "navigate",
"Sec-Fetch-Site": "cross-site",
}
# Small random delay
time.sleep(random.uniform(0.5, 2.0))
response = session.get(url, headers=headers, timeout=20)
if response.status_code == 403:
raise Exception(f"Blocked: {response.status_code}")
return response.text
# The session maintains cookies across requests automatically
Key notes:
-
Session()reuses the connection (faster, and maintains cookies) - Include
Referer: google.comfor first page load (natural navigation pattern) - Random delays are important — constant 0-latency requests are a detection signal
The Arms Race
Bot detection evolves continuously. What works in April 2026 may not work in October 2026. The current state:
- Pure Python requests: blocked on most Cloudflare-protected sites
- curl_cffi with chrome impersonation: works on 70-80% of Cloudflare sites
- playwright + stealth: works on ~90% but 5-10x slower
- Residential proxies + playwright: works on 95%+ but costs $5-15/GB
The progression from free to expensive matches the anti-bot sophistication you're dealing with.
Production Anti-Bot Ready Scrapers
If you need scrapers that already handle Cloudflare, I maintain 35 Apify actors with built-in anti-bot handling — proxy rotation, browser fingerprinting, and retry logic are included.
Apify Scrapers Bundle — €29 — one-time download. All actors run on Apify's infrastructure (no server needed).
Top comments (0)