DEV Community

Vhub Systems
Vhub Systems

Posted on

TLS Fingerprinting: How Cloudflare Identifies Scrapers (And How to Bypass It)

Cloudflare doesn't just block scrapers by IP. In 2026, the primary detection mechanism is TLS fingerprinting — analyzing how your HTTPS connection is established before any HTTP request is sent.

Here's how it works and what you actually need to do to bypass it.

What TLS Fingerprinting Is

When you make an HTTPS request, your HTTP client and the server perform a TLS handshake. During this handshake, the client announces:

  • Which TLS versions it supports
  • Which cipher suites it prefers (and in what order)
  • Which extensions it includes (SNI, ALPN, etc.)
  • Which elliptic curves it supports

This combination creates a fingerprint. Different clients produce different fingerprints:

  • Python's requests library produces a known fingerprint
  • Chrome 120 on Windows produces a different fingerprint
  • Firefox 121 on macOS produces another

Cloudflare and similar services maintain databases of these fingerprints. When your scraper connects with Python's default fingerprint, Cloudflare knows before seeing any HTTP headers that this is a programmatic client, not a browser.

The Standard Python Fingerprint Problem

import requests
r = requests.get("https://cloudflare-protected-site.com")
# Cloudflare sees: Python-urllib/3.11 TLS fingerprint
# Blocked: often immediately, sometimes after first request
Enter fullscreen mode Exit fullscreen mode

Python's built-in SSL implementation (via OpenSSL) produces a fingerprint that looks nothing like a browser. Specifically:

  • Python uses a different cipher suite order
  • Python typically doesn't include browser-specific TLS extensions
  • Python's ALPN negotiation differs from Chrome's

This is detectable at the TCP level before the HTTP layer sees anything.

The curl_cffi Solution

curl_cffi wraps libcurl with Chromium's TLS stack, producing a TLS fingerprint that matches a real browser.

from curl_cffi import requests as curl_requests

# Impersonate Chrome 120
session = curl_requests.Session(impersonate="chrome120")
response = session.get("https://cloudflare-protected-site.com")
print(response.status_code)  # 200 instead of 403
Enter fullscreen mode Exit fullscreen mode

Available impersonation targets in 2026:

  • chrome120, chrome119, chrome110 (recommended)
  • firefox120, firefox110
  • safari17_0, safari16_5
  • edge99

When to use which: chrome120 is the default choice. If a site specifically blocks Chrome patterns, try safari17_0 — Safari's fingerprint is less commonly targeted in blocklists.

Advanced: httpx with TLS Customization

For more control, httpx with a custom TLS configuration can get you closer to browser fingerprints:

import httpx
import ssl

# Custom SSL context that matches Chrome's behavior more closely
def create_chrome_ssl_context():
    ctx = ssl.create_default_context()
    # Chrome's preferred cipher order (simplified)
    ctx.set_ciphers(
        "TLS_AES_128_GCM_SHA256:"
        "TLS_AES_256_GCM_SHA384:"
        "TLS_CHACHA20_POLY1305_SHA256:"
        "ECDH+AESGCM:"
        "ECDH+CHACHA20:"
        "DHE+AESGCM"
    )
    return ctx

# This alone isn't sufficient for modern Cloudflare, but helps with basic fingerprinting
async with httpx.AsyncClient(verify=create_chrome_ssl_context()) as client:
    r = await client.get(url)
Enter fullscreen mode Exit fullscreen mode

In practice: for serious Cloudflare bypassing, curl_cffi is more reliable than manual SSL context configuration. The TLS fingerprint involves dozens of parameters; curl_cffi handles all of them correctly by using the actual Chromium TLS stack.

JA3 and JA3N Fingerprinting

The specific fingerprint format most services use is called JA3 (developed by Salesforce). It hashes:

  1. SSLVersion
  2. Ciphers (comma-separated)
  3. Extensions (comma-separated)
  4. EllipticCurves
  5. EllipticCurvePointFormats

The resulting MD5 hash is compared against a database. Python requests produces 7dc465e28e1a62b68be994b34ae9eb24 — a well-known scraper fingerprint.

JA3N (the newer variant) includes additional parameters and is harder to spoof without using the actual client SSL stack.

Check your current fingerprint:

import subprocess
# Check what fingerprint your current setup produces
result = subprocess.run([
    "curl", "-v", "--tls-max", "1.3", "https://tls.peet.ws/api/all"
], capture_output=True, text=True)
# Parse the JSON response for your JA3 hash
Enter fullscreen mode Exit fullscreen mode

Or visit tls.peet.ws in a browser vs. from your scraper to see the difference.

What curl_cffi Doesn't Fix

TLS fingerprinting is one layer of bot detection. Even with a perfect TLS fingerprint, you'll still be detected if:

1. Your HTTP headers are wrong:

# Wrong: missing or out-of-order headers
headers = {"User-Agent": "Mozilla/5.0", "Accept": "*/*"}

# Right: match browser header order and values exactly
headers = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "DNT": "1",
    "Connection": "keep-alive",
    "Upgrade-Insecure-Requests": "1",
    "Sec-Fetch-Dest": "document",
    "Sec-Fetch-Mode": "navigate",
    "Sec-Fetch-Site": "none",
}
Enter fullscreen mode Exit fullscreen mode

2. Your behavior patterns are robotic:

  • Zero delay between requests
  • Perfect regularity (no human-like variation)
  • Missing referer headers on subsequent page loads
  • No cookie handling

3. JavaScript challenges aren't solved:
Cloudflare's highest protection level (Under Attack Mode) serves a JavaScript challenge that must be solved before you see any content. This requires a browser (playwright) not just TLS spoofing.

Practical Decision Tree

Is the site behind Cloudflare?
├── No → requests or httpx works fine
└── Yes → check protection level
    ├── Basic (static content loads) → curl_cffi with chrome impersonation
    ├── Anti-bot (5-second check) → curl_cffi + proper headers + cookie handling
    └── Under Attack Mode (JS challenge) → playwright with stealth mode
Enter fullscreen mode Exit fullscreen mode

curl_cffi in Production

from curl_cffi import requests as curl_requests
import time, random

session = curl_requests.Session(impersonate="chrome120")

def scrape_cloudflare_site(url: str) -> str:
    headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Accept-Encoding": "gzip, deflate, br",
        "Referer": "https://www.google.com/",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "cross-site",
    }

    # Small random delay
    time.sleep(random.uniform(0.5, 2.0))

    response = session.get(url, headers=headers, timeout=20)

    if response.status_code == 403:
        raise Exception(f"Blocked: {response.status_code}")

    return response.text

# The session maintains cookies across requests automatically
Enter fullscreen mode Exit fullscreen mode

Key notes:

  • Session() reuses the connection (faster, and maintains cookies)
  • Include Referer: google.com for first page load (natural navigation pattern)
  • Random delays are important — constant 0-latency requests are a detection signal

The Arms Race

Bot detection evolves continuously. What works in April 2026 may not work in October 2026. The current state:

  • Pure Python requests: blocked on most Cloudflare-protected sites
  • curl_cffi with chrome impersonation: works on 70-80% of Cloudflare sites
  • playwright + stealth: works on ~90% but 5-10x slower
  • Residential proxies + playwright: works on 95%+ but costs $5-15/GB

The progression from free to expensive matches the anti-bot sophistication you're dealing with.


Production Anti-Bot Ready Scrapers

If you need scrapers that already handle Cloudflare, I maintain 35 Apify actors with built-in anti-bot handling — proxy rotation, browser fingerprinting, and retry logic are included.

Apify Scrapers Bundle — €29 — one-time download. All actors run on Apify's infrastructure (no server needed).

Top comments (0)