DEV Community

Cover image for HTTP/2 and Header Consistency: The Holy Grail of Stealth
Lalit Mishra
Lalit Mishra

Posted on

HTTP/2 and Header Consistency: The Holy Grail of Stealth

1. Introduction: The Stealth Gap

In the previous analysis of TLS fingerprinting, we established that the handshake is the first hurdle in modern scraping. By adopting tools like curl_cffi, engineers can successfully mimic the cryptographic signature (JA3/JA4) of a legitimate browser. Yet, many sophisticated scrapers utilizing perfect TLS impersonation are still blocked by Cloudflare, Akamai, and Datadome.

The reason lies in the stealth gap between the Transport Layer (TLS) and the Application Layer (Content).

When a connection is established, the WAF doesn't stop analyzing. It begins inspecting the structure of the traffic itself. We are transitioning from the era of "HTTP Header Spoofing" (simply copying User-Agent strings) to "Protocol Consistency." Modern anti-bot systems analyze the HTTP/2 frames, the precise ordering of pseudo-headers, and the correlation between your declared identity and your TCP/IP behavior.

If your TLS fingerprint says "Chrome 120," but your HTTP/2 SETTINGS frame looks like golang/net/http or your headers are alphabetized (a standard Python behavior), you are immediately flagged. This article explores the mechanics of HTTP/2 fingerprinting and explains why header consistency is the final frontier of stealth engineering.

2. HTTP/2 Fundamentals: The Binary Shift

To understand how you are being detected, you must understand how HTTP/2 differs from HTTP/1.1.

In HTTP/1.1, a request is a plaintext blob. You send:

GET /resource HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0...

Enter fullscreen mode Exit fullscreen mode

In this text-based world, order mattered slightly, but the protocol was lenient.

HTTP/2 is a binary protocol. It does not send text; it sends frames.

  • Stream Multiplexing: Multiple requests are sent over a single TCP connection in parallel streams.
  • Header Compression (HPACK): Headers are compressed using a dynamic dictionary.
  • SETTINGS Frame: Upon connection, the client and server exchange a SETTINGS frame defining parameters like MAX_CONCURRENT_STREAMS or INITIAL_WINDOW_SIZE.

Crucially, the values in the SETTINGS frame are specific to the client implementation. Chrome, Firefox, and httpx (Python) all use distinct default values. A WAF simply reads these binary parameters. If you claim to be Chrome in your User-Agent but send the SETTINGS frame of a Python library, the game is over.

A technical comparison diagram split down the middle. Left side labeled

3. The Pseudo-Header Minefield

HTTP/2 removed the "Request Line" (GET / HTTP/1.1) and replaced it with Pseudo-Headers. These are special headers prefixed with a colon (:) that must be sent before any regular headers.

The four critical pseudo-headers are:

  1. :method: The HTTP verb (e.g., GET, POST).
  2. :authority: The domain name (replaces the Host header).
  3. :scheme: The protocol (https or http).
  4. :path: The resource path (e.g., /search?q=scraping).

The Ordering Trap

This is where Python scrapers fail most often. The HTTP/2 specification requires pseudo-headers to be sent first, but it does not mandate a specific order among them. However, browsers are rigid.

  • Google Chrome typically sends: :method, :authority, :scheme, :path.
  • Firefox often uses a different internal ordering logic.

Standard Python libraries (like httpx or legacy adapters) often treat headers as a dictionary (hash map). When these are serialized to the wire, the order might be random, alphabetical, or consistent-but-wrong (e.g., :path before :method).

A WAF rule is trivial to write:
IF User-Agent == "Chrome" AND First-Pseudo-Header!= ":method" THEN BLOCK.

This detection logic requires zero CPU power for the WAF; it's a static pattern match against the first few bytes of the request stream.

4. Header Consistency and Implementation Leaks

Beyond pseudo-headers, the ordering of standard headers (Accept, Accept-Language, Cache-Control) acts as a secondary fingerprint.

Browsers optimize their networking stacks heavily. Chrome has a specific "header train" it sends for navigation requests. It will almost always send sec-ch-ua (Client Hints) near the top, followed by mobile, then platform.

If your scraper manually constructs headers like this:

headers = {
    "User-Agent": "Mozilla/5.0...",
    "Accept": "*/*",
    "Referer": "https://google.com"
}

Enter fullscreen mode Exit fullscreen mode

You are likely creating a "Frankenstein" fingerprint.

  1. Missing Headers: You forgot sec-fetch-site, sec-fetch-mode, or sec-fetch-dest (Fetch Metadata Request Headers), which all modern browsers send.
  2. Wrong Case: HTTP/2 requires all header names to be lowercase. User-Agent is invalid; it must be user-agent. While some libraries handle this conversion, manually setting headers in raw socket implementations can leak uppercase characters, triggering an instant block.
  3. Inconsistent Order: If you claim to be Chrome, but your Accept-Encoding header is at the bottom (Chrome usually places it early), you create a statistical anomaly.

A visualization of

5. Passive OS Fingerprinting (p0f)

We must also consider the layer below HTTP/2: TCP/IP. Passive OS Fingerprinting (p0f) analyzes the TCP SYN packet to determine the operating system of the client.

Key metrics include:

  • TTL (Time To Live): Windows usually defaults to 128; Linux/Android to 64.
  • Window Size: The size of the receive window.
  • TCP Options: The order of options like MSS, SACK, and Timestamps.

The Scenario: You run your scraper in a Docker container (Linux) but spoof a User-Agent string for "Windows 10 Chrome".

  • WAF sees: "I am Windows" (User-Agent).
  • WAF measures: TTL=64 (Linux).
  • Result: Mismatch detected. Bot score increases.

This is why "Network consistency" is harder than it looks. You cannot easily change the OS-level TCP/IP stack from inside a Python script. However, many commercial WAFs prioritize the TLS and HTTP/2 fingerprints over p0f due to the prevalence of NAT and proxies which can modify TTL/Window sizes. But for the highest-tier targets, this discrepancy matters.

6. The Solution: curl_cffi and Consistency

The requests library is built on urllib3, which does not support HTTP/2. httpx supports HTTP/2, but it implements its own Python-based frame logic, which creates a unique "httpx fingerprint."

curl_cffi solves this by wrapping a custom-patched version of libcurl (curl-impersonate).

When you run:

from curl_cffi import requests

response = requests.get(
    "https://example.com",
    impersonate="chrome120"
)

Enter fullscreen mode Exit fullscreen mode

You are not just changing the TLS ciphers. You are triggering a preset configuration in the underlying C library that:

  1. Sets HTTP/2 Frames: Sends the exact SETTINGS frame values (Window Size, Max Streams) that Chrome 120 uses.
  2. Enforces Pseudo-Header Order: Ensures :method comes before :authority (or whatever the target browser version does).
  3. Orders Standard Headers: Re-orders the headers you provide to match the browser's statistical profile, and injects default headers (like sec-ch-ua) if they are missing.

This is impersonation consistency. The TLS handshake and the HTTP/2 framing and the header order all align with the story told by the User-Agent.

An architectural diagram showing the  raw `curl_cffi` endraw

7. Practical Configuration Patterns

Using curl_cffi effectively requires discipline.

The "Header Pollution" Mistake

A common error is manually adding headers that the impersonator already handles.

Bad Practice:

# Don't do this!
headers = {
    "User-Agent": "Mozilla/5.0...", 
    "Accept-Encoding": "gzip, deflate", 
    "sec-ch-ua": "..."
}
requests.get(url, headers=headers, impersonate="chrome120")

Enter fullscreen mode Exit fullscreen mode

By manually defining User-Agent or Accept-Encoding, you might override the perfect values curl_cffi generates, or worse, introduce duplicates.

Best Practice:
Only provide the headers that contain state or context (like Referer, Authorization, or custom cookies). Let the impersonator handle the static browser fingerprints.

# Do this
requests.get(
    url, 
    headers={"Referer": "https://google.com"}, 
    impersonate="chrome120"
)

Enter fullscreen mode Exit fullscreen mode

Handling "Silent" Blocks

If you are using curl_cffi and still get blocked:

  1. Check IP Reputation: No amount of header consistency fixes a flagged IP.
  2. Check Browser Version: "chrome120" might be deprecated. If the real world is on Chrome 130, Chrome 120 looks suspicious. Keep the library updated.
  3. Cookies: Ensure you aren't sending a "fresh" request to an endpoint that expects a session cookie (e.g., a search API).

Troubleshooting Flowchart

8. Conclusion: The Reality of Modern Scraping

The "Holy Grail" of stealth is not a single tool; it is consistency across the stack.

  1. Transport (TLS): Matches the browser (JA3/JA4).
  2. Protocol (HTTP/2): Matches the browser (Frames, Settings).
  3. Application (Headers): Matches the browser (Order, Case, Pseudo-headers).

If any of these layers contradict the others, the WAF wins. Traditional Python libraries like requests and httpx simply cannot coordinate these layers with the precision required for 2025. By moving to browser-based network bindings like curl_cffi, you align your scraper's DNA with that of legitimate users, turning an architectural vulnerability into a strength.

Top comments (0)