DEV Community

Cover image for Why Headless Browsers Get Detected: A Technical Breakdown
Emir
Emir

Posted on • Originally published at howtocenterdiv.com

Why Headless Browsers Get Detected: A Technical Breakdown

howtocenterdiv.com — "Software engineering is more than just centering a div."


Puppeteer rats itself out in at least 11 different ways the moment it starts up — and that's before it's even loaded a single page. Scraping tutorials almost never bring this up, then act shocked when the same script that ran perfectly on localhost gets hammered in production.


Here's what people get wrong: bot detection isn't a single if-statement checking one flag. It's a scoring system. Each signal you leak adds weight to a total, and once that total crosses a threshold, you're done — blocked, CAPTCHAed, or worst of all, quietly served garbage data so you don't even know it happened.

Layer What It Checks When
TLS Fingerprint (JA3) Cipher suite order, extensions TCP handshake — before HTTP
HTTP/2 Fingerprint Frame settings, header order First request
navigator properties webdriver, plugins, languages JS runtime
Canvas / WebGL Rendering entropy, GPU string JS runtime
Mouse & keyboard Movement patterns, timing Behavioral
IP reputation ASN, datacenter range DNS / IP layer

Most developers fixate on the navigator layer. They patch webdriver, maybe fake the user agent, and call it a day. They have no idea TLS fingerprinting has already clocked them before a single line of JavaScript ran.

Detection is cumulative and concurrent. Failing one check won't get you blocked. Getting blocked happens because a handful of small failures push the score over the threshold together. You can dodge navigator.webdriver perfectly and still get caught — because your JA3, canvas fingerprint, and plugin list aren't telling the same story.


Signal #1 — navigator.webdriver

console.log(navigator.webdriver);
// Headless → true   (instant detection)
// Real browser → undefined
Enter fullscreen mode Exit fullscreen mode

The value itself isn't even the whole story. Detectors also inspect the property descriptor's configurability — that's a fingerprint of its own.

// Looks like a fix, still detectable
Object.defineProperty(navigator, 'webdriver', { get: () => false });

// What a detector actually sees
Object.getOwnPropertyDescriptor(navigator, 'webdriver');
// → { value: false, writable: true, configurable: true }
// In real Chrome, the property doesn't exist at all — just undefined
Enter fullscreen mode Exit fullscreen mode

The classic mistake is patching properties after the page loads instead of injecting before. The clock doesn't wait.


Signal #2 — The plugins Array

navigator.plugins.length;
// Real Chrome → 3–7
// Headless    → 0  ← one-line detection

navigator.mimeTypes.length;
// Real Chrome → 2+
// Headless    → 0
Enter fullscreen mode Exit fullscreen mode

Any halfway-decent detection script checks navigator.plugins.length === 0 and stops right there. But stuffing in fake plugins isn't a real fix either. The names, descriptions, and mime types inside each plugin object all have to be internally consistent — and they have to match the user-agent you claimed. If your UA says Chrome 120 on macOS but your plugin list looks like Chrome on Windows, that mismatch is itself a signal.


Signal #3 — Canvas Fingerprinting

const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
ctx.font = '11pt "Times New Roman"';
ctx.fillText('Cwm fjordbank', 2, 15);
const fingerprint = canvas.toDataURL();
// Real machine   → unique hash, varies by hardware
// Headless Chrome → identical hash, every single time
Enter fullscreen mode Exit fullscreen mode

Headless Chrome produces pixel-perfect identical output for identical code, no matter what machine it's running on. There's no GPU variance. Detection systems maintain databases of known headless canvas hashes. Yours is already in there.


Signal #4 — WebGL Renderer String

const gl = document.createElement('canvas').getContext('webgl');
const info = gl.getExtension('WEBGL_debug_renderer_info');
gl.getParameter(info.UNMASKED_VENDOR_WEBGL);
// Real machine → "Intel Inc." / "NVIDIA Corporation"
// Headless     → "Google SwiftShader"  ← banned everywhere
Enter fullscreen mode Exit fullscreen mode

SwiftShader is Google's software renderer, built for display-less environments. That string has been identified and blacklisted across detection systems everywhere. If SwiftShader shows up, you're flagged — doesn't matter what else you've cleaned up.


Signal #5 — TLS / JA3 Fingerprint

The first move in a TLS handshake is the client sending a ClientHello. Inside it: the list of cipher suites, extensions, and elliptic curves the client supports. The ordering of those items is dictated by the underlying TLS library — not the user-agent string you set.

JA3 = md5(SSLVersion, Ciphers, Extensions, EllipticCurves, ECPointFormats)
Enter fullscreen mode Exit fullscreen mode
Client TLS Library JA3 Hash
Chrome 120 / macOS BoringSSL cd08e31494f9531f560d64c695473da9
Node.js 20 (axios/got) OpenSSL b32309a26951912be7dba376398abc3b
Python requests Python ssl 3b5074b1b5d032e5620f69f9f700ff0e

You can set User-Agent: Chrome/120 all you want. The TLS handshake already announced Node.js before any JavaScript touched the page. There is no JS-layer fix for this one.


Signal #6 — HTTP/2 Fingerprint

Real Chrome and Node's http2 module send different SETTINGS frames:

Chrome 120:  HEADER_TABLE_SIZE=65536, ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=6291456
Node.js:     HEADER_TABLE_SIZE=4096,  ENABLE_PUSH=1, INITIAL_WINDOW_SIZE=65535
Enter fullscreen mode Exit fullscreen mode

This gets extracted at the load balancer level, well before any application logic sees the request.


Signal #7 — Behavioral Entropy

// Bot movement
mousemove: (100,200)  (400,200)  (400,500)  // perfect L-shapes, instant

// Human movement
mousemove: (100,200)  (138,213)  (201,228)  ...  // curved, variable speed
Enter fullscreen mode Exit fullscreen mode

Mouse movement is just one piece. Detection systems also profile keystroke timing (real humans: 50–200ms between keystrokes), scroll behavior, and how long someone spends on a page before doing anything. A bot that clicks 80 milliseconds after page load is a bot.

Math.random() delays don't fix this. A straight-line mouse path with randomized timing is still a straight-line mouse path.

Entropy scores accumulate across full sessions, not just individual events. That's why some bots clear the first checkpoint and get flagged 30 seconds later — the score built up over time, not all at once.


The Mistake Matrix

Mistake Why It Fails
Only patching navigator.webdriver 10+ signals still leak
Using got/axios with spoofed headers JA3 still says Node.js
No --disable-blink-features=AutomationControlled window.chrome exposes automation flag
Datacenter proxies (AWS/GCP/Azure) ASN is blacklisted before fingerprint checks even run
User-Agent without matching sec-ch-ua Header contradiction — caught immediately
Math.random() delays only Timing variance isn't behavioral entropy

How the Score Actually Adds Up

All checks run simultaneously, scores stack:

IP reputation:       +0.1  (clean, residential)
JA3 mismatch:        +0.6  (Node.js TLS on Chrome UA)
navigator.webdriver: +0.0  (patched correctly)
Canvas hash:         +0.4  (known headless hash)
Plugin count:        +0.3  (empty plugins)
Mouse entropy:       +0.5  (straight-line movement)
─────────────────────────────────────────────
Total: 1.9  →  Block threshold: 1.5
Enter fullscreen mode Exit fullscreen mode

You nailed navigator. Doesn't matter — TLS and canvas alone already pushed it over.


The sec-ch-ua Problem

Chrome 90+ attaches client hints to every request. A real Chrome session looks like:

sec-ch-ua: "Chromium";v="120", "Google Chrome";v="120", "Not-A.Brand";v="99"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...
Enter fullscreen mode Exit fullscreen mode

A Puppeteer session that sets a modern user-agent but sends no client hints looks like:

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...
# sec-ch-ua: missing entirely
Enter fullscreen mode Exit fullscreen mode

A Chrome 90+ user-agent without sec-ch-ua is a physical impossibility. Flagged on arrival. And just being present isn't enough — the brand token order and version numbers have to be consistent with the full UA string.


Tools vs. What They Actually Cover

Tool Fixes Doesn't Fix
puppeteer-extra-plugin-stealth JS-layer signals TLS/HTTP2 fingerprint
rebrowser-puppeteer CDP leaks, runtime injection TLS, behavioral
Go + CycleTLS JA3 fingerprint Behavioral, canvas
Real Chrome via CDP TLS, canvas, GPU Proxy/IP reputation

The only client that passes every layer by default is a real Chrome browser, running on consumer hardware, behind a residential IP.

Most guides don't even touch this. They treat detection as a checklist — knock off each item, done. But detection systems are probabilistic. They don't need certainty, just confidence. Fix 9 out of 10 signals and you can still get blocked if that last signal carries enough weight. JA3 mismatches typically score 0.5–0.7. One leak can be all it takes.

Top comments (0)