Emir

Posted on Mar 13 • Originally published at howtocenterdiv.com

Why Headless Browsers Get Detected: A Technical Breakdown

#security #javascript #webdev #devops

howtocenterdiv.com — "Software engineering is more than just centering a div."

Puppeteer rats itself out in at least 11 different ways the moment it starts up — and that's before it's even loaded a single page. Scraping tutorials almost never bring this up, then act shocked when the same script that ran perfectly on localhost gets hammered in production.

Here's what people get wrong: bot detection isn't a single if-statement checking one flag. It's a scoring system. Each signal you leak adds weight to a total, and once that total crosses a threshold, you're done — blocked, CAPTCHAed, or worst of all, quietly served garbage data so you don't even know it happened.

Layer	What It Checks	When
TLS Fingerprint (JA3)	Cipher suite order, extensions	TCP handshake — before HTTP
HTTP/2 Fingerprint	Frame settings, header order	First request
`navigator` properties	`webdriver`, `plugins`, `languages`	JS runtime
Canvas / WebGL	Rendering entropy, GPU string	JS runtime
Mouse & keyboard	Movement patterns, timing	Behavioral
IP reputation	ASN, datacenter range	DNS / IP layer

Most developers fixate on the navigator layer. They patch webdriver, maybe fake the user agent, and call it a day. They have no idea TLS fingerprinting has already clocked them before a single line of JavaScript ran.

Detection is cumulative and concurrent. Failing one check won't get you blocked. Getting blocked happens because a handful of small failures push the score over the threshold together. You can dodge navigator.webdriver perfectly and still get caught — because your JA3, canvas fingerprint, and plugin list aren't telling the same story.

Signal #1 — `navigator.webdriver`

console.log(navigator.webdriver);
// Headless → true   (instant detection)
// Real browser → undefined

The value itself isn't even the whole story. Detectors also inspect the property descriptor's configurability — that's a fingerprint of its own.

// Looks like a fix, still detectable
Object.defineProperty(navigator, 'webdriver', { get: () => false });

// What a detector actually sees
Object.getOwnPropertyDescriptor(navigator, 'webdriver');
// → { value: false, writable: true, configurable: true }
// In real Chrome, the property doesn't exist at all — just undefined

The classic mistake is patching properties after the page loads instead of injecting before. The clock doesn't wait.

Signal #2 — The `plugins` Array

navigator.plugins.length;
// Real Chrome → 3–7
// Headless    → 0  ← one-line detection

navigator.mimeTypes.length;
// Real Chrome → 2+
// Headless    → 0

Any halfway-decent detection script checks navigator.plugins.length === 0 and stops right there. But stuffing in fake plugins isn't a real fix either. The names, descriptions, and mime types inside each plugin object all have to be internally consistent — and they have to match the user-agent you claimed. If your UA says Chrome 120 on macOS but your plugin list looks like Chrome on Windows, that mismatch is itself a signal.

Signal #3 — Canvas Fingerprinting

const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
ctx.font = '11pt "Times New Roman"';
ctx.fillText('Cwm fjordbank', 2, 15);
const fingerprint = canvas.toDataURL();
// Real machine   → unique hash, varies by hardware
// Headless Chrome → identical hash, every single time

Headless Chrome produces pixel-perfect identical output for identical code, no matter what machine it's running on. There's no GPU variance. Detection systems maintain databases of known headless canvas hashes. Yours is already in there.

Signal #4 — WebGL Renderer String

const gl = document.createElement('canvas').getContext('webgl');
const info = gl.getExtension('WEBGL_debug_renderer_info');
gl.getParameter(info.UNMASKED_VENDOR_WEBGL);
// Real machine → "Intel Inc." / "NVIDIA Corporation"
// Headless     → "Google SwiftShader"  ← banned everywhere

SwiftShader is Google's software renderer, built for display-less environments. That string has been identified and blacklisted across detection systems everywhere. If SwiftShader shows up, you're flagged — doesn't matter what else you've cleaned up.

Signal #5 — TLS / JA3 Fingerprint

The first move in a TLS handshake is the client sending a ClientHello. Inside it: the list of cipher suites, extensions, and elliptic curves the client supports. The ordering of those items is dictated by the underlying TLS library — not the user-agent string you set.

JA3 = md5(SSLVersion, Ciphers, Extensions, EllipticCurves, ECPointFormats)

Client	TLS Library	JA3 Hash
Chrome 120 / macOS	BoringSSL	`cd08e31494f9531f560d64c695473da9`
Node.js 20 (axios/got)	OpenSSL	`b32309a26951912be7dba376398abc3b`
Python requests	Python ssl	`3b5074b1b5d032e5620f69f9f700ff0e`

You can set User-Agent: Chrome/120 all you want. The TLS handshake already announced Node.js before any JavaScript touched the page. There is no JS-layer fix for this one.

Signal #6 — HTTP/2 Fingerprint

Real Chrome and Node's http2 module send different SETTINGS frames:

Chrome 120:  HEADER_TABLE_SIZE=65536, ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=6291456
Node.js:     HEADER_TABLE_SIZE=4096,  ENABLE_PUSH=1, INITIAL_WINDOW_SIZE=65535

This gets extracted at the load balancer level, well before any application logic sees the request.

Signal #7 — Behavioral Entropy

// Bot movement
mousemove: (100,200) → (400,200) → (400,500)  // perfect L-shapes, instant

// Human movement
mousemove: (100,200) → (138,213) → (201,228) → ...  // curved, variable speed

Mouse movement is just one piece. Detection systems also profile keystroke timing (real humans: 50–200ms between keystrokes), scroll behavior, and how long someone spends on a page before doing anything. A bot that clicks 80 milliseconds after page load is a bot.

Math.random() delays don't fix this. A straight-line mouse path with randomized timing is still a straight-line mouse path.

Entropy scores accumulate across full sessions, not just individual events. That's why some bots clear the first checkpoint and get flagged 30 seconds later — the score built up over time, not all at once.

The Mistake Matrix

Mistake	Why It Fails
Only patching `navigator.webdriver`	10+ signals still leak
Using `got`/`axios` with spoofed headers	JA3 still says Node.js
No `--disable-blink-features=AutomationControlled`	`window.chrome` exposes automation flag
Datacenter proxies (AWS/GCP/Azure)	ASN is blacklisted before fingerprint checks even run
`User-Agent` without matching `sec-ch-ua`	Header contradiction — caught immediately
`Math.random()` delays only	Timing variance isn't behavioral entropy

How the Score Actually Adds Up

All checks run simultaneously, scores stack:

IP reputation:       +0.1  (clean, residential)
JA3 mismatch:        +0.6  (Node.js TLS on Chrome UA)
navigator.webdriver: +0.0  (patched correctly)
Canvas hash:         +0.4  (known headless hash)
Plugin count:        +0.3  (empty plugins)
Mouse entropy:       +0.5  (straight-line movement)
─────────────────────────────────────────────
Total: 1.9  →  Block threshold: 1.5

You nailed navigator. Doesn't matter — TLS and canvas alone already pushed it over.

The `sec-ch-ua` Problem

Chrome 90+ attaches client hints to every request. A real Chrome session looks like:

sec-ch-ua: "Chromium";v="120", "Google Chrome";v="120", "Not-A.Brand";v="99"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...

A Puppeteer session that sets a modern user-agent but sends no client hints looks like:

User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...
# sec-ch-ua: missing entirely

A Chrome 90+ user-agent without sec-ch-ua is a physical impossibility. Flagged on arrival. And just being present isn't enough — the brand token order and version numbers have to be consistent with the full UA string.

Tools vs. What They Actually Cover

Tool	Fixes	Doesn't Fix
`puppeteer-extra-plugin-stealth`	JS-layer signals	TLS/HTTP2 fingerprint
`rebrowser-puppeteer`	CDP leaks, runtime injection	TLS, behavioral
Go + CycleTLS	JA3 fingerprint	Behavioral, canvas
Real Chrome via CDP	TLS, canvas, GPU	Proxy/IP reputation

The only client that passes every layer by default is a real Chrome browser, running on consumer hardware, behind a residential IP.

Most guides don't even touch this. They treat detection as a checklist — knock off each item, done. But detection systems are probabilistic. They don't need certainty, just confidence. Fix 9 out of 10 signals and you can still get blocked if that last signal carries enough weight. JA3 mismatches typically score 0.5–0.7. One leak can be all it takes.

DEV Community

Why Headless Browsers Get Detected: A Technical Breakdown

Signal #1 — `navigator.webdriver`

Signal #2 — The `plugins` Array

Signal #3 — Canvas Fingerprinting

Signal #4 — WebGL Renderer String

Signal #5 — TLS / JA3 Fingerprint

Signal #6 — HTTP/2 Fingerprint

Signal #7 — Behavioral Entropy

The Mistake Matrix

How the Score Actually Adds Up

The `sec-ch-ua` Problem

Tools vs. What They Actually Cover

Top comments (0)

Signal #1 — navigator.webdriver

Signal #2 — The plugins Array

Signal #3 — Canvas Fingerprinting

Signal #4 — WebGL Renderer String

Signal #5 — TLS / JA3 Fingerprint

Signal #6 — HTTP/2 Fingerprint

Signal #7 — Behavioral Entropy

The Mistake Matrix

How the Score Actually Adds Up

The sec-ch-ua Problem

Tools vs. What They Actually Cover

Signal #1 — `navigator.webdriver`

Signal #2 — The `plugins` Array

The `sec-ch-ua` Problem