howtocenterdiv.com — "Software engineering is more than just centering a div."
Puppeteer rats itself out in at least 11 different ways the moment it starts up — and that's before it's even loaded a single page. Scraping tutorials almost never bring this up, then act shocked when the same script that ran perfectly on localhost gets hammered in production.
Here's what people get wrong: bot detection isn't a single if-statement checking one flag. It's a scoring system. Each signal you leak adds weight to a total, and once that total crosses a threshold, you're done — blocked, CAPTCHAed, or worst of all, quietly served garbage data so you don't even know it happened.
| Layer | What It Checks | When |
|---|---|---|
| TLS Fingerprint (JA3) | Cipher suite order, extensions | TCP handshake — before HTTP |
| HTTP/2 Fingerprint | Frame settings, header order | First request |
navigator properties |
webdriver, plugins, languages
|
JS runtime |
| Canvas / WebGL | Rendering entropy, GPU string | JS runtime |
| Mouse & keyboard | Movement patterns, timing | Behavioral |
| IP reputation | ASN, datacenter range | DNS / IP layer |
Most developers fixate on the navigator layer. They patch webdriver, maybe fake the user agent, and call it a day. They have no idea TLS fingerprinting has already clocked them before a single line of JavaScript ran.
Detection is cumulative and concurrent. Failing one check won't get you blocked. Getting blocked happens because a handful of small failures push the score over the threshold together. You can dodge navigator.webdriver perfectly and still get caught — because your JA3, canvas fingerprint, and plugin list aren't telling the same story.
Signal #1 — navigator.webdriver
console.log(navigator.webdriver);
// Headless → true (instant detection)
// Real browser → undefined
The value itself isn't even the whole story. Detectors also inspect the property descriptor's configurability — that's a fingerprint of its own.
// Looks like a fix, still detectable
Object.defineProperty(navigator, 'webdriver', { get: () => false });
// What a detector actually sees
Object.getOwnPropertyDescriptor(navigator, 'webdriver');
// → { value: false, writable: true, configurable: true }
// In real Chrome, the property doesn't exist at all — just undefined
The classic mistake is patching properties after the page loads instead of injecting before. The clock doesn't wait.
Signal #2 — The plugins Array
navigator.plugins.length;
// Real Chrome → 3–7
// Headless → 0 ← one-line detection
navigator.mimeTypes.length;
// Real Chrome → 2+
// Headless → 0
Any halfway-decent detection script checks navigator.plugins.length === 0 and stops right there. But stuffing in fake plugins isn't a real fix either. The names, descriptions, and mime types inside each plugin object all have to be internally consistent — and they have to match the user-agent you claimed. If your UA says Chrome 120 on macOS but your plugin list looks like Chrome on Windows, that mismatch is itself a signal.
Signal #3 — Canvas Fingerprinting
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
ctx.font = '11pt "Times New Roman"';
ctx.fillText('Cwm fjordbank', 2, 15);
const fingerprint = canvas.toDataURL();
// Real machine → unique hash, varies by hardware
// Headless Chrome → identical hash, every single time
Headless Chrome produces pixel-perfect identical output for identical code, no matter what machine it's running on. There's no GPU variance. Detection systems maintain databases of known headless canvas hashes. Yours is already in there.
Signal #4 — WebGL Renderer String
const gl = document.createElement('canvas').getContext('webgl');
const info = gl.getExtension('WEBGL_debug_renderer_info');
gl.getParameter(info.UNMASKED_VENDOR_WEBGL);
// Real machine → "Intel Inc." / "NVIDIA Corporation"
// Headless → "Google SwiftShader" ← banned everywhere
SwiftShader is Google's software renderer, built for display-less environments. That string has been identified and blacklisted across detection systems everywhere. If SwiftShader shows up, you're flagged — doesn't matter what else you've cleaned up.
Signal #5 — TLS / JA3 Fingerprint
The first move in a TLS handshake is the client sending a ClientHello. Inside it: the list of cipher suites, extensions, and elliptic curves the client supports. The ordering of those items is dictated by the underlying TLS library — not the user-agent string you set.
JA3 = md5(SSLVersion, Ciphers, Extensions, EllipticCurves, ECPointFormats)
| Client | TLS Library | JA3 Hash |
|---|---|---|
| Chrome 120 / macOS | BoringSSL | cd08e31494f9531f560d64c695473da9 |
| Node.js 20 (axios/got) | OpenSSL | b32309a26951912be7dba376398abc3b |
| Python requests | Python ssl | 3b5074b1b5d032e5620f69f9f700ff0e |
You can set User-Agent: Chrome/120 all you want. The TLS handshake already announced Node.js before any JavaScript touched the page. There is no JS-layer fix for this one.
Signal #6 — HTTP/2 Fingerprint
Real Chrome and Node's http2 module send different SETTINGS frames:
Chrome 120: HEADER_TABLE_SIZE=65536, ENABLE_PUSH=0, INITIAL_WINDOW_SIZE=6291456
Node.js: HEADER_TABLE_SIZE=4096, ENABLE_PUSH=1, INITIAL_WINDOW_SIZE=65535
This gets extracted at the load balancer level, well before any application logic sees the request.
Signal #7 — Behavioral Entropy
// Bot movement
mousemove: (100,200) → (400,200) → (400,500) // perfect L-shapes, instant
// Human movement
mousemove: (100,200) → (138,213) → (201,228) → ... // curved, variable speed
Mouse movement is just one piece. Detection systems also profile keystroke timing (real humans: 50–200ms between keystrokes), scroll behavior, and how long someone spends on a page before doing anything. A bot that clicks 80 milliseconds after page load is a bot.
Math.random() delays don't fix this. A straight-line mouse path with randomized timing is still a straight-line mouse path.
Entropy scores accumulate across full sessions, not just individual events. That's why some bots clear the first checkpoint and get flagged 30 seconds later — the score built up over time, not all at once.
The Mistake Matrix
| Mistake | Why It Fails |
|---|---|
Only patching navigator.webdriver
|
10+ signals still leak |
Using got/axios with spoofed headers |
JA3 still says Node.js |
No --disable-blink-features=AutomationControlled
|
window.chrome exposes automation flag |
| Datacenter proxies (AWS/GCP/Azure) | ASN is blacklisted before fingerprint checks even run |
User-Agent without matching sec-ch-ua
|
Header contradiction — caught immediately |
Math.random() delays only |
Timing variance isn't behavioral entropy |
How the Score Actually Adds Up
All checks run simultaneously, scores stack:
IP reputation: +0.1 (clean, residential)
JA3 mismatch: +0.6 (Node.js TLS on Chrome UA)
navigator.webdriver: +0.0 (patched correctly)
Canvas hash: +0.4 (known headless hash)
Plugin count: +0.3 (empty plugins)
Mouse entropy: +0.5 (straight-line movement)
─────────────────────────────────────────────
Total: 1.9 → Block threshold: 1.5
You nailed navigator. Doesn't matter — TLS and canvas alone already pushed it over.
The sec-ch-ua Problem
Chrome 90+ attaches client hints to every request. A real Chrome session looks like:
sec-ch-ua: "Chromium";v="120", "Google Chrome";v="120", "Not-A.Brand";v="99"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...
A Puppeteer session that sets a modern user-agent but sends no client hints looks like:
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)...
# sec-ch-ua: missing entirely
A Chrome 90+ user-agent without sec-ch-ua is a physical impossibility. Flagged on arrival. And just being present isn't enough — the brand token order and version numbers have to be consistent with the full UA string.
Tools vs. What They Actually Cover
| Tool | Fixes | Doesn't Fix |
|---|---|---|
puppeteer-extra-plugin-stealth |
JS-layer signals | TLS/HTTP2 fingerprint |
rebrowser-puppeteer |
CDP leaks, runtime injection | TLS, behavioral |
| Go + CycleTLS | JA3 fingerprint | Behavioral, canvas |
| Real Chrome via CDP | TLS, canvas, GPU | Proxy/IP reputation |
The only client that passes every layer by default is a real Chrome browser, running on consumer hardware, behind a residential IP.
Most guides don't even touch this. They treat detection as a checklist — knock off each item, done. But detection systems are probabilistic. They don't need certainty, just confidence. Fix 9 out of 10 signals and you can still get blocked if that last signal carries enough weight. JA3 mismatches typically score 0.5–0.7. One leak can be all it takes.
Top comments (0)