The Detection Arms Race
You launch your Puppeteer script, it works perfectly in testing, then fails in production. The site knows you are a bot. But how?
Modern bot detection goes far beyond checking user agents. Let's dive into exactly how sites detect headless browsers and how to defend against each technique.
Detection Method 1: The WebDriver Flag
The simplest check. Every automated browser sets navigator.webdriver = true:
// What sites check
if (navigator.webdriver) {
// Block this visitor
}
Defense in Python with Playwright:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Detection Method 2: Chrome DevTools Protocol
Sites detect if CDP (Chrome DevTools Protocol) is active:
// Check for CDP artifacts
if (window.cdc_adoQpoasnfa76pfcZLmcfl_Array ||
window.cdc_adoQpoasnfa76pfcZLmcfl_Promise) {
// Automated browser detected
}
This is why undetected-chromedriver exists — it patches these artifacts:
import undetected_chromedriver as uc
driver = uc.Chrome(headless=True)
driver.get("https://nowsecure.nl") # Bot detection test site
Detection Method 3: Browser Fingerprinting
Sites build a fingerprint from dozens of browser properties:
// Canvas fingerprinting
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
ctx.textBaseline = 'top';
ctx.font = '14px Arial';
ctx.fillText('Hello', 2, 2);
const fingerprint = canvas.toDataURL();
// Headless browsers produce different canvas renders
// WebGL fingerprinting
const gl = canvas.getContext('webgl');
const renderer = gl.getParameter(gl.RENDERER);
// Headless Chrome: "Google SwiftShader"
// Real Chrome: "ANGLE (Intel, ...)"
Defense:
# Inject realistic WebGL values
context.add_init_script("""
const getParameter = WebGLRenderingContext.prototype.getParameter;
WebGLRenderingContext.prototype.getParameter = function(parameter) {
if (parameter === 37445) return 'Intel Inc.';
if (parameter === 37446) return 'Intel Iris OpenGL Engine';
return getParameter.call(this, parameter);
};
""")
Detection Method 4: Missing Browser APIs
Headless browsers lack certain APIs that real browsers have:
// Notification API check
if (!window.Notification) {
// Probably headless
}
// Permission check
navigator.permissions.query({name: 'notifications'}).then(perm => {
if (perm.state === 'prompt') {
// Real browser behavior
}
});
// Plugin check
if (navigator.plugins.length === 0) {
// Headless browsers have no plugins
}
Defense:
context.add_init_script("""
// Fake plugins
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5] // Non-empty
});
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
""")
Detection Method 5: Behavioral Analysis
Advanced detection tracks how users interact:
// Mouse movement patterns
let movements = [];
document.addEventListener('mousemove', (e) => {
movements.push({x: e.clientX, y: e.clientY, t: Date.now()});
});
// Bots move in straight lines or not at all
// Humans have natural curves and micro-movements
// Timing analysis
const loadTime = performance.timing.domContentLoadedEventEnd -
performance.timing.navigationStart;
// Bots often load pages unnaturally fast
Defense:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Detection Method 6: TLS Fingerprinting
Your HTTP library's TLS handshake is unique. Python requests has a fingerprint that screams "bot":
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Detection Method 7: IP Reputation
Datacenter IPs are in public databases. Sites check instantly:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
ThorData provides residential proxies that pass IP reputation checks.
The Complete Stealth Stack
For maximum evasion, combine all defenses:
# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Or Just Use an API
All of the above is a lot of work to maintain. ScraperAPI handles all detection bypass automatically — they keep up with the arms race so you do not have to.
Monitor your detection rates with ScrapeOps to know when your evasion techniques stop working.
Conclusion
Bot detection is a multi-layered system. No single technique catches every bot, and no single evasion defeats every detector. The key is understanding what each layer checks and ensuring your scraper passes all of them. For production use, a managed proxy service is almost always more cost-effective than maintaining your own stealth infrastructure.
Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.