DEV Community

agenthustler
agenthustler

Posted on • Edited on

Headless Browser Detection: How Sites Know You're a Bot

The Detection Arms Race

You launch your Puppeteer script, it works perfectly in testing, then fails in production. The site knows you are a bot. But how?

Modern bot detection goes far beyond checking user agents. Let's dive into exactly how sites detect headless browsers and how to defend against each technique.

Detection Method 1: The WebDriver Flag

The simplest check. Every automated browser sets navigator.webdriver = true:

// What sites check
if (navigator.webdriver) {
    // Block this visitor
}
Enter fullscreen mode Exit fullscreen mode

Defense in Python with Playwright:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Detection Method 2: Chrome DevTools Protocol

Sites detect if CDP (Chrome DevTools Protocol) is active:

// Check for CDP artifacts
if (window.cdc_adoQpoasnfa76pfcZLmcfl_Array ||
    window.cdc_adoQpoasnfa76pfcZLmcfl_Promise) {
    // Automated browser detected
}
Enter fullscreen mode Exit fullscreen mode

This is why undetected-chromedriver exists — it patches these artifacts:

import undetected_chromedriver as uc

driver = uc.Chrome(headless=True)
driver.get("https://nowsecure.nl")  # Bot detection test site
Enter fullscreen mode Exit fullscreen mode

Detection Method 3: Browser Fingerprinting

Sites build a fingerprint from dozens of browser properties:

// Canvas fingerprinting
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
ctx.textBaseline = 'top';
ctx.font = '14px Arial';
ctx.fillText('Hello', 2, 2);
const fingerprint = canvas.toDataURL();
// Headless browsers produce different canvas renders

// WebGL fingerprinting
const gl = canvas.getContext('webgl');
const renderer = gl.getParameter(gl.RENDERER);
// Headless Chrome: "Google SwiftShader"
// Real Chrome: "ANGLE (Intel, ...)"
Enter fullscreen mode Exit fullscreen mode

Defense:

# Inject realistic WebGL values
context.add_init_script("""
    const getParameter = WebGLRenderingContext.prototype.getParameter;
    WebGLRenderingContext.prototype.getParameter = function(parameter) {
        if (parameter === 37445) return 'Intel Inc.';
        if (parameter === 37446) return 'Intel Iris OpenGL Engine';
        return getParameter.call(this, parameter);
    };
""")
Enter fullscreen mode Exit fullscreen mode

Detection Method 4: Missing Browser APIs

Headless browsers lack certain APIs that real browsers have:

// Notification API check
if (!window.Notification) {
    // Probably headless
}

// Permission check
navigator.permissions.query({name: 'notifications'}).then(perm => {
    if (perm.state === 'prompt') {
        // Real browser behavior
    }
});

// Plugin check
if (navigator.plugins.length === 0) {
    // Headless browsers have no plugins
}
Enter fullscreen mode Exit fullscreen mode

Defense:

context.add_init_script("""
    // Fake plugins
    Object.defineProperty(navigator, 'plugins', {
        get: () => [1, 2, 3, 4, 5]  // Non-empty
    });

    Object.defineProperty(navigator, 'languages', {
        get: () => ['en-US', 'en']
    });
""")
Enter fullscreen mode Exit fullscreen mode

Detection Method 5: Behavioral Analysis

Advanced detection tracks how users interact:

// Mouse movement patterns
let movements = [];
document.addEventListener('mousemove', (e) => {
    movements.push({x: e.clientX, y: e.clientY, t: Date.now()});
});

// Bots move in straight lines or not at all
// Humans have natural curves and micro-movements

// Timing analysis
const loadTime = performance.timing.domContentLoadedEventEnd -
                 performance.timing.navigationStart;
// Bots often load pages unnaturally fast
Enter fullscreen mode Exit fullscreen mode

Defense:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Detection Method 6: TLS Fingerprinting

Your HTTP library's TLS handshake is unique. Python requests has a fingerprint that screams "bot":

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Detection Method 7: IP Reputation

Datacenter IPs are in public databases. Sites check instantly:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

ThorData provides residential proxies that pass IP reputation checks.

The Complete Stealth Stack

For maximum evasion, combine all defenses:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).
Enter fullscreen mode Exit fullscreen mode

Or Just Use an API

All of the above is a lot of work to maintain. ScraperAPI handles all detection bypass automatically — they keep up with the arms race so you do not have to.

Monitor your detection rates with ScrapeOps to know when your evasion techniques stop working.

Conclusion

Bot detection is a multi-layered system. No single technique catches every bot, and no single evasion defeats every detector. The key is understanding what each layer checks and ensuring your scraper passes all of them. For production use, a managed proxy service is almost always more cost-effective than maintaining your own stealth infrastructure.

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.