DEV Community

agenthustler
agenthustler

Posted on

Headless Browser Detection: How Sites Know You're a Bot

The Detection Arms Race

You launch your Puppeteer script, it works perfectly in testing, then fails in production. The site knows you are a bot. But how?

Modern bot detection goes far beyond checking user agents. Let's dive into exactly how sites detect headless browsers and how to defend against each technique.

Detection Method 1: The WebDriver Flag

The simplest check. Every automated browser sets navigator.webdriver = true:

// What sites check
if (navigator.webdriver) {
    // Block this visitor
}
Enter fullscreen mode Exit fullscreen mode

Defense in Python with Playwright:

from playwright.sync_api import sync_playwright

def create_stealth_browser():
    p = sync_playwright().start()
    browser = p.chromium.launch(
        headless=True,
        args=["--disable-blink-features=AutomationControlled"]
    )
    context = browser.new_context()

    # Remove webdriver flag
    context.add_init_script("""
        Object.defineProperty(navigator, 'webdriver', {
            get: () => undefined
        });
    """)

    return browser, context
Enter fullscreen mode Exit fullscreen mode

Detection Method 2: Chrome DevTools Protocol

Sites detect if CDP (Chrome DevTools Protocol) is active:

// Check for CDP artifacts
if (window.cdc_adoQpoasnfa76pfcZLmcfl_Array ||
    window.cdc_adoQpoasnfa76pfcZLmcfl_Promise) {
    // Automated browser detected
}
Enter fullscreen mode Exit fullscreen mode

This is why undetected-chromedriver exists — it patches these artifacts:

import undetected_chromedriver as uc

driver = uc.Chrome(headless=True)
driver.get("https://nowsecure.nl")  # Bot detection test site
Enter fullscreen mode Exit fullscreen mode

Detection Method 3: Browser Fingerprinting

Sites build a fingerprint from dozens of browser properties:

// Canvas fingerprinting
const canvas = document.createElement('canvas');
const ctx = canvas.getContext('2d');
ctx.textBaseline = 'top';
ctx.font = '14px Arial';
ctx.fillText('Hello', 2, 2);
const fingerprint = canvas.toDataURL();
// Headless browsers produce different canvas renders

// WebGL fingerprinting
const gl = canvas.getContext('webgl');
const renderer = gl.getParameter(gl.RENDERER);
// Headless Chrome: "Google SwiftShader"
// Real Chrome: "ANGLE (Intel, ...)"
Enter fullscreen mode Exit fullscreen mode

Defense:

# Inject realistic WebGL values
context.add_init_script("""
    const getParameter = WebGLRenderingContext.prototype.getParameter;
    WebGLRenderingContext.prototype.getParameter = function(parameter) {
        if (parameter === 37445) return 'Intel Inc.';
        if (parameter === 37446) return 'Intel Iris OpenGL Engine';
        return getParameter.call(this, parameter);
    };
""")
Enter fullscreen mode Exit fullscreen mode

Detection Method 4: Missing Browser APIs

Headless browsers lack certain APIs that real browsers have:

// Notification API check
if (!window.Notification) {
    // Probably headless
}

// Permission check
navigator.permissions.query({name: 'notifications'}).then(perm => {
    if (perm.state === 'prompt') {
        // Real browser behavior
    }
});

// Plugin check
if (navigator.plugins.length === 0) {
    // Headless browsers have no plugins
}
Enter fullscreen mode Exit fullscreen mode

Defense:

context.add_init_script("""
    // Fake plugins
    Object.defineProperty(navigator, 'plugins', {
        get: () => [1, 2, 3, 4, 5]  // Non-empty
    });

    Object.defineProperty(navigator, 'languages', {
        get: () => ['en-US', 'en']
    });
""")
Enter fullscreen mode Exit fullscreen mode

Detection Method 5: Behavioral Analysis

Advanced detection tracks how users interact:

// Mouse movement patterns
let movements = [];
document.addEventListener('mousemove', (e) => {
    movements.push({x: e.clientX, y: e.clientY, t: Date.now()});
});

// Bots move in straight lines or not at all
// Humans have natural curves and micro-movements

// Timing analysis
const loadTime = performance.timing.domContentLoadedEventEnd -
                 performance.timing.navigationStart;
// Bots often load pages unnaturally fast
Enter fullscreen mode Exit fullscreen mode

Defense:

import random
import asyncio

async def human_like_interaction(page):
    # Random mouse movements
    for _ in range(random.randint(3, 7)):
        x = random.randint(100, 800)
        y = random.randint(100, 600)
        await page.mouse.move(x, y, steps=random.randint(10, 25))
        await asyncio.sleep(random.uniform(0.1, 0.5))

    # Scroll naturally
    for _ in range(random.randint(2, 5)):
        delta = random.randint(100, 400)
        await page.mouse.wheel(0, delta)
        await asyncio.sleep(random.uniform(0.5, 1.5))
Enter fullscreen mode Exit fullscreen mode

Detection Method 6: TLS Fingerprinting

Your HTTP library's TLS handshake is unique. Python requests has a fingerprint that screams "bot":

# Standard requests - detectable TLS fingerprint
import requests
resp = requests.get(url)  # JA3 hash identifies this as Python

# curl_cffi - mimics Chrome's TLS fingerprint
from curl_cffi import requests as cf_requests
resp = cf_requests.get(url, impersonate="chrome120")  # Matches real Chrome
Enter fullscreen mode Exit fullscreen mode

Detection Method 7: IP Reputation

Datacenter IPs are in public databases. Sites check instantly:

# Datacenter IP = instant block
# Residential IP = trusted

# Use residential proxies from ThorData
proxies = {"https": "http://user:pass@residential.thordata.com:9000"}
resp = requests.get(url, proxies=proxies)
Enter fullscreen mode Exit fullscreen mode

ThorData provides residential proxies that pass IP reputation checks.

The Complete Stealth Stack

For maximum evasion, combine all defenses:

from playwright.async_api import async_playwright
import random
import asyncio

async def stealth_scrape(url):
    async with async_playwright() as p:
        browser = await p.chromium.launch(
            headless=True,
            args=[
                "--disable-blink-features=AutomationControlled",
                "--no-sandbox",
                "--disable-dev-shm-usage",
            ]
        )

        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
            locale="en-US",
            timezone_id="America/New_York",
            color_scheme="light",
        )

        # Apply all stealth patches
        await context.add_init_script("""
            Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
            Object.defineProperty(navigator, 'plugins', {get: () => [1,2,3]});
            Object.defineProperty(navigator, 'languages', {get: () => ['en-US','en']});

            const origGetParameter = WebGLRenderingContext.prototype.getParameter;
            WebGLRenderingContext.prototype.getParameter = function(p) {
                if (p === 37445) return 'Intel Inc.';
                if (p === 37446) return 'Intel Iris OpenGL Engine';
                return origGetParameter.call(this, p);
            };
        """)

        page = await context.new_page()
        await page.goto(url, wait_until="networkidle")

        # Simulate human behavior
        await human_like_interaction(page)

        content = await page.content()
        await browser.close()
        return content
Enter fullscreen mode Exit fullscreen mode

Or Just Use an API

All of the above is a lot of work to maintain. ScraperAPI handles all detection bypass automatically — they keep up with the arms race so you do not have to.

Monitor your detection rates with ScrapeOps to know when your evasion techniques stop working.

Conclusion

Bot detection is a multi-layered system. No single technique catches every bot, and no single evasion defeats every detector. The key is understanding what each layer checks and ensuring your scraper passes all of them. For production use, a managed proxy service is almost always more cost-effective than maintaining your own stealth infrastructure.

Top comments (0)