You launch your Playwright scraper in headless mode. It works on page 1. By page 3, you're staring at a CAPTCHA. Switch to headed mode — no CAPTCHA for 50 pages.
What's going on? Browser fingerprinting. Anti-bot systems don't just check your IP — they analyze dozens of browser properties to decide if you're human. Let's break down exactly what they check and how to handle it.
What Is Browser Fingerprinting?
Every browser exposes properties through JavaScript APIs. Anti-bot services collect these into a "fingerprint" — a unique identifier for your browser session:
// Just a sample of what gets collected
{
userAgent: navigator.userAgent,
platform: navigator.platform,
languages: navigator.languages,
hardwareConcurrency: navigator.hardwareConcurrency,
deviceMemory: navigator.deviceMemory,
screenResolution: [screen.width, screen.height],
colorDepth: screen.colorDepth,
timezone: Intl.DateTimeFormat().resolvedOptions().timeZone,
webglVendor: getWebGLVendor(),
webglRenderer: getWebGLRenderer(),
canvas: getCanvasFingerprint(),
audioContext: getAudioFingerprint(),
fonts: getInstalledFonts(),
plugins: navigator.plugins.length,
touchSupport: navigator.maxTouchPoints,
}
How Headless Chrome Gets Detected
Headless Chrome has several telltale signs:
1. The navigator.webdriver Flag
// Headless Chrome:
navigator.webdriver // true ← BUSTED
// Real Chrome:
navigator.webdriver // undefined or false
2. Missing Plugins
// Real Chrome: has PDF viewer, etc.
navigator.plugins.length // 3-5
// Headless Chrome:
navigator.plugins.length // 0 ← suspicious
3. WebGL Renderer
// Real Chrome:
getWebGLRenderer() // "ANGLE (NVIDIA GeForce...)"
// Headless Chrome:
getWebGLRenderer() // "Google SwiftShader" ← dead giveaway
4. Chrome Object
// Real Chrome:
window.chrome // {runtime: {...}, ...}
// Headless Chrome:
window.chrome // undefined ← missing
5. Permissions API Behavior
// Real Chrome:
navigator.permissions.query({name: "notifications"})
.then(p => p.state) // "prompt" or "denied"
// Headless Chrome sometimes:
// Throws or returns unexpected values
Detecting These Leaks in Your Scraper
Before trying to fix things, find out what's leaking:
# fingerprint_audit.py
from playwright.sync_api import sync_playwright
DETECTION_SCRIPT = """
() => {
const results = {};
// Test 1: webdriver flag
results.webdriver = navigator.webdriver;
// Test 2: plugins
results.pluginCount = navigator.plugins.length;
// Test 3: languages
results.languages = navigator.languages;
// Test 4: chrome object
results.hasChrome = !!window.chrome;
results.hasChromeRuntime = !!(
window.chrome && window.chrome.runtime
);
// Test 5: WebGL
try {
const canvas = document.createElement('canvas');
const gl = canvas.getContext('webgl');
const debugInfo = gl.getExtension(
'WEBGL_debug_renderer_info'
);
results.webglVendor = gl.getParameter(
debugInfo.UNMASKED_VENDOR_WEBGL
);
results.webglRenderer = gl.getParameter(
debugInfo.UNMASKED_RENDERER_WEBGL
);
} catch(e) {
results.webglError = e.message;
}
// Test 6: Permissions
results.permissionsAPI = !!navigator.permissions;
// Test 7: Screen dimensions
results.screen = {
width: screen.width,
height: screen.height,
availWidth: screen.availWidth,
availHeight: screen.availHeight,
colorDepth: screen.colorDepth,
};
// Test 8: Hardware
results.hardwareConcurrency = navigator.hardwareConcurrency;
results.deviceMemory = navigator.deviceMemory;
results.maxTouchPoints = navigator.maxTouchPoints;
// Test 9: Headless indicators
results.userAgent = navigator.userAgent;
results.platform = navigator.platform;
return results;
}
"""
def audit_fingerprint():
with sync_playwright() as p:
# Test headless
browser = p.chromium.launch(headless=True)
page = browser.new_page()
headless_fp = page.evaluate(DETECTION_SCRIPT)
browser.close()
# Compare with headed
browser = p.chromium.launch(headless=False)
page = browser.new_page()
headed_fp = page.evaluate(DETECTION_SCRIPT)
browser.close()
# Show differences
print("=== Fingerprint Differences ===")
for key in headless_fp:
if headless_fp[key] != headed_fp.get(key):
print(f" {key}:")
print(f" Headless: {headless_fp[key]}")
print(f" Headed: {headed_fp.get(key)}")
audit_fingerprint()
Fixing the Fingerprint Leaks
Approach 1: Playwright Stealth (Quick Fix)
Use playwright-stealth\ to patch common detection points:
# pip install playwright-stealth
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
# Apply stealth patches
stealth_sync(page)
page.goto("https://target-site.com")
# Now navigator.webdriver = false,
# plugins are spoofed, etc.
Approach 2: Manual Patches (More Control)
from playwright.sync_api import sync_playwright
def apply_stealth(page):
"""Apply individual stealth patches."""
# 1. Remove webdriver flag
page.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
""")
# 2. Fake plugins
page.add_init_script("""
Object.defineProperty(navigator, 'plugins', {
get: () => {
const plugins = [
{
name: 'Chrome PDF Plugin',
description: 'Portable Document Format',
filename: 'internal-pdf-viewer',
length: 1
},
{
name: 'Chrome PDF Viewer',
description: '',
filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai',
length: 1
},
{
name: 'Native Client',
description: '',
filename: 'internal-nacl-plugin',
length: 2
}
];
plugins.length = 3;
return plugins;
}
});
""")
# 3. Fake chrome object
page.add_init_script("""
window.chrome = {
runtime: {
onConnect: null,
onMessage: null,
},
loadTimes: function() {},
csi: function() {},
app: {}
};
""")
# 4. Fix permissions query
page.add_init_script("""
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (params) => {
if (params.name === 'notifications') {
return Promise.resolve({
state: Notification.permission
});
}
return originalQuery(params);
};
""")
# 5. Fake languages
page.add_init_script("""
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
""")
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
args=[
'--disable-blink-features=AutomationControlled',
]
)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent=(
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36"
),
locale="en-US",
timezone_id="America/New_York",
)
page = context.new_page()
apply_stealth(page)
page.goto("https://target-site.com")
Approach 3: CDP Connection to Real Browser
The most reliable approach — connect to a real browser instance:
from playwright.sync_api import sync_playwright
# Launch a real Chrome with remote debugging
# chrome --remote-debugging-port=9222
with sync_playwright() as p:
browser = p.chromium.connect_over_cdp(
"http://localhost:9222"
)
context = browser.contexts[0]
page = context.new_page()
# This IS a real browser — nothing to patch
page.goto("https://target-site.com")
When Stealth Isn't Enough: Solving CAPTCHAs
Even with perfect fingerprinting, some sites will still show CAPTCHAs — especially on:
- First visit from a new IP
- Login/signup flows
- After N requests in a session
- High-value pages (checkout, pricing)
That's when you need a solving service:
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
import httpx
def scrape_with_stealth_and_solving(url: str):
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
stealth_sync(page)
page.goto(url)
# Check if we got a CAPTCHA despite stealth
captcha_el = page.query_selector(
'[class*="captcha"], '
'[data-sitekey], '
'.g-recaptcha, '
'.h-captcha, '
'.cf-turnstile'
)
if captcha_el:
print("CAPTCHA detected despite stealth — solving...")
sitekey = (
captcha_el.get_attribute("data-sitekey")
or extract_sitekey(page.content())
)
captcha_type = detect_type(captcha_el)
# Solve via API
token = solve_captcha(
captcha_type=captcha_type,
sitekey=sitekey,
url=url
)
# Inject token
page.evaluate(f"""() => {{
const textarea = document.querySelector(
'textarea[name*="captcha-response"]'
);
if (textarea) textarea.value = '{token}';
}}""")
page.click('button[type="submit"]')
page.wait_for_load_state("networkidle")
# Now scrape the actual content
data = extract_data(page)
browser.close()
return data
Fingerprint Consistency Checklist
When setting up your scraper, make sure these are all consistent:
# ✅ Consistent setup
context = browser.new_context(
# Match the User-Agent to the viewport/platform
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)...",
viewport={"width": 1920, "height": 1080}, # Desktop
locale="en-US",
timezone_id="America/New_York",
# Screen size should match viewport
screen={"width": 1920, "height": 1080},
# Match color scheme to majority of users
color_scheme="light",
)
# ❌ Inconsistent (will get flagged)
context = browser.new_context(
user_agent="...iPhone...", # Says mobile
viewport={"width": 1920, ...}, # But desktop viewport!
locale="zh-CN", # Chinese locale
timezone_id="America/New_York", # But US timezone!
)
Testing Your Stealth Setup
Run your browser against common detection sites:
DETECTION_SITES = [
"https://bot.sannysoft.com/",
"https://arh.antoinevastel.com/bots/areyouheadless",
"https://infosimples.github.io/detect-headless/",
"https://fingerprintjs.github.io/fingerprintjs/",
]
def test_stealth(page):
for site in DETECTION_SITES:
page.goto(site)
page.wait_for_timeout(3000)
page.screenshot(
path=f"stealth_test_{site.split('/')[2]}.png"
)
print(f"Screenshotted: {site}")
The Decision Tree
Start
├─ Can you use a real browser (CDP)?
│ └─ Yes → Best stealth, highest resource cost
├─ Is the site moderately protected?
│ └─ playwright-stealth + good fingerprint config
├─ Is the site heavily protected?
│ └─ Stealth + CAPTCHA solving API for fallback
└─ Is it an API endpoint (no JS)?
└─ Skip browser entirely, just solve CAPTCHAs
via HTTP and submit tokens
Key Takeaways
- Headless detection is fingerprint-based — not just User-Agent
- WebGL renderer and navigator.webdriver are the biggest tells
- Consistency matters — a mobile UA with a desktop viewport is obvious
- Stealth plugins fix 80% — but determined anti-bot systems need more
- Always have a CAPTCHA solver fallback — perfect stealth is impossible
For solving CAPTCHAs when stealth isn't enough, check out passxapi-python — it handles reCAPTCHA, hCaptcha, Turnstile, and FunCaptcha with a unified API.
What's your stealth setup? Have you found detection vectors I didn't cover? Let me know in the comments.
Top comments (0)