When your Playwright script gets blocked after 10 requests, it's usually not your IP.
Modern anti-bot systems like Cloudflare, PerimeterX, and DataDome use browser fingerprinting to identify automated traffic. They collect 40-60 signals from your browser and score them against a model trained on millions of real user sessions.
Here's what they're actually checking.
Tier 1: The Instant Giveaways
These signals are binary — you either pass or fail:
navigator.webdriver
When Chrome is launched by WebDriver (Playwright, Selenium, Puppeteer), it sets navigator.webdriver = true. Every serious anti-bot system checks this first.
Fix: Patch it at the JS level before any page code runs:
await page.addInitScript(() => {
Object.defineProperty(navigator, 'webdriver', { get: () => false });
});
Note: This alone isn't enough — other signals will still expose you.
Headless Chrome Detection
Headless Chrome leaks its nature through:
-
navigator.pluginslength = 0 (real browsers have 3-7 plugins) -
navigator.languagesnot set correctly - Missing Chrome-specific objects (
window.chrome,window.chrome.runtime) -
screen.colorDepth= 24 by default in headless (real users have varied values)
Fix: Use playwright-stealth or puppeteer-extra-plugin-stealth to patch all of these.
Tier 2: Behavioral Signals
These are harder to fake because they require time and interaction:
Mouse Movement Entropy
Real users don't move their mouse in straight lines. Anti-bots measure:
- Path curvature (Bezier curves vs straight lines)
- Velocity variation (acceleration/deceleration)
- Micro-movements (natural hand tremor)
- Time-on-page before first interaction
Fix: Libraries like ghost-cursor generate realistic mouse paths. Still imperfect — trained models can distinguish synthetic curves.
Click Timing
Real users don't click buttons at exactly 0ms after page load. Anti-bots check:
- Time between page load and first interaction
- Time between mousedown and mouseup events
- Click coordinates (too perfectly centered = bot)
Fix: Add random delays (300-2000ms), use page.mouse.move() to approach the element before clicking.
Scroll Patterns
Real users scroll, stop, read, scroll more. Scripts that scroll to 100% of the page in one smooth motion are flagged.
Fix: Simulate reading — scroll to 30%, pause 2-8 seconds, scroll to 60%, pause, etc.
Tier 3: Hardware & Environment Signals
These are the hardest to fake:
WebGL Fingerprint
WebGL renders a test image using your GPU. The exact output varies by GPU model, driver version, and OS. Anti-bots hash this and compare against known real device profiles.
Virtual machines and cloud environments render WebGL differently than physical hardware — this is a strong signal.
Fix: Inject a static WebGL hash that matches a common real GPU. Libraries like FingerprintJS Spoofing can help, but cloud VM rendering is hard to fully mask.
Canvas Fingerprint
Similar to WebGL — renders text and shapes, hashes the result. Cloud VMs use Mesa/LLVMpipe rendering which produces distinctive outputs.
Fix: Override HTMLCanvasElement.prototype.getContext to return modified pixel data.
Audio Fingerprint
Processes audio through the Web Audio API. The output varies by OS/hardware and is used as a stable identifier.
Fix: Override AudioContext.prototype.createOscillator and related methods.
Tier 4: Network Signals
TLS Fingerprint (JA3)
Even before your browser sends an HTTP request, the TLS handshake reveals your browser type. JA3 hashing of the TLS ClientHello is a strong signal — Python's requests library has a completely different JA3 signature than Chrome.
Fix: Use curl_cffi instead of requests — it impersonates Chrome's TLS fingerprint.
HTTP/2 Fingerprint
HTTP/2 connection parameters (header ordering, stream priorities, window sizes) vary by browser. Python's httpx has a different HTTP/2 fingerprint than Chrome.
Fix: curl_cffi also handles HTTP/2 fingerprint spoofing.
The Practical Checklist
For light scraping (no anti-bot, just rate limits):
-
curl_cffiwith Chrome impersonation - Rotate IPs
- Random delays between requests
For moderate anti-bot (basic fingerprinting):
- Playwright + playwright-stealth
- Realistic user agents
- Random delays + scroll simulation
For heavy anti-bot (Cloudflare, PerimeterX):
- Playwright + stealth + ghost-cursor
- Residential proxy rotation
- Real browser profiles with history
- Budget for CAPTCHA solving services ($2-5/1K)
For enterprise anti-bot (DataDome, Kasada, Akamai Bot Manager):
- Specialized tools (zenrows, scrapingbee, Brightdata Scraping Browser)
- Or use a pre-built actor that handles this for you
When to Stop Fighting Anti-Bot
Beyond a certain point, bypassing enterprise-grade anti-bot systems costs more than using a pre-built solution.
For most scraping use cases — B2B contacts, product prices, job listings, social stats — there are pre-built Apify actors that already handle fingerprinting and proxy rotation. The Apify Scrapers Bundle ($29) includes 30 actors covering the most common data types, each configured with appropriate anti-detection settings.
Build the fingerprinting defense yourself when you have a unique target site. Use pre-built actors when the data type is common.
What anti-bot system are you fighting? Drop the domain in the comments and I'll tell you which tier it falls in and the practical fix.
Top comments (0)