I've spent the last 18 months scraping sites protected by all three major enterprise bot detection systems. Here's an honest technical assessment of what each actually does, where they fail, and what bypasses still work in April 2026.
This is for security researchers, penetration testers, and developers building compliant data pipelines who need to understand what they're dealing with.
How Enterprise Bot Detection Works (The 2026 Stack)
All three systems — Imperva (formerly Incapsula), PerimeterX (now HUMAN Security), and DataDome — operate on the same core architecture:
- JavaScript challenge: A script runs in the browser collecting behavioral signals
- Fingerprinting: TLS fingerprint, HTTP/2 settings, browser API enumeration
- Behavioral scoring: Mouse movement, scroll patterns, timing between requests
- Risk score → decision: Block, challenge (CAPTCHA), or allow
- Reputation lookups: IP reputation, ASN, datacenter detection
The difference between them is which of these layers they invest most heavily in.
Imperva (Incapsula) — 2026 Assessment
Strength: IP reputation and ASN blocking. Imperva maintains one of the most comprehensive IP blocklists in the industry. If you're using datacenter IPs — AWS, GCP, Azure, DigitalOcean — you'll get blocked on first request, before any JavaScript runs.
Weakness: JavaScript challenge quality. Imperva's JS fingerprinting is less sophisticated than PerimeterX. It primarily collects: navigator.webdriver, canvas fingerprint, WebGL renderer, and basic timing signals. It does not collect detailed font enumeration or audio API fingerprinting at the level HUMAN does.
What works against it:
- Residential proxies (any reputable provider) — the IP blocklist is the main gate
- Standard Playwright with webdriver flag patched (
Object.defineProperty(navigator, 'webdriver', {get: () => undefined})) - Slowing request rate to under 20 req/min per IP
What doesn't work:
- Any datacenter IP (blocked at TCP level by Imperva's edge network)
- Selenium default config (webdriver flag detected)
Current bypass difficulty: Medium. Residential proxy + patched Playwright handles ~85% of Imperva-protected targets.
PerimeterX / HUMAN Security — 2026 Assessment
Strength: Behavioral analysis and JavaScript sophistication. HUMAN's sensor data collection is the most comprehensive of the three. Their JS agent collects 200+ signals including:
- Keystroke timing and pressure patterns (when form fields exist)
- Mouse movement velocity and curvature
- Scroll behavior patterns
- WebRTC leak detection
- Battery API status
- Audio context fingerprinting
- Canvas font measurement (not just rendering)
- CSS media query responses
- Hardware concurrency vs actual CPU behavior
Weakness: The behavioral model requires time to score. Sites with HUMAN protection often let through first requests (score = neutral) and only block after 2-3 pages when behavioral signals accumulate. This means: if you rotate sessions after each URL, you can often stay under the block threshold.
What works against it:
- Session rotation: new context per URL (slower but effective)
- Playwright-extra with
puppeteer-extra-plugin-stealthequivalents - Human-like timing: random delays 1.5s-4s between actions
- Residential proxies (HUMAN also blocks datacenter, but their behavioral layer still catches bad actors on residential)
What doesn't work:
- Headless Chrome with default settings (detected via AudioContext and font metrics)
- High-speed scraping even on residential IPs (behavioral patterns flag it)
- Any consistent timing pattern (even random delays if the RNG distribution is uniform)
Current bypass difficulty: High. HUMAN is the hardest of the three for most scraping use cases.
DataDome — 2026 Assessment
Strength: Real-time ML scoring with very low latency. DataDome processes bot decisions in under 2ms at the edge, which means there's no "warm-up" window you can exploit. Every request is scored in real time.
Weakness: Their ML model is trained primarily on behavior, not environment. If you can provide genuinely human-like behavioral signals — even in automation — their environment checks (TLS, webdriver, etc.) are less exhaustive than HUMAN's.
What works against it:
- Playwright with full stealth patching (they test fewer environment flags)
- Consistent realistic typing speed if forms are involved (DataDome tracks form interaction heavily)
- Rotating user agents with matching Accept-Language and platform signals
What doesn't work:
- Default Playwright/Selenium configs
- Rapid-fire requests (DataDome's rate limiting triggers faster than competitors)
Current bypass difficulty: Medium-High. More bypassable than HUMAN if your behavioral simulation is good, but stricter than basic Imperva.
TLS Fingerprinting — The Layer All Three Use
One layer none of the three prominently advertise but all use: TLS fingerprinting (JA3/JA4).
Python's requests library produces a distinctive TLS handshake. So does vanilla aiohttp. Modern bot detection systems fingerprint the TLS Client Hello to identify non-browser clients before any HTTP layer is reached.
The fix: use curl_cffi which impersonates Chrome's TLS fingerprint:
from curl_cffi import requests
session = requests.Session(impersonate="chrome124")
response = session.get("https://target-site.com", headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
})
This handles Imperva and basic DataDome instances without needing a browser at all. For HUMAN, you still need a real Playwright context because their JavaScript challenge requires actual browser APIs.
The Proxy Question
For all three systems:
| Proxy type | Imperva | PerimeterX/HUMAN | DataDome |
|---|---|---|---|
| Datacenter | ❌ Blocked | ❌ Blocked | ❌ Blocked |
| Datacenter (rotated) | ❌ | ❌ | ⚠️ Some pass |
| Residential | ✅ Works | ✅ Needed | ✅ Works |
| Mobile (4G/LTE) | ✅ Best | ✅ Best | ✅ Best |
Mobile residential proxies have the highest bypass rates across all three because:
- Mobile IPs are almost never on blocklists
- 4G NAT means many users share a single IP (unusual traffic less suspicious)
- Mobile User Agents + mobile IPs are internally consistent
Current State of Captcha Solving
When bot detection escalates to a visual CAPTCHA (Cloudflare Turnstile, hCaptcha, reCAPTCHA v3), the options in 2026:
- 2captcha / Anti-Captcha: Human solvers, 5-30 second delay, $1-2 per 1000 solves. Still works.
- CapSolver: AI-based, faster (2-8s), more expensive but Turnstile-specific models are better than human solvers for accuracy.
- Nopecha: Browser extension approach that intercepts and solves in-browser. Works for some Cloudflare variants.
reCAPTCHA v3 doesn't produce visible challenges — it scores silently. If your score is too low, the site returns a challenge page or blocks. The only fix for reCAPTCHA v3: better behavioral simulation, not captcha solving.
Practical Decision Framework
Which approach to use for a given target:
-
Try
curl_cffifirst — if the site uses Imperva or basic DataDome, this may work without a browser - Add residential proxies — if TLS fingerprint passes but IP is blocked
- Upgrade to Playwright + stealth — if JS challenge is detected
- Session rotation + human timing — if passing individual pages but getting blocked after 3-5 pages (HUMAN behavioral scoring)
- Reconsider the target — if the site is HUMAN-protected with strict behavioral scoring, the cost of bypass may exceed the value of the data
Production-Ready Scrapers With Anti-Bot Built In
Each of the 35 actors in the bundle handles the anti-bot layer for its specific target — rotating proxies, patched browser context, and rate limiting baked in. No manual configuration needed.
Apify Scrapers Bundle — €29 — includes contact info, SERP, LinkedIn, Amazon, TikTok, Reddit, and 29 more.
Each actor uses PAY_PER_EVENT pricing: €0.002–€0.005 per result.
Top comments (0)