Vhub Systems

Posted on Apr 3

How to Bypass Akamai Bot Detection in 2026: curl-cffi + Residential Proxies

#webscraping #python #botdetection #akamai

How to Bypass Akamai Bot Detection in 2026: curl-cffi + Residential Proxies

If you've hit an Akamai-protected site and watched your scraper go from 200 OK to a wall of CAPTCHAs and 403s in under 30 seconds, you already know: Akamai is not Cloudflare.

Cloudflare checks your TLS handshake and browser cookies. Akamai runs sensor.js — a 50KB+ JavaScript fingerprinting engine that inspects your browser's GPU rendering, audio context, WebRTC stack, and hundreds of passive signals to assign you a bot score before a single HTTP request completes.

Standard tools fail hard:

Selenium with a vanilla Chrome profile: ~80% detection rate against Akamai in recent tests
Python requests with a User-Agent header: ~100% detection within the first 5 requests
Playwright default: Still gets flagged at high volumes

The combination that actually works: curl-cffi with Chrome impersonation plus fresh residential proxies. Here's the full picture.

Why Akamai Is Harder Than Cloudflare

Cloudflare's primary detection vector is the TLS fingerprint (JA3/JA4 hash) and whether your browser completes a challenge correctly. Fix the TLS fingerprint, and you're largely through.

Akamai layers multiple systems:

sensor.js behavioral fingerprinting: Executes on page load, probes browser internals (canvas rendering speed, WebGL vendor, font enumeration, navigator.hardwareConcurrency, etc.)
Passive client hints: Reads HTTP/2 and HTTP/3 connection characteristics your client sends
IP velocity scoring: Tracks requests per IP across their network, not just the target site
Session integrity checks: Validates that subsequent requests from the same session behave like a real browser

The key difference: Akamai detects who you are (browser fingerprint + IP history), not just what you sent (one request's headers).

What Actually Works

1. curl-cffi with Chrome Impersonation

The curl-cffi library uses libcurl's impersonation mode to replicate Chrome's TLS fingerprint exactly — including HTTP/2 settings, cipher suites, and ALPN protocols. Combined with browser-like default headers, it slips past Akamai's TLS checks.

from curl_cffi.requests import Session

def create_browser_session(proxy=None):
    """Create a curl-cffi session that impersonates Chrome 124."""
    session = Session(
        impersonate='chrome124',
        timeout=30,
        proxies={'http': proxy, 'https': proxy} if proxy else None,
        headers={
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.9',
            'Accept-Encoding': 'gzip, deflate, br',
            'Sec-Fetch-Dest': 'document',
            'Sec-Fetch-Mode': 'navigate',
            'Sec-Fetch-Site': 'none',
            'Sec-Fetch-User': '?1',
        }
    )
    return session

Test it against a site running Akamai:

session = create_browser_session()
response = session.get('https://www.target-akamai-site.com/')
print(f'Status: {response.status_code}')
print(f'URL: {response.url}')
# If redirected to CAPTCHA or blocked page, status_code will be 200 but content will have bot indicators
if 'captcha' in response.text.lower() or 'blocked' in response.text.lower():
    print('DETECTED — try a different proxy')
else:
    print('Access OK')

On simple Akamai-protected sites (no sensor.js challenge), this alone gets you through 80-90% of the time.

2. Residential Proxy Rotation

For pages where sensor.js fires and scores you as a bot, you need:

Residential proxies (not data center — Akamai flags DC IPs aggressively)
Fresh IP per request or per session, not per page batch
Geographic consistency — rotating through IPs in the same country as your target site's expected audience

import random
import time
from curl_cffi.requests import Session

# Example proxy list (replace with your provider's credentials)
PROXIES = [
    'http://user1:pass1@residential-proxy-1.example.com:8080',
    'http://user2:pass2@residential-proxy-2.example.com:8080',
    'http://user3:pass3@residential-proxy-3.example.com:8080',
]

def get_fresh_session():
    proxy = random.choice(PROXIES)
    session = Session(
        impersonate='chrome124',
        proxies={'http': proxy, 'https': proxy},
        timeout=30,
    )
    return session

def scrape_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        session = get_fresh_session()
        resp = session.get(url)

        if resp.status_code == 200 and 'captcha' not in resp.text.lower():
            return resp

        # Rotate proxy on detection
        time.sleep(1)
    return None

Provider options (no affiliation): Bright Data, Oxylabs, SmartProxy. Expect to pay $15-30/GB for residential traffic. A single 5-page scrape run might use 10-50MB depending on the site.

3. The Selenium/Playwright Fallback (When curl-cffi Isn't Enough)

If curl-cffi keeps getting flagged even with fresh proxies, the site is running sensor.js with active behavioral checks. In that case, you need a real browser — but configured properly:

from selenium import webdriver
from selenium_stealth import stealth
from selenium.webdriver.chrome.options import Options

def create_stealth_driver(proxy=None):
    options = Options()
    options.add_argument('--headless=new')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('--disable-gpu')
    options.add_argument('--window-size=1920,1080')

    if proxy:
        options.add_argument(f'--proxy-server={proxy}')

    driver = webdriver.Chrome(options=options)

    stealth(driver,
        languages=['en-US', 'en'],
        vendor='Google Inc.',
        webgl_vendor='Intel Inc.',
        renderer='Intel Iris OpenGL Engine',
        fix_hairline=True,
    )
    return driver

# Usage
driver = create_stealth_driver('http://user:pass@proxy:8080')
driver.get('https://www.target-akamai-site.com/')
time.sleep(3)  # Let sensor.js run
html = driver.page_source
driver.quit()

The selenium-stealth package patches the most common navigator.webdriver and automation flags, but detection rates vary by Akamai configuration. In practice: Selenium as a fallback gets you through maybe 60-70% of the time where curl-cffi fails, at 10-20x the latency cost.

What Doesn't Work

Method	Detection Rate	Notes
`requests` + User-Agent	~100%	Dead in seconds
Selenium (vanilla)	~80%	`navigator.webdriver=true` flags immediately
Playwright (default)	~60-70%	Better, but still detectable at volume
curl-cffi + datacenter proxy	~50-70%	TLS passes but IP is flagged
curl-cffi + residential proxy	~10-20%	Works on most sites without JS challenges

Handling Dynamic Content and JavaScript Challenges

Many Akamai-protected pages render critical content after initial page load via JavaScript. For these:

Check if the data is available in the initial HTML — look at the raw response before assuming JS is needed
If JS is required, use Selenium with stealth + residential proxy
Consider a headless browser API (e.g., Apify actors built for this) instead of running your own — they handle the browser infrastructure and proxy rotation at scale

# Quick check: is data in the initial HTML?
session = create_browser_session()
resp = session.get(url)
if '<script' in resp.text and 'akamai' in resp.text.lower():
    print("Dynamic content — Selenium recommended")
else:
    print("Static page — curl-cffi sufficient")

Complete Working Example

import random
import time
import json
from curl_cffi.requests import Session
from bs4 import BeautifulSoup

PROXIES = [
    'http://user:pass@residential-proxy-1.example.com:8080',
    'http://user:pass@residential-proxy-2.example.com:8080',
]

IMPERSONATE = 'chrome124'

def scrape_akamai_page(url, use_proxy=True):
    """Scrape an Akamai-protected page with automatic proxy rotation."""
    session = Session(
        impersonate=IMPERSONATE,
        proxies={'http': random.choice(PROXIES), 'https': random.choice(PROXIES)} if use_proxy else None,
        timeout=30,
        headers={
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Language': 'en-US,en;q=0.9',
            'Accept-Encoding': 'gzip, deflate, br',
        }
    )

    resp = session.get(url)

    # Check for bot detection
    if resp.status_code == 200:
        soup = BeautifulSoup(resp.text, 'html.parser')
        title = soup.title.string if soup.title else ''
        if any(kw in resp.text.lower() for kw in ['captcha', 'blocked', 'access denied', 'security check']):
            return {'success': False, 'error': 'bot_detected', 'status': resp.status_code}
        return {'success': True, 'status': resp.status_code, 'title': title, 'content': resp.text[:500]}

    return {'success': False, 'error': f'http_{resp.status_code}'}

# Test against a known Akamai site
result = scrape_akamai_page('https://www.example-akamai-site.com/')
print(json.dumps(result, indent=2))

Key Takeaways

Start with curl-cffi — it's 10-50x faster than Selenium and passes TLS checks that requests cannot
Residential proxies are mandatory for serious Akamai scraping — datacenter IPs get flagged quickly
Selenium is a fallback, not a primary tool — use it when curl-cffi gets blocked after multiple proxy rotations
Monitor your bot score — if you're hitting CAPTCHAs on 2+ consecutive requests, rotate your proxy and add delays
Consider managed solutions if the cat-and-mouse game costs more than your time is worth

The arms race is continuous. Akamai updates sensor.js regularly. Test against your specific target — what works on one site may need tuning on another.

Related Tools

Pre-built actors for this use case:

n8n AI Automation Pack ($39) — 5 production-ready workflows

Skip the setup

Pre-built scrapers with Akamai bypass built in:

Apify Scrapers Bundle — $29 one-time

35+ actors, instant download. Handle anti-bot automatically.

DEV Community

How to Bypass Akamai Bot Detection in 2026: curl-cffi + Residential Proxies

How to Bypass Akamai Bot Detection in 2026: curl-cffi + Residential Proxies

Why Akamai Is Harder Than Cloudflare

What Actually Works

1. curl-cffi with Chrome Impersonation

2. Residential Proxy Rotation

3. The Selenium/Playwright Fallback (When curl-cffi Isn't Enough)

What Doesn't Work

Handling Dynamic Content and JavaScript Challenges

Complete Working Example

Key Takeaways

Related Tools

n8n AI Automation Pack ($39) — 5 production-ready workflows

Skip the setup

Top comments (0)