Alex Chen

Posted on Mar 23

hCaptcha vs hCaptcha Enterprise: What Changes for Your Scraper

If you've scraped enough sites, you've seen hCaptcha pop up — the "select all images with bicycles" challenge that replaced reCAPTCHA on many platforms. But not all hCaptcha implementations are the same.

Let's break down standard vs Enterprise, what it means for your automation, and how to handle both.

How hCaptcha Works (The Basics)

hCaptcha serves two purposes:

Bot detection — fingerprinting, behavioral analysis
Data labeling — those image challenges actually train ML models

When a page loads hCaptcha, it:

Loads a JS script from hcaptcha.com
Renders an iframe with data-sitekey
Collects browser signals (mouse movement, screen size, WebGL hash)
Decides: show a challenge or pass silently

<div class="h-captcha" data-sitekey="abc123-site-key"></div>
<script src="https://js.hcaptcha.com/1/api.js" async defer></script>

Standard vs Enterprise: Key Differences

Feature	Standard	Enterprise
Challenge type	Image selection	Image + risk scoring
Passive mode	Limited	Full (invisible)
Difficulty	Fixed	Adaptive per-session
Response time	~3s solve	~5-8s (more signals)
`rqdata` field	Not used	Required for some sites

The biggest difference: Enterprise uses adaptive difficulty. If your session looks suspicious (headless browser, datacenter IP, no mouse movement), the challenges get harder — more rounds, obscure image categories.

Detecting Which Version You're Facing

Check the network requests:

import requests
from bs4 import BeautifulSoup

resp = requests.get(target_url)
soup = BeautifulSoup(resp.text, "html.parser")

# Find hCaptcha element
hcaptcha_div = soup.find("div", class_="h-captcha")
if hcaptcha_div:
    sitekey = hcaptcha_div.get("data-sitekey")

    # Check for enterprise indicators
    script_tags = soup.find_all("script", src=True)
    is_enterprise = any(
        "enterprise" in s["src"] 
        for s in script_tags 
        if "hcaptcha" in s["src"]
    )

    print(f"Sitekey: {sitekey}")
    print(f"Enterprise: {is_enterprise}")

Enterprise sites often load from js.hcaptcha.com/1/api.js?endpoint=enterprise or include additional config in the page source.

Solving Standard hCaptcha

For standard hCaptcha, the flow is straightforward:

import httpx

def solve_hcaptcha(sitekey: str, page_url: str) -> str:
    client = httpx.Client(base_url="https://www.passxapi.com")

    # Step 1: Submit the task
    task = client.post("/api/v1/task", json={
        "type": "hcaptcha",
        "sitekey": sitekey,
        "pageurl": page_url
    }).json()

    task_id = task["task_id"]

    # Step 2: Poll for result
    import time
    for _ in range(60):
        result = client.get(f"/api/v1/task/{task_id}").json()
        if result["status"] == "completed":
            return result["token"]
        time.sleep(2)

    raise TimeoutError("Solve took too long")

Handling Enterprise hCaptcha

Enterprise requires extra parameters. The critical one is rqdata — a site-specific payload that the hCaptcha script generates. Without it, your token gets rejected.

def solve_hcaptcha_enterprise(
    sitekey: str, 
    page_url: str,
    rqdata: str = None
) -> str:
    client = httpx.Client(base_url="https://www.passxapi.com")

    payload = {
        "type": "hcaptcha",
        "sitekey": sitekey,
        "pageurl": page_url,
        "enterprise": True
    }

    # rqdata is optional but improves success rate
    if rqdata:
        payload["rqdata"] = rqdata

    task = client.post("/api/v1/task", json=payload).json()
    task_id = task["task_id"]

    for _ in range(90):  # Enterprise takes longer
        result = client.get(f"/api/v1/task/{task_id}").json()
        if result["status"] == "completed":
            return result["token"]
        time.sleep(2)

    raise TimeoutError("Enterprise solve timeout")

Extracting rqdata

Some sites embed rqdata in their JavaScript. You can intercept it:

from playwright.sync_api import sync_playwright

def extract_rqdata(url: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        rqdata = None

        def handle_request(request):
            nonlocal rqdata
            if "hcaptcha.com/checksiteconfig" in request.url:
                # rqdata is sent as a query parameter
                from urllib.parse import urlparse, parse_qs
                params = parse_qs(urlparse(request.url).query)
                if "rqdata" in params:
                    rqdata = params["rqdata"][0]

        page.on("request", handle_request)
        page.goto(url)
        page.wait_for_timeout(5000)
        browser.close()

        return rqdata

Injecting the Solved Token

Once you have the token, inject it the same way for both versions:

from playwright.sync_api import Page

def inject_hcaptcha_token(page: Page, token: str):
    page.evaluate(f"""() => {{
        // Set the response textarea
        const textarea = document.querySelector(
            'textarea[name="h-captcha-response"]'
        );
        if (textarea) textarea.value = '{token}';

        // Also set the global callback if it exists
        const iframe = document.querySelector(
            'iframe[data-hcaptcha-widget-id]'
        );
        if (iframe) {{
            const widgetId = iframe.getAttribute(
                'data-hcaptcha-widget-id'
            );
            if (window.hcaptcha) {{
                // Trigger the callback
                const evt = new Event('hcaptcha-success');
                document.dispatchEvent(evt);
            }}
        }}
    }}""")

Real-World Pattern: Login Form with hCaptcha

Putting it all together for a login page:

import httpx
from playwright.sync_api import sync_playwright

def login_with_hcaptcha(url, username, password):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto(url)

        # Fill credentials
        page.fill('input[name="username"]', username)
        page.fill('input[name="password"]', password)

        # Extract sitekey
        sitekey = page.get_attribute(
            ".h-captcha", "data-sitekey"
        )

        # Solve via API
        token = solve_hcaptcha(sitekey, url)

        # Inject and submit
        inject_hcaptcha_token(page, token)
        page.click('button[type="submit"]')

        page.wait_for_url("**/dashboard**", timeout=10000)
        print("Login successful!")
        browser.close()

Tips for Better Success Rates

Use residential proxies — datacenter IPs get harder challenges
Send real User-Agent — match the browser you claim to be
Include rqdata for Enterprise — even if not strictly required, it helps
Don't reuse tokens — each token is single-use and expires in ~120 seconds
Monitor solve times — if average goes above 10s, your IP might be flagged

Wrapping Up

hCaptcha (both standard and Enterprise) is solvable with the right approach. The key differences are adaptive difficulty and the rqdata parameter.

For a Python client that handles both versions, check out passxapi-python — it abstracts away the polling and parameter differences so you can focus on your actual scraping logic.

Have questions about handling specific hCaptcha implementations? Drop a comment below.

DEV Community