DEV Community

Alex Chen
Alex Chen

Posted on

hCaptcha vs hCaptcha Enterprise: What Changes for Your Scraper

If you've scraped enough sites, you've seen hCaptcha pop up — the "select all images with bicycles" challenge that replaced reCAPTCHA on many platforms. But not all hCaptcha implementations are the same.

Let's break down standard vs Enterprise, what it means for your automation, and how to handle both.

How hCaptcha Works (The Basics)

hCaptcha serves two purposes:

  1. Bot detection — fingerprinting, behavioral analysis
  2. Data labeling — those image challenges actually train ML models

When a page loads hCaptcha, it:

  1. Loads a JS script from hcaptcha.com
  2. Renders an iframe with data-sitekey
  3. Collects browser signals (mouse movement, screen size, WebGL hash)
  4. Decides: show a challenge or pass silently
<div class="h-captcha" data-sitekey="abc123-site-key"></div>
<script src="https://js.hcaptcha.com/1/api.js" async defer></script>
Enter fullscreen mode Exit fullscreen mode

Standard vs Enterprise: Key Differences

Feature Standard Enterprise
Challenge type Image selection Image + risk scoring
Passive mode Limited Full (invisible)
Difficulty Fixed Adaptive per-session
Response time ~3s solve ~5-8s (more signals)
rqdata field Not used Required for some sites

The biggest difference: Enterprise uses adaptive difficulty. If your session looks suspicious (headless browser, datacenter IP, no mouse movement), the challenges get harder — more rounds, obscure image categories.

Detecting Which Version You're Facing

Check the network requests:

import requests
from bs4 import BeautifulSoup

resp = requests.get(target_url)
soup = BeautifulSoup(resp.text, "html.parser")

# Find hCaptcha element
hcaptcha_div = soup.find("div", class_="h-captcha")
if hcaptcha_div:
    sitekey = hcaptcha_div.get("data-sitekey")

    # Check for enterprise indicators
    script_tags = soup.find_all("script", src=True)
    is_enterprise = any(
        "enterprise" in s["src"] 
        for s in script_tags 
        if "hcaptcha" in s["src"]
    )

    print(f"Sitekey: {sitekey}")
    print(f"Enterprise: {is_enterprise}")
Enter fullscreen mode Exit fullscreen mode

Enterprise sites often load from js.hcaptcha.com/1/api.js?endpoint=enterprise or include additional config in the page source.

Solving Standard hCaptcha

For standard hCaptcha, the flow is straightforward:

import httpx

def solve_hcaptcha(sitekey: str, page_url: str) -> str:
    client = httpx.Client(base_url="https://www.passxapi.com")

    # Step 1: Submit the task
    task = client.post("/api/v1/task", json={
        "type": "hcaptcha",
        "sitekey": sitekey,
        "pageurl": page_url
    }).json()

    task_id = task["task_id"]

    # Step 2: Poll for result
    import time
    for _ in range(60):
        result = client.get(f"/api/v1/task/{task_id}").json()
        if result["status"] == "completed":
            return result["token"]
        time.sleep(2)

    raise TimeoutError("Solve took too long")
Enter fullscreen mode Exit fullscreen mode

Handling Enterprise hCaptcha

Enterprise requires extra parameters. The critical one is rqdata — a site-specific payload that the hCaptcha script generates. Without it, your token gets rejected.

def solve_hcaptcha_enterprise(
    sitekey: str, 
    page_url: str,
    rqdata: str = None
) -> str:
    client = httpx.Client(base_url="https://www.passxapi.com")

    payload = {
        "type": "hcaptcha",
        "sitekey": sitekey,
        "pageurl": page_url,
        "enterprise": True
    }

    # rqdata is optional but improves success rate
    if rqdata:
        payload["rqdata"] = rqdata

    task = client.post("/api/v1/task", json=payload).json()
    task_id = task["task_id"]

    for _ in range(90):  # Enterprise takes longer
        result = client.get(f"/api/v1/task/{task_id}").json()
        if result["status"] == "completed":
            return result["token"]
        time.sleep(2)

    raise TimeoutError("Enterprise solve timeout")
Enter fullscreen mode Exit fullscreen mode

Extracting rqdata

Some sites embed rqdata in their JavaScript. You can intercept it:

from playwright.sync_api import sync_playwright

def extract_rqdata(url: str) -> str:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()

        rqdata = None

        def handle_request(request):
            nonlocal rqdata
            if "hcaptcha.com/checksiteconfig" in request.url:
                # rqdata is sent as a query parameter
                from urllib.parse import urlparse, parse_qs
                params = parse_qs(urlparse(request.url).query)
                if "rqdata" in params:
                    rqdata = params["rqdata"][0]

        page.on("request", handle_request)
        page.goto(url)
        page.wait_for_timeout(5000)
        browser.close()

        return rqdata
Enter fullscreen mode Exit fullscreen mode

Injecting the Solved Token

Once you have the token, inject it the same way for both versions:

from playwright.sync_api import Page

def inject_hcaptcha_token(page: Page, token: str):
    page.evaluate(f"""() => {{
        // Set the response textarea
        const textarea = document.querySelector(
            'textarea[name="h-captcha-response"]'
        );
        if (textarea) textarea.value = '{token}';

        // Also set the global callback if it exists
        const iframe = document.querySelector(
            'iframe[data-hcaptcha-widget-id]'
        );
        if (iframe) {{
            const widgetId = iframe.getAttribute(
                'data-hcaptcha-widget-id'
            );
            if (window.hcaptcha) {{
                // Trigger the callback
                const evt = new Event('hcaptcha-success');
                document.dispatchEvent(evt);
            }}
        }}
    }}""")
Enter fullscreen mode Exit fullscreen mode

Real-World Pattern: Login Form with hCaptcha

Putting it all together for a login page:

import httpx
from playwright.sync_api import sync_playwright

def login_with_hcaptcha(url, username, password):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto(url)

        # Fill credentials
        page.fill('input[name="username"]', username)
        page.fill('input[name="password"]', password)

        # Extract sitekey
        sitekey = page.get_attribute(
            ".h-captcha", "data-sitekey"
        )

        # Solve via API
        token = solve_hcaptcha(sitekey, url)

        # Inject and submit
        inject_hcaptcha_token(page, token)
        page.click('button[type="submit"]')

        page.wait_for_url("**/dashboard**", timeout=10000)
        print("Login successful!")
        browser.close()
Enter fullscreen mode Exit fullscreen mode

Tips for Better Success Rates

  1. Use residential proxies — datacenter IPs get harder challenges
  2. Send real User-Agent — match the browser you claim to be
  3. Include rqdata for Enterprise — even if not strictly required, it helps
  4. Don't reuse tokens — each token is single-use and expires in ~120 seconds
  5. Monitor solve times — if average goes above 10s, your IP might be flagged

Wrapping Up

hCaptcha (both standard and Enterprise) is solvable with the right approach. The key differences are adaptive difficulty and the rqdata parameter.

For a Python client that handles both versions, check out passxapi-python — it abstracts away the polling and parameter differences so you can focus on your actual scraping logic.


Have questions about handling specific hCaptcha implementations? Drop a comment below.

Top comments (0)