If you've scraped enough sites, you've seen hCaptcha pop up — the "select all images with bicycles" challenge that replaced reCAPTCHA on many platforms. But not all hCaptcha implementations are the same.
Let's break down standard vs Enterprise, what it means for your automation, and how to handle both.
How hCaptcha Works (The Basics)
hCaptcha serves two purposes:
- Bot detection — fingerprinting, behavioral analysis
- Data labeling — those image challenges actually train ML models
When a page loads hCaptcha, it:
- Loads a JS script from
hcaptcha.com - Renders an iframe with
data-sitekey - Collects browser signals (mouse movement, screen size, WebGL hash)
- Decides: show a challenge or pass silently
<div class="h-captcha" data-sitekey="abc123-site-key"></div>
<script src="https://js.hcaptcha.com/1/api.js" async defer></script>
Standard vs Enterprise: Key Differences
| Feature | Standard | Enterprise |
|---|---|---|
| Challenge type | Image selection | Image + risk scoring |
| Passive mode | Limited | Full (invisible) |
| Difficulty | Fixed | Adaptive per-session |
| Response time | ~3s solve | ~5-8s (more signals) |
rqdata field |
Not used | Required for some sites |
The biggest difference: Enterprise uses adaptive difficulty. If your session looks suspicious (headless browser, datacenter IP, no mouse movement), the challenges get harder — more rounds, obscure image categories.
Detecting Which Version You're Facing
Check the network requests:
import requests
from bs4 import BeautifulSoup
resp = requests.get(target_url)
soup = BeautifulSoup(resp.text, "html.parser")
# Find hCaptcha element
hcaptcha_div = soup.find("div", class_="h-captcha")
if hcaptcha_div:
sitekey = hcaptcha_div.get("data-sitekey")
# Check for enterprise indicators
script_tags = soup.find_all("script", src=True)
is_enterprise = any(
"enterprise" in s["src"]
for s in script_tags
if "hcaptcha" in s["src"]
)
print(f"Sitekey: {sitekey}")
print(f"Enterprise: {is_enterprise}")
Enterprise sites often load from js.hcaptcha.com/1/api.js?endpoint=enterprise or include additional config in the page source.
Solving Standard hCaptcha
For standard hCaptcha, the flow is straightforward:
import httpx
def solve_hcaptcha(sitekey: str, page_url: str) -> str:
client = httpx.Client(base_url="https://www.passxapi.com")
# Step 1: Submit the task
task = client.post("/api/v1/task", json={
"type": "hcaptcha",
"sitekey": sitekey,
"pageurl": page_url
}).json()
task_id = task["task_id"]
# Step 2: Poll for result
import time
for _ in range(60):
result = client.get(f"/api/v1/task/{task_id}").json()
if result["status"] == "completed":
return result["token"]
time.sleep(2)
raise TimeoutError("Solve took too long")
Handling Enterprise hCaptcha
Enterprise requires extra parameters. The critical one is rqdata — a site-specific payload that the hCaptcha script generates. Without it, your token gets rejected.
def solve_hcaptcha_enterprise(
sitekey: str,
page_url: str,
rqdata: str = None
) -> str:
client = httpx.Client(base_url="https://www.passxapi.com")
payload = {
"type": "hcaptcha",
"sitekey": sitekey,
"pageurl": page_url,
"enterprise": True
}
# rqdata is optional but improves success rate
if rqdata:
payload["rqdata"] = rqdata
task = client.post("/api/v1/task", json=payload).json()
task_id = task["task_id"]
for _ in range(90): # Enterprise takes longer
result = client.get(f"/api/v1/task/{task_id}").json()
if result["status"] == "completed":
return result["token"]
time.sleep(2)
raise TimeoutError("Enterprise solve timeout")
Extracting rqdata
Some sites embed rqdata in their JavaScript. You can intercept it:
from playwright.sync_api import sync_playwright
def extract_rqdata(url: str) -> str:
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
rqdata = None
def handle_request(request):
nonlocal rqdata
if "hcaptcha.com/checksiteconfig" in request.url:
# rqdata is sent as a query parameter
from urllib.parse import urlparse, parse_qs
params = parse_qs(urlparse(request.url).query)
if "rqdata" in params:
rqdata = params["rqdata"][0]
page.on("request", handle_request)
page.goto(url)
page.wait_for_timeout(5000)
browser.close()
return rqdata
Injecting the Solved Token
Once you have the token, inject it the same way for both versions:
from playwright.sync_api import Page
def inject_hcaptcha_token(page: Page, token: str):
page.evaluate(f"""() => {{
// Set the response textarea
const textarea = document.querySelector(
'textarea[name="h-captcha-response"]'
);
if (textarea) textarea.value = '{token}';
// Also set the global callback if it exists
const iframe = document.querySelector(
'iframe[data-hcaptcha-widget-id]'
);
if (iframe) {{
const widgetId = iframe.getAttribute(
'data-hcaptcha-widget-id'
);
if (window.hcaptcha) {{
// Trigger the callback
const evt = new Event('hcaptcha-success');
document.dispatchEvent(evt);
}}
}}
}}""")
Real-World Pattern: Login Form with hCaptcha
Putting it all together for a login page:
import httpx
from playwright.sync_api import sync_playwright
def login_with_hcaptcha(url, username, password):
with sync_playwright() as p:
browser = p.chromium.launch(headless=False)
page = browser.new_page()
page.goto(url)
# Fill credentials
page.fill('input[name="username"]', username)
page.fill('input[name="password"]', password)
# Extract sitekey
sitekey = page.get_attribute(
".h-captcha", "data-sitekey"
)
# Solve via API
token = solve_hcaptcha(sitekey, url)
# Inject and submit
inject_hcaptcha_token(page, token)
page.click('button[type="submit"]')
page.wait_for_url("**/dashboard**", timeout=10000)
print("Login successful!")
browser.close()
Tips for Better Success Rates
- Use residential proxies — datacenter IPs get harder challenges
- Send real User-Agent — match the browser you claim to be
-
Include
rqdatafor Enterprise — even if not strictly required, it helps - Don't reuse tokens — each token is single-use and expires in ~120 seconds
- Monitor solve times — if average goes above 10s, your IP might be flagged
Wrapping Up
hCaptcha (both standard and Enterprise) is solvable with the right approach. The key differences are adaptive difficulty and the rqdata parameter.
For a Python client that handles both versions, check out passxapi-python — it abstracts away the polling and parameter differences so you can focus on your actual scraping logic.
Have questions about handling specific hCaptcha implementations? Drop a comment below.
Top comments (0)