Add captcha solving to a Python scraper

#python #scraping #webscraping #captcha

Your requests scraper runs fine until a target drops a reCAPTCHA on the login or search page, and then every response is the challenge instead of the data. This post wires CapBypass into a plain Python scraper: detect the captcha, solve it through the API, inject the token, and retry - no browser.

when your scraper hits a captcha

A captcha shows up as a page that is not your data: a reCAPTCHA widget (g-recaptcha div, a grecaptcha.execute call), an hCaptcha frame, or an AWS WAF challenge. With requests/httpx you cannot run the widget's JavaScript, so the form never gets a valid g-recaptcha-response and the server rejects the submit.

The fix is to get that token from a solving API and submit it yourself.

detecting it

Recognise the challenge deterministically before you waste a submit:

def needs_recaptcha(html: str) -> bool:
    return "g-recaptcha" in html or "grecaptcha.execute" in html

def site_key(html: str) -> str | None:
    import re
    m = re.search(r'data-sitekey="([^"]+)"', html)
    return m.group(1) if m else None

If needs_recaptcha is true, pull the data-sitekey and solve before submitting.

solving via capbypass

Send the site URL and key to CapBypass and read the token back. Use the Python SDK:

import os
from capbypass import CapBypass

solver = CapBypass(api_key=os.environ["CAPBYPASS_API_KEY"])

def solve_recaptcha_v2(url: str, key: str) -> str:
    result = solver.solve({
        "type": "ReCaptchaV2Task",
        "websiteURL": url,
        "websiteKey": key,
        "proxy": "host:port:user:pass",
    })
    return result["solution"]["gRecaptchaResponse"]

solve() handles the create-task / poll loop for you and returns once the token is ready.

injecting the token and retrying

reCAPTCHA tokens go into the form field named g-recaptcha-response. Add it to the POST body you were already sending:

import requests

session = requests.Session()

def login(url: str, username: str, password: str):
    page = session.get(url).text
    payload = {"username": username, "password": password}

    if needs_recaptcha(page):
        token = solve_recaptcha_v2(url, site_key(page))
        payload["g-recaptcha-response"] = token   # submit immediately

    return session.post(url, data=payload)

For challenge-based protection like AWS WAF the solution is a cookie, not a form field - set it on session.cookies and reuse the returned userAgent instead of adding a body field.

a retry pattern

Tokens are single-use and expire fast, so solve as late as possible (right before the submit) and retry once on a captcha response rather than reusing a stale token:

def submit_with_retry(url, payload, attempts=2):
    for _ in range(attempts):
        resp = session.post(url, data=payload)
        if not needs_recaptcha(resp.text):
            return resp
        payload["g-recaptcha-response"] = solve_recaptcha_v2(url, site_key(resp.text))
    return resp

Bonus: +5% credits on every top-up

New to CapBypass? Apply code WELCOME_2026 at checkout for an extra 5% in credits on every top-up, with no minimum and no expiry. Redeem it on the top-up page and put it toward your first ReCaptchaV2Task solves.

things that go wrong

Stale token. Solving at the top of the run and submitting minutes later fails with timeout-or-duplicate. Solve right before the submit.
Wrong site key. Read data-sitekey from the live page each run; do not hardcode it.
v3 instead of v2. If the page uses grecaptcha.execute(...) with an action, it is reCAPTCHA v3 - switch to ReCaptchaV3Task and pass the matching pageAction. See the reCAPTCHA v3 docs.

faq

Does this work with httpx instead of requests?
Yes. The flow is identical - swap requests.Session() for httpx.Client(); you still inject g-recaptcha-response into the POST data or set the cookie on the client.

Do I need a real browser?
No. The solving API runs the challenge for you; your scraper stays a plain HTTP client and just submits the returned token or cookie.

Where do I put the token?
In the form field g-recaptcha-response for reCAPTCHA/hCaptcha. For AWS WAF set the returned cookie on your session instead.

How do I avoid solving on every request?
Only solve when needs_recaptcha is true. Most requests in a session will not be challenged once you are past the gated page.

Originally published on capbypass.pro. CapBypass is an AI-powered captcha-solving API for reCAPTCHA, hCaptcha, Cloudflare Turnstile and AWS WAF.