Alex Chen

Posted on Mar 23

reCAPTCHA v2 vs v3: What's Different, What Breaks Your Scraper, and How to Handle Both

reCAPTCHA v2 shows a checkbox. reCAPTCHA v3 is invisible. But the differences go much deeper than that — and understanding them determines whether your scraper works or silently fails.

Let's break down how each version works, how sites implement them, and how to handle both in your automation.

The Fundamental Difference

reCAPTCHA v2: Binary — you either pass or fail. The user sees a challenge (checkbox, image selection) and the server gets a pass/fail token.

reCAPTCHA v3: Score-based — every request gets a risk score from 0.0 (bot) to 1.0 (human). The site owner decides the threshold. No user interaction.

v2 Flow:
User → Checkbox → (maybe image puzzle) → Token → Server verifies → Pass/Fail

v3 Flow:
Page loads → JS collects signals → Score assigned (0.0-1.0) → Token → Server checks score

How v2 Works

The Checkbox ("I'm not a robot")

When a user clicks the checkbox, Google's JavaScript:

Analyzes mouse movement patterns to the checkbox
Checks browser fingerprint
Reviews cookies and browsing history
Makes a risk decision

If low risk → green checkmark, token issued.
If uncertain → image challenge appears.

<!-- v2 HTML embed -->
<div class="g-recaptcha" 
     data-sitekey="6Le-wvkSAAAAAPBM..."
     data-callback="onCaptchaSuccess">
</div>
<script src="https://www.google.com/recaptcha/api.js" 
        async defer></script>

The Hidden Input

After solving, v2 populates a hidden textarea:

<textarea name="g-recaptcha-response" 
          style="display:none">
03AGdBq24P...long-token...
</textarea>

Server-Side Verification

# What the site's backend does
import httpx

def verify_recaptcha_v2(token: str) -> bool:
    resp = httpx.post(
        "https://www.google.com/recaptcha/api/siteverify",
        data={
            "secret": RECAPTCHA_SECRET_KEY,
            "response": token,
        }
    ).json()

    return resp["success"]  # True or False
    # That's it — binary result

How v3 Works

Invisible Scoring

v3 runs entirely in the background. The site loads a script, calls execute()\ on user actions, and gets a score:

<!-- v3 HTML embed -->
<script src="https://www.google.com/recaptcha/api.js?render=SITE_KEY">
</script>
<script>
  grecaptcha.ready(function() {
    // Called on every important action
    grecaptcha.execute('SITE_KEY', {action: 'login'})
      .then(function(token) {
        // Send token to your server
        document.getElementById('captcha-token')
          .value = token;
      });
  });
</script>

The Score Response

# v3 server-side verification
def verify_recaptcha_v3(
    token: str, 
    expected_action: str = "login"
) -> dict:
    resp = httpx.post(
        "https://www.google.com/recaptcha/api/siteverify",
        data={
            "secret": RECAPTCHA_SECRET_KEY,
            "response": token,
        }
    ).json()

    return {
        "success": resp["success"],
        "score": resp.get("score", 0),
        # 1.0 = very likely human
        # 0.0 = very likely bot
        "action": resp.get("action", ""),
        "hostname": resp.get("hostname", ""),
    }

# Site owner decides the threshold
result = verify_recaptcha_v3(token)
if result["score"] < 0.5:
    # Too risky — show v2 fallback or block
    show_recaptcha_v2()
elif result["action"] != "login":
    # Action mismatch — possible replay
    block_request()

Side-by-Side Comparison

Feature	v2	v3
User interaction	Checkbox + puzzle	None (invisible)
Result type	Pass/Fail	Score (0.0-1.0)
Threshold	Fixed	Site-configurable
Action parameter	No	Yes
Multiple calls/page	Usually once	Multiple (per action)
Sitekey location	`data-sitekey\` attribute	Script URL or JS call
Token field	`g-recaptcha-response\`	Custom (varies)
Fallback	Image challenge	Often falls back to v2

Detecting Which Version You're Facing

import re
from bs4 import BeautifulSoup

def detect_recaptcha_version(html: str) -> dict | None:
    soup = BeautifulSoup(html, "html.parser")

    # Check for v2 — explicit checkbox widget
    v2_div = soup.find("div", class_="g-recaptcha")
    if v2_div:
        return {
            "version": "v2",
            "sitekey": v2_div.get("data-sitekey"),
            "type": (
                "invisible" 
                if v2_div.get("data-size") == "invisible"
                else "checkbox"
            ),
        }

    # Check for v3 — render parameter in script
    for script in soup.find_all("script", src=True):
        src = script["src"]
        if "recaptcha" in src and "render=" in src:
            match = re.search(r'render=([^&"]+)', src)
            if match and match.group(1) != "explicit":
                return {
                    "version": "v3",
                    "sitekey": match.group(1),
                    "type": "score",
                }

    # Check for v3 — grecaptcha.execute in inline JS
    for script in soup.find_all("script"):
        if script.string and "grecaptcha.execute" in script.string:
            match = re.search(
                r"grecaptcha\.execute\(['\"]([^'\"]+)",
                script.string
            )
            if match:
                return {
                    "version": "v3",
                    "sitekey": match.group(1),
                    "type": "score",
                }

    return None

Solving reCAPTCHA v2

v2 is straightforward — you need a valid token:

import httpx
import time

def solve_recaptcha_v2(
    sitekey: str, 
    page_url: str,
    invisible: bool = False
) -> str:
    client = httpx.Client(
        base_url="https://www.passxapi.com"
    )

    payload = {
        "type": "recaptcha_v2",
        "sitekey": sitekey,
        "pageurl": page_url,
    }
    if invisible:
        payload["invisible"] = True

    # Submit task
    task = client.post(
        "/api/v1/task", json=payload
    ).json()
    task_id = task["task_id"]

    # Poll for result
    for _ in range(60):
        result = client.get(
            f"/api/v1/task/{task_id}"
        ).json()
        if result["status"] == "completed":
            return result["token"]
        time.sleep(2)

    raise TimeoutError("v2 solve timed out")

Injecting the v2 Token

# With Playwright
async def inject_v2_token(page, token: str):
    await page.evaluate(f"""() => {{
        document.querySelector(
            '#g-recaptcha-response'
        ).value = '{token}';

        // Also set in all iframes
        document.querySelectorAll(
            'textarea[name="g-recaptcha-response"]'
        ).forEach(el => el.value = '{token}');

        // Trigger the callback if it exists
        const callback = document.querySelector(
            '.g-recaptcha'
        )?.getAttribute('data-callback');
        if (callback && window[callback]) {{
            window[callback]('{token}');
        }}
    }}""")

# Without browser — just POST
def submit_with_v2_token(
    url: str, token: str, form_data: dict
):
    form_data["g-recaptcha-response"] = token
    return httpx.post(url, data=form_data)

Solving reCAPTCHA v3

v3 requires an extra parameter — the action string:

def solve_recaptcha_v3(
    sitekey: str,
    page_url: str,
    action: str = "submit",
    min_score: float = 0.7,
) -> str:
    client = httpx.Client(
        base_url="https://www.passxapi.com"
    )

    task = client.post("/api/v1/task", json={
        "type": "recaptcha_v3",
        "sitekey": sitekey,
        "pageurl": page_url,
        "action": action,
        "min_score": min_score,
    }).json()
    task_id = task["task_id"]

    for _ in range(60):
        result = client.get(
            f"/api/v1/task/{task_id}"
        ).json()
        if result["status"] == "completed":
            return result["token"]
        time.sleep(2)

    raise TimeoutError("v3 solve timed out")

Finding the Action String

The action string must match what the site expects. Find it in the page source:

import re

def extract_v3_action(html: str) -> str | None:
    """Find the action parameter from page JS."""
    patterns = [
        r"grecaptcha\.execute\([^,]+,\s*\{action:\s*['\"]([^'\"]+)",
        r"action\s*:\s*['\"]([^'\"]+)['\"]",
    ]

    for pattern in patterns:
        match = re.search(pattern, html)
        if match:
            return match.group(1)

    return None

# Common action strings
COMMON_ACTIONS = [
    "submit", "login", "register", "signup",
    "contact", "checkout", "search", "homepage",
    "validate", "verify",
]

Injecting the v3 Token

v3 tokens are often sent differently — in headers, hidden fields, or via AJAX:

# Method 1: Hidden input
async def inject_v3_hidden_input(page, token):
    await page.evaluate(f"""() => {{
        // Find the hidden input for the token
        let input = document.querySelector(
            'input[name="recaptcha_token"], '
            + 'input[name="g-recaptcha-response"], '
            + 'input[name="captcha_token"]'
        );
        if (!input) {{
            // Create it if it doesn't exist
            input = document.createElement('input');
            input.type = 'hidden';
            input.name = 'g-recaptcha-response';
            document.querySelector('form')
                .appendChild(input);
        }}
        input.value = '{token}';
    }}""")

# Method 2: Direct POST with token
def submit_v3_form(
    url: str, token: str, 
    form_data: dict, 
    token_field: str = "g-recaptcha-response"
):
    form_data[token_field] = token
    return httpx.post(url, data=form_data)

# Method 3: API call with token in headers
def api_call_with_v3(
    url: str, token: str, payload: dict
):
    return httpx.post(
        url,
        json=payload,
        headers={"X-Recaptcha-Token": token}
    )

The v3 → v2 Fallback Pattern

Many sites use v3 first, then show v2 if the score is low:

async def handle_recaptcha_adaptive(
    page, sitekey: str, url: str
):
    """Handle sites that use v3 with v2 fallback."""

    # Try v3 first
    v3_token = solve_recaptcha_v3(
        sitekey=sitekey,
        page_url=url,
        action=extract_v3_action(
            await page.content()
        ) or "submit",
        min_score=0.9,
    )

    await inject_v3_hidden_input(page, v3_token)
    await page.click('button[type="submit"]')
    await page.wait_for_timeout(2000)

    # Check if v2 appeared as fallback
    v2_widget = await page.query_selector(
        ".g-recaptcha:visible, "
        "#recaptcha-anchor:visible"
    )

    if v2_widget:
        print("v3 score too low — falling back to v2")
        v2_sitekey = await v2_widget.get_attribute(
            "data-sitekey"
        ) or sitekey

        v2_token = solve_recaptcha_v2(
            sitekey=v2_sitekey,
            page_url=url,
        )

        await inject_v2_token(page, v2_token)
        await page.click('button[type="submit"]')

Common Pitfalls

1. Wrong Action String for v3

# ❌ Generic action
token = solve_v3(sitekey, url, action="submit")

# ✅ Match the site's actual action
action = extract_v3_action(page_html) or "login"
token = solve_v3(sitekey, url, action=action)

If the action doesn't match, the server rejects the token even if the score is high.

2. Confusing v2 Invisible with v3

# v2 invisible — still has a widget, just auto-triggered
# data-size="invisible" on the div
<div class="g-recaptcha" 
     data-sitekey="..." 
     data-size="invisible">

# v3 — loaded via render parameter, no div at all
<script src="...recaptcha/api.js?render=SITEKEY">

v2 invisible needs to be solved as v2 (with invisible=True\), not v3.

3. Token Field Names Vary

# Common field names for v3 tokens
V3_FIELD_NAMES = [
    "g-recaptcha-response",
    "recaptcha_token",
    "captcha_token",
    "recaptchaToken",
    "token",
    "grecaptcha",
]

def find_token_field(html: str) -> str:
    for field in V3_FIELD_NAMES:
        if f'name="{field}"' in html:
            return field
    return "g-recaptcha-response"  # Default

Unified Solver

Handle both versions with a single function:

async def solve_recaptcha(
    html: str, 
    url: str
) -> dict | None:
    info = detect_recaptcha_version(html)
    if not info:
        return None

    if info["version"] == "v2":
        token = solve_recaptcha_v2(
            sitekey=info["sitekey"],
            page_url=url,
            invisible=info["type"] == "invisible",
        )
        return {
            "token": token,
            "field": "g-recaptcha-response",
            "version": "v2",
        }

    elif info["version"] == "v3":
        action = extract_v3_action(html) or "submit"
        token = solve_recaptcha_v3(
            sitekey=info["sitekey"],
            page_url=url,
            action=action,
        )
        field = find_token_field(html)
        return {
            "token": token,
            "field": field,
            "version": "v3",
            "action": action,
        }

Key Takeaways

v2 is binary, v3 is scored — different handling required
v3 needs the action string — extract it from page JS, don't guess
v2 invisible ≠ v3 — same appearance, different protocol
Many sites use v3→v2 fallback — handle both in sequence
Token field names vary — especially for v3, check the form
Test with Google's test keys — v2: 6LeIxAcTAAAA...\, always passes

For a Python client that handles both v2 and v3 with a unified API, check out passxapi-python — it abstracts away the version differences so your code stays clean.

Which version gives you more trouble — v2 or v3? Share your experience in the comments.

DEV Community