DEV Community

Alex Chen
Alex Chen

Posted on

reCAPTCHA v2 vs v3: What's Different, What Breaks Your Scraper, and How to Handle Both

reCAPTCHA v2 shows a checkbox. reCAPTCHA v3 is invisible. But the differences go much deeper than that — and understanding them determines whether your scraper works or silently fails.

Let's break down how each version works, how sites implement them, and how to handle both in your automation.

The Fundamental Difference

reCAPTCHA v2: Binary — you either pass or fail. The user sees a challenge (checkbox, image selection) and the server gets a pass/fail token.

reCAPTCHA v3: Score-based — every request gets a risk score from 0.0 (bot) to 1.0 (human). The site owner decides the threshold. No user interaction.

v2 Flow:
User → Checkbox → (maybe image puzzle) → Token → Server verifies → Pass/Fail

v3 Flow:
Page loads → JS collects signals → Score assigned (0.0-1.0) → Token → Server checks score
Enter fullscreen mode Exit fullscreen mode

How v2 Works

The Checkbox ("I'm not a robot")

When a user clicks the checkbox, Google's JavaScript:

  1. Analyzes mouse movement patterns to the checkbox
  2. Checks browser fingerprint
  3. Reviews cookies and browsing history
  4. Makes a risk decision

If low risk → green checkmark, token issued.
If uncertain → image challenge appears.

<!-- v2 HTML embed -->
<div class="g-recaptcha" 
     data-sitekey="6Le-wvkSAAAAAPBM..."
     data-callback="onCaptchaSuccess">
</div>
<script src="https://www.google.com/recaptcha/api.js" 
        async defer></script>
Enter fullscreen mode Exit fullscreen mode

The Hidden Input

After solving, v2 populates a hidden textarea:

<textarea name="g-recaptcha-response" 
          style="display:none">
03AGdBq24P...long-token...
</textarea>
Enter fullscreen mode Exit fullscreen mode

Server-Side Verification

# What the site's backend does
import httpx

def verify_recaptcha_v2(token: str) -> bool:
    resp = httpx.post(
        "https://www.google.com/recaptcha/api/siteverify",
        data={
            "secret": RECAPTCHA_SECRET_KEY,
            "response": token,
        }
    ).json()

    return resp["success"]  # True or False
    # That's it — binary result
Enter fullscreen mode Exit fullscreen mode

How v3 Works

Invisible Scoring

v3 runs entirely in the background. The site loads a script, calls execute()\ on user actions, and gets a score:

<!-- v3 HTML embed -->
<script src="https://www.google.com/recaptcha/api.js?render=SITE_KEY">
</script>
<script>
  grecaptcha.ready(function() {
    // Called on every important action
    grecaptcha.execute('SITE_KEY', {action: 'login'})
      .then(function(token) {
        // Send token to your server
        document.getElementById('captcha-token')
          .value = token;
      });
  });
</script>
Enter fullscreen mode Exit fullscreen mode

The Score Response

# v3 server-side verification
def verify_recaptcha_v3(
    token: str, 
    expected_action: str = "login"
) -> dict:
    resp = httpx.post(
        "https://www.google.com/recaptcha/api/siteverify",
        data={
            "secret": RECAPTCHA_SECRET_KEY,
            "response": token,
        }
    ).json()

    return {
        "success": resp["success"],
        "score": resp.get("score", 0),
        # 1.0 = very likely human
        # 0.0 = very likely bot
        "action": resp.get("action", ""),
        "hostname": resp.get("hostname", ""),
    }

# Site owner decides the threshold
result = verify_recaptcha_v3(token)
if result["score"] < 0.5:
    # Too risky — show v2 fallback or block
    show_recaptcha_v2()
elif result["action"] != "login":
    # Action mismatch — possible replay
    block_request()
Enter fullscreen mode Exit fullscreen mode

Side-by-Side Comparison

Feature v2 v3
User interaction Checkbox + puzzle None (invisible)
Result type Pass/Fail Score (0.0-1.0)
Threshold Fixed Site-configurable
Action parameter No Yes
Multiple calls/page Usually once Multiple (per action)
Sitekey location data-sitekey\ attribute Script URL or JS call
Token field g-recaptcha-response\ Custom (varies)
Fallback Image challenge Often falls back to v2

Detecting Which Version You're Facing

import re
from bs4 import BeautifulSoup

def detect_recaptcha_version(html: str) -> dict | None:
    soup = BeautifulSoup(html, "html.parser")

    # Check for v2 — explicit checkbox widget
    v2_div = soup.find("div", class_="g-recaptcha")
    if v2_div:
        return {
            "version": "v2",
            "sitekey": v2_div.get("data-sitekey"),
            "type": (
                "invisible" 
                if v2_div.get("data-size") == "invisible"
                else "checkbox"
            ),
        }

    # Check for v3 — render parameter in script
    for script in soup.find_all("script", src=True):
        src = script["src"]
        if "recaptcha" in src and "render=" in src:
            match = re.search(r'render=([^&"]+)', src)
            if match and match.group(1) != "explicit":
                return {
                    "version": "v3",
                    "sitekey": match.group(1),
                    "type": "score",
                }

    # Check for v3 — grecaptcha.execute in inline JS
    for script in soup.find_all("script"):
        if script.string and "grecaptcha.execute" in script.string:
            match = re.search(
                r"grecaptcha\.execute\(['\"]([^'\"]+)",
                script.string
            )
            if match:
                return {
                    "version": "v3",
                    "sitekey": match.group(1),
                    "type": "score",
                }

    return None
Enter fullscreen mode Exit fullscreen mode

Solving reCAPTCHA v2

v2 is straightforward — you need a valid token:

import httpx
import time

def solve_recaptcha_v2(
    sitekey: str, 
    page_url: str,
    invisible: bool = False
) -> str:
    client = httpx.Client(
        base_url="https://www.passxapi.com"
    )

    payload = {
        "type": "recaptcha_v2",
        "sitekey": sitekey,
        "pageurl": page_url,
    }
    if invisible:
        payload["invisible"] = True

    # Submit task
    task = client.post(
        "/api/v1/task", json=payload
    ).json()
    task_id = task["task_id"]

    # Poll for result
    for _ in range(60):
        result = client.get(
            f"/api/v1/task/{task_id}"
        ).json()
        if result["status"] == "completed":
            return result["token"]
        time.sleep(2)

    raise TimeoutError("v2 solve timed out")
Enter fullscreen mode Exit fullscreen mode

Injecting the v2 Token

# With Playwright
async def inject_v2_token(page, token: str):
    await page.evaluate(f"""() => {{
        document.querySelector(
            '#g-recaptcha-response'
        ).value = '{token}';

        // Also set in all iframes
        document.querySelectorAll(
            'textarea[name="g-recaptcha-response"]'
        ).forEach(el => el.value = '{token}');

        // Trigger the callback if it exists
        const callback = document.querySelector(
            '.g-recaptcha'
        )?.getAttribute('data-callback');
        if (callback && window[callback]) {{
            window[callback]('{token}');
        }}
    }}""")

# Without browser — just POST
def submit_with_v2_token(
    url: str, token: str, form_data: dict
):
    form_data["g-recaptcha-response"] = token
    return httpx.post(url, data=form_data)
Enter fullscreen mode Exit fullscreen mode

Solving reCAPTCHA v3

v3 requires an extra parameter — the action string:

def solve_recaptcha_v3(
    sitekey: str,
    page_url: str,
    action: str = "submit",
    min_score: float = 0.7,
) -> str:
    client = httpx.Client(
        base_url="https://www.passxapi.com"
    )

    task = client.post("/api/v1/task", json={
        "type": "recaptcha_v3",
        "sitekey": sitekey,
        "pageurl": page_url,
        "action": action,
        "min_score": min_score,
    }).json()
    task_id = task["task_id"]

    for _ in range(60):
        result = client.get(
            f"/api/v1/task/{task_id}"
        ).json()
        if result["status"] == "completed":
            return result["token"]
        time.sleep(2)

    raise TimeoutError("v3 solve timed out")
Enter fullscreen mode Exit fullscreen mode

Finding the Action String

The action string must match what the site expects. Find it in the page source:

import re

def extract_v3_action(html: str) -> str | None:
    """Find the action parameter from page JS."""
    patterns = [
        r"grecaptcha\.execute\([^,]+,\s*\{action:\s*['\"]([^'\"]+)",
        r"action\s*:\s*['\"]([^'\"]+)['\"]",
    ]

    for pattern in patterns:
        match = re.search(pattern, html)
        if match:
            return match.group(1)

    return None

# Common action strings
COMMON_ACTIONS = [
    "submit", "login", "register", "signup",
    "contact", "checkout", "search", "homepage",
    "validate", "verify",
]
Enter fullscreen mode Exit fullscreen mode

Injecting the v3 Token

v3 tokens are often sent differently — in headers, hidden fields, or via AJAX:

# Method 1: Hidden input
async def inject_v3_hidden_input(page, token):
    await page.evaluate(f"""() => {{
        // Find the hidden input for the token
        let input = document.querySelector(
            'input[name="recaptcha_token"], '
            + 'input[name="g-recaptcha-response"], '
            + 'input[name="captcha_token"]'
        );
        if (!input) {{
            // Create it if it doesn't exist
            input = document.createElement('input');
            input.type = 'hidden';
            input.name = 'g-recaptcha-response';
            document.querySelector('form')
                .appendChild(input);
        }}
        input.value = '{token}';
    }}""")

# Method 2: Direct POST with token
def submit_v3_form(
    url: str, token: str, 
    form_data: dict, 
    token_field: str = "g-recaptcha-response"
):
    form_data[token_field] = token
    return httpx.post(url, data=form_data)

# Method 3: API call with token in headers
def api_call_with_v3(
    url: str, token: str, payload: dict
):
    return httpx.post(
        url,
        json=payload,
        headers={"X-Recaptcha-Token": token}
    )
Enter fullscreen mode Exit fullscreen mode

The v3 → v2 Fallback Pattern

Many sites use v3 first, then show v2 if the score is low:

async def handle_recaptcha_adaptive(
    page, sitekey: str, url: str
):
    """Handle sites that use v3 with v2 fallback."""

    # Try v3 first
    v3_token = solve_recaptcha_v3(
        sitekey=sitekey,
        page_url=url,
        action=extract_v3_action(
            await page.content()
        ) or "submit",
        min_score=0.9,
    )

    await inject_v3_hidden_input(page, v3_token)
    await page.click('button[type="submit"]')
    await page.wait_for_timeout(2000)

    # Check if v2 appeared as fallback
    v2_widget = await page.query_selector(
        ".g-recaptcha:visible, "
        "#recaptcha-anchor:visible"
    )

    if v2_widget:
        print("v3 score too low — falling back to v2")
        v2_sitekey = await v2_widget.get_attribute(
            "data-sitekey"
        ) or sitekey

        v2_token = solve_recaptcha_v2(
            sitekey=v2_sitekey,
            page_url=url,
        )

        await inject_v2_token(page, v2_token)
        await page.click('button[type="submit"]')
Enter fullscreen mode Exit fullscreen mode

Common Pitfalls

1. Wrong Action String for v3

# ❌ Generic action
token = solve_v3(sitekey, url, action="submit")

# ✅ Match the site's actual action
action = extract_v3_action(page_html) or "login"
token = solve_v3(sitekey, url, action=action)
Enter fullscreen mode Exit fullscreen mode

If the action doesn't match, the server rejects the token even if the score is high.

2. Confusing v2 Invisible with v3

# v2 invisible — still has a widget, just auto-triggered
# data-size="invisible" on the div
<div class="g-recaptcha" 
     data-sitekey="..." 
     data-size="invisible">

# v3 — loaded via render parameter, no div at all
<script src="...recaptcha/api.js?render=SITEKEY">
Enter fullscreen mode Exit fullscreen mode

v2 invisible needs to be solved as v2 (with invisible=True\), not v3.

3. Token Field Names Vary

# Common field names for v3 tokens
V3_FIELD_NAMES = [
    "g-recaptcha-response",
    "recaptcha_token",
    "captcha_token",
    "recaptchaToken",
    "token",
    "grecaptcha",
]

def find_token_field(html: str) -> str:
    for field in V3_FIELD_NAMES:
        if f'name="{field}"' in html:
            return field
    return "g-recaptcha-response"  # Default
Enter fullscreen mode Exit fullscreen mode

Unified Solver

Handle both versions with a single function:

async def solve_recaptcha(
    html: str, 
    url: str
) -> dict | None:
    info = detect_recaptcha_version(html)
    if not info:
        return None

    if info["version"] == "v2":
        token = solve_recaptcha_v2(
            sitekey=info["sitekey"],
            page_url=url,
            invisible=info["type"] == "invisible",
        )
        return {
            "token": token,
            "field": "g-recaptcha-response",
            "version": "v2",
        }

    elif info["version"] == "v3":
        action = extract_v3_action(html) or "submit"
        token = solve_recaptcha_v3(
            sitekey=info["sitekey"],
            page_url=url,
            action=action,
        )
        field = find_token_field(html)
        return {
            "token": token,
            "field": field,
            "version": "v3",
            "action": action,
        }
Enter fullscreen mode Exit fullscreen mode

Key Takeaways

  1. v2 is binary, v3 is scored — different handling required
  2. v3 needs the action string — extract it from page JS, don't guess
  3. v2 invisible ≠ v3 — same appearance, different protocol
  4. Many sites use v3→v2 fallback — handle both in sequence
  5. Token field names vary — especially for v3, check the form
  6. Test with Google's test keys — v2: 6LeIxAcTAAAA...\, always passes

For a Python client that handles both v2 and v3 with a unified API, check out passxapi-python — it abstracts away the version differences so your code stays clean.


Which version gives you more trouble — v2 or v3? Share your experience in the comments.

Top comments (0)