DEV Community

Jonathan Blake
Jonathan Blake

Posted on

How to Solve AWS WAF Challenges with CapSolver: 10 Practical Solutions

Introduction

AWS WAF is a powerful tool for protecting your web applications from common web exploits. However, it can also present a significant challenge for web scraping and data extraction. This guide delves into the intricacies of AWS WAF CAPTCHA and presents a comprehensive approach to overcoming these obstacles. We’ll explore why these CAPTCHAs appear, how specialized services can help, and provide practical Python examples to integrate these solutions into your workflow. By the end of this article, you’ll have a clear understanding of how to solve AWS WAF challenges and be able to implement these solutions in your own projects.

Understanding AWS WAF Challenges

AWS WAF (Web Application Firewall) acts as a protective layer for web applications, meticulously filtering and monitoring HTTP and HTTPS requests. Its primary function is to guard against common web exploits that could compromise application availability, security, or consume excessive resources. While vital for security, WAFs frequently impede legitimate web scraping activities by deploying various challenges intended to distinguish human users from automated bots.

These challenges can manifest in several forms, including:

•CAPTCHAs: These involve image-based puzzles, text-based challenges, or interactive verification steps.

•JavaScript Challenges: These require the execution of complex JavaScript code to generate a token or cookie.

•IP Rate Limiting: This blocks requests from IP addresses that exceed a predefined threshold.

•Header and Fingerprinting Analysis: This detects unusual browser headers or unique browser fingerprints that indicate bot activity.

Overcoming these barriers is essential for anyone engaged in data collection, market research, or competitive analysis. This guide will focus on practical, actionable solutions, specifically leveraging CapSolver’s capabilities, to effectively navigate these AWS WAF challenges.

CapSolver: Your Ally Against AWS WAF

CapSolver is an AI-powered CAPTCHA solving service designed to automate the resolution of various CAPTCHA types, including those implemented by AWS WAF. It provides a robust API that integrates seamlessly into existing scraping workflows, offering solutions for both image recognition and token-based challenges. CapSolver’s continuous updates ensure its effectiveness against evolving WAF defenses, making it a reliable choice for maintaining uninterrupted data streams.

Redeem Your CapSolver Bonus Code
Visit the CapSolver Dashboard and use the bonus code CAP25 when topping up your CapSolver account and receive an extra 5% bonus on each recharge.

10 Detailed Solutions to AWS WAF Challenges with CapSolver

Here are ten comprehensive solutions, ranging from basic integration to advanced scenarios, to help you solve AWS WAF challenges using CapSolver:

Solution 1: Basic AWS WAF Token Solving (ProxyLess)
This is the most common scenario where AWS WAF presents a JavaScript challenge, and you need to obtain an aws-waf-token cookie. CapSolver's AntiAwsWafTaskProxyLess task type is ideal for this.

Steps:

  • Make an initial request to the target URL protected by AWS WAF.
  • Parse the HTML response to extract critical parameters: key, iv, context, and challengeJS.
  • Send these parameters to CapSolver using the createTask endpoint with AntiAwsWafTaskProxyLess.
  • Poll the getTaskResult endpoint until the task is ready.
  • Extract the aws-waf-token cookie from CapSolver's solution.
  • Use this cookie in subsequent requests to access the protected content.
import requests
import re
import time

CAPSOLVER_API_KEY = "YOUR_CAPSOLVER_API_KEY"
CAPSOLVER_CREATE_TASK_ENDPOINT = "https://api.capsolver.com/createTask"
CAPSOLVER_GET_TASK_RESULT_ENDPOINT = "https://api.capsolver.com/getTaskResult"

WEBSITE_URL = "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest" # Example URL

def solve_aws_waf_captcha_proxyless(website_url, capsolver_api_key):
    client = requests.Session()
    response = client.get(website_url)
    script_content = response.text

    key_match = re.search(r'"key":"([^"]+)"', script_content)
    iv_match = re.search(r'"iv":"([^"]+)"', script_content)
    context_match = re.search(r'"context":"([^"]+)"', script_content)
    jschallenge_match = re.search(r'<script.*?src="(.*?)".*?></script>', script_content)

    key = key_match.group(1) if key_match else None
    iv = iv_match.group(1) if iv_match else None
    context = context_match.group(1) if context_match else None
    jschallenge = jschallenge_match.group(1) if jschallenge_match else None

    if not all([key, iv, context, jschallenge]):
        print("Error: AWS WAF parameters not found in the page content.")
        return None

    task_payload = {
        "clientKey": capsolver_api_key,
        "task": {
            "type": "AntiAwsWafTaskProxyLess",
            "websiteURL": website_url,
            "awsKey": key,
            "awsIv": iv,
            "awsContext": context,
            "awsChallengeJS": jschallenge
        }
    }

    create_task_response = client.post(CAPSOLVER_CREATE_TASK_ENDPOINT, json=task_payload).json()
    task_id = create_task_response.get('taskId')

    if not task_id:
        print(f"Error creating CapSolver task: {create_task_response.get('errorId')}, {create_task_response.get('errorCode')}")
        return None

    print(f"CapSolver task created with ID: {task_id}")

    for _ in range(10):
        time.sleep(5)
        get_result_payload = {"clientKey": capsolver_api_key, "taskId": task_id}
        get_result_response = client.post(CAPSOLVER_GET_TASK_RESULT_ENDPOINT, json=get_result_payload).json()

        if get_result_response.get('status') == 'ready':
            aws_waf_token_cookie = get_result_response['solution']['cookie']
            print("CapSolver successfully solved the CAPTCHA.")
            return aws_waf_token_cookie
        elif get_result_response.get('status') == 'failed':
            print(f"CapSolver task failed: {get_result_response.get('errorId')}, {get_result_response.get('errorCode')}")
            return None

    print("CapSolver task timed out.")
    return None

# Example usage:
# aws_waf_token = solve_aws_waf_captcha_proxyless(WEBSITE_URL, CAPSOLVER_API_KEY)
# if aws_waf_token:
#     print(f"Received AWS WAF Token: {aws_waf_token}")
#     final_response = requests.get(WEBSITE_URL, cookies={"aws-waf-token": aws_waf_token})
#     print(final_response.text)
Enter fullscreen mode Exit fullscreen mode

Solution 2: AWS WAF Token Solving with Proxies
For more robust scraping operations, especially when dealing with aggressive WAFs or IP-based restrictions, using proxies with CapSolver is essential. This solution is similar to Solution 1 but incorporates proxy usage.

Steps:

  • Follow steps 1 and 2 from Solution 1 to extract WAF parameters.
  • Send these parameters to CapSolver using the createTask endpoint with AntiAwsWafTask and include your proxy details.
  • Poll the getTaskResult endpoint until the task is ready.
  • Extract the aws-waf-token cookie.
  • Use this cookie with your proxy in subsequent requests.

Code Example (Python — Task Payload modification):

# ... (previous code for imports and parameter extraction)

    task_payload = {
        "clientKey": capsolver_api_key,
        "task": {
            "type": "AntiAwsWafTask", # Use AntiAwsWafTask for proxy support
            "websiteURL": website_url,
            "awsKey": key,
            "awsIv": iv,
            "awsContext": context,
            "awsChallengeJS": jschallenge,
            "proxy": "http:user:pass@ip:port" # Example: "http:your_user:your_pass@192.168.1.1:8080"
        }
    }

# ... (rest of the code for creating task and getting result remains the same)
Enter fullscreen mode Exit fullscreen mode

Solution 3: Handling 405 Response Codes with Key, IV, Context
Sometimes, the initial request to an AWS WAF protected page might return a 405 status code, and the necessary key, iv, and context parameters are embedded directly in the HTML. This scenario requires careful parsing.

Steps:

  • Make an HTTP GET request to the websiteURL.
  • If the response status code is 405, parse the HTML content to find window.gokuProps = {"key":"AQID...","iv":"A6we...","context":"rGXm.."} or similar structures to extract key, iv, and context.
  • Submit these parameters to CapSolver using AntiAwsWafTask or AntiAwsWafTaskProxyLess.
  • Retrieve the aws-waf-token and proceed.

Code Example (Python — Parameter Extraction):

import requests
import re

WEBSITE_URL = "https://efw47fpad9.execute-api.us-east-1.amazonaws.com/latest"

response = requests.get(WEBSITE_URL)
script_content = response.text

if response.status_code == 405:
    key_match = re.search(r'"key":"([^"]+)"', script_content)
    iv_match = re.search(r'"iv":"([^"]+)"', script_content)
    context_match = re.search(r'"context":"([^"]+)"', script_content)
    # ... (extract jschallenge if present)

    key = key_match.group(1) if key_match else None
    iv = iv_match.group(1) if iv_match else None
    context = context_match.group(1) if context_match else None
    # ... (use these parameters with CapSolver)
else:
    print(f"Unexpected status code: {response.status_code}")
Enter fullscreen mode Exit fullscreen mode

Solution 4: Handling 202 Response Codes with awsChallengeJS
In other cases, an AWS WAF protected page might return a 202 status code, and only the awsChallengeJS parameter is required. The key, iv, and context can be ignored in this specific scenario.

Steps:

  • Make an HTTP GET request to the websiteURL.
  • If the response status code is 202, parse the HTML content to find the challenge.js link.
  • Submit websiteURL and awsChallengeJS to CapSolver.
  • Retrieve the aws-waf-token and proceed.

Code Example (Python — Parameter Extraction):

import capsolver
import base64
import requests
import re

capsolver.api_key = "YOUR_CAPSOLVER_API_KEY"

WEBSITE_URL = "https://example.com/aws-waf-image-challenge" # Example URL with image challenge

def solve_aws_waf_image_captcha(website_url, capsolver_api_key):
    # This part would involve scraping the page to get the base64 images and the question
    # For demonstration, let's assume we have them:
    # In a real scenario, you'd use a headless browser or advanced parsing to get these.
    # Example: response = requests.get(website_url)
    #          images_base64 = re.findall(r'data:image/png;base64,([a-zA-Z0-9+/=]+)', response.text)
    #          question_match = re.search(r'"question":"(aws:grid:[a-zA-Z]+)"', response.text)
    #          question = question_match.group(1) if question_match else "aws:grid:bed"

    # Placeholder for actual scraped data
    images_base64 = ["/9j/4AAQSkZJRgABAgAA...", "/9j/2wCEAAoHBwgH..."] # Replace with actual base64 images
    question = "aws:grid:bed" # Replace with actual question from the page

    if not images_base64 or not question:
        print("Error: Image data or question not found.")
        return None

    try:
        solution = capsolver.solve({
            "type": "AwsWafClassification",
            "websiteURL": website_url,
            "images": images_base64,
            "question": question
        })
        print("CapSolver successfully solved the image CAPTCHA.")
        return solution
    except Exception as e:
        print(f"CapSolver image task failed: {e}")
        return None

# Example usage:
# image_solution = solve_aws_waf_image_captcha(WEBSITE_URL, capsolver.api_key)
# if image_solution:
#     print(f"Received Image Solution: {image_solution}")
#     # The solution will contain 'objects' for grid type, indicating which images to select.
Enter fullscreen mode Exit fullscreen mode

Solution 5: AWS WAF Image Recognition (Grid Type)
When AWS WAF presents an image-based CAPTCHA, specifically a grid-type challenge (e.g., “Choose all the beds”), CapSolver’s AwsWafClassification task type can solve it.

Steps:

  • Identify that the AWS WAF challenge is an image recognition task, specifically a grid type.
  • Extract the base64 encoded images from the challenge page.
  • Determine the question (e.g., aws:grid:bed).
  • Send the websiteURL, images (as a list of base64 strings), and question to CapSolver using the createTask endpoint with AwsWafClassification.
  • CapSolver will directly return the solution, which includes the objects (indices of the correct images) or box (coordinates for carcity type).

Code Example (Python — Image Recognition):

import capsolver
import base64
import requests
import re

capsolver.api_key = "YOUR_CAPSOLVER_API_KEY"

WEBSITE_URL = "https://example.com/aws-waf-image-challenge" # Example URL with image challenge

def solve_aws_waf_image_captcha(website_url, capsolver_api_key):
    # This part would involve scraping the page to get the base64 images and the question
    # For demonstration, let's assume we have them:
    # In a real scenario, you'd use a headless browser or advanced parsing to get these.
    # Example: response = requests.get(website_url)
    #          images_base64 = re.findall(r'data:image/png;base64,([a-zA-Z0-9+/=]+)', response.text)
    #          question_match = re.search(r'"question":"(aws:grid:[a-zA-Z]+)"', response.text)
    #          question = question_match.group(1) if question_match else "aws:grid:bed"

    # Placeholder for actual scraped data
    images_base64 = ["/9j/4AAQSkZJRgABAgAA...", "/9j/2wCEAAoHBwgH..."] # Replace with actual base64 images
    question = "aws:grid:bed" # Replace with actual question from the page

    if not images_base64 or not question:
        print("Error: Image data or question not found.")
        return None

    try:
        solution = capsolver.solve({
            "type": "AwsWafClassification",
            "websiteURL": website_url,
            "images": images_base64,
            "question": question
        })
        print("CapSolver successfully solved the image CAPTCHA.")
        return solution
    except Exception as e:
        print(f"CapSolver image task failed: {e}")
        return None

# Example usage:
# image_solution = solve_aws_waf_image_captcha(WEBSITE_URL, capsolver.api_key)
# if image_solution:
#     print(f"Received Image Solution: {image_solution}")
#     # The solution will contain 'objects' for grid type, indicating which images to select.
Enter fullscreen mode Exit fullscreen mode

Solution 6: AWS WAF Image Recognition (Toy Car City Type)
Another common image recognition challenge is the “toy car city” type, where you need to place a dot at the end of a car’s path. CapSolver also supports this with AwsWafClassification.

Steps:

  • Identify the challenge as a “toy car city” type.
  • Extract the base64 encoded image.
  • Use the question aws:toycarcity:carcity.
  • Send the websiteURL, images (single base64 string), and question to CapSolver.
  • CapSolver will return the box coordinates (x, y) where the dot should be placed.

Code Example (Python — Toy Car City Recognition):

import capsolver
import base64

capsolver.api_key = "YOUR_CAPSOLVER_API_KEY"

WEBSITE_URL = "https://example.com/aws-waf-toycar-challenge" # Example URL

def solve_aws_waf_toycar_captcha(website_url, capsolver_api_key):
    # Placeholder for actual scraped data
    image_base64 = "/9j/4AAQSkZJRgABAgAA..." # Replace with actual base64 image
    question = "aws:toycarcity:carcity"

    if not image_base64:
        print("Error: Image data not found.")
        return None

    try:
        solution = capsolver.solve({
            "type": "AwsWafClassification",
            "websiteURL": website_url,
            "images": [image_base64],
            "question": question
        })
        print("CapSolver successfully solved the toy car city CAPTCHA.")
        return solution
    except Exception as e:
        print(f"CapSolver toy car city task failed: {e}")
        return None

# Example usage:
# toycar_solution = solve_aws_waf_toycar_captcha(WEBSITE_URL, capsolver.api_key)
# if toycar_solution:
#     print(f"Received Toy Car City Solution: {toycar_solution}")
#     # The solution will contain 'box' with x, y coordinates.
Enter fullscreen mode Exit fullscreen mode

Solution 7: Real-time Parameter Parsing for Expired Tokens
AWS WAF tokens can expire quickly. If CapSolver returns an error like timeout metering, your parameters have expired, it indicates that the awsKey, awsIv, awsContext, or awsChallengeJS are no longer valid. The solution is to parse these parameters in real-time for each request.

Steps:

  • Implement a robust parsing mechanism to extract key, iv, context, and challengeJS immediately before sending the task to CapSolver.
  • Ensure your scraping logic retries the process with newly extracted parameters if an expiration error occurs.
  • This approach minimizes the window for token expiration, enhancing the reliability of your AWS WAF solve.

Code Example (Python — Real-time Parsing Strategy):

def get_aws_waf_params(website_url):
    client = requests.Session()
    response = client.get(website_url)
    script_content = response.text

    key_match = re.search(r'"key":"([^"]+)"', script_content)
    iv_match = re.search(r'"iv":"([^"]+)"', script_content)
    context_match = re.search(r'"context":"([^"]+)"', script_content)
    jschallenge_match = re.search(r'<script.*?src="(.*?)".*?></script>', script_content)

    return {
        "key": key_match.group(1) if key_match else None,
        "iv": iv_match.group(1) if iv_match else None,
        "context": context_match.group(1) if context_match else None,
        "jschallenge": jschallenge_match.group(1) if jschallenge_match else None
    }

def solve_aws_waf_with_retry(website_url, capsolver_api_key, max_retries=3):
    for attempt in range(max_retries):
        print(f"Attempt {attempt + 1} to solve AWS WAF challenge...")
        params = get_aws_waf_params(website_url)
        if not all(params.values()):
            print("Failed to extract all AWS WAF parameters. Retrying...")
            time.sleep(2) # Wait before retrying extraction
            continue

        # Construct task_payload using params and send to CapSolver
        # ... (similar to Solution 1, but using the dynamically fetched params)

        # Placeholder for CapSolver call and result retrieval
        # For example:
        # aws_waf_token = call_capsolver_api(website_url, capsolver_api_key, params)
        # if aws_waf_token:
        #     return aws_waf_token
        # else:
        #     print("CapSolver failed to return token. Retrying...")
        #     time.sleep(5) # Wait before retrying CapSolver call

    print("Failed to solve AWS WAF challenge after multiple retries.")
    return None
Enter fullscreen mode Exit fullscreen mode

Solution 8: Using awsChallengeJS when Key, IV, Context are Absent
Sometimes, the key, iv, and context parameters might not be present on the page, but a challenge.js link is available. In such cases, passing awsChallengeJS to CapSolver is sufficient.

Steps:

  • Scrape the target page and check for the presence of challenge.js.
  • If found, extract the URL of challenge.js.
  • Submit the websiteURL and the extracted awsChallengeJS to CapSolver.
  • CapSolver will process the challenge and return the aws-waf-token.

Code Example (Python — awsChallengeJS only):

# ... (imports and API key setup)

WEBSITE_URL = "https://example.com/challenge-js-only"

def solve_aws_waf_challenge_js(website_url, capsolver_api_key):
    client = requests.Session()
    response = client.get(website_url)
    script_content = response.text

    jschallenge_match = re.search(r'<script.*?src="(.*?challenge.js)".*?></script>', script_content)
    jschallenge = jschallenge_match.group(1) if jschallenge_match else None

    if not jschallenge:
        print("Error: awsChallengeJS not found.")
        return None

    task_payload = {
        "clientKey": capsolver_api_key,
        "task": {
            "type": "AntiAwsWafTaskProxyLess",
            "websiteURL": website_url,
            "awsChallengeJS": jschallenge
        }
    }

    # ... (rest of the code for creating task and getting result remains the same as Solution 1)
Enter fullscreen mode Exit fullscreen mode

Solution 9: Utilizing awsApiJs for Dynamic challenge.js
In more complex scenarios, the challenge.js URL might not be directly visible but is assembled from the code within jsapi.js. CapSolver can handle this by accepting awsApiJs.

Steps:

  • Scrape the target page and look for jsapi.js.
  • Extract the URL of jsapi.js.
  • Submit the websiteURL and the extracted awsApiJs to CapSolver.
  • CapSolver will then internally resolve the challenge.js and solve the AWS WAF challenge.

Code Example (Python — awsApiJs):

# ... (imports and API key setup)

WEBSITE_URL = "https://example.com/jsapi-challenge"

def solve_aws_waf_api_js(website_url, capsolver_api_key):
    client = requests.Session()
    response = client.get(website_url)
    script_content = response.text

    jsapi_match = re.search(r'<script.*?src="(.*?jsapi.js)".*?></script>', script_content)
    jsapi = jsapi_match.group(1) if jsapi_match else None

    if not jsapi:
        print("Error: awsApiJs not found.")
        return None

    task_payload = {
        "clientKey": capsolver_api_key,
        "task": {
            "type": "AntiAwsWafTaskProxyLess",
            "websiteURL": website_url,
            "awsApiJs": jsapi
        }
    }

    # ... (rest of the code for creating task and getting result remains the same as Solution 1)
Enter fullscreen mode Exit fullscreen mode

Solution 10: Advanced awsProblemUrl for Visual Challenges
For highly dynamic visual challenges where key, iv, context, and challenge.js are absent, but a problem endpoint URL is present, CapSolver can use awsProblemUrl.

Steps:

  • Scrape the page to find the problem endpoint URL, which typically contains keywords like problem and num_solutions_required.
  • This URL can often be found by searching for visualSolutionsRequired in the page HTML.
  • Submit the websiteURL and the extracted awsProblemUrl to CapSolver.
  • CapSolver will interact with this endpoint to solve the visual AWS WAF challenge.

Code Example (Python — awsProblemUrl):

# ... (imports and API key setup)

WEBSITE_URL = "https://example.com/problem-url-challenge"

def solve_aws_waf_problem_url(website_url, capsolver_api_key):
    client = requests.Session()
    response = client.get(website_url)
    script_content = response.text

    # Example of how to find awsProblemUrl (this might vary)
    problem_url_match = re.search(r'"problemUrl":"(https://.*?problem\?.*?)"', script_content)
    problem_url = problem_url_match.group(1) if problem_url_match else None

    if not problem_url:
        print("Error: awsProblemUrl not found.")
        return None

    task_payload = {
        "clientKey": capsolver_api_key,
        "task": {
            "type": "AntiAwsWafTaskProxyLess",
            "websiteURL": website_url,
            "awsProblemUrl": problem_url
        }
    }

    # ... (rest of the code for creating task and getting result remains the same as Solution 1)
Enter fullscreen mode Exit fullscreen mode

Application Scenarios and Case Studies

CapSolver’s versatility in handling AWS WAF challenges makes it invaluable across various applications. Here are a few scenarios:

Case Study 1: E-commerce Price Monitoring
A data analytics company specializing in e-commerce price monitoring faced constant disruptions due to AWS WAF challenges on major retail websites. Their existing scrapers were frequently blocked, leading to incomplete data and delayed insights. By integrating CapSolver’s AntiAwsWafTaskProxyLess, they automated the token generation process. This allowed their bots to consistently solve the WAF, ensuring real-time price updates and competitive intelligence. The solution significantly reduced manual intervention and improved data accuracy by 90%.

Case Study 2: Digital Lending Platform for Enhanced Credit Scoring
A digital lending platform aimed to enhance its credit scoring models by incorporating alternative data sources, such as public financial records, social media activity, and online behavioral patterns. Many of these crucial data points were hosted on websites protected by AWS WAF, presenting frequent JavaScript challenges and CAPTCHAs. By integrating CapSolver’s AntiAwsWafTaskProxyLess and AntiAwsWafTask with sophisticated proxy management, the platform successfully automated the bypass of these WAF protections. This enabled the real-time collection of diverse data, leading to more accurate and dynamic credit risk assessments, reduced fraud rates, and more informed lending decisions.

Case Study 3: Public Legal Data Access
A compliance-focused SaaS company needed to collect publicly available legal and regulatory data, such as corporate filings, intellectual property records, and case updates. These platforms, while offering open access, depoyed AWS WAF .

By integrating CapSolver’s AntiAwsWafTaskProxyLess, the company ensured stable and automated access to these datasets without manual intervention. This allowed them to provide real-time alerts and analytics for their clients in law, finance, and compliance.

The result was a*more reliable data pipeline and faster delivery of critical legal insights helping their customers stay compliant and competitive.

Conclusion

Navigating AWS WAF challenges is an unavoidable part of modern web scraping. However, with the right tools and strategies, these obstacles can be effectively overcome. CapSolver provides a powerful, flexible, and reliable solution for solving both token-based and image-recognition AWS WAF challenges. By understanding the different scenarios and implementing the detailed solutions outlined in this guide, you can ensure your data collection efforts remain uninterrupted and efficient.

Top comments (0)