Crawlbase

Posted on Jun 9 • Originally published at crawlbase.com

How to Unblock Amazon with Smart Proxy

#unblockamazonproxy

This blog was initially posted to Crawlbase Blog

This guide will show you how to unblock Amazon proxy issues and scrape data reliably using Crawlbase’s Smart Proxy. A complete guide to scraping Amazon effectively.

Setting Up Your Coding Environment

Before building your Amazon proxy unblocker, you'll need to set up a basic Python environment. Here's how to get started:

Install Python 3 on your computer
Install requests module, which makes it easy to send HTTP requests in Python.

python -m pip install requests

Note: You can write and run your code using any text editor, but using an IDE can speed things up. Tools like PyCharm or VS Code are great for writing Python code, especially for beginners, as they include helpful features like syntax highlighting, error checking, and debugging tools.

Obtaining Credentials

Sign up for a Crawlbase account and log in to receive your 5,000 free requests
Get your Smart Proxy Private token

Making Your First Successful Request

At this point, your coding environment should be ready. Let’s try sending your first request.

In this example code, we'll try to fetch the HTML content of this Amazon product details page. You are free to copy this code, but make sure to replace the Private_token with the actual token or authentication key obtained from your Crawlbase account.

import requests
from urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)

url_to_crawl = "https://www.amazon.com/Apple-iPhone-Silicone-Case-MagSafe/dp/B0CHX2XFLN"

crawlbase_normal_token = "<Normal requests token>"
crawlbase_smart_proxy_url = (
    f"https://{crawlbase_normal_token}:@smartproxy.crawlbase.com:8013"
)

try:
    response = requests.get(
        url=url_to_crawl,
        proxies={
            "http": crawlbase_smart_proxy_url,
            "https": crawlbase_smart_proxy_url
        },
        verify=False,
        timeout=30,
    )
    response.raise_for_status()

    print("Response Code:", response.status_code)
    print("Response Body:", response.text)

except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

You may refer to our GitHub repository for the source code.

Key Things to Know

Smart Proxy URL: The format https://<TOKEN>:@smartproxy.crawlbase.com:8013 is how authentication is handled. Your token is used as the username in the proxy connection.
verify=False: This disables SSL verification on the client side, which is required here because SSL is handled by the proxy itself, as noted in the Smart Proxy documentation.

Once you run this code, you should see a 200 response and the full HTML of the Amazon product page similar to the image below.

Unblocking Amazon with Smart Proxy: A Practical Use Case

Now, let’s apply what you’ve learned into practice. We’ll show you how to extract a list of reviews from an Amazon product page and save the data into a CSV file.

Extracting Specific Data

We'll use the Data Scraper feature from Crawlbase called Amazon-product-details Scraper via the CrawlbaseAPI-Parameters header. This allows our code to automatically parse the Amazon page and return clean, structured JSON data.

import requests
import json
from urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)

url_to_crawl = "https://www.amazon.com/Apple-iPhone-Silicone-Case-MagSafe/dp/B0CHX2XFLN"
crawlbase_normal_token = "<Normal requests token>"
crawlbase_crawling_api_parameters = "scraper=amazon-product-details"
crawlbase_smart_proxy_url = (
    f"https://{crawlbase_normal_token}:@smartproxy.crawlbase.com:8013"
)

try:
    response = requests.get(
        url=url_to_crawl,
        headers={"CrawlbaseAPI-Parameters": crawlbase_crawling_api_parameters},
        proxies={"http": crawlbase_smart_proxy_url, "https": crawlbase_smart_proxy_url},
        verify=False,
        timeout=30,
    )
    response.raise_for_status()

    json_data = json.loads(response.text)
    product_reviews = json_data["body"]["reviews"]

    for review in product_reviews:
        # TODO save the values here in a CSV file
        # but console print for now
        print("--------------------")
        print("Author: ", review["reviewerName"])
        print("Rating: ", review["reviewRating"])
        print("  Date: ", review["reviewDate"])
        print("Review: ", review["reviewText"])

except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

You may refer to our GitHub repository for the source code.

How This Works

CrawlbaseAPI-Parameters: The scraper=amazon-product-details parameter tells Crawlbase to analyze the product page and return structured JSON that includes reviews, ratings, product info, etc.
Print JSON response: We extract the list of reviews from json_data["body"]["reviews"] and loop through them. For each product review, we print the Author, Rating, Date, and Review text.

Compiling Extracted Data into CSV

Lastly, you could easily modify the code to save the reviews to a CSV file for later analysis. Here's an example of how to save the data.

import requests
import json
import csv # newly added code
from urllib3.exceptions import InsecureRequestWarning

requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)

url_to_crawl = "https://www.amazon.com/Apple-iPhone-Silicone-Case-MagSafe/dp/B0CHX2XFLN"
crawlbase_normal_token = "<Normal requests token>"
crawlbase_crawling_api_parameters = "scraper=amazon-product-details"
crawlbase_smart_proxy_url = (
    f"https://{crawlbase_normal_token}:@smartproxy.crawlbase.com:8013"
)

try:
    response = requests.get(
        url=url_to_crawl,
        headers={"CrawlbaseAPI-Parameters": crawlbase_crawling_api_parameters},
        proxies={"http": crawlbase_smart_proxy_url, "https": crawlbase_smart_proxy_url},
        verify=False,
        timeout=30,
    )
    response.raise_for_status()

    json_data = json.loads(response.text)
    product_reviews = json_data["body"]["reviews"]

    # start of newly replaced code
    with open("product_reviews.csv", "w", newline="") as file:
        writer = csv.writer(file)
        writer.writerow(["Author", "Rating", "Date", "Review"])  # Header
        for review in product_reviews:
            writer.writerow(
                [
                    review["reviewerName"],
                    review["reviewRating"],
                    review["reviewDate"],
                    review["reviewText"],
                ]
            )
    # end of newly replaced code

except requests.exceptions.RequestException as e:
    print(f"An error occurred: {e}")

You may refer to our GitHub repository for the source code.

This simple snippet writes to a new CSV file named product_reviews.csv.

This is a basic use case for how to interact with Amazon’s product pages, and you can adapt the script for different tasks, such as extracting other product details like prices, ASIN values, and descriptions.

We've published the complete code for this solution on GitHub. You can view it here.

DEV Community

How to Unblock Amazon with Smart Proxy

Setting Up Your Coding Environment

Obtaining Credentials

Making Your First Successful Request

Key Things to Know

Unblocking Amazon with Smart Proxy: A Practical Use Case

Extracting Specific Data

How This Works

Compiling Extracted Data into CSV

Top comments (0)