This blog was initially posted to Crawlbase Blog
This guide will show you how to unblock Amazon proxy issues and scrape data reliably using Crawlbase’s Smart Proxy. A complete guide to scraping Amazon effectively.
Setting Up Your Coding Environment
Before building your Amazon proxy unblocker, you'll need to set up a basic Python environment. Here's how to get started:
- Install Python 3 on your computer
- Install
requests
module, which makes it easy to send HTTP requests in Python.
python -m pip install requests
Note: You can write and run your code using any text editor, but using an IDE can speed things up. Tools like PyCharm or VS Code are great for writing Python code, especially for beginners, as they include helpful features like syntax highlighting, error checking, and debugging tools.
Obtaining Credentials
- Sign up for a Crawlbase account and log in to receive your 5,000 free requests
- Get your Smart Proxy Private token
Making Your First Successful Request
At this point, your coding environment should be ready. Let’s try sending your first request.
In this example code, we'll try to fetch the HTML content of this Amazon product details page. You are free to copy this code, but make sure to replace the Private_token with the actual token or authentication key obtained from your Crawlbase account.
import requests
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
url_to_crawl = "https://www.amazon.com/Apple-iPhone-Silicone-Case-MagSafe/dp/B0CHX2XFLN"
crawlbase_normal_token = "<Normal requests token>"
crawlbase_smart_proxy_url = (
f"https://{crawlbase_normal_token}:@smartproxy.crawlbase.com:8013"
)
try:
response = requests.get(
url=url_to_crawl,
proxies={
"http": crawlbase_smart_proxy_url,
"https": crawlbase_smart_proxy_url
},
verify=False,
timeout=30,
)
response.raise_for_status()
print("Response Code:", response.status_code)
print("Response Body:", response.text)
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
You may refer to our GitHub repository for the source code.
Key Things to Know
-
Smart Proxy URL: The format
https://<TOKEN>:@smartproxy.crawlbase.com:8013
is how authentication is handled. Your token is used as the username in the proxy connection. - verify=False: This disables SSL verification on the client side, which is required here because SSL is handled by the proxy itself, as noted in the Smart Proxy documentation.
Once you run this code, you should see a 200 response and the full HTML of the Amazon product page similar to the image below.
Unblocking Amazon with Smart Proxy: A Practical Use Case
Now, let’s apply what you’ve learned into practice. We’ll show you how to extract a list of reviews from an Amazon product page and save the data into a CSV file.
Extracting Specific Data
We'll use the Data Scraper feature from Crawlbase called Amazon-product-details Scraper via the CrawlbaseAPI-Parameters header. This allows our code to automatically parse the Amazon page and return clean, structured JSON data.
import requests
import json
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
url_to_crawl = "https://www.amazon.com/Apple-iPhone-Silicone-Case-MagSafe/dp/B0CHX2XFLN"
crawlbase_normal_token = "<Normal requests token>"
crawlbase_crawling_api_parameters = "scraper=amazon-product-details"
crawlbase_smart_proxy_url = (
f"https://{crawlbase_normal_token}:@smartproxy.crawlbase.com:8013"
)
try:
response = requests.get(
url=url_to_crawl,
headers={"CrawlbaseAPI-Parameters": crawlbase_crawling_api_parameters},
proxies={"http": crawlbase_smart_proxy_url, "https": crawlbase_smart_proxy_url},
verify=False,
timeout=30,
)
response.raise_for_status()
json_data = json.loads(response.text)
product_reviews = json_data["body"]["reviews"]
for review in product_reviews:
# TODO save the values here in a CSV file
# but console print for now
print("--------------------")
print("Author: ", review["reviewerName"])
print("Rating: ", review["reviewRating"])
print(" Date: ", review["reviewDate"])
print("Review: ", review["reviewText"])
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
You may refer to our GitHub repository for the source code.
How This Works
-
CrawlbaseAPI-Parameters: The
scraper=amazon-product-details
parameter tells Crawlbase to analyze the product page and return structured JSON that includes reviews, ratings, product info, etc. -
Print JSON response: We extract the list of reviews from
json_data["body"]["reviews"]
and loop through them. For each product review, we print the Author, Rating, Date, and Review text.
Compiling Extracted Data into CSV
Lastly, you could easily modify the code to save the reviews to a CSV file for later analysis. Here's an example of how to save the data.
import requests
import json
import csv # newly added code
from urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(category=InsecureRequestWarning)
url_to_crawl = "https://www.amazon.com/Apple-iPhone-Silicone-Case-MagSafe/dp/B0CHX2XFLN"
crawlbase_normal_token = "<Normal requests token>"
crawlbase_crawling_api_parameters = "scraper=amazon-product-details"
crawlbase_smart_proxy_url = (
f"https://{crawlbase_normal_token}:@smartproxy.crawlbase.com:8013"
)
try:
response = requests.get(
url=url_to_crawl,
headers={"CrawlbaseAPI-Parameters": crawlbase_crawling_api_parameters},
proxies={"http": crawlbase_smart_proxy_url, "https": crawlbase_smart_proxy_url},
verify=False,
timeout=30,
)
response.raise_for_status()
json_data = json.loads(response.text)
product_reviews = json_data["body"]["reviews"]
# start of newly replaced code
with open("product_reviews.csv", "w", newline="") as file:
writer = csv.writer(file)
writer.writerow(["Author", "Rating", "Date", "Review"]) # Header
for review in product_reviews:
writer.writerow(
[
review["reviewerName"],
review["reviewRating"],
review["reviewDate"],
review["reviewText"],
]
)
# end of newly replaced code
except requests.exceptions.RequestException as e:
print(f"An error occurred: {e}")
You may refer to our GitHub repository for the source code.
This simple snippet writes to a new CSV file named product_reviews.csv
.
This is a basic use case for how to interact with Amazon’s product pages, and you can adapt the script for different tasks, such as extracting other product details like prices, ASIN values, and descriptions.
We've published the complete code for this solution on GitHub. You can view it here.
Top comments (0)