Amazon is the world's largest e-commerce platform with over 350 million products. Whether you're tracking competitor prices, monitoring product reviews, or building a price comparison tool, Amazon data is incredibly valuable.
But scraping Amazon in 2026 is harder than ever. Let's break down what works, what doesn't, and how to do it without getting blocked or sued.
Why Scraping Amazon Is Challenging
Anti-Bot Detection
Amazon runs one of the most sophisticated anti-bot systems on the internet. Their defenses include:
- CAPTCHA challenges triggered by unusual browsing patterns
- IP fingerprinting that tracks request frequency per IP
- Browser fingerprinting detecting headless browsers and automation tools
- Dynamic page structures where CSS classes and HTML IDs change regularly
- Rate limiting that throttles or blocks IPs making too many requests
If you send more than a handful of requests per minute from the same IP, expect to see CAPTCHAs or outright blocks.
Legal Considerations
Amazon's robots.txt explicitly disallows scraping most product pages. Their Terms of Service prohibit automated data collection. While the legal landscape around web scraping has evolved (the hiQ v. LinkedIn ruling established that scraping public data isn't necessarily illegal), Amazon has actively pursued legal action against scrapers.
The safest approach? Use official APIs where possible, and if you must scrape, be respectful — low volume, reasonable delays, and never scrape behind login walls.
Method 1: Amazon Product Advertising API (Official)
The cleanest and most legal approach is Amazon's own Product Advertising API (PA-API 5.0).
What You Get
- Product titles, descriptions, and images
- Current prices and availability
- Customer ratings (aggregate, not individual reviews)
- Browse node categories
- Search results for keywords
What You Don't Get
- Individual customer reviews (text)
- Seller information
- Historical pricing data
- Real-time ranking data beyond bestseller lists
Setup
import requests
import json
import hashlib
import hmac
from datetime import datetime
# You need an Amazon Associates account to get API access
ACCESS_KEY = "your-access-key"
SECRET_KEY = "your-secret-key"
PARTNER_TAG = "your-partner-tag"
def search_amazon(keywords, category="All"):
payload = {
"Keywords": keywords,
"Resources": [
"ItemInfo.Title",
"Offers.Listings.Price",
"Images.Primary.Large",
"BrowseNodeInfo.BrowseNodes"
],
"SearchIndex": category,
"ItemCount": 10,
"PartnerTag": PARTNER_TAG,
"PartnerType": "Associates",
"Marketplace": "www.amazon.com"
}
# Sign and send the request using PA-API 5.0 signing
# (Full signing code omitted for brevity — see AWS docs)
return response.json()
Limitations
PA-API has strict rate limits (1 request per second for new associates, scaling up based on your earnings). You also need an active Amazon Associates account with qualifying sales to maintain access.
Method 2: Third-Party APIs
If the official API doesn't cover your needs, third-party scraping APIs handle the hard parts for you.
Rainforest API
Rainforest API is purpose-built for Amazon data. It handles proxy rotation, CAPTCHA solving, and returns structured JSON:
import requests
params = {
"api_key": "YOUR_RAINFOREST_KEY",
"type": "product",
"asin": "B0BSHF7WHW",
"amazon_domain": "amazon.com"
}
response = requests.get(
"https://api.rainforestapi.com/request",
params=params
)
product = response.json()["product"]
print(f"Title: {product['title']}")
print(f"Price: {product['buybox_winner']['price']['value']}")
print(f"Rating: {product['rating']}")
ScraperAPI
ScraperAPI is a more general-purpose solution that works great for Amazon. It handles proxies, browsers, and CAPTCHAs automatically:
import requests
url = "https://api.scraperapi.com"
params = {
"api_key": "YOUR_SCRAPERAPI_KEY",
"url": "https://www.amazon.com/dp/B0BSHF7WHW",
"render": "true" # JavaScript rendering for dynamic content
}
response = requests.get(url, params=params)
# Parse the HTML response with BeautifulSoup
ScraperAPI is especially useful if you're scraping multiple sites beyond just Amazon, since the same API works for any website.
Method 3: DIY Scraping With Proxies
If you want full control, you can build your own scraper. But you'll need serious proxy infrastructure to avoid blocks.
Basic Setup
import requests
from bs4 import BeautifulSoup
import time
import random
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/120.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Accept": "text/html,application/xhtml+xml"
}
def scrape_product(asin, proxy=None):
url = f"https://www.amazon.com/dp/{asin}"
proxies = {"http": proxy, "https": proxy} if proxy else None
response = requests.get(
url,
headers=HEADERS,
proxies=proxies,
timeout=15
)
if response.status_code != 200:
print(f"Blocked or error: {response.status_code}")
return None
soup = BeautifulSoup(response.text, "html.parser")
title = soup.select_one("#productTitle")
price = soup.select_one(".a-price .a-offscreen")
rating = soup.select_one("#acrPopover span.a-size-base")
return {
"title": title.text.strip() if title else None,
"price": price.text.strip() if price else None,
"rating": rating.text.strip() if rating else None
}
# Always add delays between requests
for asin in ["B0BSHF7WHW", "B0CHX3QBCH"]:
result = scrape_product(asin)
print(result)
time.sleep(random.uniform(3, 7)) # Random delay
The Proxy Problem
Without proxies, you'll get blocked after 10-20 requests. Residential proxies are essential for Amazon scraping at any scale.
ThorData provides residential proxies that work well for e-commerce scraping. Their rotating proxy pool helps distribute requests across different IPs:
# Using ThorData residential proxies
proxy = "http://user:pass@proxy.thordata.com:9090"
result = scrape_product("B0BSHF7WHW", proxy=proxy)
What to Watch Out For
- Rotate User-Agents — don't use the same one for every request
- Randomize delays — fixed intervals are a fingerprinting signal
- Handle CAPTCHAs gracefully — back off when you hit them, don't hammer
- Monitor your success rate — if it drops below 80%, slow down
- Don't scrape while logged in — that violates ToS more clearly
Method 4: Platform-Based Scraping
Platforms like Apify provide ready-made scraping actors that run in the cloud. While we don't currently have a dedicated Amazon scraper, we do offer eBay and AliExpress scrapers as alternatives for e-commerce data. Apify's marketplace has several Amazon-specific actors built by the community.
The advantage of platform-based scraping is that you don't manage proxies, servers, or browser infrastructure — it's all handled for you.
What Data Can You Actually Get?
| Data Point | Official API | Third-Party API | DIY Scraping |
|---|---|---|---|
| Product title & description | ✅ | ✅ | ✅ |
| Current price | ✅ | ✅ | ✅ |
| Customer ratings (aggregate) | ✅ | ✅ | ✅ |
| Individual reviews | ❌ | ✅ | ✅ |
| Seller info | ❌ | ✅ | ✅ |
| Historical prices | ❌ | Some | ❌ |
| Search rankings | Limited | ✅ | ✅ |
| Images | ✅ | ✅ | ✅ |
Which Method Should You Choose?
Use the official PA-API if:
- You only need basic product data (title, price, images, ratings)
- You're building an affiliate site and already have an Associates account
- You want zero legal risk
Use a third-party API (ScraperAPI, Rainforest) if:
- You need review text, seller data, or search rankings
- You want structured data without parsing HTML
- You're scraping at moderate scale (thousands of products)
Build your own scraper if:
- You need full control over what data you collect
- You have proxy infrastructure already
- You're comfortable maintaining scrapers as Amazon changes their HTML
Final Thoughts
Amazon scraping in 2026 is a cat-and-mouse game. The official API is limited but safe. Third-party APIs are the best balance of data coverage and reliability. DIY scraping gives you the most flexibility but requires ongoing maintenance.
Whatever approach you choose, respect rate limits, add delays between requests, and stay within legal boundaries. The data is valuable — but not worth a lawsuit or a permanent IP ban.
Building scrapers for e-commerce data? Check out our Apify actors for eBay and AliExpress scraping, or use ScraperAPI and ThorData proxies to build your own.
Top comments (0)