Amazon is the most scraped website on the internet, and also one of the most actively defended. Whether you're building a price monitoring tool, competitor analysis system, or product research pipeline, choosing the wrong scraping API can mean the difference between 100% success and a wall of CAPTCHAs.
This article uses benchmark data from ScrapeOps, which independently tests scraping APIs against real websites under identical conditions. For the raw numbers on Amazon specifically, see scrapeops.io/websites/amazon/ — the source readers should check for up-to-date benchmark results.
Let's look at what actually works.
Why Amazon Is Hard to Scrape
Amazon's difficulty score in the ScrapeOps benchmark suite sits at around 40 out of 100 — moderate, not extreme. But that number is deceptive. The volume of people scraping Amazon means Amazon's defenses are tuned to a razor's edge. You're not up against generic bot detection — you're up against a system that sees millions of scraper attempts per day and has learned to fingerprint all of them.
Here's what you're actually dealing with:
1. Behavioral Fingerprinting
Amazon doesn't just check your IP or user agent — it watches how you behave. Request timing, scroll velocity, mouse movement patterns, click intervals. Scrapers are too consistent. They click at exactly the same speed every time, never pause to read, never hover over images. Amazon's JavaScript detects this and silently flags your session before you've even hit page two.
2. Dynamic Content Loading
Product prices, stock status, "Buy Box" winner data, and frequently bought together sections all load via JavaScript after the initial HTML response. A naive requests + BeautifulSoup pipeline won't see any of it. You'll scrape a shell of the page and wonder why your price fields are empty.
3. CAPTCHA Challenges
Once Amazon suspects a bot, it serves a CAPTCHA. After a few CAPTCHA failures, your IP gets a temporary block — typically 24 to 72 hours. With residential proxies, you can rotate around this, but cheap proxy pools are already burned. Amazon maintains reputation databases for known proxy IP ranges.
4. Rate Limiting and IP Reputation
Even with clean IPs and realistic headers, hammering Amazon at high volume triggers IP-level throttling. The safe request rate for a single IP is surprisingly low — somewhere in the range of 1 request every 2-5 seconds for sustained scraping. Anything faster and you start seeing 503s.
5. Geofencing and Localized Content
Amazon serves different prices, availability, and even product listings by geography. If you're scraping from EU IPs for US Amazon data, you may get regional redirects or localized content that doesn't match what your users see. Most enterprise scraping APIs handle geo-targeting, but you need to configure it explicitly.
6. Cookie and Session State
Amazon uses cookies to track session behavior. Scrapers that rotate IPs without rotating corresponding cookies, or that reuse session state across unrelated requests, get flagged quickly. Proper session management is a surprisingly common failure mode.
Benchmark Results: How APIs Perform on Amazon
The following data is drawn from the ScrapeOps Amazon benchmark — an independent test suite that measures success rates, latency, and actual cost per successful request across providers. Check the source for the latest numbers, as these are updated periodically.
| Provider | Success Rate | Avg Latency | Cost per 1K Successful Requests |
|---|---|---|---|
| ScraperAPI | ~100% | ~5.4s | ~$745 |
| Scrapfly | ~100% | ~17.4s | ~$800 |
| Scrapingant | ~93% | ~12.3s | ~$190 |
| Scrape.do | ~87% | ~5.8s | ~$290 |
| Zyte API | ~73% | ~10.2s | ~$230 |
| Scrapingdog | ~67% | ~17.3s | ~$200 |
| ZenRows | ~40% | ~5.5s | ~$276 |
What the numbers tell you:
- Top performers on success rate are ScraperAPI and Scrapfly, both hitting near-100% on Amazon. The gap between these and the third tier (87% and below) is significant at scale — a 13% failure rate on 100,000 requests means 13,000 wasted calls you still pay for.
- Latency varies 3x between providers. ScraperAPI's ~5.4s average versus Scrapfly's ~17.4s matters when you're polling thousands of ASINs. At 100% success with 17s latency, you can process ~3 ASINs per minute per concurrent thread. At 5.4s, you get roughly 11.
- Cost efficiency matters more than sticker price. ZenRows looks cheap per request, but at 40% success, your effective cost per successful request is higher than providers charging more per call. Always calculate cost-per-success, not cost-per-request.
- Scrape.do is the speed-value balance point. 87% success at ~5.8s latency and ~$290/1K successful requests makes it competitive for teams where budget is a constraint but reliability still matters.
Source: ScrapeOps Amazon Benchmark
Practical Python: Scraping Amazon Product Pages
Here's a working pattern for scraping Amazon product pages using ScraperAPI. The same structure applies to any proxy-based scraping API — swap the endpoint and API key.
import requests
import time
from bs4 import BeautifulSoup
SCRAPERAPI_KEY = "your_api_key_here"
SCRAPERAPI_URL = "http://api.scraperapi.com"
def scrape_amazon_product(asin: str, country: str = "us") -> dict:
"""
Scrape an Amazon product page by ASIN.
Returns a dict with title, price, rating, review_count.
"""
target_url = f"https://www.amazon.com/dp/{asin}"
params = {
"api_key": SCRAPERAPI_KEY,
"url": target_url,
"country_code": country,
"render": "true", # JavaScript rendering
"premium": "true", # Residential IP pool
}
try:
response = requests.get(SCRAPERAPI_URL, params=params, timeout=60)
response.raise_for_status()
except requests.exceptions.RequestException as e:
print(f"Request failed for ASIN {asin}: {e}")
return {}
soup = BeautifulSoup(response.text, "html.parser")
# Extract product title
title_el = soup.select_one("#productTitle")
title = title_el.get_text(strip=True) if title_el else None
# Extract price (handles split dollar/cents format)
price = None
price_whole = soup.select_one(".a-price-whole")
price_fraction = soup.select_one(".a-price-fraction")
if price_whole and price_fraction:
price = f"${price_whole.get_text(strip=True)}{price_fraction.get_text(strip=True)}"
elif price_whole:
price = f"${price_whole.get_text(strip=True)}"
# Extract rating
rating_el = soup.select_one("span[data-hook='rating-out-of-text']")
if not rating_el:
rating_el = soup.select_one(".a-icon-alt")
rating = rating_el.get_text(strip=True) if rating_el else None
# Extract review count
review_el = soup.select_one("#acrCustomerReviewText")
review_count = review_el.get_text(strip=True) if review_el else None
return {
"asin": asin,
"title": title,
"price": price,
"rating": rating,
"review_count": review_count,
"url": target_url,
}
def batch_scrape(asins: list, delay: float = 1.0) -> list:
"""
Scrape a list of ASINs with a delay between requests.
Returns list of product dicts.
"""
results = []
for i, asin in enumerate(asins):
print(f"Scraping {i+1}/{len(asins)}: {asin}")
data = scrape_amazon_product(asin)
if data:
results.append(data)
time.sleep(delay)
return results
# Example usage
if __name__ == "__main__":
test_asins = ["B09G9FPHY6", "B0BDHX8Z63", "B07XJ8C8F7"]
products = batch_scrape(test_asins, delay=1.5)
for p in products:
print(f"{p['asin']}: {p['title'][:50]}... | {p['price']} | {p['rating']}")
Key things this code handles:
-
render: truetriggers JavaScript execution so dynamic content (prices, stock) loads before the page is returned. -
premium: trueroutes through residential IP pools rather than datacenter IPs, reducing CAPTCHA frequency significantly. -
country_codepins the geo so you get consistent US Amazon results regardless of where your server is located. - The
batch_scrapefunction adds a delay between requests — even through a proxy API, hammering requests too fast can exhaust your credits on retries.
For production use, wrap this in a retry loop with exponential backoff. Even at 100% success rates, network errors happen.
Choosing the Right API for Your Use Case
There is no single right answer — it depends on your volume, budget, and tolerance for failure.
High-volume price monitoring (100K+ requests/month): Prioritize success rate above everything else. At scale, a 13% failure rate costs you real money in credits and missed data. ScraperAPI or Scrapfly are worth the premium.
Research and one-off scraping (under 10K requests): Cost-efficiency matters more than marginal success rate improvements. Scrapingant's combination of reasonable success rate and low cost per request makes it a solid choice for periodic jobs.
Speed-sensitive applications (real-time price comparison): Latency is your bottleneck. ScraperAPI's ~5.4s average is significantly better than Scrapfly's ~17.4s. If you're serving live pricing to end users, that difference is user-visible.
Budget-constrained teams: Scrape.do offers a workable balance — not the highest success rate, but fast and reasonably priced. Use it when you need to get something working without a large monthly commitment.
What the Benchmark Doesn't Tell You
ScrapeOps benchmark data is excellent but it's a snapshot. A few things to account for in practice:
Specific page types vary. Amazon product pages (PDPs) behave differently from search result pages (SERPs), category pages, and review pages. A provider that hits 100% on product pages may struggle on search. Check the benchmark for the specific page type you're targeting.
Results change over time. Amazon periodically updates its bot detection. A provider's success rate can drop 20 points in a month after Amazon pushes a detection update. Providers scramble to adapt. Track your own success rates in production and be ready to switch.
Geographic variation matters. If you need UK, DE, or JP Amazon data alongside US, test each region. Not all providers perform equally across geographies.
Get Started
-
ScraperAPI — Use code
SCRAPE13833889for 50% off your first month. - Scrape.do — Fast response times with competitive pricing for mid-volume scraping.
- ScrapeOps — The benchmark source, plus proxy aggregator and monitoring tools. Worth using for tracking your own scraper health in production.
Want the full breakdown — including code for Google, LinkedIn, Zillow, and Reddit scraping, plus a proxy selection guide and cost calculator? Get the full guide: The Complete Web Scraping Playbook 2026 — 48 pages, $9.
Top comments (0)