DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Amazon in 2026: Products, Reviews, Prices, and Rankings

Amazon is the world's largest e-commerce platform with over 350 million products. Whether you're tracking competitor prices, monitoring product reviews, or building a price comparison tool, Amazon data is incredibly valuable.

But scraping Amazon in 2026 is harder than ever. Let's break down what works, what doesn't, and how to do it without getting blocked or sued.

Why Scraping Amazon Is Challenging

Anti-Bot Detection

Amazon runs one of the most sophisticated anti-bot systems on the internet. Their defenses include:

  • CAPTCHA challenges triggered by unusual browsing patterns
  • IP fingerprinting that tracks request frequency per IP
  • Browser fingerprinting detecting headless browsers and automation tools
  • Dynamic page structures where CSS classes and HTML IDs change regularly
  • Rate limiting that throttles or blocks IPs making too many requests

If you send more than a handful of requests per minute from the same IP, expect to see CAPTCHAs or outright blocks.

Legal Considerations

Amazon's robots.txt explicitly disallows scraping most product pages. Their Terms of Service prohibit automated data collection. While the legal landscape around web scraping has evolved (the hiQ v. LinkedIn ruling established that scraping public data isn't necessarily illegal), Amazon has actively pursued legal action against scrapers.

The safest approach? Use official APIs where possible, and if you must scrape, be respectful — low volume, reasonable delays, and never scrape behind login walls.

Method 1: Amazon Product Advertising API (Official)

The cleanest and most legal approach is Amazon's own Product Advertising API (PA-API 5.0).

What You Get

  • Product titles, descriptions, and images
  • Current prices and availability
  • Customer ratings (aggregate, not individual reviews)
  • Browse node categories
  • Search results for keywords

What You Don't Get

  • Individual customer reviews (text)
  • Seller information
  • Historical pricing data
  • Real-time ranking data beyond bestseller lists

Setup

import requests
import json
import hashlib
import hmac
from datetime import datetime

# You need an Amazon Associates account to get API access
ACCESS_KEY = "your-access-key"
SECRET_KEY = "your-secret-key"
PARTNER_TAG = "your-partner-tag"

def search_amazon(keywords, category="All"):
    payload = {
        "Keywords": keywords,
        "Resources": [
            "ItemInfo.Title",
            "Offers.Listings.Price",
            "Images.Primary.Large",
            "BrowseNodeInfo.BrowseNodes"
        ],
        "SearchIndex": category,
        "ItemCount": 10,
        "PartnerTag": PARTNER_TAG,
        "PartnerType": "Associates",
        "Marketplace": "www.amazon.com"
    }
    # Sign and send the request using PA-API 5.0 signing
    # (Full signing code omitted for brevity — see AWS docs)
    return response.json()
Enter fullscreen mode Exit fullscreen mode

Limitations

PA-API has strict rate limits (1 request per second for new associates, scaling up based on your earnings). You also need an active Amazon Associates account with qualifying sales to maintain access.

Method 2: Third-Party APIs

If the official API doesn't cover your needs, third-party scraping APIs handle the hard parts for you.

Rainforest API

Rainforest API is purpose-built for Amazon data. It handles proxy rotation, CAPTCHA solving, and returns structured JSON:

import requests

params = {
    "api_key": "YOUR_RAINFOREST_KEY",
    "type": "product",
    "asin": "B0BSHF7WHW",
    "amazon_domain": "amazon.com"
}

response = requests.get(
    "https://api.rainforestapi.com/request",
    params=params
)
product = response.json()["product"]
print(f"Title: {product['title']}")
print(f"Price: {product['buybox_winner']['price']['value']}")
print(f"Rating: {product['rating']}")
Enter fullscreen mode Exit fullscreen mode

ScraperAPI

ScraperAPI is a more general-purpose solution that works great for Amazon. It handles proxies, browsers, and CAPTCHAs automatically:

import requests

url = "https://api.scraperapi.com"
params = {
    "api_key": "YOUR_SCRAPERAPI_KEY",
    "url": "https://www.amazon.com/dp/B0BSHF7WHW",
    "render": "true"  # JavaScript rendering for dynamic content
}

response = requests.get(url, params=params)
# Parse the HTML response with BeautifulSoup
Enter fullscreen mode Exit fullscreen mode

ScraperAPI is especially useful if you're scraping multiple sites beyond just Amazon, since the same API works for any website.

Method 3: DIY Scraping With Proxies

If you want full control, you can build your own scraper. But you'll need serious proxy infrastructure to avoid blocks.

Basic Setup

import requests
from bs4 import BeautifulSoup
import time
import random

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/120.0.0.0 Safari/537.36",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept": "text/html,application/xhtml+xml"
}

def scrape_product(asin, proxy=None):
    url = f"https://www.amazon.com/dp/{asin}"
    proxies = {"http": proxy, "https": proxy} if proxy else None

    response = requests.get(
        url,
        headers=HEADERS,
        proxies=proxies,
        timeout=15
    )

    if response.status_code != 200:
        print(f"Blocked or error: {response.status_code}")
        return None

    soup = BeautifulSoup(response.text, "html.parser")

    title = soup.select_one("#productTitle")
    price = soup.select_one(".a-price .a-offscreen")
    rating = soup.select_one("#acrPopover span.a-size-base")

    return {
        "title": title.text.strip() if title else None,
        "price": price.text.strip() if price else None,
        "rating": rating.text.strip() if rating else None
    }

# Always add delays between requests
for asin in ["B0BSHF7WHW", "B0CHX3QBCH"]:
    result = scrape_product(asin)
    print(result)
    time.sleep(random.uniform(3, 7))  # Random delay
Enter fullscreen mode Exit fullscreen mode

The Proxy Problem

Without proxies, you'll get blocked after 10-20 requests. Residential proxies are essential for Amazon scraping at any scale.

ThorData provides residential proxies that work well for e-commerce scraping. Their rotating proxy pool helps distribute requests across different IPs:

# Using ThorData residential proxies
proxy = "http://user:pass@proxy.thordata.com:9090"
result = scrape_product("B0BSHF7WHW", proxy=proxy)
Enter fullscreen mode Exit fullscreen mode

What to Watch Out For

  1. Rotate User-Agents — don't use the same one for every request
  2. Randomize delays — fixed intervals are a fingerprinting signal
  3. Handle CAPTCHAs gracefully — back off when you hit them, don't hammer
  4. Monitor your success rate — if it drops below 80%, slow down
  5. Don't scrape while logged in — that violates ToS more clearly

Method 4: Platform-Based Scraping

Platforms like Apify provide ready-made scraping actors that run in the cloud. While we don't currently have a dedicated Amazon scraper, we do offer eBay and AliExpress scrapers as alternatives for e-commerce data. Apify's marketplace has several Amazon-specific actors built by the community.

The advantage of platform-based scraping is that you don't manage proxies, servers, or browser infrastructure — it's all handled for you.

What Data Can You Actually Get?

Data Point Official API Third-Party API DIY Scraping
Product title & description
Current price
Customer ratings (aggregate)
Individual reviews
Seller info
Historical prices Some
Search rankings Limited
Images

Which Method Should You Choose?

Use the official PA-API if:

  • You only need basic product data (title, price, images, ratings)
  • You're building an affiliate site and already have an Associates account
  • You want zero legal risk

Use a third-party API (ScraperAPI, Rainforest) if:

  • You need review text, seller data, or search rankings
  • You want structured data without parsing HTML
  • You're scraping at moderate scale (thousands of products)

Build your own scraper if:

  • You need full control over what data you collect
  • You have proxy infrastructure already
  • You're comfortable maintaining scrapers as Amazon changes their HTML

Final Thoughts

Amazon scraping in 2026 is a cat-and-mouse game. The official API is limited but safe. Third-party APIs are the best balance of data coverage and reliability. DIY scraping gives you the most flexibility but requires ongoing maintenance.

Whatever approach you choose, respect rate limits, add delays between requests, and stay within legal boundaries. The data is valuable — but not worth a lawsuit or a permanent IP ban.


Building scrapers for e-commerce data? Check out our Apify actors for eBay and AliExpress scraping, or use ScraperAPI and ThorData proxies to build your own.

Top comments (0)