DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Amazon Reviews in 2026: Product Intelligence with Python

Amazon reviews are one of the most valuable datasets in e-commerce. Whether you're doing product research, competitor analysis, or sentiment tracking, knowing how to programmatically access this data gives you a significant edge.

This guide covers everything you need to scrape Amazon reviews in 2026 — from understanding the protections to writing production-ready code.

Why Amazon Reviews Are Hard to Scrape

Amazon runs some of the most sophisticated bot detection in the world:

  • Dynamic HTML — review content is loaded via JavaScript in many cases
  • TLS fingerprinting — Amazon checks your TLS hello packet for browser signatures
  • Behavioral analysis — too many requests in sequence triggers CAPTCHA
  • IP reputation scoring — datacenter IPs are flagged immediately

A basic requests.get() call won't work. You'll get blocked within the first few requests.

Approach 1: Direct Scraping with Proxy Rotation

The most reliable approach for production use is combining a residential proxy with a stealth browser. ScraperAPI handles this for you automatically:

import requests
from urllib.parse import quote
import json
from bs4 import BeautifulSoup

API_KEY = 'YOUR_SCRAPERAPI_KEY'

def scrape_amazon_reviews(asin: str, page: int = 1, star_filter: str = 'all'):
    """Scrape Amazon product reviews by ASIN."""
    # Amazon review URL format
    url = f'https://www.amazon.com/product-reviews/{asin}/?pageNumber={page}&filterByStar={star_filter}&reviewerType=all_reviews'

    # ScraperAPI handles all anti-bot measures
    api_url = f'http://api.scraperapi.com?api_key={API_KEY}&url={quote(url)}&render=true&country_code=us'

    response = requests.get(api_url, timeout=60)
    return parse_reviews(response.text)

def parse_reviews(html: str) -> list[dict]:
    """Parse review HTML into structured data."""
    soup = BeautifulSoup(html, 'html.parser')
    reviews = []

    for card in soup.select('[data-hook="review"]'):
        # Rating
        rating_el = card.select_one('[data-hook="review-star-rating"]')
        rating = None
        if rating_el:
            rating_text = rating_el.get('class', [])
            # Classes like 'a-star-5' contain the rating
            for cls in rating_el.get('class', []):
                if cls.startswith('a-star-'):
                    try:
                        rating = int(cls.split('-')[-1])
                    except:
                        pass

        # Review text
        text_el = card.select_one('[data-hook="review-body"] span')
        text = text_el.get_text(strip=True) if text_el else ''

        # Reviewer name
        author_el = card.select_one('.a-profile-name')
        author = author_el.get_text(strip=True) if author_el else 'Anonymous'

        # Date
        date_el = card.select_one('[data-hook="review-date"]')
        date = date_el.get_text(strip=True) if date_el else ''

        # Helpful votes
        helpful_el = card.select_one('[data-hook="helpful-vote-statement"]')
        helpful = helpful_el.get_text(strip=True) if helpful_el else '0 people found this helpful'

        # Verified purchase
        verified_el = card.select_one('[data-hook="avp-badge"]')
        verified = verified_el is not None

        reviews.append({
            'rating': rating,
            'text': text,
            'author': author,
            'date': date,
            'helpful': helpful,
            'verified_purchase': verified
        })

    return reviews

# Usage
asin = 'B08N5WRWNW'  # Replace with your target ASIN
reviews = scrape_amazon_reviews(asin, page=1)
for r in reviews[:3]:
    stars = '' * (r['rating'] or 0) + '' * (5 - (r['rating'] or 0))
    print(f"{stars} | {r['author']}: {r['text'][:100]}")
Enter fullscreen mode Exit fullscreen mode

Approach 2: Amazon's Customer Reviews API

If you're a seller or brand, Amazon's Product Advertising API v5 gives you structured access to your own product data. But it has major limitations:

  • Only accessible to Amazon Associates members with recent sales
  • Rate limited to 1 request/second (max 8,640/day)
  • Cannot access competitor reviews — only your own
  • Response doesn't include full review text in all cases

For competitive intelligence or research purposes, you need to scrape.

Building a Review Intelligence System

Here's a complete pipeline that collects reviews, analyzes sentiment, and alerts on rating drops:

import sqlite3
from datetime import datetime
import statistics
from textblob import TextBlob  # pip install textblob
import requests
from urllib.parse import quote
from bs4 import BeautifulSoup

DB_PATH = 'reviews.db'
API_KEY = 'YOUR_SCRAPERAPI_KEY'

def init_db():
    conn = sqlite3.connect(DB_PATH)
    conn.execute('''
        CREATE TABLE IF NOT EXISTS reviews (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            asin TEXT,
            rating INTEGER,
            text TEXT,
            author TEXT,
            date TEXT,
            verified INTEGER,
            sentiment REAL,
            scraped_at TEXT
        )
    ''')
    conn.commit()
    return conn

def analyze_sentiment(text: str) -> float:
    """Returns polarity score: -1 (negative) to +1 (positive)."""
    return TextBlob(text).sentiment.polarity

def collect_reviews(asin: str, max_pages: int = 5):
    """Collect and store reviews for an ASIN."""
    conn = init_db()
    all_reviews = []

    for page in range(1, max_pages + 1):
        print(f'Scraping page {page}...')

        url = f'https://www.amazon.com/product-reviews/{asin}/?pageNumber={page}&reviewerType=all_reviews'
        api_url = f'http://api.scraperapi.com?api_key={API_KEY}&url={quote(url)}&render=true&country_code=us'

        resp = requests.get(api_url, timeout=60)
        reviews = parse_reviews(resp.text)

        if not reviews:
            break

        now = datetime.now().isoformat()
        for r in reviews:
            sentiment = analyze_sentiment(r['text'])
            conn.execute(
                'INSERT INTO reviews (asin, rating, text, author, date, verified, sentiment, scraped_at) VALUES (?, ?, ?, ?, ?, ?, ?, ?)',
                (asin, r['rating'], r['text'], r['author'], r['date'], 1 if r['verified_purchase'] else 0, sentiment, now)
            )

        all_reviews.extend(reviews)
        conn.commit()

    conn.close()
    return all_reviews

def generate_report(asin: str):
    """Generate intelligence report for a product."""
    conn = sqlite3.connect(DB_PATH)
    cursor = conn.execute(
        'SELECT rating, sentiment, date FROM reviews WHERE asin = ? ORDER BY scraped_at DESC LIMIT 1000',
        (asin,)
    )
    rows = cursor.fetchall()
    conn.close()

    if not rows:
        return None

    ratings = [r[0] for r in rows if r[0]]
    sentiments = [r[1] for r in rows]

    report = {
        'asin': asin,
        'total_reviews': len(rows),
        'average_rating': round(statistics.mean(ratings), 2) if ratings else 0,
        'average_sentiment': round(statistics.mean(sentiments), 3),
        'five_star_pct': round(ratings.count(5) / len(ratings) * 100, 1) if ratings else 0,
        'one_star_pct': round(ratings.count(1) / len(ratings) * 100, 1) if ratings else 0,
        'recommendation': 'Investigate' if statistics.mean(ratings) < 3.5 else 'Normal'
    }
    return report

# Example
asin = 'B08N5WRWNW'
collect_reviews(asin, max_pages=3)
report = generate_report(asin)
print(f"Product {asin}:")
print(f"  Average rating: {report['average_rating']}/5")
print(f"  Sentiment score: {report['average_sentiment']}")
print(f"  5-star reviews: {report['five_star_pct']}%")
print(f"  1-star reviews: {report['one_star_pct']}%")
print(f"  Status: {report['recommendation']}")
Enter fullscreen mode Exit fullscreen mode

Use Cases That Pay

1. Review Monitoring SaaS

Brands pay $200-$500/month for tools that alert them to negative review spikes. Build it with this pipeline + Slack webhook.

2. Competitor Intelligence

E-commerce brands want to know when competitors' ratings drop (sales opportunity). Agencies charge $1,000+/month for this data.

3. Review-Based SEO

Product content teams use review data to find keyword gaps and answer common customer questions in their copy.

4. Market Research

Venture funds and private equity firms pay for systematic review analysis before acquisitions.

ScrapeOps Alternative

ScrapeOps is another proxy service worth testing for Amazon. It offers:

  • Amazon-specific proxies optimized for the site
  • SERP scraping included in plans
  • Starts at $9/month for 1,000 requests

Both ScraperAPI and ScrapeOps offer free tiers to test before committing.

Performance Optimization

  1. Scrape only changed reviews — sort by recency and stop when you hit a known date
  2. Cache product pages — Amazon product details change rarely, cache for 24h
  3. Rate limiting — even with a proxy, don't exceed ~10 requests/minute to avoid account flags
  4. Error handling — implement exponential backoff for 429/503 responses
import time
from functools import wraps

def retry_with_backoff(max_retries=3):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise
                    wait = 2 ** attempt  # 1s, 2s, 4s
                    print(f'Retry {attempt+1}/{max_retries} after {wait}s: {e}')
                    time.sleep(wait)
        return wrapper
    return decorator

@retry_with_backoff(max_retries=3)
def safe_scrape(asin, page):
    return scrape_amazon_reviews(asin, page)
Enter fullscreen mode Exit fullscreen mode

Conclusion

Amazon review scraping is powerful but requires the right tools. ScraperAPI handles the hardest parts — TLS fingerprinting, IP rotation, JavaScript rendering — so you can focus on building the intelligence layer on top.

Start with the basic parser, add sentiment analysis, then build the monitoring layer as your needs grow. The commercial use cases are real and businesses pay for them.


What are you building with Amazon data? Share in the comments below.

Top comments (0)