agenthustler

Posted on Mar 16 • Originally published at thedatacollector.substack.com

How to Scrape Amazon Product Data in 2026 (Price Monitoring Guide)

#python #webscraping #ecommerce #tutorial

Amazon is one of the most valuable datasets on the internet. Real-time prices, competitor stock levels, customer reviews, ratings trends, historical price changes — all of it is there, waiting to be harvested. The problem: Amazon knows this and has spent billions making scraping as hard as possible.

If you've ever tried to scrape Amazon with requests and BeautifulSoup, you know the story. You send one request, get a 503 error. Send two, get blocked for an hour. Try rotating user agents? Now you hit CAPTCHA. This isn't an accident — Amazon employs sophisticated bot detection, rotating proxies into different edge locations, JavaScript rendering, and behavioral fingerprinting. Scraping Amazon in 2026 isn't like scraping Reddit. It's a real arms race.

But it's not impossible. This guide covers five practical methods: the official API (limited but safe), the naive Python approach (instructive but fragile), dedicated scraping APIs (expensive but reliable), pre-built Apify actors (fastest to deploy), and hybrid approaches for price monitoring at scale.

Why Scrape Amazon (And What You'll Actually Get)

Before we dive into methods, let's be honest about what you can realistically extract and why it matters:

Price Monitoring: Track competitor pricing on identical products. Most valuable for e-commerce businesses undercutting larger sellers or aggregators watching market shifts.

Historical Price Data: Amazon doesn't publicly expose price history, but by polling every 6-12 hours you can build your own dataset. Valuable for research, market analysis, or arbitrage strategies.

Competitor Analysis: Track which products your competitors are selling, review counts, ratings, stock status. Less dense than price data but useful for strategic decisions.

Product Discovery: Identify trending products, new categories, high-volume ASIN clusters. Good for dropshipping research, market validation, or content creation (e.g., "Best budget X products").

Review Sentiment: Scrape review text and ratings (though Amazon aggressively obfuscates review pages). Most scrapers focus on metadata (count, average rating) rather than full text.

Stock/Availability: Real-time "In Stock" vs "Out of Stock" signals. Useful for arbitrage or identifying sudden demand spikes.

What you won't get: customer personal data, private seller information, or anything behind authentication. Amazon's anti-scraping is primarily about protecting merchant data and controlling access to their search/recommendation algorithms.

Amazon's Anti-Scraping Arsenal (What You're Up Against)

Amazon doesn't just block scrapers — it fingerprints them. Here's the defensive stack:

1. User-Agent Filtering: Generic curl or requests user agents are immediately flagged. Amazon expects a real browser.

2. Behavioral Analysis: Amazon tracks:

Request patterns (too many requests to the same product in a short window)
Scroll velocity (are you hovering on product images like a human?)
Mouse movement (yes, JavaScript can detect if you move the mouse at realistic speeds)
Timing between clicks (robots are too consistent; humans waver)

3. JavaScript Rendering: Modern Amazon product pages load prices, stock status, and reviews via JavaScript. A simple HTTP GET won't capture dynamic data.

4. CAPTCHA and Challenges: If Amazon suspects a bot, you hit a CAPTCHA. After 3-5 failures, your IP gets temporarily blocked (24-72 hours).

5. IP Reputation: Amazon maintains a database of known proxy/VPN IP ranges. Cheap residential proxies are often burned (already flagged). Premium proxy providers (Bright Data, Luminati) rotate to avoid this, but are expensive.

6. Rate Limiting: Even with perfect headers, hammering Amazon with 100s of requests per minute will trigger IP-level throttling.

7. Cookies and Session Management: Amazon uses cookies to detect scraper patterns. Rotating cookies or failing to maintain session state flags your client as a bot.

The upshot: naive scraping (plain requests) will fail immediately. You need to either:

Use Amazon's official API (limited scope, safe)
Use a dedicated scraping service that handles the above (costs money, extremely reliable)
Use pre-built actors (Apify, same reliability, better for teams)
Accept that you'll get some blocked requests and build error handling (fragile, research-only)

Method 1: Amazon Product Advertising API (Official, Limited)

Amazon's official API is the safest, most legal option. But it's also the most constrained.

What It Does

The Product Advertising API (formerly called the MWS API, now branded as the "Amazon Associates API") lets you:

Search for products by keyword
Get product details (ASIN, title, price, ratings, image URLs)
Track price changes over time (if you poll regularly)
Get links for monetization (Amazon Associates affiliates)

What It Doesn't Do

Full historical price data (no price history endpoint)
Review text or detailed review metadata
Real-time stock tracking (only on some items)
Competitor seller information
Search ranking or trending products
Access to private seller metrics

Setup and Requirements

Sign up for Amazon Associates: Go to https://associates.amazon.com/
Verify your account: Takes 1-3 days. Amazon will ask where you plan to send traffic.
Get API credentials: In your Associates dashboard, request API access. You'll get:
- Access Key ID
- Secret Access Key
- Tracking ID (for affiliate links)
Install the SDK:

pip install amazon-product-advertising-api

Code Example: Search Products

from amazon_product_advertising_api import get_api

# Initialize the API client
api = get_api(
    access_key='YOUR_ACCESS_KEY',
    secret_key='YOUR_SECRET_KEY',
    partner_tag='YOUR_TRACKING_ID',
    region='US'  # or 'GB', 'DE', 'FR', 'JP', etc.
)

def search_products(keyword, max_results=10):
    """Search Amazon for products by keyword"""
    try:
        results = api.search_items(
            keywords=keyword,
            resources=['ItemInfo.Title', 'ItemInfo.ByLineInfo', 'Offers.Summaries.Price', 'CustomerReviews.StarRating']
        )

        products = []
        for item in results['SearchResult']['Items']:
            # Extract product data
            product = {
                'asin': item['ASIN'],
                'title': item['ItemInfo']['Title']['DisplayValue'],
                'price': None,
                'currency': None,
                'rating': None,
                'reviews': None,
                'url': item.get('DetailPageURL', '')
            }

            # Price (if available)
            if 'Offers' in item and 'Summaries' in item['Offers']:
                price_obj = item['Offers']['Summaries'][0]['Price']
                product['price'] = float(price_obj.get('Amount', 0))
                product['currency'] = price_obj.get('Currency', 'USD')

            # Rating
            if 'CustomerReviews' in item:
                rating = item['CustomerReviews'].get('StarRating', {}).get('DisplayValue')
                product['rating'] = float(rating) if rating else None

            products.append(product)

        return products
    except Exception as e:
        print(f"Error: {e}")
        return []

# Example usage
products = search_products('wireless headphones', max_results=5)
for p in products:
    print(f"{p['title'][:60]}")
    print(f"  ASIN: {p['asin']}")
    print(f"  Price: ${p['price']} {p['currency']}" if p['price'] else "  Price: N/A")
    print(f"  Rating: {p['rating']}/5.0" if p['rating'] else "  Rating: N/A")
    print()

Limits & Reality Check

Rate limit: 1 request per second (in practice, Amazon is lenient; 10+ per second usually works)
Monthly quota: 10,000 API calls/month on free tier (paid tier allows more)
Cost: Free (no charge per API call; you earn money through affiliate commissions)
Data freshness: Prices update hourly, but you're dependent on Amazon's API cache
Search scope: Limited to products Amazon's search index covers; some products excluded

When to Use This

Production services that need legal cover, small monitoring operations (< 10K lookups/month), or anything where you need to demonstrate compliance.

Method 2: Python + Requests + BeautifulSoup (Educational; Gets Blocked Fast)

This is how 95% of people start with web scraping. It rarely works for more than a few requests on Amazon, but it's instructive.

Why This Will Fail

import requests
from bs4 import BeautifulSoup

# This approach will fail. Do not use in production.
response = requests.get(f'https://www.amazon.com/s?k=wireless+headphones')
soup = BeautifulSoup(response.content, 'html.parser')

Within 1-3 requests, you'll hit:

403 Forbidden (IP blocked)
CAPTCHA challenge
<title>Robot Check</title> HTML page instead of product data

The reason: Amazon's servers can detect that:

The User-Agent is python-requests/2.x.x (screams "bot")
There are no browser headers (Accept-Language, Referer, etc.)
No cookies (HTTP requests are stateless; browsers maintain cookie jars)
The IP is recognized as a known scraper IP or data center

The "Better" Version (Still Fails, But Slower)

import requests
from bs4 import BeautifulSoup
import time
import random

def scrape_amazon_naive(keyword):
    """
    This improved version lasts a little longer but still gets blocked.
    Use only for educational purposes or small one-off scrapes.
    """

    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate',
        'DNT': '1',
        'Connection': 'keep-alive',
        'Referer': 'https://www.amazon.com/',
    }

    session = requests.Session()
    session.headers.update(headers)

    url = f'https://www.amazon.com/s?k={keyword}'

    try:
        # Add random delay to seem more human
        time.sleep(random.uniform(2, 5))

        response = session.get(url, timeout=10)
        response.raise_for_status()

        soup = BeautifulSoup(response.content, 'html.parser')

        # Amazon's HTML structure for product listings:
        products = []
        for item in soup.find_all('div', {'data-component-type': 's-search-result'}):
            title_elem = item.find('span', class_='a-size-medium a-color-base a-text-normal')
            price_elem = item.find('span', class_='a-price-whole')

            if title_elem:
                products.append({
                    'title': title_elem.get_text(strip=True),
                    'price': price_elem.get_text(strip=True) if price_elem else 'N/A'
                })

        return products
    except Exception as e:
        print(f"Scraping failed: {e}")
        return []

# Try it
products = scrape_amazon_naive('wireless headphones')
print(f"Found {len(products)} products")

Honest assessment: This will work for maybe 1-5 requests before Amazon blocks you. The added headers and session management help, but are insufficient. Amazon's detection is sophisticated enough to flag behavioral patterns — consistent timing, lack of JavaScript execution, missing cookies from browsing history, etc.

When to Use This

Never in production. Educational purposes only, or genuine one-off scrapes where you don't mind a manual CAPTCHA solve every few requests.

Method 3: Scraping APIs (ScraperAPI, Bright Data, etc.)

If you want reliable Amazon scraping without building a complex proxy rotation system yourself, dedicated scraping APIs are the solution.

How They Work

Services like ScraperAPI and Bright Data maintain massive networks of residential proxies (IPs from real homes/mobile networks, not data centers). When you send a request, they:

Route it through a residential IP
Handle CAPTCHAs automatically (with human workers or ML)
Render JavaScript if needed
Return clean HTML or JSON

The cost is higher than raw API or web scraping, but you get reliability.

ScraperAPI Example

import requests
import json
from datetime import datetime

SCRAPERAPI_KEY = 'YOUR_SCRAPERAPI_KEY'

def scrape_amazon_with_scraperapi(asin):
    """
    Use ScraperAPI to reliably scrape an Amazon product page.
    ScraperAPI handles proxies, CAPTCHAs, and JavaScript rendering.
    """

    url = f'https://www.amazon.com/dp/{asin}'

    payload = {
        'api_key': SCRAPERAPI_KEY,
        'url': url,
        'render': 'true',  # Enable JavaScript rendering (costs more)
    }

    try:
        # ScraperAPI's endpoint
        response = requests.get('http://api.scraperapi.com', params=payload, timeout=30)
        response.raise_for_status()

        html = response.text

        # Now parse with BeautifulSoup (HTML is clean, real)
        from bs4 import BeautifulSoup
        soup = BeautifulSoup(html, 'html.parser')

        # Extract product data
        product = {
            'asin': asin,
            'title': None,
            'price': None,
            'rating': None,
            'num_reviews': None,
            'in_stock': False,
            'scraped_at': datetime.now().isoformat()
        }

        # Title
        title_elem = soup.find('span', class_='a-size-large hxp-headline-text')
        if title_elem:
            product['title'] = title_elem.get_text(strip=True)

        # Price
        price_elem = soup.find('span', class_='a-price-whole')
        if price_elem:
            price_text = price_elem.get_text(strip=True)
            # Remove $ and commas, convert to float
            product['price'] = float(price_text.replace('$', '').replace(',', ''))

        # Rating
        rating_elem = soup.find('span', class_='a-icon-star-small')
        if rating_elem:
            rating_text = rating_elem.get_text(strip=True)
            try:
                product['rating'] = float(rating_text.split()[0])
            except:
                pass

        # Number of reviews
        reviews_elem = soup.find('span', {'data-hook': 'social-proof-fpa-link-analyzing'})
        if reviews_elem:
            reviews_text = reviews_elem.get_text(strip=True)
            try:
                product['num_reviews'] = int(reviews_text.split()[0].replace(',', ''))
            except:
                pass

        # Stock status
        stock_elem = soup.find(class_=lambda x: x and 'availability' in x.lower())
        if stock_elem:
            stock_text = stock_elem.get_text(strip=True).lower()
            product['in_stock'] = 'in stock' in stock_text

        return product
    except Exception as e:
        print(f"ScraperAPI scrape failed: {e}")
        return None

# Example usage
product = scrape_amazon_with_scraperapi('B0C3SJG2D2')  # Example ASIN for AirPods
if product:
    print(f"Product: {product['title']}")
    print(f"Price: ${product['price']}")
    print(f"Rating: {product['rating']}/5.0 ({product['num_reviews']} reviews)")
    print(f"In Stock: {product['in_stock']}")

Cost Comparison

Service	Cost	Features	Best For
ScraperAPI	$10/month (5K calls) - $300/month (500K calls)	Proxy rotation, CAPTCHA solving, JS rendering	Small to medium scale scraping
Bright Data	$100/month (25GB bandwidth) - up	Residential proxies, mobile proxies, dedicated IP pools	Large-scale commercial scraping
Apify	Pay-per-run + infrastructure ($0.05-0.20/1K items)	Pre-built actors, cloud execution, built-in monitors	Teams, no maintenance overhead

Limits & Reality Check

Cost: $0.02-0.05 per successful scrape (varies by provider)
Speed: 2-10 seconds per request (JavaScript rendering adds latency)
Reliability: 95-99% success rate; failed requests need retry logic
Rate limit: Depends on your tier; most allow 100-1000 requests/minute

When to Use This

Any production scraping service, ecommerce monitoring, price aggregation platforms, or data for resale. The cost is worth it for reliability and legal safety.

Method 4: Pre-Built Scrapers (Apify Actors for Amazon)

Apify is a platform of pre-built, tested web scrapers ("actors"). The Amazon actors handle all the hard stuff: anti-scraping logic, error handling, retries, structured data export.

Why Use Apify

You get 80% of the way there in 5 minutes, with no code. The actor runs in Apify's cloud infrastructure (you don't manage proxies or IPs). Results are validated and structured. You pay by usage, not by infrastructure.

Using the Apify Amazon Scraper

Without code (via web UI):

Go to https://apify.com/apify/amazon-product-scraper
Click "Try now"
Enter your parameters (ASINs, keywords, number of items)
Start the run; results stream to your browser
Export as CSV, JSON, or webhook

With code (Apify SDK):

from apify_client import ApifyClient

def scrape_amazon_with_apify(search_keywords, max_items=100):
    """
    Use Apify's Amazon Product Scraper actor to reliably scrape products.
    """

    client = ApifyClient('YOUR_APIFY_TOKEN')

    # Run the Amazon Product Scraper actor
    run = client.actor('apify/amazon-product-scraper').call(
        run_input={
            'searchKeywords': search_keywords,
            'maxResults': max_items,
            'startPage': 0,
        }
    )

    # Retrieve structured results
    dataset_client = client.dataset(run['defaultDatasetId'])
    results = dataset_client.list_items()['items']

    return results

# Example
products = scrape_amazon_with_apify(['wireless headphones'], max_items=50)
for product in products:
    print(f"{product.get('title', 'N/A')[:60]}")
    print(f"  Price: ${product.get('price', 'N/A')}")
    print(f"  Rating: {product.get('reviewsCount', 0)} reviews")
    print()

Install the Apify SDK

pip install apify-client

Apify Pricing

Free tier: 20 actor runs/month, limited compute
Paid: $5/month ($0.25 per 1K items scraped, approximately)

Limits & Reality Check

Speed: 100-300 items per minute (faster than manual API calls)
Cost: ~$0.02-0.10 per 100 items
Structure: Returns clean, validated JSON with all key fields
Reliability: 99%+ success on valid ASINs
Maintenance: Zero — Apify updates the actor when Amazon changes

When to Use This

Teams without dedicated infrastructure, one-off data collection projects, or regular monitoring services where you need quick iteration.

Building a Price Monitoring Pipeline

Let's say you want to track the price of 10 products every 6 hours and alert when prices drop. Here's a hybrid approach using ScraperAPI or Apify + scheduled runs + a simple database:

import requests
import sqlite3
from datetime import datetime
import json
import time

# Database setup
def init_db():
    conn = sqlite3.connect('/data/price_monitor.db')
    cursor = conn.cursor()

    cursor.execute('''
        CREATE TABLE IF NOT EXISTS products (
            asin TEXT PRIMARY KEY,
            title TEXT,
            last_price REAL,
            lowest_price REAL,
            highest_price REAL,
            last_checked TIMESTAMP
        )
    ''')

    cursor.execute('''
        CREATE TABLE IF NOT EXISTS price_history (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            asin TEXT,
            price REAL,
            timestamp TIMESTAMP,
            in_stock BOOLEAN,
            FOREIGN KEY(asin) REFERENCES products(asin)
        )
    ''')

    conn.commit()
    return conn

def monitor_products(asin_list, scraper_api_key):
    """
    Check current prices for a list of ASINs and log to database.
    Run this function every 6 hours via a scheduler (APScheduler, cron, etc.)
    """

    conn = init_db()
    cursor = conn.cursor()

    for asin in asin_list:
        print(f"Checking {asin}...")

        # Scrape via ScraperAPI
        payload = {
            'api_key': scraper_api_key,
            'url': f'https://www.amazon.com/dp/{asin}',
            'render': 'true',
        }

        try:
            response = requests.get('http://api.scraperapi.com', params=payload, timeout=30)
            html = response.text

            # Parse (simplified for brevity)
            from bs4 import BeautifulSoup
            soup = BeautifulSoup(html, 'html.parser')

            # Extract price, title, stock
            title = soup.find('span', class_='a-size-large')
            price_elem = soup.find('span', class_='a-price-whole')

            if not title or not price_elem:
                print(f"  Failed to extract data for {asin}")
                continue

            title_text = title.get_text(strip=True)
            price_str = price_elem.get_text(strip=True).replace('$', '').replace(',', '')
            price = float(price_str)
            in_stock = 'in stock' in html.lower()

            # Log to database
            cursor.execute('''
                INSERT OR IGNORE INTO products (asin, title, last_price, lowest_price, highest_price, last_checked)
                VALUES (?, ?, ?, ?, ?, ?)
            ''', (asin, title_text, price, price, price, datetime.now()))

            # Update price
            cursor.execute('''
                UPDATE products
                SET last_price = ?, last_checked = ?
                WHERE asin = ?
            ''', (price, datetime.now(), asin))

            # Track in history
            cursor.execute('''
                INSERT INTO price_history (asin, price, timestamp, in_stock)
                VALUES (?, ?, ?, ?)
            ''', (asin, price, datetime.now(), in_stock))

            # Update lowest/highest
            cursor.execute('''
                SELECT lowest_price, highest_price FROM products WHERE asin = ?
            ''', (asin,))
            row = cursor.fetchone()
            if row:
                lowest = min(row[0], price)
                highest = max(row[1], price)
                cursor.execute('''
                    UPDATE products SET lowest_price = ?, highest_price = ?
                    WHERE asin = ?
                ''', (lowest, highest, asin))

            print(f"  {title_text[:50]}... = ${price}")

        except Exception as e:
            print(f"  Error: {e}")

        time.sleep(1)  # Polite delay

    conn.commit()
    conn.close()

def get_price_drops(threshold_percent=5):
    """Find products that dropped in price by more than threshold_percent"""

    conn = sqlite3.connect('/data/price_monitor.db')
    cursor = conn.cursor()

    cursor.execute('''
        SELECT
            p.asin, p.title, p.last_price, p.lowest_price,
            ((p.lowest_price - p.last_price) / p.last_price * 100) as drop_percent
        FROM products p
        WHERE p.lowest_price < p.last_price
        AND ((p.last_price - p.lowest_price) / p.last_price * 100) > ?
        ORDER BY drop_percent DESC
    ''', (threshold_percent,))

    drops = cursor.fetchall()
    conn.close()

    return drops

# Usage: Run this every 6 hours
asin_list = [
    'B0C3SJG2D2',  # AirPods Pro
    'B08CX5Z9QN',  # Echo Dot
    'B07FKR6KXF',  # Fire Stick 4K
    # ... add more
]

# monitor_products(asin_list, scraper_api_key='YOUR_KEY')

# Check for price drops
# drops = get_price_drops(threshold_percent=5)
# for drop in drops:
#     print(f"Price drop: {drop[1]} from ${drop[2]} to ${drop[3]} ({drop[4]:.1f}%)")

Scheduling: Use APScheduler to run this every 6 hours:

from apscheduler.schedulers.background import BackgroundScheduler

scheduler = BackgroundScheduler()
scheduler.add_job(lambda: monitor_products(asin_list, scraper_api_key), 'interval', hours=6)
scheduler.start()

Or deploy to a cloud function (AWS Lambda, Google Cloud Functions) and trigger via cron.

Legal Considerations: What You Actually Need to Know

Like Reddit, Amazon's terms of service explicitly forbid scraping. But again, TOS violations aren't laws.

Amazon's Terms of Service

Violating Amazon's TOS can result in:

Account ban (if you use your personal account)
IP ban (temporary, 24-72 hours usually)
Legal cease-and-desist (rare, unless you're republishing data)

It's not a criminal issue.

The Computer Fraud and Abuse Act (CFAA)

The same HiQ v. LinkedIn (2017) precedent that protects Reddit scraping applies here: scraping publicly available data is legal, even if the TOS forbids it.

But Amazon is more aggressive than LinkedIn was about enforcement.

Scraping prices, reviews, and product metadata is in a legal gray zone.
If you republish Amazon data without adding value (e.g., copying product descriptions wholesale), you infringe on copyright.
If your scraping causes measurable damage (DDoS-level load, stealing compute), you could face CFAA charges.

The Safe Path

Use the official API when possible (Product Advertising API). It's endorsed and carries zero legal risk.
Use dedicated scraping APIs (ScraperAPI, Bright Data). These companies absorb the legal risk and have done the vetting.
Use pre-built actors (Apify). Same as above — they maintain the tool and assume legal liability.
Scrape responsibly: Add delays (1-2 seconds between requests), don't scrape the same product 100 times per day, don't republish raw data.
Don't pretend to be a different service: No faking user agents as mobile browsers if you're a server.
Document your purpose: "Price monitoring for my ecommerce business" is defensible. "Scraping to republish on my aggregator" is not.

For anything commercial: Use a paid service (ScraperAPI, Apify) so there's a clear audit trail and the service provider assumes liability.

What Changed Since 2023?

Amazon killed most unofficial scrapers (2024-2025): CAPTCHA became mandatory on the majority of requests. Cheap proxies no longer work.
Apify's Amazon actor improved dramatically: Now handles JavaScript rendering, stock tracking, and review counts. High reliability (99%+).
ScraperAPI added Amazon-specific routing: They now pre-route Amazon requests through premium residential proxies, CAPTCHA solve built-in.
The official API stayed stable: No price increases, rate limits remain consistent. Still the safest option for non-commercial use.
Mobile Amazon became harder to scrape: Amazon mobile site (m.amazon.com) is more resilient to automated access.

Real-World Example: Dropshipping Price Monitoring

You run a dropshipping store with 50 products sourced from Amazon. You want to alert when your suppliers' prices drop so you can adjust your margins or restock.

# dropship_monitor.py
import sqlite3
from datetime import datetime
import smtplib
from email.mime.text import MIMEText
from apify_client import ApifyClient

def monitor_dropship_products(product_asins, alert_threshold=0.10):
    """
    Monitor supplier prices and send alerts when they drop.
    Threshold = 0.10 means alert if price drops by 10% or more.
    """

    client = ApifyClient('YOUR_APIFY_TOKEN')

    # Scrape all products in one batch run (more efficient)
    run = client.actor('apify/amazon-product-scraper').call(
        run_input={
            'asinListUrl': product_asins,  # List of ASINs
            'maxResults': 1,  # Just get current price
        }
    )

    # Retrieve results
    dataset = client.dataset(run['defaultDatasetId'])
    items = dataset.list_items()['items']

    # Check database for price history
    conn = sqlite3.connect('/data/dropship.db')
    cursor = conn.cursor()

    alerts = []

    for item in items:
        asin = item['asin']
        current_price = item['price']

        # Get last recorded price
        cursor.execute('SELECT price FROM price_log WHERE asin = ? ORDER BY timestamp DESC LIMIT 1', (asin,))
        result = cursor.fetchone()

        if result:
            last_price = result[0]
            drop_percent = (last_price - current_price) / last_price

            if drop_percent >= alert_threshold:
                alerts.append({
                    'asin': asin,
                    'title': item['title'],
                    'old_price': last_price,
                    'new_price': current_price,
                    'drop_percent': drop_percent
                })

        # Log current price
        cursor.execute('''
            INSERT INTO price_log (asin, title, price, timestamp)
            VALUES (?, ?, ?, ?)
        ''', (asin, item['title'], current_price, datetime.now()))

    conn.commit()
    conn.close()

    # Send email alerts
    if alerts:
        send_alert_email(alerts)

    return alerts

def send_alert_email(alerts):
    """Send email with price drop alerts"""

    subject = f"[Price Alert] {len(alerts)} products dropped in price"

    body = "Price drops detected:\n\n"
    for alert in alerts:
        body += f"• {alert['title'][:60]}\n"
        body += f"  Was: ${alert['old_price']:.2f} → Now: ${alert['new_price']:.2f}\n"
        body += f"  Drop: {alert['drop_percent']*100:.1f}%\n"
        body += f"  ASIN: {alert['asin']}\n\n"

    # Send via SMTP (Gmail, AWS SES, etc.)
    msg = MIMEText(body)
    msg['Subject'] = subject
    msg['From'] = 'alerts@yourstore.com'
    msg['To'] = 'you@yourstore.com'

    # (Implement SMTP sending here; depends on your email provider)
    print(f"Email alert: {subject}")

# Run every 6 hours
# monitor_dropship_products(['B0C3SJG2D2', 'B08CX5Z9QN', ...], alert_threshold=0.10)

This is a real, revenue-generating use case. You automate margin optimization and rebuy signals.

Comparison Table: Which Method to Use

Method	Speed	Cost	Legal Risk	Best For	Effort
Official API	1 req/sec	Free	Lowest	Small monitoring, research	Low
Raw Python	10 req/min (before block)	$0	Highest	Educational, one-offs	Low (but unreliable)
ScraperAPI	1-5 sec/req	$0.02/req	Low	Small-to-medium scale	Low
Bright Data	1-3 sec/req	$100+/month	Low	Large-scale commercial	Medium
Apify Actor	1-3 sec/req	$0.02/100 items	Low	Teams, no DevOps	Very low

Resources

Amazon Product Advertising API: https://advertising.amazon.com/
Apify Amazon Scraper: https://apify.com/apify/amazon-product-scraper
ScraperAPI: https://www.scraperapi.com/
Bright Data: https://brightdata.com/
BeautifulSoup Documentation: https://www.crummy.com/software/BeautifulSoup/
Previous articles you might like:

TL;DR

For small monitoring or research: Use the official Amazon Product Advertising API. It's free, safe, and legal.
For reliable scraping: Use ScraperAPI or Bright Data. They handle anti-bot, pay them, and avoid legal friction.
For teams or one-offs: Use Apify's pre-built Amazon actor. Deploy in 5 minutes, zero maintenance.
For learning: Raw BeautifulSoup works on a few requests. Expect blocks. Don't use in production.
For monitoring at scale: Build a pipeline with scheduled scrapes, price history tracking, and alerts.
Always be respectful: Add delays, don't republish raw data, use appropriate tools for your scale.
Track pricing trends: Historical price data is valuable. Build it once, monetize it later.

Amazon's prices are transparent — they're just guarded. Use the right tool for your use case and you'll get there.

Want to stay ahead of data extraction trends? Subscribe to The Data Collector for working code, legal updates, and practical scraping strategies that actually work in 2026. No fluff — just what works.

Disclosure: This post contains affiliate links. I may earn a commission if you sign up through my links, at no extra cost to you.

Compare web scraping APIs:

ScraperAPI — 5,000 free credits, 50+ countries, structured data parsing
Scrape.do — From $29/mo, strong Cloudflare bypass
ScrapeOps — Proxy comparison + monitoring dashboard

Need custom web scraping? Email hustler@curlship.com — fast turnaround, fair pricing.

DEV Community

How to Scrape Amazon Product Data in 2026 (Price Monitoring Guide)

Why Scrape Amazon (And What You'll Actually Get)

Amazon's Anti-Scraping Arsenal (What You're Up Against)

Method 1: Amazon Product Advertising API (Official, Limited)

What It Does

What It Doesn't Do

Setup and Requirements

Code Example: Search Products

Limits & Reality Check

When to Use This

Method 2: Python + Requests + BeautifulSoup (Educational; Gets Blocked Fast)

Why This Will Fail

The "Better" Version (Still Fails, But Slower)

When to Use This

Method 3: Scraping APIs (ScraperAPI, Bright Data, etc.)

How They Work

ScraperAPI Example

Cost Comparison

Limits & Reality Check

When to Use This

Method 4: Pre-Built Scrapers (Apify Actors for Amazon)

Why Use Apify

Using the Apify Amazon Scraper

Install the Apify SDK

Apify Pricing

Limits & Reality Check

When to Use This

Building a Price Monitoring Pipeline

Legal Considerations: What You Actually Need to Know

Amazon's Terms of Service

The Computer Fraud and Abuse Act (CFAA)

The Safe Path

What Changed Since 2023?

Real-World Example: Dropshipping Price Monitoring

Comparison Table: Which Method to Use

Resources

TL;DR

Top comments (0)