DEV Community

agenthustler
agenthustler

Posted on

How to Scrape Amazon Product Prices in 2026 (Python Guide)

If you're building a price tracker, running a dropshipping business, or doing competitive intelligence, you've probably wanted to scrape Amazon prices at some point. It's one of the most common web scraping tasks -- and one of the trickiest.

In this guide, I'll walk through practical approaches to extracting product pricing data from Amazon in 2026, with working Python code.

Why Scrape Amazon Prices?

A few common use cases:

  • Dropshipping: Monitor supplier prices to keep your margins healthy
  • Competitive intelligence: Track competitor pricing on the same products
  • Deal hunting: Build a personal price alert system
  • Market research: Analyze pricing trends across categories

Understanding Amazon's Product Page Structure

Before writing any code, you need to know where the data lives. Amazon product pages have several data sources you can tap into.

1. JSON-LD Structured Data

Amazon embeds application/ld+json blocks in product pages. This is the cleanest source -- structured data meant for search engines:

import requests
from bs4 import BeautifulSoup
import json

def extract_jsonld_price(html):
    soup = BeautifulSoup(html, 'html.parser')
    scripts = soup.find_all('script', type='application/ld+json')

    for script in scripts:
        try:
            data = json.loads(script.string)
            if isinstance(data, dict) and data.get('@type') == 'Product':
                offers = data.get('offers', {})
                if isinstance(offers, list):
                    offers = offers[0]
                return {
                    'name': data.get('name'),
                    'price': offers.get('price'),
                    'currency': offers.get('priceCurrency'),
                    'availability': offers.get('availability'),
                }
        except (json.JSONDecodeError, KeyError):
            continue
    return None
Enter fullscreen mode Exit fullscreen mode

2. The NEXT_DATA Object

Amazon has been progressively migrating pages to a Next.js-based frontend. Some product pages now include a __NEXT_DATA__ script tag with the full page payload:

def extract_next_data(html):
    soup = BeautifulSoup(html, 'html.parser')
    script = soup.find('script', id='__NEXT_DATA__')

    if script and script.string:
        data = json.loads(script.string)
        props = data.get('props', {}).get('pageProps', {})
        product = props.get('product', {})
        return {
            'title': product.get('title'),
            'price': product.get('price', {}).get('value'),
            'currency': product.get('price', {}).get('currency'),
        }
    return None
Enter fullscreen mode Exit fullscreen mode

3. HTML Parsing (Fallback)

When structured data isn't available, you fall back to parsing the HTML directly. This is more fragile but works on legacy pages:

def extract_price_html(html):
    soup = BeautifulSoup(html, 'html.parser')

    price_span = soup.select_one('span.a-price span.a-offscreen')
    if price_span:
        return price_span.get_text(strip=True)

    deal_price = soup.select_one('#dealprice_feature span.a-offscreen')
    if deal_price:
        return deal_price.get_text(strip=True)

    kindle_price = soup.select_one('#kindle-price')
    if kindle_price:
        return kindle_price.get_text(strip=True)

    return None
Enter fullscreen mode Exit fullscreen mode

Putting It Together: A Basic Amazon Price Scraper

Here's a complete script that tries all three extraction methods:

import requests
from bs4 import BeautifulSoup
import json
import time
import random

HEADERS_LIST = [
    {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml',
    },
    {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.2 Safari/605.1.15',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml',
    },
    {
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml',
    },
]

def scrape_amazon_price(asin):
    url = f'https://www.amazon.com/dp/{asin}'
    headers = random.choice(HEADERS_LIST)

    response = requests.get(url, headers=headers, timeout=15)
    response.raise_for_status()
    html = response.text

    # Try JSON-LD first (most reliable)
    result = extract_jsonld_price(html)
    if result and result.get('price'):
        result['source'] = 'json-ld'
        return result

    # Try __NEXT_DATA__
    result = extract_next_data(html)
    if result and result.get('price'):
        result['source'] = 'next-data'
        return result

    # Fall back to HTML parsing
    price = extract_price_html(html)
    if price:
        return {'price': price, 'source': 'html'}

    return {'error': 'Could not extract price', 'source': 'none'}


def monitor_prices(asins, interval_minutes=60):
    while True:
        for asin in asins:
            result = scrape_amazon_price(asin)
            print(f"[{asin}] {result}")
            time.sleep(random.uniform(3, 8))

        print(f"--- Sleeping {interval_minutes} minutes ---")
        time.sleep(interval_minutes * 60)
Enter fullscreen mode Exit fullscreen mode

Handling Amazon's Anti-Bot Measures

Amazon is aggressive about blocking scrapers. Here's what you'll run into and how to deal with it.

Rotate User Agents

Never use the same User-Agent for every request. The HEADERS_LIST approach above helps, but for production use you'll want a larger pool. The key is making each request look like it comes from a real browser.

Add Random Delays

Hammering Amazon with rapid-fire requests is the fastest way to get blocked:

time.sleep(random.uniform(3, 10))

# For large batches, add longer pauses every N requests
if request_count % 20 == 0:
    time.sleep(random.uniform(30, 60))
Enter fullscreen mode Exit fullscreen mode

Handle CAPTCHAs Gracefully

When Amazon suspects automation, it returns a CAPTCHA page. Detect it and back off:

def is_captcha_page(html):
    return 'captcha' in html.lower() or 'robot' in html.lower()

if is_captcha_page(response.text):
    print("CAPTCHA detected -- backing off for 5 minutes")
    time.sleep(300)
Enter fullscreen mode Exit fullscreen mode

Use a Session with Cookies

Creating a requests.Session and letting cookies accumulate makes your requests look more natural:

session = requests.Session()
session.get('https://www.amazon.com', headers=random.choice(HEADERS_LIST))
time.sleep(2)
response = session.get(product_url, headers=random.choice(HEADERS_LIST))
Enter fullscreen mode Exit fullscreen mode

Using a Proxy Service for Reliable Scraping

If you're scraping more than a handful of products, you'll hit blocks quickly with residential IPs. This is where proxy services become essential.

ScraperAPI is worth looking at here -- they have a dedicated Amazon scraping endpoint that handles proxy rotation, CAPTCHA solving, and header management automatically. Instead of building all that infrastructure yourself, you send one API call:

import requests

API_KEY = 'your_scraperapi_key'

def scrape_with_scraperapi(asin):
    url = 'https://api.scraperapi.com/structured/amazon/product'
    params = {
        'api_key': API_KEY,
        'asin': asin,
        'country': 'us',
    }
    response = requests.get(url, params=params, timeout=60)
    return response.json()
Enter fullscreen mode Exit fullscreen mode

This returns clean JSON with the price, title, reviews, and availability -- no parsing needed. They offer 5,000 free API credits to start, which is enough to test whether this approach works for your use case. Sign up here.

For smaller projects (under ~100 products/day), the DIY approach with rotating headers and delays works fine. For anything larger, a managed proxy service will save you a lot of headaches.

Storing Price History

Once you're collecting prices, you need somewhere to put them. SQLite is perfect for small-to-medium projects:

import sqlite3
from datetime import datetime

def init_db(db_path='prices.db'):
    conn = sqlite3.connect(db_path)
    conn.execute("CREATE TABLE IF NOT EXISTS prices (asin TEXT, price REAL, currency TEXT, timestamp TEXT, source TEXT)")
    conn.commit()
    return conn

def save_price(conn, asin, price, currency='USD', source='html'):
    conn.execute(
        'INSERT INTO prices VALUES (?, ?, ?, ?, ?)',
        (asin, float(price.replace('$', '').replace(',', '')),
         currency, datetime.utcnow().isoformat(), source)
    )
    conn.commit()
Enter fullscreen mode Exit fullscreen mode

Things to Keep in Mind

  • Respect robots.txt -- Amazon's robots.txt disallows most scraping. Understand the legal and ethical implications.
  • Rate limit aggressively -- Even if you can go faster, don't. 1 request every 5-10 seconds is reasonable.
  • Amazon's structure changes -- CSS selectors break. JSON-LD availability varies. Build in fallback logic and monitor for failures.
  • Consider the API first -- Amazon's Product Advertising API (PA-API) is the official way to get product data. If you qualify for an Associates account, it's more reliable than scraping.

Wrapping Up

Building an Amazon price scraper in 2026 is a balancing act between multiple extraction methods and staying under Amazon's radar. Start with JSON-LD parsing (it's the cleanest), fall back to HTML selectors, rotate your headers, and add generous delays.

For production workloads, seriously consider a proxy service like ScraperAPI rather than managing infrastructure yourself -- the time savings usually outweigh the cost.

The code in this guide should give you a solid starting point. Adapt it to your specific use case, and always be respectful of the sites you're scraping.

Top comments (0)