If you're building a price tracker, running a dropshipping business, or doing competitive intelligence, you've probably wanted to scrape Amazon prices at some point. It's one of the most common web scraping tasks -- and one of the trickiest.
In this guide, I'll walk through practical approaches to extracting product pricing data from Amazon in 2026, with working Python code.
Why Scrape Amazon Prices?
A few common use cases:
- Dropshipping: Monitor supplier prices to keep your margins healthy
- Competitive intelligence: Track competitor pricing on the same products
- Deal hunting: Build a personal price alert system
- Market research: Analyze pricing trends across categories
Understanding Amazon's Product Page Structure
Before writing any code, you need to know where the data lives. Amazon product pages have several data sources you can tap into.
1. JSON-LD Structured Data
Amazon embeds application/ld+json blocks in product pages. This is the cleanest source -- structured data meant for search engines:
import requests
from bs4 import BeautifulSoup
import json
def extract_jsonld_price(html):
soup = BeautifulSoup(html, 'html.parser')
scripts = soup.find_all('script', type='application/ld+json')
for script in scripts:
try:
data = json.loads(script.string)
if isinstance(data, dict) and data.get('@type') == 'Product':
offers = data.get('offers', {})
if isinstance(offers, list):
offers = offers[0]
return {
'name': data.get('name'),
'price': offers.get('price'),
'currency': offers.get('priceCurrency'),
'availability': offers.get('availability'),
}
except (json.JSONDecodeError, KeyError):
continue
return None
2. The NEXT_DATA Object
Amazon has been progressively migrating pages to a Next.js-based frontend. Some product pages now include a __NEXT_DATA__ script tag with the full page payload:
def extract_next_data(html):
soup = BeautifulSoup(html, 'html.parser')
script = soup.find('script', id='__NEXT_DATA__')
if script and script.string:
data = json.loads(script.string)
props = data.get('props', {}).get('pageProps', {})
product = props.get('product', {})
return {
'title': product.get('title'),
'price': product.get('price', {}).get('value'),
'currency': product.get('price', {}).get('currency'),
}
return None
3. HTML Parsing (Fallback)
When structured data isn't available, you fall back to parsing the HTML directly. This is more fragile but works on legacy pages:
def extract_price_html(html):
soup = BeautifulSoup(html, 'html.parser')
price_span = soup.select_one('span.a-price span.a-offscreen')
if price_span:
return price_span.get_text(strip=True)
deal_price = soup.select_one('#dealprice_feature span.a-offscreen')
if deal_price:
return deal_price.get_text(strip=True)
kindle_price = soup.select_one('#kindle-price')
if kindle_price:
return kindle_price.get_text(strip=True)
return None
Putting It Together: A Basic Amazon Price Scraper
Here's a complete script that tries all three extraction methods:
import requests
from bs4 import BeautifulSoup
import json
import time
import random
HEADERS_LIST = [
{
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
'Accept-Language': 'en-US,en;q=0.9',
'Accept': 'text/html,application/xhtml+xml',
},
{
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/18.2 Safari/605.1.15',
'Accept-Language': 'en-US,en;q=0.9',
'Accept': 'text/html,application/xhtml+xml',
},
{
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:134.0) Gecko/20100101 Firefox/134.0',
'Accept-Language': 'en-US,en;q=0.9',
'Accept': 'text/html,application/xhtml+xml',
},
]
def scrape_amazon_price(asin):
url = f'https://www.amazon.com/dp/{asin}'
headers = random.choice(HEADERS_LIST)
response = requests.get(url, headers=headers, timeout=15)
response.raise_for_status()
html = response.text
# Try JSON-LD first (most reliable)
result = extract_jsonld_price(html)
if result and result.get('price'):
result['source'] = 'json-ld'
return result
# Try __NEXT_DATA__
result = extract_next_data(html)
if result and result.get('price'):
result['source'] = 'next-data'
return result
# Fall back to HTML parsing
price = extract_price_html(html)
if price:
return {'price': price, 'source': 'html'}
return {'error': 'Could not extract price', 'source': 'none'}
def monitor_prices(asins, interval_minutes=60):
while True:
for asin in asins:
result = scrape_amazon_price(asin)
print(f"[{asin}] {result}")
time.sleep(random.uniform(3, 8))
print(f"--- Sleeping {interval_minutes} minutes ---")
time.sleep(interval_minutes * 60)
Handling Amazon's Anti-Bot Measures
Amazon is aggressive about blocking scrapers. Here's what you'll run into and how to deal with it.
Rotate User Agents
Never use the same User-Agent for every request. The HEADERS_LIST approach above helps, but for production use you'll want a larger pool. The key is making each request look like it comes from a real browser.
Add Random Delays
Hammering Amazon with rapid-fire requests is the fastest way to get blocked:
time.sleep(random.uniform(3, 10))
# For large batches, add longer pauses every N requests
if request_count % 20 == 0:
time.sleep(random.uniform(30, 60))
Handle CAPTCHAs Gracefully
When Amazon suspects automation, it returns a CAPTCHA page. Detect it and back off:
def is_captcha_page(html):
return 'captcha' in html.lower() or 'robot' in html.lower()
if is_captcha_page(response.text):
print("CAPTCHA detected -- backing off for 5 minutes")
time.sleep(300)
Use a Session with Cookies
Creating a requests.Session and letting cookies accumulate makes your requests look more natural:
session = requests.Session()
session.get('https://www.amazon.com', headers=random.choice(HEADERS_LIST))
time.sleep(2)
response = session.get(product_url, headers=random.choice(HEADERS_LIST))
Using a Proxy Service for Reliable Scraping
If you're scraping more than a handful of products, you'll hit blocks quickly with residential IPs. This is where proxy services become essential.
ScraperAPI is worth looking at here -- they have a dedicated Amazon scraping endpoint that handles proxy rotation, CAPTCHA solving, and header management automatically. Instead of building all that infrastructure yourself, you send one API call:
import requests
API_KEY = 'your_scraperapi_key'
def scrape_with_scraperapi(asin):
url = 'https://api.scraperapi.com/structured/amazon/product'
params = {
'api_key': API_KEY,
'asin': asin,
'country': 'us',
}
response = requests.get(url, params=params, timeout=60)
return response.json()
This returns clean JSON with the price, title, reviews, and availability -- no parsing needed. They offer 5,000 free API credits to start, which is enough to test whether this approach works for your use case. Sign up here.
For smaller projects (under ~100 products/day), the DIY approach with rotating headers and delays works fine. For anything larger, a managed proxy service will save you a lot of headaches.
Storing Price History
Once you're collecting prices, you need somewhere to put them. SQLite is perfect for small-to-medium projects:
import sqlite3
from datetime import datetime
def init_db(db_path='prices.db'):
conn = sqlite3.connect(db_path)
conn.execute("CREATE TABLE IF NOT EXISTS prices (asin TEXT, price REAL, currency TEXT, timestamp TEXT, source TEXT)")
conn.commit()
return conn
def save_price(conn, asin, price, currency='USD', source='html'):
conn.execute(
'INSERT INTO prices VALUES (?, ?, ?, ?, ?)',
(asin, float(price.replace('$', '').replace(',', '')),
currency, datetime.utcnow().isoformat(), source)
)
conn.commit()
Things to Keep in Mind
- Respect robots.txt -- Amazon's robots.txt disallows most scraping. Understand the legal and ethical implications.
- Rate limit aggressively -- Even if you can go faster, don't. 1 request every 5-10 seconds is reasonable.
- Amazon's structure changes -- CSS selectors break. JSON-LD availability varies. Build in fallback logic and monitor for failures.
- Consider the API first -- Amazon's Product Advertising API (PA-API) is the official way to get product data. If you qualify for an Associates account, it's more reliable than scraping.
Wrapping Up
Building an Amazon price scraper in 2026 is a balancing act between multiple extraction methods and staying under Amazon's radar. Start with JSON-LD parsing (it's the cleanest), fall back to HTML selectors, rotate your headers, and add generous delays.
For production workloads, seriously consider a proxy service like ScraperAPI rather than managing infrastructure yourself -- the time savings usually outweigh the cost.
The code in this guide should give you a solid starting point. Adapt it to your specific use case, and always be respectful of the sites you're scraping.
Top comments (0)