DEV Community

agenthustler
agenthustler

Posted on

How to Monitor Amazon Product Prices in Python (Without Getting Blocked in 2026)

Price tracking is a billion-dollar industry. From Keepa to CamelCamelCamel, hundreds of tools monitor Amazon prices — and behind each one is a scraper that needs to survive Amazon's anti-bot defenses.

In this guide, I'll show you how to build a working Amazon price monitor in Python, why naive approaches fail, and how to actually keep it running in 2026.

The Basic Approach (And Why It Breaks)

Let's start with the simplest possible scraper:

import requests
from bs4 import BeautifulSoup

def scrape_amazon_product(url):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml'
    }

    response = requests.get(url, headers=headers)

    if response.status_code != 200:
        print(f'Blocked! Status: {response.status_code}')
        return None

    soup = BeautifulSoup(response.text, 'html.parser')

    title = soup.select_one('#productTitle')
    price_whole = soup.select_one('.a-price-whole')
    price_fraction = soup.select_one('.a-price-fraction')
    rating = soup.select_one('span.a-icon-alt')
    review_count = soup.select_one('#acrCustomerReviewText')

    return {
        'title': title.text.strip() if title else None,
        'price': f"{price_whole.text}{price_fraction.text}" if price_whole and price_fraction else None,
        'rating': rating.text.strip() if rating else None,
        'reviews': review_count.text.strip() if review_count else None
    }

# Test it
product = scrape_amazon_product('https://www.amazon.com/dp/B0D1XD1ZV3')
print(product)
Enter fullscreen mode Exit fullscreen mode

Run this a few times and it works. Run it 20 times in a row and you'll get hit with a CAPTCHA page or a 503 response. Here's why.

Why Amazon Blocks You

Amazon's anti-bot system (internally called "Perimeter") checks several signals:

  1. User-Agent consistency — Sending the same UA on every request is a red flag
  2. Request rate — More than ~10 requests/minute from one IP triggers detection
  3. IP reputation — Datacenter IPs are flagged immediately. Residential IPs get more leeway
  4. TLS fingerprint — Python's requests library has a recognizable TLS fingerprint that differs from real browsers
  5. Missing headers — Real browsers send 15+ headers. Your script sends 3.

The basic approach above will work for occasional one-off checks. For continuous monitoring, you need a different strategy.

Adding Rotation and Delays

A slightly more robust version with random delays and User-Agent rotation:

import requests
from bs4 import BeautifulSoup
import random
import time

USER_AGENTS = [
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Chrome/131.0.0.0 Safari/537.36',
    'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/605.1.15 Safari/605.1.15',
    'Mozilla/5.0 (X11; Linux x86_64; rv:132.0) Gecko/20100101 Firefox/132.0',
    'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 Edg/131.0.0.0',
]

def scrape_with_retry(url, max_retries=3):
    for attempt in range(max_retries):
        headers = {
            'User-Agent': random.choice(USER_AGENTS),
            'Accept-Language': 'en-US,en;q=0.9',
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Encoding': 'gzip, deflate, br',
            'DNT': '1',
            'Connection': 'keep-alive',
            'Upgrade-Insecure-Requests': '1',
        }

        time.sleep(random.uniform(2, 5))  # Random delay between requests
        response = requests.get(url, headers=headers, timeout=10)

        if response.status_code == 200 and 'captcha' not in response.text.lower():
            return parse_product(response.text)

        print(f'Attempt {attempt + 1} blocked, retrying...')
        time.sleep(random.uniform(10, 30))  # Back off on failure

    return None

def parse_product(html):
    soup = BeautifulSoup(html, 'html.parser')

    price_whole = soup.select_one('.a-price-whole')
    price_fraction = soup.select_one('.a-price-fraction')

    return {
        'title': _text(soup.select_one('#productTitle')),
        'price': f"${price_whole.text}{price_fraction.text}" if price_whole and price_fraction else None,
        'rating': _text(soup.select_one('span.a-icon-alt')),
        'reviews': _text(soup.select_one('#acrCustomerReviewText')),
        'availability': _text(soup.select_one('#availability span')),
    }

def _text(el):
    return el.text.strip() if el else None
Enter fullscreen mode Exit fullscreen mode

This buys you more time, but for any serious monitoring (100+ products, daily checks), you'll still get blocked within a day or two. The problem is your IP — Amazon sees all requests from the same address.

The Proxy Approach: ScraperAPI

The most reliable way to scrape Amazon at scale is to route requests through rotating proxies that handle the anti-bot layer for you. ScraperAPI does exactly this — it manages proxy rotation, CAPTCHA solving, and header management automatically.

Here's the same scraper using ScraperAPI:

import requests
from bs4 import BeautifulSoup

SCRAPER_API_KEY = 'your_api_key_here'

def scrape_amazon_via_proxy(asin):
    url = f'https://www.amazon.com/dp/{asin}'
    api_url = f'http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}&country_code=us'

    response = requests.get(api_url, timeout=60)

    if response.status_code != 200:
        return None

    return parse_product(response.text)

# Monitor a list of products
asins = ['B0D1XD1ZV3', 'B0DGHRF8TN', 'B0C8Y7WC2M']

for asin in asins:
    product = scrape_amazon_via_proxy(asin)
    if product:
        print(f"{product['title'][:50]}... — {product['price']}")
Enter fullscreen mode Exit fullscreen mode

This handles rotation, retries, and CAPTCHA solving on their end. You get clean HTML back. Their free tier gives you 5,000 API credits to test with.

Building a Price Monitor

Now let's combine everything into an actual price tracker that alerts you on drops:

import json
import smtplib
from email.mime.text import MIMEText
from datetime import datetime
from pathlib import Path

PRICE_HISTORY_FILE = 'price_history.json'

def load_history():
    if Path(PRICE_HISTORY_FILE).exists():
        return json.loads(Path(PRICE_HISTORY_FILE).read_text())
    return {}

def save_history(history):
    Path(PRICE_HISTORY_FILE).write_text(json.dumps(history, indent=2))

def check_price_drop(asin, current_price, threshold_pct=5):
    history = load_history()

    if asin not in history:
        history[asin] = []

    prices = history[asin]
    prices.append({'price': current_price, 'date': datetime.now().isoformat()})

    # Keep last 90 days of data
    history[asin] = prices[-90:]
    save_history(history)

    if len(prices) < 2:
        return False

    previous = prices[-2]['price']
    if previous and current_price:
        drop_pct = ((previous - current_price) / previous) * 100
        return drop_pct >= threshold_pct

    return False

def monitor_products(asins):
    for asin in asins:
        product = scrape_amazon_via_proxy(asin)  # or scrape_with_retry()
        if not product or not product['price']:
            continue

        price = float(product['price'].replace('$', '').replace(',', ''))
        dropped = check_price_drop(asin, price)

        status = ' ⬇ PRICE DROP!' if dropped else ''
        print(f"[{datetime.now():%H:%M}] {product['title'][:40]}... ${price:.2f}{status}")

# Run every hour via cron:
# 0 * * * * cd /path/to/project && python monitor.py
monitor_products(['B0D1XD1ZV3', 'B0DGHRF8TN', 'B0C8Y7WC2M'])
Enter fullscreen mode Exit fullscreen mode

Scaling Up: When Code Isn't Enough

Once you're monitoring 500+ products, managing your own scraper becomes a full-time job — proxy failures, selector changes, rate limit tuning. At that point, consider a managed solution.

Amazon product scrapers on Apify handle the infrastructure side: automatic proxy rotation, scheduled runs, and structured JSON output. You write zero scraping code — just feed in ASINs and get back clean data via API or webhook.

This makes sense when your time is better spent building the application layer (alerts, dashboards, arbitrage logic) rather than maintaining scraping infrastructure.

Key Takeaways

  1. Basic requests + BeautifulSoup works for small-scale, occasional scraping
  2. Rotate User-Agents and add delays to extend your runway
  3. Use a proxy service like ScraperAPI when you need reliability at scale
  4. Store price history in JSON or SQLite for trend analysis
  5. Schedule with cron for hands-off monitoring
  6. Consider managed solutions when maintaining scrapers costs more than paying for one

The code above is a starting point. For production use, add SQLite instead of JSON, proper error handling, and structured logging. But the core pattern — fetch, parse, compare, alert — stays the same.


What's your Amazon scraping setup? Drop a comment if you've found better selectors or anti-detection tricks that work in 2026.

Top comments (0)