How to Use Python to Scrape Amazon?

#tutorial #python #programming #webdev

Scraping Amazon can help you monitor prices, track reviews, and analyze product listings—but you must do it responsibly and within the site’s Terms of Service. If you’re exploring a starter approach, this GitHub project is a handy reference: amazon-scraper-python.

Quick Overview

At a high level, you’ll:

Send an HTTP request with realistic headers.
Parse the HTML to extract product data (title, price, rating, reviews).
Handle pagination and anti-bot measures (rotating user agents/proxies).
Store results (CSV/JSON/DB).

The sample structure in the repo—amazon-scraper-python
—illustrates these steps with clean, beginner-friendly code.

Minimal Example (Educational Use Only)

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0",
    "Accept-Language": "en-US,en;q=0.9"
}

url = "https://www.amazon.com/s?k=wireless+earbuds"
resp = requests.get(url, headers=headers, timeout=20)
resp.raise_for_status()

soup = BeautifulSoup(resp.text, "html.parser")
items = []

for card in soup.select("div[data-component-type='s-search-result']"):
    title = card.select_one("h2 a span")
    price_whole = card.select_one(".a-price .a-offscreen")
    rating = card.select_one("i span.a-icon-alt")
    link = card.select_one("h2 a")

    items.append({
        "title": title.get_text(strip=True) if title else None,
        "price": price_whole.get_text(strip=True) if price_whole else None,
        "rating": rating.get_text(strip=True) if rating else None,
        "url": f"https://www.amazon.com{link['href']}" if link else None
    })

print(items[:5])

For a fuller implementation with utilities, see the examples in this https://github.com/maivyly52-gif/amazon-scraper-python

Practical Tips to Reduce Blocks

Rotate headers & delays: Randomize User-Agent and add human-like waits.
Use residential/mobile proxies: Avoid sending many requests from a single IP.
Retry logic & backoff: Handle 503/robot checks gracefully.
CSS selectors change: Keep selectors resilient; prefer stable attributes.
Respect robots & ToS: Prefer official APIs where possible.

The project at https://github.com/maivyly52-gif/amazon-scraper-python shows a sensible baseline you can extend with rotating proxies, session handling, and structured outputs.

What Data Can You Extract?

Product title, ASIN, price, availability
Rating, review count
Seller info and badges
Image URLs and variant options
(Availability varies by page/locale—validate selectors regularly. See examples in https://github.com/maivyly52-gif/amazon-scraper-python)