DEV Community

Mai Vy Ly
Mai Vy Ly

Posted on

How to Use Python to Scrape Amazon?

Scraping Amazon can help you monitor prices, track reviews, and analyze product listings—but you must do it responsibly and within the site’s Terms of Service. If you’re exploring a starter approach, this GitHub project is a handy reference: amazon-scraper-python.

Quick Overview

At a high level, you’ll:

  1. Send an HTTP request with realistic headers.

  2. Parse the HTML to extract product data (title, price, rating, reviews).

  3. Handle pagination and anti-bot measures (rotating user agents/proxies).

  4. Store results (CSV/JSON/DB).

The sample structure in the repo—amazon-scraper-python
—illustrates these steps with clean, beginner-friendly code.

Minimal Example (Educational Use Only)

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0",
    "Accept-Language": "en-US,en;q=0.9"
}

url = "https://www.amazon.com/s?k=wireless+earbuds"
resp = requests.get(url, headers=headers, timeout=20)
resp.raise_for_status()

soup = BeautifulSoup(resp.text, "html.parser")
items = []

for card in soup.select("div[data-component-type='s-search-result']"):
    title = card.select_one("h2 a span")
    price_whole = card.select_one(".a-price .a-offscreen")
    rating = card.select_one("i span.a-icon-alt")
    link = card.select_one("h2 a")

    items.append({
        "title": title.get_text(strip=True) if title else None,
        "price": price_whole.get_text(strip=True) if price_whole else None,
        "rating": rating.get_text(strip=True) if rating else None,
        "url": f"https://www.amazon.com{link['href']}" if link else None
    })

print(items[:5])
Enter fullscreen mode Exit fullscreen mode

For a fuller implementation with utilities, see the examples in this https://github.com/maivyly52-gif/amazon-scraper-python

Practical Tips to Reduce Blocks

  • Rotate headers & delays: Randomize User-Agent and add human-like waits.

  • Use residential/mobile proxies: Avoid sending many requests from a single IP.

  • Retry logic & backoff: Handle 503/robot checks gracefully.

  • CSS selectors change: Keep selectors resilient; prefer stable attributes.

  • Respect robots & ToS: Prefer official APIs where possible.

The project at https://github.com/maivyly52-gif/amazon-scraper-python shows a sensible baseline you can extend with rotating proxies, session handling, and structured outputs.

What Data Can You Extract?

Legal & Ethical Notes

  • Check Amazon’s Terms of Service and your local laws.

  • Use data you’re authorized to access; avoid personal data.

  • Cache results and keep request rates low.

Ready to try it? Explore the code, run examples, and adapt it for your use case here: https://github.com/maivyly52-gif/amazon-scraper-python
.

Top comments (0)