DEV Community

Mai Vy Ly
Mai Vy Ly

Posted on • Edited on

How to Web Scrape Amazon with Python?

Want a fast, practical guide to scraping Amazon product data with Python? Here’s a concise walkthrough using requests + BeautifulSoup, with anti-bot tips, pagination, and clean parsing. For a working reference, check the GitHub repo: https://github.com/maivyly52-gif/amazon-web-scraper-python

What You’ll Learn

  • Send realistic HTTP requests (headers, delays)

  • Parse titles, prices, ratings, URLs with BeautifulSoup

  • Handle pagination safely

  • Reduce blocks with rotating user agents/proxies

  • Know ethical & legal guardrails

Explore the full example code here: https://github.com/maivyly52-gif/amazon-web-scraper-python

pip install requests beautifulsoup4 fake-useragent
Enter fullscreen mode Exit fullscreen mode

(Proxy support? Add httpx/requests[socks] or a provider SDK.)

Core Steps

1) Build a “human-like” request

import time, random, requests
from fake_useragent import UserAgent

ua = UserAgent()
headers = {
    "User-Agent": ua.random,
    "Accept-Language": "en-US,en;q=0.9",
}

def fetch(url, *, retries=3, backoff=2):
    for i in range(retries):
        resp = requests.get(url, headers=headers, timeout=20)
        if resp.status_code == 200 and "Robot Check" not in resp.text:
            return resp.text
        time.sleep(backoff * (i + 1) + random.uniform(0.2, 1.1))
    return None

Enter fullscreen mode Exit fullscreen mode

2) Parse product cards

from bs4 import BeautifulSoup

def parse_search(html):
    soup = BeautifulSoup(html, "html.parser")
    items = []
    for card in soup.select("div.s-main-slot div[data-asin][data-component-type='s-search-result']"):
        asin = card.get("data-asin")
        title_el = card.select_one("h2 a span")
        price_whole = card.select_one("span.a-price > span.a-offscreen")
        rating = card.select_one("span.a-icon-alt")
        link_el = card.select_one("h2 a")
        if not (asin and title_el and link_el):
            continue
        items.append({
            "asin": asin,
            "title": title_el.get_text(strip=True),
            "price": price_whole.get_text(strip=True) if price_whole else None,
            "rating": rating.get_text(strip=True) if rating else None,
            "url": f"https://www.amazon.com{link_el['href'].split('?')[0]}",
        })
    return items

Enter fullscreen mode Exit fullscreen mode

3) Walk pagination (carefully)

from urllib.parse import urlencode

def search_amazon(query, pages=1):
    base = "https://www.amazon.com/s"
    results = []
    for page in range(1, pages + 1):
        params = {"k": query, "page": page}
        html = fetch(f"{base}?{urlencode(params)}")
        if not html:
            break
        results.extend(parse_search(html))
        time.sleep(random.uniform(1.2, 3.1))  # be gentle
    return results

if __name__ == "__main__":
    data = search_amazon("wireless earbuds", pages=2)
    for row in data[:5]:
        print(row)

Enter fullscreen mode Exit fullscreen mode

Prefer a ready-to-run example? See the repo’s code paths and notes: https://github.com/maivyly52-gif/amazon-web-scraper-python

Anti-Bot Tips (Reduce Blocks)

  • Rotate User-Agents per request (fake-useragent or a maintained list).

  • Respectful delays (1–5s jitter) and low concurrency.

  • Proxies: residential/mobile work best; rotate IPs and subnets.

  • Fewer parameters in URLs; avoid suspicious patterns.

  • Fallback strategies: try different storefronts or narrower filters when you hit captchas.

You’ll find a compact starter you can adapt in the GitHub project: https://github.com/maivyly52-gif/amazon-web-scraper-python

Data You Can Extract (Typical)

  • Title, price, list price, rating, review count

  • ASIN, product URL, image URL

  • Badges (e.g., “Best Seller”, “Amazon’s Choice”)

  • Availability snippets

Legal & Ethical Notes

  • Check Amazon’s Terms of Use and your local laws before scraping.

  • Prefer official APIs when possible (e.g., Amazon Product Advertising API) for reliability.

  • Don’t overload servers; throttle requests and cache results.

  • Use scraped data only where you have the right to use it.

Next Steps

  • Turn results into CSV/JSON for analysis.

  • Add retry with CAPTCHA detection and proxy rotation.

  • Expand parsing to product detail pages (features, bullets, specs).

Dive deeper, copy the boilerplate, and tweak it for your use case here: https://github.com/maivyly52-gif/amazon-web-scraper-python — and if you find it useful, the repo and explore the code examples in https://github.com/maivyly52-gif/amazon-web-scraper-python

Top comments (1)

Collapse
 
onlineproxy profile image
OnlineProxy

IP rotation beats UA rotation; consider residential/mobile proxies if you need consistency, track ban rates, and normalize currencies/availability per locale. Export deduped ASINs to tidy CSV/JSON, graduate to Scrapy or httpx when you scale, and always play nice with Amazon’s Terms and official APIs. For sanity, split things into fetch.py, parse.py, and runner.py, and wire up search_amazon plus tests around HTML fixtures.