Want a fast, practical guide to scraping Amazon product data with Python? Here’s a concise walkthrough using requests + BeautifulSoup
, with anti-bot tips, pagination, and clean parsing. For a working reference, check the GitHub repo: https://github.com/maivyly52-gif/amazon-web-scraper-python
What You’ll Learn
Send realistic HTTP requests (headers, delays)
Parse titles, prices, ratings, URLs with BeautifulSoup
Handle pagination safely
Reduce blocks with rotating user agents/proxies
Know ethical & legal guardrails
Explore the full example code here: https://github.com/maivyly52-gif/amazon-web-scraper-python
pip install requests beautifulsoup4 fake-useragent
(Proxy support? Add httpx/requests[socks]
or a provider SDK.)
Core Steps
1) Build a “human-like” request
import time, random, requests
from fake_useragent import UserAgent
ua = UserAgent()
headers = {
"User-Agent": ua.random,
"Accept-Language": "en-US,en;q=0.9",
}
def fetch(url, *, retries=3, backoff=2):
for i in range(retries):
resp = requests.get(url, headers=headers, timeout=20)
if resp.status_code == 200 and "Robot Check" not in resp.text:
return resp.text
time.sleep(backoff * (i + 1) + random.uniform(0.2, 1.1))
return None
2) Parse product cards
from bs4 import BeautifulSoup
def parse_search(html):
soup = BeautifulSoup(html, "html.parser")
items = []
for card in soup.select("div.s-main-slot div[data-asin][data-component-type='s-search-result']"):
asin = card.get("data-asin")
title_el = card.select_one("h2 a span")
price_whole = card.select_one("span.a-price > span.a-offscreen")
rating = card.select_one("span.a-icon-alt")
link_el = card.select_one("h2 a")
if not (asin and title_el and link_el):
continue
items.append({
"asin": asin,
"title": title_el.get_text(strip=True),
"price": price_whole.get_text(strip=True) if price_whole else None,
"rating": rating.get_text(strip=True) if rating else None,
"url": f"https://www.amazon.com{link_el['href'].split('?')[0]}",
})
return items
3) Walk pagination (carefully)
from urllib.parse import urlencode
def search_amazon(query, pages=1):
base = "https://www.amazon.com/s"
results = []
for page in range(1, pages + 1):
params = {"k": query, "page": page}
html = fetch(f"{base}?{urlencode(params)}")
if not html:
break
results.extend(parse_search(html))
time.sleep(random.uniform(1.2, 3.1)) # be gentle
return results
if __name__ == "__main__":
data = search_amazon("wireless earbuds", pages=2)
for row in data[:5]:
print(row)
Prefer a ready-to-run example? See the repo’s code paths and notes: https://github.com/maivyly52-gif/amazon-web-scraper-python
Anti-Bot Tips (Reduce Blocks)
Rotate User-Agents per request (fake-useragent or a maintained list).
Respectful delays (1–5s jitter) and low concurrency.
Proxies: residential/mobile work best; rotate IPs and subnets.
Fewer parameters in URLs; avoid suspicious patterns.
Fallback strategies: try different storefronts or narrower filters when you hit captchas.
You’ll find a compact starter you can adapt in the GitHub project: https://github.com/maivyly52-gif/amazon-web-scraper-python
Data You Can Extract (Typical)
Title, price, list price, rating, review count
ASIN, product URL, image URL
Badges (e.g., “Best Seller”, “Amazon’s Choice”)
Availability snippets
Legal & Ethical Notes
Check Amazon’s Terms of Use and your local laws before scraping.
Prefer official APIs when possible (e.g., Amazon Product Advertising API) for reliability.
Don’t overload servers; throttle requests and cache results.
Use scraped data only where you have the right to use it.
Next Steps
Turn results into CSV/JSON for analysis.
Add retry with CAPTCHA detection and proxy rotation.
Expand parsing to product detail pages (features, bullets, specs).
Dive deeper, copy the boilerplate, and tweak it for your use case here: https://github.com/maivyly52-gif/amazon-web-scraper-python — and if you find it useful, the repo and explore the code examples in https://github.com/maivyly52-gif/amazon-web-scraper-python
Top comments (0)