Web scraping is one of the most practical Python skills you can learn. Here's how to build one from scratch.
What You Need
pip install requests beautifulsoup4
Step 1: Fetch the Page
import requests
from bs4 import BeautifulSoup
url = "https://example.com/products"
response = requests.get(url, headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
})
soup = BeautifulSoup(response.text, "html.parser")
Always set a User-Agent. Many sites block requests without one.
Step 2: Find Your Data
Use your browser's DevTools (F12 → Inspect) to identify the HTML structure. Then:
# Find all product cards
products = soup.find_all("div", class_="product-card")
for product in products:
name = product.find("h2").text.strip()
price = product.find("span", class_="price").text.strip()
print(f"{name}: {price}")
Step 3: Handle Pagination
def scrape_all_pages(base_url):
all_data = []
page = 1
while True:
response = requests.get(f"{base_url}?page={page}")
soup = BeautifulSoup(response.text, "html.parser")
items = soup.find_all("div", class_="product-card")
if not items:
break
for item in items:
all_data.append({
"name": item.find("h2").text.strip(),
"price": item.find("span", class_="price").text.strip(),
})
page += 1
return all_data
Step 4: Save to CSV
import csv
data = scrape_all_pages("https://example.com/products")
with open("products.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "price"])
writer.writeheader()
writer.writerows(data)
Step 5: Add Error Handling
import time
def safe_request(url, retries=3):
for attempt in range(retries):
try:
r = requests.get(url, timeout=10, headers={
"User-Agent": "Mozilla/5.0"
})
r.raise_for_status()
return r
except requests.RequestException as e:
print(f"Attempt {attempt+1} failed: {e}")
time.sleep(2 ** attempt)
return None
Common Pitfalls
-
Rate limiting — Add
time.sleep(1)between requests - Dynamic content — If data loads via JavaScript, use Playwright or Selenium instead
- Changing HTML — Your selectors will break when the site updates. Use flexible selectors.
- Legal — Check the site's robots.txt and terms of service
Want ready-to-use scraping scripts? My Web Scraping Starter Kit includes 5 production scripts covering tables, pagination, login-protected sites, and API extraction.
Also check out: Python Automation Toolkit — 10 scripts for common dev tasks.
Top comments (0)