How to Build a Web Scraper in Python (Step by Step)

#python #webscraping #tutorial #beginners

Web scraping is one of the most practical Python skills you can learn. Here's how to build one from scratch.

What You Need

pip install requests beautifulsoup4

Step 1: Fetch the Page

import requests
from bs4 import BeautifulSoup

url = "https://example.com/products"
response = requests.get(url, headers={
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
})
soup = BeautifulSoup(response.text, "html.parser")

Always set a User-Agent. Many sites block requests without one.

Step 2: Find Your Data

Use your browser's DevTools (F12 → Inspect) to identify the HTML structure. Then:

# Find all product cards
products = soup.find_all("div", class_="product-card")

for product in products:
    name = product.find("h2").text.strip()
    price = product.find("span", class_="price").text.strip()
    print(f"{name}: {price}")

Step 3: Handle Pagination

def scrape_all_pages(base_url):
    all_data = []
    page = 1
    while True:
        response = requests.get(f"{base_url}?page={page}")
        soup = BeautifulSoup(response.text, "html.parser")
        items = soup.find_all("div", class_="product-card")
        if not items:
            break
        for item in items:
            all_data.append({
                "name": item.find("h2").text.strip(),
                "price": item.find("span", class_="price").text.strip(),
            })
        page += 1
    return all_data

Step 4: Save to CSV

import csv

data = scrape_all_pages("https://example.com/products")
with open("products.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "price"])
    writer.writeheader()
    writer.writerows(data)

Step 5: Add Error Handling

import time

def safe_request(url, retries=3):
    for attempt in range(retries):
        try:
            r = requests.get(url, timeout=10, headers={
                "User-Agent": "Mozilla/5.0"
            })
            r.raise_for_status()
            return r
        except requests.RequestException as e:
            print(f"Attempt {attempt+1} failed: {e}")
            time.sleep(2 ** attempt)
    return None

Common Pitfalls

Rate limiting — Add time.sleep(1) between requests
Dynamic content — If data loads via JavaScript, use Playwright or Selenium instead
Changing HTML — Your selectors will break when the site updates. Use flexible selectors.
Legal — Check the site's robots.txt and terms of service

Want ready-to-use scraping scripts? My Web Scraping Starter Kit includes 5 production scripts covering tables, pagination, login-protected sites, and API extraction.

Also check out: Python Automation Toolkit — 10 scripts for common dev tasks.