What is Web Scraping? A Beginner's Guide with Real Python Code

#python #webscraping #beginners #programming

Every website you visit is full of data. Web scraping is how you extract that data automatically using code instead of copying it manually.

A real problem web scraping solves

Imagine you want a list of all books under £10 from an online bookstore. The website has 50 pages of books. Copying them manually would take hours.

A web scraper does it in 30 seconds.

How websites work — the quick version

Every webpage is just an HTML file. When you open a website, your browser downloads that HTML and displays it visually. Web scraping works by downloading that same HTML and extracting specific pieces of data from it.

Your first web scraper — 15 lines of code

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Step 1: Download the webpage
url = "https://books.toscrape.com"
response = requests.get(url)

# Step 2: Parse the HTML
soup = BeautifulSoup(response.content, "html.parser")

# Step 3: Find all book items on the page
books = soup.find_all("article", class_="product_pod")

# Step 4: Extract the data we want
data = []
for book in books:
    title = book.find("h3").find("a")["title"]
    price = book.find("p", class_="price_color").text
    data.append({"Title": title, "Price": price})

# Step 5: Save to Excel
df = pd.DataFrame(data)
df.to_excel("books.xlsx", index=False)
print(f"Scraped {len(data)} books")

Run this and you get an Excel file with every book title and price from the page. books.toscrape.com is a safe practice website built specifically for learning scraping.

What each part does

requests.get(url) — downloads the raw HTML of the webpage, the same way your browser does

BeautifulSoup — reads the HTML and lets you search through it like a document

find_all("article", class_="product_pod") — finds every book item on the page by looking for its HTML tag and class name

title = book.find("h3").find("a")["title"] — digs into each book item and pulls out just the title text

df.to_excel() — saves everything neatly into an Excel file

Real uses of web scraping

Price monitoring — track competitor prices on Amazon or Flipkart automatically
Lead generation — extract business names and contact details from directories
Research — collect hundreds of data points from multiple websites for analysis
Job listings — scrape job boards and filter by your criteria automatically
News aggregation — pull headlines from multiple news sites into one place

Is web scraping legal?

Generally yes, with conditions. Scraping publicly visible data is usually fine. The rules are:

Only scrape public pages — never pages that require login
Respect the site's robots.txt file
Add delays between requests so you don't overload the server
Never scrape personal or private data

When in doubt, check the website's Terms of Service.

How to install the libraries

Open your terminal and run:

pip install requests beautifulsoup4 pandas openpyxl

Then run the script above. It works immediately.

The one-line summary

Web scraping is code that reads a webpage the same way your browser does — but instead of displaying it, it extracts specific data and saves it for you.

Written by Raaga Priya Madhan — CSE student, Bangalore. I build Python automation scripts for businesses. See my scraping code on GitHub or connect on LinkedIn