Every website you visit is full of data. Web scraping is how you extract that data automatically using code instead of copying it manually.
A real problem web scraping solves
Imagine you want a list of all books under £10 from an online bookstore. The website has 50 pages of books. Copying them manually would take hours.
A web scraper does it in 30 seconds.
How websites work — the quick version
Every webpage is just an HTML file. When you open a website, your browser downloads that HTML and displays it visually. Web scraping works by downloading that same HTML and extracting specific pieces of data from it.
Your first web scraper — 15 lines of code
import requests
from bs4 import BeautifulSoup
import pandas as pd
# Step 1: Download the webpage
url = "https://books.toscrape.com"
response = requests.get(url)
# Step 2: Parse the HTML
soup = BeautifulSoup(response.content, "html.parser")
# Step 3: Find all book items on the page
books = soup.find_all("article", class_="product_pod")
# Step 4: Extract the data we want
data = []
for book in books:
title = book.find("h3").find("a")["title"]
price = book.find("p", class_="price_color").text
data.append({"Title": title, "Price": price})
# Step 5: Save to Excel
df = pd.DataFrame(data)
df.to_excel("books.xlsx", index=False)
print(f"Scraped {len(data)} books")
Run this and you get an Excel file with every book title and price from the page. books.toscrape.com is a safe practice website built specifically for learning scraping.
What each part does
requests.get(url) — downloads the raw HTML of the webpage, the same way your browser does
BeautifulSoup — reads the HTML and lets you search through it like a document
find_all("article", class_="product_pod") — finds every book item on the page by looking for its HTML tag and class name
title = book.find("h3").find("a")["title"] — digs into each book item and pulls out just the title text
df.to_excel() — saves everything neatly into an Excel file
Real uses of web scraping
- Price monitoring — track competitor prices on Amazon or Flipkart automatically
- Lead generation — extract business names and contact details from directories
- Research — collect hundreds of data points from multiple websites for analysis
- Job listings — scrape job boards and filter by your criteria automatically
- News aggregation — pull headlines from multiple news sites into one place
Is web scraping legal?
Generally yes, with conditions. Scraping publicly visible data is usually fine. The rules are:
- Only scrape public pages — never pages that require login
- Respect the site's robots.txt file
- Add delays between requests so you don't overload the server
- Never scrape personal or private data
When in doubt, check the website's Terms of Service.
How to install the libraries
Open your terminal and run:
pip install requests beautifulsoup4 pandas openpyxl
Then run the script above. It works immediately.
The one-line summary
Web scraping is code that reads a webpage the same way your browser does — but instead of displaying it, it extracts specific data and saves it for you.
Written by Raaga Priya Madhan — CSE student, Bangalore. I build Python automation scripts for businesses. See my scraping code on GitHub or connect on LinkedIn
Top comments (0)