DEV Community

Cover image for What is Web Scraping? A Beginner's Guide with Real Python Code
Raaga Priya Madhan
Raaga Priya Madhan

Posted on

What is Web Scraping? A Beginner's Guide with Real Python Code

Every website you visit is full of data. Web scraping is how you extract that data automatically using code instead of copying it manually.

A real problem web scraping solves

Imagine you want a list of all books under £10 from an online bookstore. The website has 50 pages of books. Copying them manually would take hours.

A web scraper does it in 30 seconds.

How websites work — the quick version

Every webpage is just an HTML file. When you open a website, your browser downloads that HTML and displays it visually. Web scraping works by downloading that same HTML and extracting specific pieces of data from it.

Your first web scraper — 15 lines of code

import requests
from bs4 import BeautifulSoup
import pandas as pd

# Step 1: Download the webpage
url = "https://books.toscrape.com"
response = requests.get(url)

# Step 2: Parse the HTML
soup = BeautifulSoup(response.content, "html.parser")

# Step 3: Find all book items on the page
books = soup.find_all("article", class_="product_pod")

# Step 4: Extract the data we want
data = []
for book in books:
    title = book.find("h3").find("a")["title"]
    price = book.find("p", class_="price_color").text
    data.append({"Title": title, "Price": price})

# Step 5: Save to Excel
df = pd.DataFrame(data)
df.to_excel("books.xlsx", index=False)
print(f"Scraped {len(data)} books")
Enter fullscreen mode Exit fullscreen mode

Run this and you get an Excel file with every book title and price from the page. books.toscrape.com is a safe practice website built specifically for learning scraping.

What each part does

requests.get(url) — downloads the raw HTML of the webpage, the same way your browser does

BeautifulSoup — reads the HTML and lets you search through it like a document

find_all("article", class_="product_pod") — finds every book item on the page by looking for its HTML tag and class name

title = book.find("h3").find("a")["title"] — digs into each book item and pulls out just the title text

df.to_excel() — saves everything neatly into an Excel file

Real uses of web scraping

  • Price monitoring — track competitor prices on Amazon or Flipkart automatically
  • Lead generation — extract business names and contact details from directories
  • Research — collect hundreds of data points from multiple websites for analysis
  • Job listings — scrape job boards and filter by your criteria automatically
  • News aggregation — pull headlines from multiple news sites into one place

Is web scraping legal?

Generally yes, with conditions. Scraping publicly visible data is usually fine. The rules are:

  • Only scrape public pages — never pages that require login
  • Respect the site's robots.txt file
  • Add delays between requests so you don't overload the server
  • Never scrape personal or private data

When in doubt, check the website's Terms of Service.

How to install the libraries

Open your terminal and run:

pip install requests beautifulsoup4 pandas openpyxl
Enter fullscreen mode Exit fullscreen mode

Then run the script above. It works immediately.

The one-line summary

Web scraping is code that reads a webpage the same way your browser does — but instead of displaying it, it extracts specific data and saves it for you.


Written by Raaga Priya Madhan — CSE student, Bangalore. I build Python automation scripts for businesses. See my scraping code on GitHub or connect on LinkedIn

Top comments (0)