DEV Community

Ayaz
Ayaz

Posted on

How to Scrape a Website with Python in 10 Minutes

Introduction

Web scraping is one of those skills that sounds complicated but is surprisingly easy to pick up in Python. In this tutorial, I'll show you how to scrape real data from a website in under 10 minutes using two popular libraries: Requests and BeautifulSoup.

By the end, you'll have a working scraper that pulls data from a webpage and saves it in a format you can actually use.

  • What is Web Scraping?

Web scraping is the process of automatically extracting data from websites. Instead of manually copying information, you write a script that does it for you — instantly, at scale.

Common use cases:

  • Collecting product prices for comparison
  • Gathering news headlines
  • Building datasets for research or machine learning
  • Monitoring job listings

What we'll Need

  • Python 3.7+
  • Basic Python knowledge (loops, lists, print statements)
  • Terminal / command prompt

Step 1: Install the Libraries

Open your terminal and install the two libraries we'll use:

pip install requests beautifulsoup4
Enter fullscreen mode Exit fullscreen mode
  • Requests — fetches the HTML content of a webpage
  • BeautifulSoup — parses and navigates that HTML so you can extract what you need

Step 2: Fetch a Web Page

We'll use books.toscrape.com — a website built specifically for scraping practice. No legal issues, no rate limits.

Create a file called scraper.py:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)

print(response.status_code)  # Should print 200
print(response.text[:500])   # First 500 characters of the HTML
Enter fullscreen mode Exit fullscreen mode

Run it:

python scraper.py
Enter fullscreen mode Exit fullscreen mode

If you see 200 printed, the request worked. You're now downloading an entire webpage with 2 lines of code.

Step 3: Parse the HTML with BeautifulSoup

Now let's make sense of that HTML:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
print(soup.title.text)  # Prints the page title
Enter fullscreen mode Exit fullscreen mode

BeautifulSoup turns the raw HTML into a navigable object. Think of it as a map for the webpage.

Step 4: Find the Data we Want

Right-click any book title on books.toscrape.com and hit "Inspect" in your browser. You'll see book titles are inside <h3> tags wrapped in <article class="product_pod">.

Let's extract all book titles:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

books = soup.find_all('article', class_='product_pod')

for book in books:
    title = book.h3.a['title']
    print(title)
Enter fullscreen mode Exit fullscreen mode

Run it and you'll see 20 book titles printed in your terminal. That's web scraping — done.

Step 5: Extract More Data (Price + Rating)

Let's also grab the price and star rating for each book:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

books = soup.find_all('article', class_='product_pod')

for book in books:
    title = book.h3.a['title']
    price = book.find('p', class_='price_color').text
    rating = book.p['class'][1]  # e.g. "Three", "Five"
    print(f"{title} | {price} | {rating} stars")
Enter fullscreen mode Exit fullscreen mode

Output will look like:

A Light in the Attic | £51.77 | Three stars
Tipping the Velvet | £53.74 | One stars
Soumission | £50.10 | One stars
...
Enter fullscreen mode Exit fullscreen mode

Step 6: Save the Data to a CSV File

Raw printed data isn't very useful. Let's save it to a CSV so you can open it in Excel or use it in another script:

import requests
from bs4 import BeautifulSoup
import csv

url = "http://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

books = soup.find_all('article', class_='product_pod')

with open('books.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Title', 'Price', 'Rating'])  # Header row

    for book in books:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text
        rating = book.p['class'][1]
        writer.writerow([title, price, rating])

print("Saved to books.csv!")
Enter fullscreen mode Exit fullscreen mode

Open books.csv and you'll have a clean spreadsheet of all 20 books with titles, prices, and ratings.

Step 7: Scrape Multiple Pages

The site has 50 pages. Let's loop through them all:

import requests
from bs4 import BeautifulSoup
import csv

base_url = "http://books.toscrape.com/catalogue/page-{}.html"
all_books = []

for page_num in range(1, 51):  # Pages 1 to 50
    url = base_url.format(page_num)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    books = soup.find_all('article', class_='product_pod')

    for book in books:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text
        rating = book.p['class'][1]
        all_books.append([title, price, rating])

    print(f"Scraped page {page_num}/50")

with open('all_books.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Title', 'Price', 'Rating'])
    writer.writerows(all_books)

print(f"Done! Saved {len(all_books)} books.")
Enter fullscreen mode Exit fullscreen mode

This scrapes all 1,000 books from the website and saves them in one CSV file.


Important: Be Responsible When Scraping

Before scraping any real website, always:

  1. Check the robots.txt — visit website.com/robots.txt to see what's allowed
  2. Add delays between requests using time.sleep(1) to avoid overloading servers
  3. Read the Terms of Service — some sites prohibit scraping
  4. Never scrape personal data without consent

What we Built

Here's a summary of what you can now do:

Task Code
Fetch a webpage requests.get(url)
Parse HTML BeautifulSoup(html, 'html.parser')
Find elements soup.find_all('tag', class_='name')
Save to CSV csv.writer
Scrape multiple pages Loop with range()

Top comments (0)