Ayaz

Posted on Mar 21

How to Scrape a Website with Python in 10 Minutes

#python #webscraping #tutorial #beginners

Introduction

Web scraping is one of those skills that sounds complicated but is surprisingly easy to pick up in Python. In this tutorial, I'll show you how to scrape real data from a website in under 10 minutes using two popular libraries: Requests and BeautifulSoup.

By the end, you'll have a working scraper that pulls data from a webpage and saves it in a format you can actually use.

What is Web Scraping?

Web scraping is the process of automatically extracting data from websites. Instead of manually copying information, you write a script that does it for you — instantly, at scale.

Common use cases:

Collecting product prices for comparison
Gathering news headlines
Building datasets for research or machine learning
Monitoring job listings

What we'll Need

Python 3.7+
Basic Python knowledge (loops, lists, print statements)
Terminal / command prompt

Step 1: Install the Libraries

Open your terminal and install the two libraries we'll use:

pip install requests beautifulsoup4

Requests — fetches the HTML content of a webpage
BeautifulSoup — parses and navigates that HTML so you can extract what you need

Step 2: Fetch a Web Page

We'll use books.toscrape.com — a website built specifically for scraping practice. No legal issues, no rate limits.

Create a file called scraper.py:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)

print(response.status_code)  # Should print 200
print(response.text[:500])   # First 500 characters of the HTML

Run it:

python scraper.py

If you see 200 printed, the request worked. You're now downloading an entire webpage with 2 lines of code.

Step 3: Parse the HTML with BeautifulSoup

Now let's make sense of that HTML:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
print(soup.title.text)  # Prints the page title

BeautifulSoup turns the raw HTML into a navigable object. Think of it as a map for the webpage.

Step 4: Find the Data we Want

Right-click any book title on books.toscrape.com and hit "Inspect" in your browser. You'll see book titles are inside <h3> tags wrapped in <article class="product_pod">.

Let's extract all book titles:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

books = soup.find_all('article', class_='product_pod')

for book in books:
    title = book.h3.a['title']
    print(title)

Run it and you'll see 20 book titles printed in your terminal. That's web scraping — done.

Step 5: Extract More Data (Price + Rating)

Let's also grab the price and star rating for each book:

import requests
from bs4 import BeautifulSoup

url = "http://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

books = soup.find_all('article', class_='product_pod')

for book in books:
    title = book.h3.a['title']
    price = book.find('p', class_='price_color').text
    rating = book.p['class'][1]  # e.g. "Three", "Five"
    print(f"{title} | {price} | {rating} stars")

Output will look like:

A Light in the Attic | £51.77 | Three stars
Tipping the Velvet | £53.74 | One stars
Soumission | £50.10 | One stars
...

Step 6: Save the Data to a CSV File

Raw printed data isn't very useful. Let's save it to a CSV so you can open it in Excel or use it in another script:

import requests
from bs4 import BeautifulSoup
import csv

url = "http://books.toscrape.com/"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

books = soup.find_all('article', class_='product_pod')

with open('books.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Title', 'Price', 'Rating'])  # Header row

    for book in books:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text
        rating = book.p['class'][1]
        writer.writerow([title, price, rating])

print("Saved to books.csv!")

Open books.csv and you'll have a clean spreadsheet of all 20 books with titles, prices, and ratings.

Step 7: Scrape Multiple Pages

The site has 50 pages. Let's loop through them all:

import requests
from bs4 import BeautifulSoup
import csv

base_url = "http://books.toscrape.com/catalogue/page-{}.html"
all_books = []

for page_num in range(1, 51):  # Pages 1 to 50
    url = base_url.format(page_num)
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    books = soup.find_all('article', class_='product_pod')

    for book in books:
        title = book.h3.a['title']
        price = book.find('p', class_='price_color').text
        rating = book.p['class'][1]
        all_books.append([title, price, rating])

    print(f"Scraped page {page_num}/50")

with open('all_books.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(['Title', 'Price', 'Rating'])
    writer.writerows(all_books)

print(f"Done! Saved {len(all_books)} books.")

This scrapes all 1,000 books from the website and saves them in one CSV file.

Important: Be Responsible When Scraping

Before scraping any real website, always:

Check the robots.txt — visit website.com/robots.txt to see what's allowed
Add delays between requests using time.sleep(1) to avoid overloading servers
Read the Terms of Service — some sites prohibit scraping
Never scrape personal data without consent

What we Built

Here's a summary of what you can now do:

Task	Code
Fetch a webpage	`requests.get(url)`
Parse HTML	`BeautifulSoup(html, 'html.parser')`
Find elements	`soup.find_all('tag', class_='name')`
Save to CSV	`csv.writer`
Scrape multiple pages	Loop with `range()`

DEV Community

How to Scrape a Website with Python in 10 Minutes

Top comments (0)