Web Scraping with python: Real world examples

#python #webscraping #programming #coding

Hello Again!

In this post, we'll explore some real-world examples of web scraping using Python and popular libraries like BeautifulSoup and Scrapy.

Example 1: Scraping News Articles

Let's say you want to build a machine learning model to analyze sentiment in news articles. To do this, you'll need a large dataset of news articles with labels indicating the sentiment of each article. Instead of manually collecting and labeling articles, you can use web scraping to automate the process.

One popular source of news articles is the New York Times website. Here's some Python code using BeautifulSoup to scrape the headlines and text from articles on the New York Times website:

import requests
from bs4 import BeautifulSoup

url = 'https://www.nytimes.com/'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

articles = []
for article in soup.find_all('article'):
headline = article.find('h2').text.strip()
body = article.find('p').text.strip()
articles.append({'headline': headline, 'body': body})

In this example, we first use the requests library to retrieve the HTML content of the New York Times homepage. We then create a BeautifulSoup object and use its methods to extract the headlines and text from each article on the page. We store the data in a list of dictionaries, with each dictionary representing an article and its headline and body text.

Example 2: Scraping Product Prices

Another common use case for web scraping is to collect data on product prices from e-commerce websites. This can be useful for market research, competitor analysis, and more.

Let's say you want to compare prices for a particular product across multiple e-commerce websites. Here's some Python code using Scrapy to scrape the prices for a product from Amazon, Best Buy, and Walmart:

import scrapy

class ProductSpider(scrapy.Spider):
    name = 'product_spider'
    start_urls = [
      'https://www.amazon.com/dp/B08J62XKJT', 
      'https://www.bestbuy.com/site/sony-playstation-5- 
           console/6426149.p?skuId=6426149', 
      'https://www.walmart.com/ip/Sony-PlayStation-5/363472942'
     ]
    def parse(self, response):
        title = response.css('h1.a-text-normal::text').get()
        price = response.css('span.a-offscreen::text').get()
        yield {'title': title, 'price': price}

Example 3: Scraping Weather Data

Web scraping can also be used to collect weather data from websites like Weather Underground or the National Weather Service. This data can be used for climate modeling, weather forecasting, and more.

Let's say you want to collect temperature data for Matmata city from Weather Underground. Here's some Python code using BeautifulSoup to scrape the temperature data:

import requests
from bs4 import BeautifulSoup

url = 'https://www.wunderground.com/weather/tn/matmata'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

temperature = soup.find('span', {'class': 'wu-value wu-value-to'}).text.strip()
print(f"The current temperature in Matmata is {temperature} degrees Fahrenheit.")

In this example, we use requests and BeautifulSoup to scrape the current temperature for Matmata city, Tunisia from the Weather Underground website. We first send a GET request to the website and create a BeautifulSoup object. We then use a CSS selector to extract the temperature data from the page, and print it to the console.

These are just a few examples of the many real-world applications of web scraping using Python. Whether you're collecting data for research or business purposes, web scraping can help automate the process and save you valuable time and resources.

See you in the next post!

DEV Community

Web Scraping with python: Real world examples

Top comments (0)

Read next

Web APIs

crazy-complete: Generate shell auto-completion files with a single configuration file

Tired of AI Tech Writing? Here’s How to Make Your Posts More Human

Login/cadastro com firebase + Vue JS #PASSO 1️⃣