Max Klein

Posted on Mar 2

Building a Price Tracker with Python and BeautifulSoup

#webscraping #python #automation #tutorial

Building a Price Tracker with Python and BeautifulSoup

In today’s fast-paced digital marketplace, staying ahead of price fluctuations is a game-changer. Whether you're a savvy shopper, an e-commerce entrepreneur, or a data analyst, the ability to track product prices in real time can save you money, optimize inventory, and uncover market trends. But manually checking prices across dozens of websites is tedious, error-prone, and time-consuming. What if you could automate this process with a few lines of code? That’s where Python and BeautifulSoup come into play. In this tutorial, we’ll walk you through building a price tracker that scrapes product prices from e-commerce sites, stores them securely, and even visualizes trends over time. By the end, you’ll have a powerful tool that works seamlessly in the background, freeing you up to focus on what matters most.

Prerequisites

Before diving into the code, ensure your environment is set up correctly. Here’s what you’ll need:

Software Requirements

Python 3.x installed on your machine (preferably 3.7 or newer).
A code editor or IDE (e.g., VS Code, PyCharm, or Jupyter Notebook).
A modern web browser (for inspecting HTML elements during scraping).

Python Libraries

Install the following Python packages using pip:

pip install beautifulsoup4 requests pandas matplotlib

BeautifulSoup: For parsing HTML and XML documents.
Requests: For sending HTTP requests to fetch web pages.
Pandas: For organizing and storing scraped data.
Matplotlib: For visualizing price trends (optional but recommended).

Web Scraping Ethics

Respect website terms of service and robots.txt files.
Avoid overloading servers with excessive requests (use rate limiting).
Use headers to mimic a real browser (see tips below).

Setting Up Your Project

Create a new directory for your project and initialize it with a requirements.txt file:

mkdir price-tracker
cd price-tracker
touch requirements.txt

Add the required libraries to requirements.txt:

beautifulsoup4
requests
pandas
matplotlib

Now, create a Python file (e.g., price_tracker.py) and import the necessary modules:

import requests
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
import time

This setup will form the backbone of your price tracker.

Step 1: Scraping Product Prices with BeautifulSoup

Let’s start by scraping a product page. We’ll use a fictional e-commerce site for this example, but the same principles apply to real websites like Amazon, eBay, or any HTML-based store.

Example: Scraping a Product Page

def fetch_product_price(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    try:
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
    except requests.RequestException as e:
        print(f"Error fetching {url}: {e}")
        return None

    soup = BeautifulSoup(response.text, "html.parser")

    # Example: Extract product name and price
    product_name = soup.find("h1", class_="product-title").text.strip()
    price_element = soup.find("span", class_="price")
    if price_element:
        price = price_element.text.strip()
        # Clean price by removing non-numeric characters
        cleaned_price = float(price.replace("$", "").replace(",", ""))
        return {"name": product_name, "price": cleaned_price}
    else:
        print("Price not found on the page.")
        return None

Key Notes

User-Agent Headers: Websites often block scrapers by detecting bot traffic. Use a browser’s User-Agent string to mimic a real user.
Error Handling: Always handle exceptions for network errors, timeouts, or malformed HTML.
CSS Selectors: Inspect the HTML of the target site (using browser dev tools) to identify the correct CSS classes or IDs for price and product name.

Step 2: Handling Dynamic Content (Advanced)

While BeautifulSoup is great for static HTML, many modern websites load content dynamically using JavaScript (e.g., React, Vue, or Angular). In such cases, BeautifulSoup alone won’t work. Here’s how to handle it:

Option 1: Use Selenium (For JavaScript-Heavy Sites)

Install Selenium:

pip install selenium

Then, use it to render JavaScript:

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

def fetch_dynamic_price(url):
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    driver.get(url)
    time.sleep(5)  # Wait for JavaScript to load
    price = driver.find_element("css selector", ".price").text
    driver.quit()
    return price

🚨 Warning: Selenium is slower and resource-heavy. Use it only when necessary.

Option 2: Use API Endpoints (If Available)

Some sites expose price data via REST APIs. Check the site’s network traffic in dev tools for API endpoints and use requests to fetch JSON data directly.

Step 3: Storing Data in a Database

Storing scraped data ensures you can track prices over time. We’ll use SQLite for simplicity, but you could also use PostgreSQL, MySQL, or even a CSV file.

Example: Saving to SQLite

import sqlite3

def save_to_database(product_data):
    conn = sqlite3.connect("prices.db")
    c = conn.cursor()

    # Create table if it doesn't exist
    c.execute('''
        CREATE TABLE IF NOT EXISTS product_prices (
            id INTEGER PRIMARY KEY AUTOINCREMENT,
            product_name TEXT,
            price REAL,
            timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
        )
    ''')

    # Insert data
    c.execute('''
        INSERT INTO product_prices (product_name, price)
        VALUES (?, ?)
    ''', (product_data["name"], product_data["price"]))

    conn.commit()
    conn.close()

Alternative: Using Pandas

You can also store data in CSV or JSON format:

def save_to_csv(product_data):
    df = pd.DataFrame([product_data])
    df.to_csv("product_prices.csv", mode="a", header=not pd.io.common.file_exists("product_prices.csv"), index=False)

Step 4: Automating the Price Tracker

To keep your tracker running continuously, use a scheduler or cron job.

Example: Using APScheduler

Install APScheduler:

pip install apscheduler

Then, set up a recurring job:

from apscheduler.schedulers.blocking import BlockingScheduler

def job():
    product_url = "https://example.com/product/123"
    product_data = fetch_product_price(product_url)
    if product_data:
        save_to_database(product_data)
        print("Data saved successfully.")

if __name__ == "__main__":
    scheduler = BlockingScheduler()
    scheduler.add_job(job, 'interval', hours=1)  # Run every hour
    try:
        scheduler.start()
    except KeyboardInterrupt:
        print("Scheduler stopped.")

Tips

Rate Limiting: Add a time.sleep() delay between requests to avoid being blocked.
Logging: Use Python’s logging module to track errors and success messages.
Configuration: Store URLs and database settings in a config file or environment variables.

Step 5: Visualizing Price Trends

Now that we’re collecting data, let’s visualize it. We’ll use Matplotlib to plot price trends over time.

Example: Plotting Historical Prices

def plot_price_history(product_name):
    conn = sqlite3.connect("prices.db")
    df = pd.read_sql_query("SELECT * FROM product_prices WHERE product_name = ?", conn, params=(product_name,))
    conn.close()

    if not df.empty:
        plt.figure(figsize=(10, 5))
        plt.plot(df["timestamp"], df["price"], marker="o")
        plt.title(f"Price Trend for {product_name}")
        plt.xlabel("Date")
        plt.ylabel("Price ($)")
        plt.grid(True)
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
    else:
        print(f"No data found for {product_name}.")

Enhancements

Use Pandas to aggregate data by day or week.
Add interactivity with Plotly for web-based dashboards.
Export charts as PNG or PDF for reports.

Conclusion

You’ve now built a fully functional price tracker using Python and BeautifulSoup! This tool can monitor product prices in real time, store historical data, and even visualize trends. Whether you’re tracking competitors, optimizing your pricing strategy, or just saving money on your next purchase, this script is a powerful ally.

Next Steps

Ready to take your price tracker to the next level? Here are some ideas:

Add Email Alerts: Use SMTP or services like SendGrid to notify you when prices drop.
Deploy as a Web App: Use Flask or Django to create a dashboard for viewing price trends.
Scrape Multiple Products: Loop through a list of URLs and store data in a structured format.
Integrate with APIs: Use Stripe or Shopify APIs to sync data with your e-commerce platform.
Use Cloud Storage: Store data in AWS S3, Google Cloud, or MongoDB for scalability.

As you explore these enhancements, always remember to respect website policies and avoid unethical scraping practices. Happy coding!

🚀 Need professional web scraping? N3X1S INTELLIGENCE on Fiverr — fast, reliable data extraction from any website.

DEV Community

Building a Price Tracker with Python and BeautifulSoup

Building a Price Tracker with Python and BeautifulSoup

Prerequisites

Software Requirements

Python Libraries

Web Scraping Ethics

Setting Up Your Project

Step 1: Scraping Product Prices with BeautifulSoup

Example: Scraping a Product Page

Key Notes

Step 2: Handling Dynamic Content (Advanced)

Option 1: Use Selenium (For JavaScript-Heavy Sites)

Option 2: Use API Endpoints (If Available)

Step 3: Storing Data in a Database

Example: Saving to SQLite

Alternative: Using Pandas

Step 4: Automating the Price Tracker

Example: Using APScheduler

Tips

Step 5: Visualizing Price Trends

Example: Plotting Historical Prices

Enhancements

Conclusion

Next Steps

Top comments (0)