agenthustler

Posted on Mar 26 • Edited on Apr 19

Build a Price Comparison Tool with Python Web Scraping

#webscraping #webdev #python #tutorial

Why Build a Price Comparison Tool?

Price comparison tools are one of the most practical web scraping projects you can build. Whether it's for personal use (finding the best deals) or a business application (competitive pricing intelligence), the fundamentals are the same: scrape prices from multiple sources, normalize the data, and present the results.

In this tutorial, I'll walk you through building a multi-site price scraper from scratch.

Architecture Overview

Our price comparison tool has four components:

Scrapers — Site-specific modules that extract product data
Normalizer — Cleans and standardizes data across sources
Storage — SQLite database for price history
Reporter — Generates comparison output

┌─────────────┐     ┌────────────┐     ┌──────────┐     ┌──────────┐
│  Scraper A  │────▶│            │────▶│          │────▶│          │
│  Scraper B  │────▶│ Normalizer │────▶│  SQLite  │────▶│ Reporter │
│  Scraper C  │────▶│            │────▶│          │────▶│          │
└─────────────┘     └────────────┘     └──────────┘     └──────────┘

Setting Up

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

The Base Scraper Class

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Building Site-Specific Scrapers

Amazon Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

eBay Scraper

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Price Database

import sqlite3
from datetime import datetime

class PriceDatabase:
    def __init__(self, db_path='prices.db'):
        self.conn = sqlite3.connect(db_path)
        self.create_tables()

    def create_tables(self):
        self.conn.execute('''
            CREATE TABLE IF NOT EXISTS prices (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                product_id TEXT,
                name TEXT,
                price REAL,
                currency TEXT,
                source TEXT,
                url TEXT,
                scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
            )
        ''')
        self.conn.commit()

    def save_product(self, product: Product):
        self.conn.execute(
            'INSERT INTO prices (product_id, name, price, currency, source, url) VALUES (?, ?, ?, ?, ?, ?)',
            (product.product_id, product.name, product.price, product.currency, product.source, product.url)
        )
        self.conn.commit()

    def get_price_history(self, product_id):
        cursor = self.conn.execute(
            'SELECT price, scraped_at FROM prices WHERE product_id = ? ORDER BY scraped_at',
            (product_id,)
        )
        return cursor.fetchall()

    def get_best_prices(self, name_query):
        cursor = self.conn.execute('''
            SELECT name, price, source, url, scraped_at
            FROM prices
            WHERE name LIKE ?
            ORDER BY price ASC
            LIMIT 20
        ''', (f'%{name_query}%',))
        return cursor.fetchall()

The Comparison Engine

import pandas as pd

class PriceComparer:
    def __init__(self):
        self.scrapers = [
            AmazonScraper(),
            EbayScraper(),
        ]
        self.db = PriceDatabase()

    def compare(self, query: str) -> pd.DataFrame:
        all_products = []

        for scraper in self.scrapers:
            try:
                products = scraper.search(query)
                for p in products:
                    self.db.save_product(p)
                all_products.extend(products)
                print(f'{scraper.__class__.__name__}: found {len(products)} results')
            except Exception as e:
                print(f'{scraper.__class__.__name__} failed: {e}')
            time.sleep(random.uniform(1, 3))

        if not all_products:
            return pd.DataFrame()

        df = pd.DataFrame([
            {'name': p.name, 'price': p.price, 'source': p.source, 
             'url': p.url, 'in_stock': p.in_stock}
            for p in all_products
        ])

        return df.sort_values('price').head(20)

# Run comparison
comparer = PriceComparer()
results = comparer.compare('wireless mouse')
print(results[['name', 'price', 'source']].to_string())

Scaling with Proxies

When scraping multiple sites, you'll quickly hit rate limits. Using a proxy service like ThorData with rotating residential IPs solves this:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scheduling Price Checks

import schedule

def daily_check():
    comparer = PriceComparer()
    products = ['wireless mouse', 'mechanical keyboard', 'usb-c hub']

    for query in products:
        results = comparer.compare(query)
        if not results.empty:
            best = results.iloc[0]
            print(f'Best {query}: ${best["price"]:.2f} at {best["source"]}')
        time.sleep(5)

schedule.every().day.at('09:00').do(daily_check)

while True:
    schedule.run_pending()
    time.sleep(60)

Conclusion

A price comparison tool is a great way to learn multi-site scraping. The key challenges are normalizing data across sources and handling anti-bot measures. For serious scraping workloads, a reliable proxy service like ThorData keeps your scrapers running smoothly across all target sites.

The full code from this tutorial gives you a foundation to build on — add more retailers, implement price alerts, or build a web dashboard.

DEV Community