agenthustler

Posted on Mar 26 • Edited on Apr 19

How to Build a Price Monitoring System with Python in 2026 (Complete Guide)

#python #webdev #tutorial #webscraping

Price monitoring is one of the most practical applications of web scraping. Whether you are tracking competitor prices, finding deals, or building a repricing tool for your e-commerce business — the architecture is the same. This guide walks you through building a complete system from scratch.

System Architecture

+---------------+     +----------------+     +---------------+
|   Scheduler   |---->|  Scraper(s)    |---->|  Database      |
|  (cron/APSch) |     |  + Proxies     |     |  (SQLite/      |
+---------------+     +----------------+     |   Postgres)    |
                             |               +-------+--------+
                             |                       |
                      +------v------+         +------v--------+
                      |  Anti-Bot   |         |  Alerting     |
                      |  Bypass     |         |  Engine       |
                      |  (proxies,  |         |  (email,      |
                      |   headers)  |         |   Telegram)   |
                      +-------------+         +---------------+

Data Flow:
1. Scheduler triggers scraping jobs (hourly/daily)
2. Scraper fetches product pages through proxy layer
3. Extracted prices stored with timestamps
4. Alert engine compares prices and notifies on drops

Step 1: Set Up the Database

We will use SQLite for simplicity. Switch to PostgreSQL for production.

import sqlite3
from datetime import datetime

def init_db(db_path="prices.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS products (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            url TEXT UNIQUE NOT NULL,
            store TEXT NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS prices (
            id INTEGER PRIMARY KEY,
            product_id INTEGER REFERENCES products(id),
            price REAL NOT NULL,
            currency TEXT DEFAULT 'USD',
            in_stock BOOLEAN DEFAULT TRUE,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_prices_product_date
        ON prices(product_id, scraped_at)
    """)
    conn.commit()
    return conn

def add_product(conn, name, url, store):
    conn.execute(
        "INSERT OR IGNORE INTO products (name, url, store) VALUES (?, ?, ?)",
        (name, url, store)
    )
    conn.commit()

def record_price(conn, product_id, price, currency="USD", in_stock=True):
    conn.execute(
        "INSERT INTO prices (product_id, price, currency, in_stock) VALUES (?, ?, ?, ?)",
        (product_id, price, currency, in_stock)
    )
    conn.commit()

Step 2: Build the Scraper

Here is a scraper that handles multiple e-commerce sites. The key insight: each store needs its own parser because HTML structures differ.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Step 3: Handle Anti-Bot Protection

This is where most price monitoring systems fail. Major e-commerce sites actively block scrapers. Here is your toolkit:

Proxy Rotation

Residential proxies are essential for e-commerce scraping. Datacenter IPs get blocked within minutes on Amazon and Walmart.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

ThorData offers residential proxy pools that rotate IPs automatically — critical for sustained price monitoring without getting blocked.

Request Spacing and Headers

import time
import random

def scrape_with_backoff(scraper, urls, min_delay=2, max_delay=5):
    results = []
    for url in urls:
        try:
            data = scraper.scrape(url)
            results.append({"url": url, "data": data, "error": None})
        except Exception as e:
            results.append({"url": url, "data": None, "error": str(e)})
        # Random delay to mimic human behavior
        time.sleep(random.uniform(min_delay, max_delay))
    return results

Browser Rendering Fallback

Some sites require JavaScript execution. Use Playwright as a fallback:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Step 4: Price Alert Engine

def check_price_drops(conn, threshold_pct=5):
    """Find products where price dropped more than threshold."""
    cursor = conn.execute("""
        WITH latest AS (
            SELECT product_id, price, scraped_at,
                   ROW_NUMBER() OVER (
                       PARTITION BY product_id ORDER BY scraped_at DESC
                   ) as rn
            FROM prices
            WHERE scraped_at > datetime('now', '-48 hours')
        )
        SELECT p.name, p.url, prev.price as old_price,
               curr.price as new_price,
               ROUND((prev.price - curr.price) / prev.price * 100, 1) as drop_pct
        FROM latest curr
        JOIN latest prev ON curr.product_id = prev.product_id
        JOIN products p ON curr.product_id = p.id
        WHERE curr.rn = 1 AND prev.rn = 2
          AND prev.price > curr.price
          AND (prev.price - curr.price) / prev.price * 100 > ?
    """, (threshold_pct,))
    return cursor.fetchall()

def send_alert(product_name, url, old_price, new_price, drop_pct):
    """Send price drop alert via email."""
    import smtplib
    from email.mime.text import MIMEText

    body = (
        f"Price drop alert!\n\n"
        f"{product_name}\n"
        f"${old_price:.2f} -> ${new_price:.2f} ({drop_pct}% off)\n"
        f"{url}"
    )
    msg = MIMEText(body)
    msg["Subject"] = f"Price Drop: {product_name} (-{drop_pct}%)"
    msg["From"] = "alerts@yourdomain.com"
    msg["To"] = "you@email.com"

    with smtplib.SMTP("smtp.gmail.com", 587) as server:
        server.starttls()
        server.login("alerts@yourdomain.com", "app-password")
        server.send_message(msg)

Step 5: Schedule Everything

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scaling Up: Use Pre-Built Actors

Building scrapers for every site is time-consuming. For production use, consider pre-built solutions:

eBay Scraper — Extract prices, listings, and seller data from eBay
Walmart Scraper — Monitor Walmart prices and inventory
AliExpress Scraper — Track AliExpress product prices and reviews

These run on Apify cloud infrastructure, so you do not need to manage proxies, browsers, or servers.

Production Checklist

Before deploying your price monitor to production:

[ ] Proxy rotation — Use residential proxies (ThorData is solid) to avoid IP bans
[ ] Error handling — Retry failed requests with exponential backoff
[ ] Data validation — Reject prices that are 0, negative, or >10x the last known price
[ ] Rate limiting — Space requests 2-5 seconds apart per domain
[ ] Monitoring — Alert yourself when scraping success rate drops below 90%
[ ] Database maintenance — Archive old price records after 90 days
[ ] Legal compliance — Only scrape public pricing data, respect robots.txt

Key Takeaways

Start simple — SQLite + requests + cron gets you 80% of the way there
Proxies are not optional — You will get blocked on major e-commerce sites without them
Store raw data — Keep price history. The trends are more valuable than any single price point
Alert on anomalies, not every change — A 1% price fluctuation is not worth an email. Set meaningful thresholds
Use pre-built scrapers for popular sites — Do not reinvent parsing logic that already exists

Need production-ready scrapers? Check out the e-commerce scraping actors on Apify — eBay, Walmart, AliExpress and more, ready to run.

DEV Community