DEV Community

agenthustler
agenthustler

Posted on

How to Build a Price Monitoring System with Python in 2026 (Complete Guide)

Price monitoring is one of the most practical applications of web scraping. Whether you are tracking competitor prices, finding deals, or building a repricing tool for your e-commerce business — the architecture is the same. This guide walks you through building a complete system from scratch.

System Architecture

+---------------+     +----------------+     +---------------+
|   Scheduler   |---->|  Scraper(s)    |---->|  Database      |
|  (cron/APSch) |     |  + Proxies     |     |  (SQLite/      |
+---------------+     +----------------+     |   Postgres)    |
                             |               +-------+--------+
                             |                       |
                      +------v------+         +------v--------+
                      |  Anti-Bot   |         |  Alerting     |
                      |  Bypass     |         |  Engine       |
                      |  (proxies,  |         |  (email,      |
                      |   headers)  |         |   Telegram)   |
                      +-------------+         +---------------+

Data Flow:
1. Scheduler triggers scraping jobs (hourly/daily)
2. Scraper fetches product pages through proxy layer
3. Extracted prices stored with timestamps
4. Alert engine compares prices and notifies on drops
Enter fullscreen mode Exit fullscreen mode

Step 1: Set Up the Database

We will use SQLite for simplicity. Switch to PostgreSQL for production.

import sqlite3
from datetime import datetime

def init_db(db_path="prices.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS products (
            id INTEGER PRIMARY KEY,
            name TEXT NOT NULL,
            url TEXT UNIQUE NOT NULL,
            store TEXT NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS prices (
            id INTEGER PRIMARY KEY,
            product_id INTEGER REFERENCES products(id),
            price REAL NOT NULL,
            currency TEXT DEFAULT 'USD',
            in_stock BOOLEAN DEFAULT TRUE,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    conn.execute("""
        CREATE INDEX IF NOT EXISTS idx_prices_product_date
        ON prices(product_id, scraped_at)
    """)
    conn.commit()
    return conn

def add_product(conn, name, url, store):
    conn.execute(
        "INSERT OR IGNORE INTO products (name, url, store) VALUES (?, ?, ?)",
        (name, url, store)
    )
    conn.commit()

def record_price(conn, product_id, price, currency="USD", in_stock=True):
    conn.execute(
        "INSERT INTO prices (product_id, price, currency, in_stock) VALUES (?, ?, ?, ?)",
        (product_id, price, currency, in_stock)
    )
    conn.commit()
Enter fullscreen mode Exit fullscreen mode

Step 2: Build the Scraper

Here is a scraper that handles multiple e-commerce sites. The key insight: each store needs its own parser because HTML structures differ.

import requests
from bs4 import BeautifulSoup
import re

class PriceScraper:
    def __init__(self, proxy_url=None):
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": (
                "Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                "AppleWebKit/537.36 (KHTML, like Gecko) "
                "Chrome/131.0.0.0 Safari/537.36"
            ),
            "Accept-Language": "en-US,en;q=0.9",
        })
        if proxy_url:
            self.session.proxies = {"http": proxy_url, "https": proxy_url}

    def scrape(self, url):
        resp = self.session.get(url, timeout=20)
        resp.raise_for_status()
        soup = BeautifulSoup(resp.text, "html.parser")

        if "amazon." in url:
            return self._parse_amazon(soup)
        elif "ebay." in url:
            return self._parse_ebay(soup)
        elif "walmart." in url:
            return self._parse_walmart(soup)
        else:
            raise ValueError(f"Unsupported store: {url}")

    def _parse_amazon(self, soup):
        price_el = (
            soup.select_one("span.a-price .a-offscreen") or
            soup.select_one("#priceblock_ourprice") or
            soup.select_one("#priceblock_dealprice")
        )
        price = self._extract_number(price_el.text) if price_el else None
        title_el = soup.select_one("#productTitle")
        return {
            "price": price,
            "title": title_el.text.strip() if title_el else None,
            "in_stock": "in stock" in soup.text.lower(),
            "currency": "USD"
        }

    def _parse_ebay(self, soup):
        price_el = soup.select_one("div.x-price-primary span.ux-textspans")
        title_el = soup.select_one("h1.x-item-title__mainTitle span")
        price = self._extract_number(price_el.text) if price_el else None
        return {
            "price": price,
            "title": title_el.text.strip() if title_el else None,
            "in_stock": True,
            "currency": "USD"
        }

    def _parse_walmart(self, soup):
        price_el = soup.select_one('[itemprop="price"]')
        title_el = soup.select_one("h1")
        price = None
        if price_el:
            content = price_el.get("content")
            if content:
                price = float(content)
            elif price_el.text:
                price = self._extract_number(price_el.text)
        return {
            "price": price,
            "title": title_el.text.strip() if title_el else None,
            "in_stock": "add to cart" in soup.text.lower(),
            "currency": "USD"
        }

    @staticmethod
    def _extract_number(text):
        cleaned = text.replace(",", "")
        match = re.search(r"[\d]+\.?\d*", cleaned)
        return float(match.group()) if match else None
Enter fullscreen mode Exit fullscreen mode

Step 3: Handle Anti-Bot Protection

This is where most price monitoring systems fail. Major e-commerce sites actively block scrapers. Here is your toolkit:

Proxy Rotation

Residential proxies are essential for e-commerce scraping. Datacenter IPs get blocked within minutes on Amazon and Walmart.

# Using ThorData residential proxies
# Sign up: https://affiliate.thordata.com/0a0x4nzu7tvv

PROXY_URL = "http://user:pass@proxy.thordata.com:9000"

scraper = PriceScraper(proxy_url=PROXY_URL)
Enter fullscreen mode Exit fullscreen mode

ThorData offers residential proxy pools that rotate IPs automatically — critical for sustained price monitoring without getting blocked.

Request Spacing and Headers

import time
import random

def scrape_with_backoff(scraper, urls, min_delay=2, max_delay=5):
    results = []
    for url in urls:
        try:
            data = scraper.scrape(url)
            results.append({"url": url, "data": data, "error": None})
        except Exception as e:
            results.append({"url": url, "data": None, "error": str(e)})
        # Random delay to mimic human behavior
        time.sleep(random.uniform(min_delay, max_delay))
    return results
Enter fullscreen mode Exit fullscreen mode

Browser Rendering Fallback

Some sites require JavaScript execution. Use Playwright as a fallback:

from playwright.sync_api import sync_playwright

def scrape_with_browser(url, proxy=None):
    with sync_playwright() as p:
        browser_args = {}
        if proxy:
            browser_args["proxy"] = {"server": proxy}
        browser = p.chromium.launch(headless=True, **browser_args)
        page = browser.new_page()
        page.goto(url, wait_until="networkidle")
        page.wait_for_selector("[data-price], .price", timeout=10000)
        content = page.content()
        browser.close()
    return BeautifulSoup(content, "html.parser")
Enter fullscreen mode Exit fullscreen mode

Step 4: Price Alert Engine

def check_price_drops(conn, threshold_pct=5):
    """Find products where price dropped more than threshold."""
    cursor = conn.execute("""
        WITH latest AS (
            SELECT product_id, price, scraped_at,
                   ROW_NUMBER() OVER (
                       PARTITION BY product_id ORDER BY scraped_at DESC
                   ) as rn
            FROM prices
            WHERE scraped_at > datetime('now', '-48 hours')
        )
        SELECT p.name, p.url, prev.price as old_price,
               curr.price as new_price,
               ROUND((prev.price - curr.price) / prev.price * 100, 1) as drop_pct
        FROM latest curr
        JOIN latest prev ON curr.product_id = prev.product_id
        JOIN products p ON curr.product_id = p.id
        WHERE curr.rn = 1 AND prev.rn = 2
          AND prev.price > curr.price
          AND (prev.price - curr.price) / prev.price * 100 > ?
    """, (threshold_pct,))
    return cursor.fetchall()

def send_alert(product_name, url, old_price, new_price, drop_pct):
    """Send price drop alert via email."""
    import smtplib
    from email.mime.text import MIMEText

    body = (
        f"Price drop alert!\n\n"
        f"{product_name}\n"
        f"${old_price:.2f} -> ${new_price:.2f} ({drop_pct}% off)\n"
        f"{url}"
    )
    msg = MIMEText(body)
    msg["Subject"] = f"Price Drop: {product_name} (-{drop_pct}%)"
    msg["From"] = "alerts@yourdomain.com"
    msg["To"] = "you@email.com"

    with smtplib.SMTP("smtp.gmail.com", 587) as server:
        server.starttls()
        server.login("alerts@yourdomain.com", "app-password")
        server.send_message(msg)
Enter fullscreen mode Exit fullscreen mode

Step 5: Schedule Everything

import time
import random
from apscheduler.schedulers.blocking import BlockingScheduler

def run_monitoring_job():
    conn = init_db()
    scraper = PriceScraper(proxy_url=PROXY_URL)
    products = conn.execute("SELECT id, url FROM products").fetchall()

    for product_id, url in products:
        try:
            data = scraper.scrape(url)
            if data["price"]:
                record_price(conn, product_id, data["price"],
                           data["currency"], data["in_stock"])
        except Exception as e:
            print(f"Error scraping {url}: {e}")
        time.sleep(random.uniform(2, 5))

    drops = check_price_drops(conn, threshold_pct=5)
    for name, url, old_p, new_p, pct in drops:
        send_alert(name, url, old_p, new_p, pct)
        print(f"ALERT: {name} dropped {pct}%")
    conn.close()

scheduler = BlockingScheduler()
scheduler.add_job(run_monitoring_job, "interval", hours=6)
scheduler.start()
Enter fullscreen mode Exit fullscreen mode

Scaling Up: Use Pre-Built Actors

Building scrapers for every site is time-consuming. For production use, consider pre-built solutions:

These run on Apify cloud infrastructure, so you do not need to manage proxies, browsers, or servers.

Production Checklist

Before deploying your price monitor to production:

  • [ ] Proxy rotation — Use residential proxies (ThorData is solid) to avoid IP bans
  • [ ] Error handling — Retry failed requests with exponential backoff
  • [ ] Data validation — Reject prices that are 0, negative, or >10x the last known price
  • [ ] Rate limiting — Space requests 2-5 seconds apart per domain
  • [ ] Monitoring — Alert yourself when scraping success rate drops below 90%
  • [ ] Database maintenance — Archive old price records after 90 days
  • [ ] Legal compliance — Only scrape public pricing data, respect robots.txt

Key Takeaways

  1. Start simple — SQLite + requests + cron gets you 80% of the way there
  2. Proxies are not optional — You will get blocked on major e-commerce sites without them
  3. Store raw data — Keep price history. The trends are more valuable than any single price point
  4. Alert on anomalies, not every change — A 1% price fluctuation is not worth an email. Set meaningful thresholds
  5. Use pre-built scrapers for popular sites — Do not reinvent parsing logic that already exists

Need production-ready scrapers? Check out the e-commerce scraping actors on Apify — eBay, Walmart, AliExpress and more, ready to run.

Top comments (0)