Price monitoring is one of the most practical applications of web scraping. Whether you are tracking competitor prices, finding deals, or building a repricing tool for your e-commerce business — the architecture is the same. This guide walks you through building a complete system from scratch.
System Architecture
+---------------+ +----------------+ +---------------+
| Scheduler |---->| Scraper(s) |---->| Database |
| (cron/APSch) | | + Proxies | | (SQLite/ |
+---------------+ +----------------+ | Postgres) |
| +-------+--------+
| |
+------v------+ +------v--------+
| Anti-Bot | | Alerting |
| Bypass | | Engine |
| (proxies, | | (email, |
| headers) | | Telegram) |
+-------------+ +---------------+
Data Flow:
1. Scheduler triggers scraping jobs (hourly/daily)
2. Scraper fetches product pages through proxy layer
3. Extracted prices stored with timestamps
4. Alert engine compares prices and notifies on drops
Step 1: Set Up the Database
We will use SQLite for simplicity. Switch to PostgreSQL for production.
import sqlite3
from datetime import datetime
def init_db(db_path="prices.db"):
conn = sqlite3.connect(db_path)
conn.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
url TEXT UNIQUE NOT NULL,
store TEXT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE TABLE IF NOT EXISTS prices (
id INTEGER PRIMARY KEY,
product_id INTEGER REFERENCES products(id),
price REAL NOT NULL,
currency TEXT DEFAULT 'USD',
in_stock BOOLEAN DEFAULT TRUE,
scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute("""
CREATE INDEX IF NOT EXISTS idx_prices_product_date
ON prices(product_id, scraped_at)
""")
conn.commit()
return conn
def add_product(conn, name, url, store):
conn.execute(
"INSERT OR IGNORE INTO products (name, url, store) VALUES (?, ?, ?)",
(name, url, store)
)
conn.commit()
def record_price(conn, product_id, price, currency="USD", in_stock=True):
conn.execute(
"INSERT INTO prices (product_id, price, currency, in_stock) VALUES (?, ?, ?, ?)",
(product_id, price, currency, in_stock)
)
conn.commit()
Step 2: Build the Scraper
Here is a scraper that handles multiple e-commerce sites. The key insight: each store needs its own parser because HTML structures differ.
import requests
from bs4 import BeautifulSoup
import re
class PriceScraper:
def __init__(self, proxy_url=None):
self.session = requests.Session()
self.session.headers.update({
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
),
"Accept-Language": "en-US,en;q=0.9",
})
if proxy_url:
self.session.proxies = {"http": proxy_url, "https": proxy_url}
def scrape(self, url):
resp = self.session.get(url, timeout=20)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
if "amazon." in url:
return self._parse_amazon(soup)
elif "ebay." in url:
return self._parse_ebay(soup)
elif "walmart." in url:
return self._parse_walmart(soup)
else:
raise ValueError(f"Unsupported store: {url}")
def _parse_amazon(self, soup):
price_el = (
soup.select_one("span.a-price .a-offscreen") or
soup.select_one("#priceblock_ourprice") or
soup.select_one("#priceblock_dealprice")
)
price = self._extract_number(price_el.text) if price_el else None
title_el = soup.select_one("#productTitle")
return {
"price": price,
"title": title_el.text.strip() if title_el else None,
"in_stock": "in stock" in soup.text.lower(),
"currency": "USD"
}
def _parse_ebay(self, soup):
price_el = soup.select_one("div.x-price-primary span.ux-textspans")
title_el = soup.select_one("h1.x-item-title__mainTitle span")
price = self._extract_number(price_el.text) if price_el else None
return {
"price": price,
"title": title_el.text.strip() if title_el else None,
"in_stock": True,
"currency": "USD"
}
def _parse_walmart(self, soup):
price_el = soup.select_one('[itemprop="price"]')
title_el = soup.select_one("h1")
price = None
if price_el:
content = price_el.get("content")
if content:
price = float(content)
elif price_el.text:
price = self._extract_number(price_el.text)
return {
"price": price,
"title": title_el.text.strip() if title_el else None,
"in_stock": "add to cart" in soup.text.lower(),
"currency": "USD"
}
@staticmethod
def _extract_number(text):
cleaned = text.replace(",", "")
match = re.search(r"[\d]+\.?\d*", cleaned)
return float(match.group()) if match else None
Step 3: Handle Anti-Bot Protection
This is where most price monitoring systems fail. Major e-commerce sites actively block scrapers. Here is your toolkit:
Proxy Rotation
Residential proxies are essential for e-commerce scraping. Datacenter IPs get blocked within minutes on Amazon and Walmart.
# Using ThorData residential proxies
# Sign up: https://affiliate.thordata.com/0a0x4nzu7tvv
PROXY_URL = "http://user:pass@proxy.thordata.com:9000"
scraper = PriceScraper(proxy_url=PROXY_URL)
ThorData offers residential proxy pools that rotate IPs automatically — critical for sustained price monitoring without getting blocked.
Request Spacing and Headers
import time
import random
def scrape_with_backoff(scraper, urls, min_delay=2, max_delay=5):
results = []
for url in urls:
try:
data = scraper.scrape(url)
results.append({"url": url, "data": data, "error": None})
except Exception as e:
results.append({"url": url, "data": None, "error": str(e)})
# Random delay to mimic human behavior
time.sleep(random.uniform(min_delay, max_delay))
return results
Browser Rendering Fallback
Some sites require JavaScript execution. Use Playwright as a fallback:
from playwright.sync_api import sync_playwright
def scrape_with_browser(url, proxy=None):
with sync_playwright() as p:
browser_args = {}
if proxy:
browser_args["proxy"] = {"server": proxy}
browser = p.chromium.launch(headless=True, **browser_args)
page = browser.new_page()
page.goto(url, wait_until="networkidle")
page.wait_for_selector("[data-price], .price", timeout=10000)
content = page.content()
browser.close()
return BeautifulSoup(content, "html.parser")
Step 4: Price Alert Engine
def check_price_drops(conn, threshold_pct=5):
"""Find products where price dropped more than threshold."""
cursor = conn.execute("""
WITH latest AS (
SELECT product_id, price, scraped_at,
ROW_NUMBER() OVER (
PARTITION BY product_id ORDER BY scraped_at DESC
) as rn
FROM prices
WHERE scraped_at > datetime('now', '-48 hours')
)
SELECT p.name, p.url, prev.price as old_price,
curr.price as new_price,
ROUND((prev.price - curr.price) / prev.price * 100, 1) as drop_pct
FROM latest curr
JOIN latest prev ON curr.product_id = prev.product_id
JOIN products p ON curr.product_id = p.id
WHERE curr.rn = 1 AND prev.rn = 2
AND prev.price > curr.price
AND (prev.price - curr.price) / prev.price * 100 > ?
""", (threshold_pct,))
return cursor.fetchall()
def send_alert(product_name, url, old_price, new_price, drop_pct):
"""Send price drop alert via email."""
import smtplib
from email.mime.text import MIMEText
body = (
f"Price drop alert!\n\n"
f"{product_name}\n"
f"${old_price:.2f} -> ${new_price:.2f} ({drop_pct}% off)\n"
f"{url}"
)
msg = MIMEText(body)
msg["Subject"] = f"Price Drop: {product_name} (-{drop_pct}%)"
msg["From"] = "alerts@yourdomain.com"
msg["To"] = "you@email.com"
with smtplib.SMTP("smtp.gmail.com", 587) as server:
server.starttls()
server.login("alerts@yourdomain.com", "app-password")
server.send_message(msg)
Step 5: Schedule Everything
import time
import random
from apscheduler.schedulers.blocking import BlockingScheduler
def run_monitoring_job():
conn = init_db()
scraper = PriceScraper(proxy_url=PROXY_URL)
products = conn.execute("SELECT id, url FROM products").fetchall()
for product_id, url in products:
try:
data = scraper.scrape(url)
if data["price"]:
record_price(conn, product_id, data["price"],
data["currency"], data["in_stock"])
except Exception as e:
print(f"Error scraping {url}: {e}")
time.sleep(random.uniform(2, 5))
drops = check_price_drops(conn, threshold_pct=5)
for name, url, old_p, new_p, pct in drops:
send_alert(name, url, old_p, new_p, pct)
print(f"ALERT: {name} dropped {pct}%")
conn.close()
scheduler = BlockingScheduler()
scheduler.add_job(run_monitoring_job, "interval", hours=6)
scheduler.start()
Scaling Up: Use Pre-Built Actors
Building scrapers for every site is time-consuming. For production use, consider pre-built solutions:
- eBay Scraper — Extract prices, listings, and seller data from eBay
- Walmart Scraper — Monitor Walmart prices and inventory
- AliExpress Scraper — Track AliExpress product prices and reviews
These run on Apify cloud infrastructure, so you do not need to manage proxies, browsers, or servers.
Production Checklist
Before deploying your price monitor to production:
- [ ] Proxy rotation — Use residential proxies (ThorData is solid) to avoid IP bans
- [ ] Error handling — Retry failed requests with exponential backoff
- [ ] Data validation — Reject prices that are 0, negative, or >10x the last known price
- [ ] Rate limiting — Space requests 2-5 seconds apart per domain
- [ ] Monitoring — Alert yourself when scraping success rate drops below 90%
- [ ] Database maintenance — Archive old price records after 90 days
- [ ] Legal compliance — Only scrape public pricing data, respect robots.txt
Key Takeaways
- Start simple — SQLite + requests + cron gets you 80% of the way there
- Proxies are not optional — You will get blocked on major e-commerce sites without them
- Store raw data — Keep price history. The trends are more valuable than any single price point
- Alert on anomalies, not every change — A 1% price fluctuation is not worth an email. Set meaningful thresholds
- Use pre-built scrapers for popular sites — Do not reinvent parsing logic that already exists
Need production-ready scrapers? Check out the e-commerce scraping actors on Apify — eBay, Walmart, AliExpress and more, ready to run.
Top comments (0)