agenthustler

Posted on Mar 26 • Edited on Apr 19

How to Monitor Website Changes with Python in 2026: Price Drops, New Listings, and Updates

#webdev #python #tutorial #webscraping

Whether you're tracking price drops on a product, watching for new job postings, or monitoring competitor changes — automated website monitoring saves hours of manual checking. In this tutorial, I'll show you how to build a Python-based change detection system from scratch.

How Website Change Detection Works

The core algorithm is simple:

Fetch the current version of a page
Compare it to a stored baseline
Alert if meaningful changes are detected
Update the baseline

The challenge is in step 2 — filtering out noise (ads, timestamps, session tokens) to detect meaningful changes.

Building the Monitor: Step by Step

Step 1: Fetch and Clean the Page

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Stripping scripts, styles, and navigation ensures you're comparing content, not boilerplate.

Step 2: Detect Changes with difflib

Python's built-in difflib is perfect for generating human-readable diffs:

import difflib

def detect_changes(old_content: str, new_content: str) -> dict:
    if old_content == new_content:
        return {"changed": False}

    old_lines = old_content.splitlines()
    new_lines = new_content.splitlines()

    differ = difflib.unified_diff(
        old_lines, new_lines,
        fromfile="previous",
        tofile="current",
        lineterm=""
    )
    diff_text = "\n".join(differ)

    # Calculate similarity ratio
    ratio = difflib.SequenceMatcher(
        None, old_content, new_content
    ).ratio()

    return {
        "changed": True,
        "similarity": round(ratio * 100, 2),
        "diff": diff_text,
        "additions": sum(1 for l in diff_text.split("\n") if l.startswith("+")),
        "deletions": sum(1 for l in diff_text.split("\n") if l.startswith("-")),
    }

The similarity ratio helps you filter out minor changes (like a timestamp update at 99.8% similarity) from major ones (new product listing at 85% similarity).

Step 3: Store Baselines

Use a simple JSON file to track monitored pages:

import json
import os
from datetime import datetime

BASELINE_FILE = "baselines.json"

def load_baselines() -> dict:
    if os.path.exists(BASELINE_FILE):
        with open(BASELINE_FILE) as f:
            return json.load(f)
    return {}

def save_baseline(url: str, content: str, hash_val: str):
    baselines = load_baselines()
    baselines[url] = {
        "hash": hash_val,
        "content": content,
        "last_checked": datetime.now().isoformat(),
        "last_changed": datetime.now().isoformat()
    }
    with open(BASELINE_FILE, "w") as f:
        json.dump(baselines, f, indent=2)

For production use, swap this for SQLite or Redis — but JSON works fine for monitoring a handful of pages.

Step 4: Send Alerts

Email Alerts (via SMTP)

import smtplib
from email.mime.text import MIMEText

def send_email_alert(url: str, changes: dict):
    msg = MIMEText(
        f"Changes detected on {url}\n\n"
        f"Similarity: {changes['similarity']}%\n"
        f"Additions: {changes['additions']}\n"
        f"Deletions: {changes['deletions']}\n\n"
        f"Diff:\n{changes['diff'][:2000]}"
    )
    msg["Subject"] = f"Change detected: {url[:50]}"
    msg["From"] = "monitor@yourdomain.com"
    msg["To"] = "you@yourdomain.com"

    with smtplib.SMTP("smtp.yourdomain.com", 587) as server:
        server.starttls()
        server.login("monitor@yourdomain.com", "your-password")
        server.send_message(msg)

Slack Webhook Alerts

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Step 5: Put It All Together

def monitor(urls: list[str], threshold: float = 99.0):
    baselines = load_baselines()

    for url in urls:
        print(f"Checking {url}...")
        try:
            current = fetch_page(url)
            current_h = content_hash(current)

            if url not in baselines:
                print(f"  New URL - saving baseline")
                save_baseline(url, current, current_h)
                continue

            if current_h == baselines[url]["hash"]:
                print(f"  No changes (hash match)")
                continue

            changes = detect_changes(baselines[url]["content"], current)

            if changes["changed"] and changes["similarity"] < threshold:
                print(f"  CHANGED! Similarity: {changes['similarity']}%")
                send_email_alert(url, changes)

            save_baseline(url, current, current_h)

        except Exception as e:
            print(f"  Error: {e}")


if __name__ == "__main__":
    urls_to_monitor = [
        "https://example.com/products/widget",
        "https://example.com/jobs",
        "https://news.example.com/latest",
    ]
    monitor(urls_to_monitor, threshold=98.0)

Real-World Use Cases

1. Price Drop Monitoring

Track product prices and alert when they drop below your target:

import re

def extract_price(content: str) -> float | None:
    match = re.search(r'\$(\d+(?:\.\d{2})?)', content)
    return float(match.group(1)) if match else None

# In your monitor loop:
old_price = extract_price(baselines[url]["content"])
new_price = extract_price(current)
if new_price and old_price and new_price < old_price:
    print(f"Price drop! ${old_price} -> ${new_price}")

2. Job Posting Alerts

Monitor career pages for new positions matching your criteria:

def check_new_listings(old_content: str, new_content: str, keywords: list):
    old_lines = set(old_content.splitlines())
    new_lines = set(new_content.splitlines())
    additions = new_lines - old_lines

    matches = [
        line for line in additions
        if any(kw.lower() in line.lower() for kw in keywords)
    ]
    return matches

new_jobs = check_new_listings(old, new, ["Python", "Remote", "Senior"])

3. Product Restock Monitoring

Watch for "Out of Stock" to "In Stock" transitions — useful for limited drops and popular items.

4. News and Regulatory Alerts

Monitor government pages, regulatory bodies, or news sites for updates that affect your business.

Scaling with Proxies

When monitoring many pages, you'll need proxy rotation to avoid IP blocks. Services like ThorData provide residential proxies ideal for monitoring tasks — they look like real users, which reduces blocking. ScrapeOps adds a monitoring layer on top, so you can track which of your monitors are succeeding and which are getting blocked.

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scheduling: Run It Automatically

Option A: Cron Job (Linux/Mac)

# Check every 30 minutes
*/30 * * * * cd /path/to/monitor && python3 monitor.py >> monitor.log 2>&1

Option B: Cloud-Based Monitoring with Apify

For hands-off monitoring that runs in the cloud, Apify lets you schedule actors (cloud functions) that run on a cron schedule. No server management, built-in proxy rotation, and results are stored automatically. Check out ready-made monitoring actors or build your own.

Option C: GitHub Actions (Free Tier)

name: Monitor Websites
on:
  schedule:
    - cron: '0 */6 * * *'
jobs:
  monitor:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install requests beautifulsoup4
      - run: python monitor.py

Tips for Production Monitoring

Set appropriate thresholds: Not every change matters. A 99.5% similarity change is probably just a timestamp
Respect robots.txt: Check before monitoring. Don't hit sites more often than necessary
Add delays between checks: 2-5 seconds between requests prevents triggering rate limits
Log everything: When something breaks at 3 AM, you'll want to know why
Use CSS selectors for precision: Instead of monitoring entire pages, target specific elements (price div, job listing container)

Conclusion

Website change monitoring is one of those tools that, once you have it, you wonder how you lived without it. The Python implementation above handles 90% of use cases. For the other 10% — JavaScript-heavy sites, large-scale monitoring, or complex alerting — consider cloud platforms like Apify with built-in scheduling and proxy rotation.

The full code from this tutorial is modular enough to extend: swap in a database, add Telegram alerts, or integrate with your existing automation pipeline.

What would you monitor? Price drops, job postings, or something else? Share your use case in the comments!