Whether you're tracking price drops on a product, watching for new job postings, or monitoring competitor changes — automated website monitoring saves hours of manual checking. In this tutorial, I'll show you how to build a Python-based change detection system from scratch.
How Website Change Detection Works
The core algorithm is simple:
- Fetch the current version of a page
- Compare it to a stored baseline
- Alert if meaningful changes are detected
- Update the baseline
The challenge is in step 2 — filtering out noise (ads, timestamps, session tokens) to detect meaningful changes.
Building the Monitor: Step by Step
Step 1: Fetch and Clean the Page
import requests
from bs4 import BeautifulSoup
import hashlib
def fetch_page(url: str) -> str:
headers = {
"User-Agent": "Mozilla/5.0 (compatible; ChangeMonitor/1.0)"
}
response = requests.get(url, headers=headers, timeout=30)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
# Remove noise elements
for tag in soup.select("script, style, nav, footer, .ads, #cookie-banner"):
tag.decompose()
# Extract text content
text = soup.get_text(separator="\n", strip=True)
return text
def content_hash(text: str) -> str:
return hashlib.sha256(text.encode()).hexdigest()
Stripping scripts, styles, and navigation ensures you're comparing content, not boilerplate.
Step 2: Detect Changes with difflib
Python's built-in difflib is perfect for generating human-readable diffs:
import difflib
def detect_changes(old_content: str, new_content: str) -> dict:
if old_content == new_content:
return {"changed": False}
old_lines = old_content.splitlines()
new_lines = new_content.splitlines()
differ = difflib.unified_diff(
old_lines, new_lines,
fromfile="previous",
tofile="current",
lineterm=""
)
diff_text = "\n".join(differ)
# Calculate similarity ratio
ratio = difflib.SequenceMatcher(
None, old_content, new_content
).ratio()
return {
"changed": True,
"similarity": round(ratio * 100, 2),
"diff": diff_text,
"additions": sum(1 for l in diff_text.split("\n") if l.startswith("+")),
"deletions": sum(1 for l in diff_text.split("\n") if l.startswith("-")),
}
The similarity ratio helps you filter out minor changes (like a timestamp update at 99.8% similarity) from major ones (new product listing at 85% similarity).
Step 3: Store Baselines
Use a simple JSON file to track monitored pages:
import json
import os
from datetime import datetime
BASELINE_FILE = "baselines.json"
def load_baselines() -> dict:
if os.path.exists(BASELINE_FILE):
with open(BASELINE_FILE) as f:
return json.load(f)
return {}
def save_baseline(url: str, content: str, hash_val: str):
baselines = load_baselines()
baselines[url] = {
"hash": hash_val,
"content": content,
"last_checked": datetime.now().isoformat(),
"last_changed": datetime.now().isoformat()
}
with open(BASELINE_FILE, "w") as f:
json.dump(baselines, f, indent=2)
For production use, swap this for SQLite or Redis — but JSON works fine for monitoring a handful of pages.
Step 4: Send Alerts
Email Alerts (via SMTP)
import smtplib
from email.mime.text import MIMEText
def send_email_alert(url: str, changes: dict):
msg = MIMEText(
f"Changes detected on {url}\n\n"
f"Similarity: {changes['similarity']}%\n"
f"Additions: {changes['additions']}\n"
f"Deletions: {changes['deletions']}\n\n"
f"Diff:\n{changes['diff'][:2000]}"
)
msg["Subject"] = f"Change detected: {url[:50]}"
msg["From"] = "monitor@yourdomain.com"
msg["To"] = "you@yourdomain.com"
with smtplib.SMTP("smtp.yourdomain.com", 587) as server:
server.starttls()
server.login("monitor@yourdomain.com", "your-password")
server.send_message(msg)
Slack Webhook Alerts
def send_slack_alert(webhook_url: str, url: str, changes: dict):
payload = {
"text": f"Change detected: {url[:60]}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": (
f"*Similarity:* {changes['similarity']}%\n"
f"*Changes:* +{changes['additions']} / "
f"-{changes['deletions']} lines"
)
}
}
]
}
requests.post(webhook_url, json=payload)
Step 5: Put It All Together
def monitor(urls: list[str], threshold: float = 99.0):
baselines = load_baselines()
for url in urls:
print(f"Checking {url}...")
try:
current = fetch_page(url)
current_h = content_hash(current)
if url not in baselines:
print(f" New URL - saving baseline")
save_baseline(url, current, current_h)
continue
if current_h == baselines[url]["hash"]:
print(f" No changes (hash match)")
continue
changes = detect_changes(baselines[url]["content"], current)
if changes["changed"] and changes["similarity"] < threshold:
print(f" CHANGED! Similarity: {changes['similarity']}%")
send_email_alert(url, changes)
save_baseline(url, current, current_h)
except Exception as e:
print(f" Error: {e}")
if __name__ == "__main__":
urls_to_monitor = [
"https://example.com/products/widget",
"https://example.com/jobs",
"https://news.example.com/latest",
]
monitor(urls_to_monitor, threshold=98.0)
Real-World Use Cases
1. Price Drop Monitoring
Track product prices and alert when they drop below your target:
import re
def extract_price(content: str) -> float | None:
match = re.search(r'\$(\d+(?:\.\d{2})?)', content)
return float(match.group(1)) if match else None
# In your monitor loop:
old_price = extract_price(baselines[url]["content"])
new_price = extract_price(current)
if new_price and old_price and new_price < old_price:
print(f"Price drop! ${old_price} -> ${new_price}")
2. Job Posting Alerts
Monitor career pages for new positions matching your criteria:
def check_new_listings(old_content: str, new_content: str, keywords: list):
old_lines = set(old_content.splitlines())
new_lines = set(new_content.splitlines())
additions = new_lines - old_lines
matches = [
line for line in additions
if any(kw.lower() in line.lower() for kw in keywords)
]
return matches
new_jobs = check_new_listings(old, new, ["Python", "Remote", "Senior"])
3. Product Restock Monitoring
Watch for "Out of Stock" to "In Stock" transitions — useful for limited drops and popular items.
4. News and Regulatory Alerts
Monitor government pages, regulatory bodies, or news sites for updates that affect your business.
Scaling with Proxies
When monitoring many pages, you'll need proxy rotation to avoid IP blocks. Services like ThorData provide residential proxies ideal for monitoring tasks — they look like real users, which reduces blocking. ScrapeOps adds a monitoring layer on top, so you can track which of your monitors are succeeding and which are getting blocked.
# Using ThorData proxy rotation
proxies = {
"http": "http://user:pass@proxy.thordata.com:9000",
"https": "http://user:pass@proxy.thordata.com:9000"
}
response = requests.get(url, proxies=proxies, timeout=30)
Scheduling: Run It Automatically
Option A: Cron Job (Linux/Mac)
# Check every 30 minutes
*/30 * * * * cd /path/to/monitor && python3 monitor.py >> monitor.log 2>&1
Option B: Cloud-Based Monitoring with Apify
For hands-off monitoring that runs in the cloud, Apify lets you schedule actors (cloud functions) that run on a cron schedule. No server management, built-in proxy rotation, and results are stored automatically. Check out ready-made monitoring actors or build your own.
Option C: GitHub Actions (Free Tier)
name: Monitor Websites
on:
schedule:
- cron: '0 */6 * * *'
jobs:
monitor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- run: pip install requests beautifulsoup4
- run: python monitor.py
Tips for Production Monitoring
- Set appropriate thresholds: Not every change matters. A 99.5% similarity change is probably just a timestamp
- Respect robots.txt: Check before monitoring. Don't hit sites more often than necessary
- Add delays between checks: 2-5 seconds between requests prevents triggering rate limits
- Log everything: When something breaks at 3 AM, you'll want to know why
- Use CSS selectors for precision: Instead of monitoring entire pages, target specific elements (price div, job listing container)
Conclusion
Website change monitoring is one of those tools that, once you have it, you wonder how you lived without it. The Python implementation above handles 90% of use cases. For the other 10% — JavaScript-heavy sites, large-scale monitoring, or complex alerting — consider cloud platforms like Apify with built-in scheduling and proxy rotation.
The full code from this tutorial is modular enough to extend: swap in a database, add Telegram alerts, or integrate with your existing automation pipeline.
What would you monitor? Price drops, job postings, or something else? Share your use case in the comments!
Top comments (1)
This is very useful! Beautiful Soup is an awesome library that I've enjoyed using in the past.