When you have 1 scraper, monitoring is easy. Check if it runs. Done.
When you have 77 scrapers running on different schedules, extracting data from sites that change their layout every Tuesday at 3am, monitoring becomes a full-time job.
Here is the system I built to stay sane.
The Problem
Web scrapers fail silently. They do not crash with an error. They just return empty data, or stale data, or wrong data. And you only notice when a client says: "Hey, the data looks off."
I needed a monitoring system that catches:
- Scrapers that return 0 results (site changed layout)
- Scrapers that return the same data twice (caching issue)
- Scrapers that take 10x longer than usual (being rate-limited)
- Scrapers that return data in the wrong format (schema changed)
Layer 1: Health Checks (5 minutes to set up)
Every scraper writes a heartbeat file after a successful run:
import json
from datetime import datetime
from pathlib import Path
def write_heartbeat(scraper_name, result_count, duration_seconds):
heartbeat = {
"scraper": scraper_name,
"timestamp": datetime.now().isoformat(),
"result_count": result_count,
"duration_seconds": round(duration_seconds, 2),
"status": "ok" if result_count > 0 else "empty"
}
Path("heartbeats").mkdir(exist_ok=True)
path = Path(f"heartbeats/{scraper_name}.json")
path.write_text(json.dumps(heartbeat, indent=2))
return heartbeat
Then a simple checker runs every hour:
import json
from datetime import datetime, timedelta
from pathlib import Path
def check_all_scrapers(max_age_hours=24):
issues = []
cutoff = datetime.now() - timedelta(hours=max_age_hours)
for hb_file in Path("heartbeats").glob("*.json"):
data = json.loads(hb_file.read_text())
last_run = datetime.fromisoformat(data["timestamp"])
if last_run < cutoff:
issues.append(f"STALE: {data['scraper']} last ran {last_run}")
elif data["status"] == "empty":
issues.append(f"EMPTY: {data['scraper']} returned 0 results")
elif data["duration_seconds"] > 300:
issues.append(f"SLOW: {data['scraper']} took {data['duration_seconds']}s")
return issues
This alone catches 80% of problems.
Layer 2: Data Quality Checks
Empty results are obvious. But what about wrong results?
def validate_scraper_output(data, schema):
errors = []
for item in data:
for field, expected_type in schema.items():
if field not in item:
errors.append(f"Missing field: {field}")
elif not isinstance(item[field], expected_type):
errors.append(f"Wrong type: {field} = {type(item[field]).__name__}, expected {expected_type.__name__}")
return errors
# Schema for a product scraper
product_schema = {
"title": str,
"price": (int, float),
"url": str,
"in_stock": bool
}
errors = validate_scraper_output(scraped_products, product_schema)
if errors:
send_alert(f"Schema validation failed: {errors[:3]}")
Layer 3: Trend Detection
The sneakiest failures are gradual. A scraper returning 1000 results suddenly returning 500 is suspicious.
import sqlite3
def log_run(scraper_name, result_count):
conn = sqlite3.connect("scraper_metrics.db")
conn.execute("""
CREATE TABLE IF NOT EXISTS runs (
scraper TEXT, count INTEGER,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
)
""")
conn.execute(
"INSERT INTO runs (scraper, count) VALUES (?, ?)",
(scraper_name, result_count)
)
conn.commit()
# Check for anomalies
avg = conn.execute("""
SELECT AVG(count) FROM runs
WHERE scraper = ?
AND timestamp > datetime('now', '-7 days')
""", (scraper_name,)).fetchone()[0]
if avg and result_count < avg * 0.5:
return f"WARNING: {scraper_name} returned {result_count}, avg is {avg:.0f}"
return None
Layer 4: Alerts That Do Not Annoy
The biggest mistake is alerting on everything. After 2 days you ignore all alerts.
My rules:
- Critical (immediate): scraper returns 0 results for 2 runs in a row
- Warning (daily digest): result count dropped 50%+
- Info (weekly report): performance trends, slow scrapers
import smtplib
from email.mime.text import MIMEText
def send_daily_digest(issues):
if not issues:
return # No news is good news
critical = [i for i in issues if i.startswith("CRITICAL")]
warnings = [i for i in issues if i.startswith("WARNING")]
body = f"""Scraper Monitor - Daily Digest
Critical ({len(critical)}):
{chr(10).join(critical) or 'None'}
Warnings ({len(warnings)}):
{chr(10).join(warnings) or 'None'}
Total scrapers: 77 | Healthy: {77 - len(issues)} | Issues: {len(issues)}
"""
# Send only if critical issues exist
if critical:
send_email("CRITICAL: Scraper failures", body)
elif warnings:
send_email("Scraper digest", body)
The Dashboard (Optional but Satisfying)
I built a simple HTML dashboard that shows:
- Green: ran in last 24h, results > 0
- Yellow: ran but results dropped
- Red: not run or 0 results
It is just a cron job that reads heartbeat files and generates a static HTML page. No framework needed.
Results
| Before | After |
|---|---|
| Found failures when clients complained | Found failures in minutes |
| 3-4 scraper fires per week | 0-1 per week |
| Manual checks every morning | Automated daily digest |
| No idea which scrapers were degrading | Trend graphs show everything |
The entire system is ~200 lines of Python. No Grafana, no Prometheus, no Datadog. Just files, SQLite, and email.
What is your scraper monitoring setup?
Are you monitoring your scrapers at all? Or do you find out when things break? I am curious what approaches others use — especially for larger fleets.
I write about web scraping, Python automation, and data engineering. Follow for practical tutorials from someone running 77 scrapers in production.
Related: 130+ web scraping tools | Scraper starter template
Need custom dev tools, scrapers, or API integrations? I build automation for dev teams. Email spinov001@gmail.com — or explore awesome-web-scraping.
More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs
NEW: I Ran an AI Agent for 16 Days — What Works
Top comments (0)