I see too many scraping scripts using CSS selectors to find prices. Then frontend developers change class names. Your scraper breaks. You fix it. They change it again.
There is a better way. Here are three patterns for building resilient price monitors in Python.
Intercepting Internal APIs
Most modern sites hydrate their DOM using JSON. You can find this data by filtering for XHR requests in DevTools.
Instead of fighting WAFs with requests use Playwright to intercept this traffic passively.
The Selector Priority Waterfall
Sometimes you must parse HTML. Do not rely on a single selector. Build a hierarchy of reliability.
- JSON-LD (Structured Data)
- Meta Tags (itemprop="price")
- Data Attributes (data-price)
- CSS Classes
# Check for machine-readable data first
json_ld = soup.find("script", {"type": "application/ld+json"})
if json_ld:
data = json.loads(json_ld.string)
return data["offers"]["price"]
Strict Data Normalization
Never store raw strings. Never use floats. Use the decimal library and handle locale differences. 1.000 is one thousand in the US but one in Germany.
from decimal import Decimal, InvalidOperation
import re
def normalize_price(raw_text, locale_hint="AUTO"):
"""
Converts localized price strings to precise Decimal objects.
Args:
raw_text: Dirty strings like "€ 1.234,56", "$1,234.56", or "£1 234.56"
locale_hint: "US", "EU", or "AUTO" for heuristic detection
Returns:
Decimal: Safe for financial calculations (never float)
"""
if not raw_text:
return None
# Step 1: Remove artifacts
# We strip everything except digits, commas, and dots
# This handles space separators (e.g., "1 200.00" becomes "1200.00")
cleaned = re.sub(r'[^\d.,]', '', raw_text)
if not cleaned:
raise ValueError(f"No numeric data found in: {raw_text}")
# Step 2: Detect Format
if locale_hint == "AUTO":
# If both separators exist, the right-most one is the decimal
if ',' in cleaned and '.' in cleaned:
last_comma = cleaned.rfind(',')
last_period = cleaned.rfind('.')
locale_hint = "EU" if last_comma > last_period else "US"
# If only comma exists, check context
elif ',' in cleaned:
# Ambiguous Case: "1,234"
# Logic: If exactly 2 digits follow the comma, assume EU (cents)
# Otherwise assume US thousands separator
parts = cleaned.split(',')
locale_hint = "EU" if len(parts[-1]) == 2 else "US"
else:
# Default to US if no comma is present
locale_hint = "US"
# Step 3: Normalize to Python Standard (US)
if locale_hint == "EU":
# Convert "1.234,56" -> "1234.56"
normalized = cleaned.replace('.', '').replace(',', '.')
else:
# Convert "1,234.56" -> "1234.56"
normalized = cleaned.replace(',', '')
try:
return Decimal(normalized)
except InvalidOperation:
raise ValueError(f"Normalization failed: {raw_text} -> {normalized}")
# Unit Tests for Validation
if __name__ == "__main__":
test_cases = [
("$1,234.56", "US", Decimal("1234.56")),
("€ 1.234,56", "EU", Decimal("1234.56")),
("Price: 1,200", "US", Decimal("1200")), # US Integer
("1,20 €", "AUTO", Decimal("1.20")), # EU Decimal
]
for raw, locale, expected in test_cases:
result = normalize_price(raw, locale)
print(f"Input: {raw:15} | Mode: {locale:4} | Result: {result}")
assert result == expected
Notes
If you scrape at scale you will hit blocking issues. WAFs analyze your TLS fingerprint. Standard Python scripts get blocked even with good proxies. You must ensure your scraper mimics the TLS signature of a real browser.
We published a deep dive on how to implement this architecture. It includes code for AI parsing and multi-region monitoring.
Check out the full article on our blog: Read the Full Article

Top comments (0)